Personal tools
You are here: Home Tutorials Learning the parameters of a BN from incomplete data
Document Actions

Learning the parameters of a BN from incomplete data

by Francois de Bergeyck last modified 2007-07-02 13:43

How to learn the parameters of a BN with known structure from a set of incomplete observations. To use this algorithm, please download the learning.zip file of the development section. That file contains the Numarray and excellreader packages that are needed to use my algorithms. The version of OpenBayes that is used in that file is not the latest up to date, however, my algorithms works with that version, and not necessarily with another version.

Create the set of observations

We first have to import the OpenBayes package and some other packages (such as copy) that we will need

from OpenBayes import learning
from copy import deepcopy
from time import time
import random

Then we will generate our observations by sampling the Water-Sprinkler network. Please note that this is a tutorial and that in real life you don't generate your observations this way. Usually the set of observations is already available (from an experiment for example). We then delete some data to work with incomplete observations. Put '?' for unknown data.

# first create a bayesian network
from WaterSprinkler import *

N = 2000
# sample the network N times
cases = G.Sample(N) # cases = [{'c':0,'s':1,'r':0,'w':1},{...},...]
# delete some observations
for i in range(500):
case = cases[3*i]
rand = random.sample(['c','s','r','w'],1)[0]
case[rand] = '?'
for i in range(50):
case = cases[3*i]
rand = random.sample(['c','s','r','w'],1)[0]
case[rand] = '?'

Now cases contains 2000 dictionaries, each one of them containing one sampled value (or '?' if the value is unknown) for each node in the Water-Sprinkler BN.

If the data is stored in an excell file, you can use the ReadFile function (see below "learn these parameters"). That function will return a list of n dictionnaries that contains one of your n cases. You have to put a '?' in your excell file if the data is unknown.

Create a BN with no parameters

# copy the BN
G2 = deepcopy(G)
# set all parameters to 1s
G2.InitDistributions()

Now we have a copy of the structure of the original network but all the parameters are set to 1

Learn these parameters!

# Learn the parameters from the set of cases
engine = learning.EMLearningEngine(G2)
# cases = engine.ReadFile('file.xls') #To use the data of file.xls
t = time()
engine.EMLearning(cases, 10)
print 'Learned from %d cases in %1.3f secs' %(N,(time()-t))

# print the learned parameters
for v in G2.all_v:
print v.name, v.distribution.cpt,'\n'

EMLearning uses the Expectation-Maximisation algorithm to learn the parameters.

Note: The third line : "cases = engine.ReadFile('file.xls')" is used to use the data contained in an excell file.

Save the results

You can easily save the results in a txt file:

engine.SaveInFile('test.txt',G2,engine.BNet,engine)

Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: