Discussion
Up to the Bugs Forum
Please feel free to communicate anything you feel could help this project move on. You can receive an email each time a new message is posted by setting up your personal preferences.
first steps in OpenBayes
hmmm....
seems like you have reported this in the right section. This seems like a bug.
Using the MCMCEngine gives the correct answer for Irene but nans for Henry, so we probably have two separate bugs :
- 1 in the JTree
- 1 in the MCMC
I'll check it out as soon as I have some spare time, probably before the end of the week.
Thanks Ivan for pointing out this errors. Bugs like this one become more and more rare but unfortunately, some still persist.
I'll keep in touch with you
kosta
seems like you have reported this in the right section. This seems like a bug.
Using the MCMCEngine gives the correct answer for Irene but nans for Henry, so we probably have two separate bugs :
- 1 in the JTree
- 1 in the MCMC
I'll check it out as soon as I have some spare time, probably before the end of the week.
Thanks Ivan for pointing out this errors. Bugs like this one become more and more rare but unfortunately, some still persist.
I'll keep in touch with you
kosta
Kosta
I've done a bit of work locating the part of the network which might be triggering the bug. It seems to be in the edge list. Everything works fine with just one or two generations (i.e., only nodes d,e,f,g,h,i & j).
With the fourth generation (i.e., nodes a,b,c, anon_i & anon_k), the fact that a is parent of three children seems to trigger the bug: change the edge list so a is parent of only two nodes (as below) & the results come out fine; change the edge list again so that c is parent of three (e.g., change (a,d) to (c,d) and the fault reappears.
Hope this is useful.
edgeList = [(a,f), (anon_i,f),
(a,d), (b,d),
(c,e), (b,e),
(c,g), (anon_k,g),
(f,h), (d,h), (e,i), (g,i),
(h,j), (i,j)
]
for edge in edgeList:
G.add_e( DirEdge( len( G.e ), *edge ) )
Some example results:
edgeList = [(a,f), (anon_i,f),
(a,d), (b,d),
(a,e), (b,e),
(c,g), (anon_k,g), ...
Henry: [ 0.17193194, 0.82806808]
Irene: [ 1., 0.]
edgeList = [(c,f), (anon_i,f),
(c,d), (b,d),
(c,e), (b,e),
(a,g), (anon_k,g), ...
Henry: [ 0.17193194, 0.82806808]
Irene: [ 1., 0.]
edgeList = [(a,f), (anon_i,f),
(a,d), (b,d),
(c,e), (b,e),
(a,g), (anon_k,g), ...
Henry: [ 1., 0.]
Irene: [ 0.29795879, 0.70204121]
edgeList = [(c,f), (anon_i,f),
(c,d), (b,d),
(a,e), (b,e),
(c,g), (anon_k,g), ...
Henry: [ 1., 0.]
Irene: [ 0.29795879, 0.70204121]
edgeList = [(a,f), (anon_i,f),
(c,d), (b,d),
(a,e), (b,e),
(a,g), (anon_k,g), ...
Henry: [ 0.32798684, 0.67201316]
Irene: [ 1., 0.]
edgeList = [(c,f), (anon_i,f),
(a,d), (b,d),
(c,e), (b,e),
(c,g), (anon_k,g), ...
Henry: [ 0.32798684, 0.67201316]
Irene: [ 1., 0.]
edgeList = [(c,f), (anon_i,f),
(a,d), (b,d),
(a,e), (b,e),
(a,g), (anon_k,g),
Henry: [ 1., 0.]
Irene: [ 0.17193195, 0.82806808]
edgeList = [(a,f), (anon_i,f),
(c,d), (b,d),
(c,e), (b,e),
(c,g), (anon_k,g),
Henry: [ 1., 0.]
Irene: [ 0.17193195, 0.82806808]
Best wishes
Ivan
I've done a bit of work locating the part of the network which might be triggering the bug. It seems to be in the edge list. Everything works fine with just one or two generations (i.e., only nodes d,e,f,g,h,i & j).
With the fourth generation (i.e., nodes a,b,c, anon_i & anon_k), the fact that a is parent of three children seems to trigger the bug: change the edge list so a is parent of only two nodes (as below) & the results come out fine; change the edge list again so that c is parent of three (e.g., change (a,d) to (c,d) and the fault reappears.
Hope this is useful.
edgeList = [(a,f), (anon_i,f),
(a,d), (b,d),
(c,e), (b,e),
(c,g), (anon_k,g),
(f,h), (d,h), (e,i), (g,i),
(h,j), (i,j)
]
for edge in edgeList:
G.add_e( DirEdge( len( G.e ), *edge ) )
Some example results:
edgeList = [(a,f), (anon_i,f),
(a,d), (b,d),
(a,e), (b,e),
(c,g), (anon_k,g), ...
Henry: [ 0.17193194, 0.82806808]
Irene: [ 1., 0.]
edgeList = [(c,f), (anon_i,f),
(c,d), (b,d),
(c,e), (b,e),
(a,g), (anon_k,g), ...
Henry: [ 0.17193194, 0.82806808]
Irene: [ 1., 0.]
edgeList = [(a,f), (anon_i,f),
(a,d), (b,d),
(c,e), (b,e),
(a,g), (anon_k,g), ...
Henry: [ 1., 0.]
Irene: [ 0.29795879, 0.70204121]
edgeList = [(c,f), (anon_i,f),
(c,d), (b,d),
(a,e), (b,e),
(c,g), (anon_k,g), ...
Henry: [ 1., 0.]
Irene: [ 0.29795879, 0.70204121]
edgeList = [(a,f), (anon_i,f),
(c,d), (b,d),
(a,e), (b,e),
(a,g), (anon_k,g), ...
Henry: [ 0.32798684, 0.67201316]
Irene: [ 1., 0.]
edgeList = [(c,f), (anon_i,f),
(a,d), (b,d),
(c,e), (b,e),
(c,g), (anon_k,g), ...
Henry: [ 0.32798684, 0.67201316]
Irene: [ 1., 0.]
edgeList = [(c,f), (anon_i,f),
(a,d), (b,d),
(a,e), (b,e),
(a,g), (anon_k,g),
Henry: [ 1., 0.]
Irene: [ 0.17193195, 0.82806808]
edgeList = [(a,f), (anon_i,f),
(c,d), (b,d),
(c,e), (b,e),
(c,g), (anon_k,g),
Henry: [ 1., 0.]
Irene: [ 0.17193195, 0.82806808]
Best wishes
Ivan
Ok, now it seems more clear
I think I have an idea of what it could possibly be. I'll check it out and report later
kosta
I think I have an idea of what it could possibly be. I'll check it out and report later
kosta
Please keep me posted. It might be an excuse for me to look at the code myself.
In the meantime, I've been trying JavaBayes ([1], recommended to me by Kevin Murphy). If it's any consolation, JavaBayes is buggy too. It works OK if you mark up your network beforehand (in 'BIF' format), but if you change the network at all in the GUI, the probabilities start to go pear shaped. So, JavaBayes is no more usable than OpenBayes.
Best
Ivan
[1] http://www.cs.cmu.edu/~javabayes/
In the meantime, I've been trying JavaBayes ([1], recommended to me by Kevin Murphy). If it's any consolation, JavaBayes is buggy too. It works OK if you mark up your network beforehand (in 'BIF' format), but if you change the network at all in the GUI, the probabilities start to go pear shaped. So, JavaBayes is no more usable than OpenBayes.
Best
Ivan
[1] http://www.cs.cmu.edu/~javabayes/
I used PNL for some time.
Mainly :
- the project is huge!
- almost no documentation
- no support (project abandonned I think?)
But if you manage to make it work, it's fast and full of features.
good luck!
kosta
Mainly :
- the project is huge!
- almost no documentation
- no support (project abandonned I think?)
But if you manage to make it work, it's fast and full of features.
good luck!
kosta
This bug has been corrected.
Please download the new OpenBayes version available in BerliOS.
The problem was found in the triangulisation algorithm of the Junction Tree. Some edges where not added leading to incorrect dependancies between the variables of the Junction Tree.
Thank you Ivan for reporting this error.
Kosta
Please download the new OpenBayes version available in BerliOS.
The problem was found in the triangulisation algorithm of the Junction Tree. Some edges where not added leading to incorrect dependancies between the variables of the Junction Tree.
Thank you Ivan for reporting this error.
Kosta
Well done Kosta! That was a fast bugfix.
A small side-remark: the new copy function of the Graph class doesn't support subclasses of Vertex and DirEdge with extra variables since it copies only known variables (like e.g. v.name, v.discrete, v.nvalues for Vertex through __init__).
Also, can in the BNet.add_e() function the test
"if e.__class__.__name__ == 'DirEdge'"
be replaced by something like
"if isinstance(e, graph.DirEdge)"
The first doesn't allow to have subclasses of DirEdge in the net.
Kind regards,
wannes
A small side-remark: the new copy function of the Graph class doesn't support subclasses of Vertex and DirEdge with extra variables since it copies only known variables (like e.g. v.name, v.discrete, v.nvalues for Vertex through __init__).
Also, can in the BNet.add_e() function the test
"if e.__class__.__name__ == 'DirEdge'"
be replaced by something like
"if isinstance(e, graph.DirEdge)"
The first doesn't allow to have subclasses of DirEdge in the net.
Kind regards,
wannes
Wannes,
I see that you are implementing a lot of stuff using openbayes. Interesting! So here is what I propose you :
- The changes you mention will be included in the next release (probably this week or the next one) together with the new SEM algorithm and a brand new sampling inference engine that uses importance sampling.
However, since I am not concerned by these changes, nothing guarranties that after that your code will work, maybe some other small changes will be needed and it could become time-consuming to keep up with all the users that implement something.
So here is my second proposition :
- I can give you SVN access to the repository of OpenBayes and this way I will be more informed of what you do. Also, by having svn access you will be able to make such small changes directly into the public version of openbayes.
You can create a new branch and implement all your stuff there without influencing the work of other people.
By having SVN access you directly become an active developper of openbayes and can therefore influence the future of this project. It all really depends on your commitment and will to pursue this work. If you don't want such responsability, then it's ok but you will have to tell me which changes I must make in the code to make it more generic. Both solutions are good for me, although I prefer the first one.
By the way, speaking of all this. I'm preparing a new version of OpenBayes for next month, so maybe we could meet on skype one day to discuss a little bit about which directions we should follow. Any other who is interested is also welcome.
tell me what you think.
Kosta
I see that you are implementing a lot of stuff using openbayes. Interesting! So here is what I propose you :
- The changes you mention will be included in the next release (probably this week or the next one) together with the new SEM algorithm and a brand new sampling inference engine that uses importance sampling.
However, since I am not concerned by these changes, nothing guarranties that after that your code will work, maybe some other small changes will be needed and it could become time-consuming to keep up with all the users that implement something.
So here is my second proposition :
- I can give you SVN access to the repository of OpenBayes and this way I will be more informed of what you do. Also, by having svn access you will be able to make such small changes directly into the public version of openbayes.
You can create a new branch and implement all your stuff there without influencing the work of other people.
By having SVN access you directly become an active developper of openbayes and can therefore influence the future of this project. It all really depends on your commitment and will to pursue this work. If you don't want such responsability, then it's ok but you will have to tell me which changes I must make in the code to make it more generic. Both solutions are good for me, although I prefer the first one.
By the way, speaking of all this. I'm preparing a new version of OpenBayes for next month, so maybe we could meet on skype one day to discuss a little bit about which directions we should follow. Any other who is interested is also welcome.
tell me what you think.
Kosta
Excellent and well done!
I've installed the new version and the results all come out right. I'll write up the stud farm example and upload it somewhere accessible.
Ivan
I've installed the new version and the results all come out right. I'll write up the stud farm example and upload it somewhere accessible.
Ivan
I've uploaded some notes on this stud farm example to
http://www.iau.ukfsn.org/bnadg/bnadg.html . I'll keep it up-to-date as I work through the book. Comments welcome, as they say. One day it will be a blog.
Ivan
http://www.iau.ukfsn.org/bnadg/bnadg.html . I'll keep it up-to-date as I work through the book. Comments welcome, as they say. One day it will be a blog.
Ivan
Very nice explanatory example.
Just for the info, you can create any kind of pages inside the openbayes site if you like. You just go to your Members folder and add a page or a folder or anything else. Once you've done this I can put the new page into a publicly available folder.
Since the plans are to create a platform not only for calculating bayesian netwroks but also for learning the theory behind, I would really appreciate it if you created the page on this site.
Anyway, the choice is yours. Let me know what you think.
kosta
Just for the info, you can create any kind of pages inside the openbayes site if you like. You just go to your Members folder and add a page or a folder or anything else. Once you've done this I can put the new page into a publicly available folder.
Since the plans are to create a platform not only for calculating bayesian netwroks but also for learning the theory behind, I would really appreciate it if you created the page on this site.
Anyway, the choice is yours. Let me know what you think.
kosta
Powered by Ploneboard
First: is this the right place for this discussion? If not, apologies; please redirect me.
I have just started studying Bayesian networks, using the textbook "Bayesian Networks and Decision Graphs" (Jensen, 2001). I need some BN software to help me with the examples. As I have some Python, I thought I would try OpenBayes.
The attached script is my first attempt. I've followed the first two tutorials on setting up and using inference in the water sprinkler network to get myself going. So far I'm finding the tutorials and the code itself very readable and usable.
The script codes up Jensen's stud farm example (p.46f): basically, four generations of horses, ancestors of John horse. Turns out John is sick. None of the other horses are sick, but some may be carriers. We want to find out the probability of each horse being a carrier given that John is sick.
Lines 25-37 represent the initial probabilities, copied from tables in the book. The following lines show P(John| Henry, Irene):
# family are [carrier, pure]
# john is [sick, carrier, pure]
j.distribution[{'Henry':0,'Irene':0}]=[0.25, 0.5, 0.25]
j.distribution[{'Henry':0,'Irene':1}]=[0.0, 0.5, 0.5]
j.distribution[{'Henry':1,'Irene':0}]=[0.0, 0.5, 0.5]
j.distribution[{'Henry':1,'Irene':1}]=[0.0, 0.0, 1.0]
This should imply that if John is sick (=0), then both Henry and Irene must be carriers (=0). Jensen reckons both parents are definitely carriers. However, the final lines in the script use the JoinTree inference engine to get values for Henry and Irene given an observation of John = 0. They do not give the values I expected:
Multinomial Distribution for node : Henry
Conditional Probability Table (CPT) :
array([ 1., 0.], type=Float32)
Multinomial Distribution for node : Irene
Conditional Probability Table (CPT) :
array([ 0.29795879, 0.70204121], type=Float32)
So Henry is definitely a carrier, but Irene probably isn't. Where have I gone wrong?
Any help gratefully received.
Best wishes
Ivan
bn1.py
# BN for Stud Farm network (Jensen, 2001, 2.2.2, p.46f)
from OpenBayes import BNet, BVertex, DirEdge, JoinTree
G = BNet( 'Stud Farm Bayesian Network' )
family = 'Ann Brian Cecily Dorothy Eric Fred Gwenn Henry Irene I K'.split()
a, b, c, d, e, f, g, h, i, anon_i, anon_k = [G.add_v( BVertex(nm, True, 2))
for nm in family]
j = G.add_v(BVertex('John', True, 3))
for edge in [(anon_i,f), (a,f), (a,d), (a,g), (b,d), (b,e), (c,e), (anon_k,g),
(f,h), (d,h), (e,i), (g,i),
(h,j), (i,j)
]:
G.add_e( DirEdge( len( G.e ), *edge ) )
print G # network same shape as in Jensen p.49
G.InitDistributions()
# family are either carrier or pure
# john is sick, carrier or pure
for w in [anon_i, a, b, c, anon_k]:
w.setDistributionParameters([0.01, 0.99])
for w in [d, e, f, g, h, i]:
w.distribution[:,0,0]=[0.67, 0.33]
w.distribution[:,0,1]=[0.5, 0.5]
w.distribution[:,1,0]=[0.5, 0.5]
w.distribution[:,1,1]=[0.0, 1.0]
j.distribution[{'Henry':0,'Irene':0}]=[0.25, 0.5, 0.25]
j.distribution[{'Henry':0,'Irene':1}]=[0.0, 0.5, 0.5]
j.distribution[{'Henry':1,'Irene':0}]=[0.0, 0.5, 0.5]
j.distribution[{'Henry':1,'Irene':1}]=[0.0, 0.0, 1.0]
# John is sick
ie = JoinTree(G)
ie.SetObs({'John': 0})
# if john is sick, both parents must be carriers
for f in ['Henry', 'Irene']:
print ie.Marginalise(f)
# prints:
# Multinomial Distribution for node : Henry
# Conditional Probability Table (CPT) :
# array([ 1., 0.], type=Float32)
# Multinomial Distribution for node : Irene
# Conditional Probability Table (CPT) :
# array([ 0.29795879, 0.70204121], type=Float32)