OpenBayes version 2 : Issues for discussion
These are the main issues that will be discussed during the skype meeting that will take place on Sunday 11 March 2007 at 2pm central America Time (9-10pm central europe time)
OpenBayes
Issues for version 0.2
This is a list of the subjects I would like to discuss with the people interested in participating in the development of OpenBayes. This paper is divided into three categories :
Issues that have to do with distributing the work charge throughout the users
Strictly Programming issues : class specification, name conventions,...
New features : gaussians, SEM, EM, ML, Augmented Bns,...
Distributing the work in a horizontal way among the (active) users
For the moment, I am the only person that can access the SVN server (some other people also have access but have stopped contributing to OpenBayes). For the moment there are two different ways of contributing :
A user tries something, finds a bug and reports it. Then I have to ask him to try to isolate the bug as much as possible, then I can track down the bug, correct it, submit the files and advise everybody to download the latest version
A user writes some new feature (e.g. Augment Bns, visualizing the network, LearnMLParameters,...), then I have to test this new code. Make sure that everything works correctly and no new bugs are generated (very difficult task...), submit the code and advise everybody to download the latest version.
This way of working is not very productive and makes me spend most of my time in correcting bugs and integrating new pieces of code. So here is my proposal.
There should be a page on the website that explains how to proceed in case a bug is to be reported, new code has to be integrated,etc... The point of this page, would be to avoid explaining the same things to each new user that is willing to contribute. This page should be a sort of a checklist of things to do before reporting a bug, how to report a bug, before writing some code, how to submit the code yourself.
This document should also be accompanied with full documentation of each class and function in OpenBayes, as well as some documentation about the way OpenBayes works (explain what is a distribution, a potential, an inference engine, etc...). By full documentation I understand that the code should have pydoc strings for each function and class but we also should provide an introductive way of understanding the code. Indeed, most of the users are not extreme programmers that will directly understand what is going on, but they will need some sort of guidance. This will avoid spending time explaining the same things to different users, it will encourage new users to contribute and will give a serious boost to code quality. For example, we could propose a tutorial for contributing with a new distribution type. The tutorial could be based on a simple distribution type example (like a boolean discrete variable).
For contributing with new features, a new user should have SVN access. For this to work he must create a BerliOS account, and then I can add him as developer in OpenBayes. I believe that a good way to work is to create a new branch each time a new feature is being developed (a branch is a copy of the files of OpenBayes). Once the coding of the new feature is done, a short period of testing should become a standard to avoid regression in the quality of code. This means that when you download OpenBayes, you have the choice of downloading the latest working version, or try a new version with a new feature added that could potentially contain some bugs. Once two or three users have tested the new feature with success, than it can be included in the main OpenBayes release with relatively small probability of presenting errors. We should also ask ourselves how the versioning system will work. I'm not aware of an automatic versioning system, maybe you can help me with that.
Another very important thing is to have a page that explains the theory behind the code. We should provide a sort of tutorial to the inference engines that are being used and also some references to articles or sites where the user can find more info on the subject. Such documentation, coupled with full documentation will make understanding the code real easy and attractive because people will learn Bayesian Networks, while they play around with OpenBayes. This page could be a wiki for registered users. I was thinking of adding a licence to this wiki where the text belongs to the authors that wrote it and that it can be freely distributed if the authors have given their consent. If this wiki is well done, it could lead (in a couple of years) to the publication of a book or a scientific article (BNT did the same thing more or less but was only written by Kevin Murphy).
Lets make a list of what I just explained :
Page explaining how an active user should react according to what he needs to do (report a bug, contribute with a new feature,...).
Full documentation of the classes and functions. This includes pydocs inside the code and a webpage explaining the basic principles of OpenBayes (this class does that and is inherited by that one in order to ...)
Allow for multiple versions of OpenBayes to be downloadable. A main stable version and some testing versions of new features.
Wiki page explaining the theory of Bayesian Networks.
All this clearly means that the web site will change substantially. Any help on this point will be greatly appreciated.
Version numbers ?
Strictly Programming Issues
In this section I will present the things I'd like to see done in the next version of OpenBayes, which should be available soon (in one or two months most!). This section is not independant from the previous one and many of the things I'd like to do go in the same direction as what I explained above.
Review the names of classes, functions and variables and propose a uniform way of naming objects according to the python PEP specifications : http://www.python.org/dev/peps/pep-0008
The class inheritance hierarchy should be revisited for distributions, potentials, inference engines and learning engines. After revision a UML graph of the classes should be created. (any ideas about which program to use for generating UML graphs, SPE is very limited in that)
Add fully comprehensible PyDocs to each class and function.
Review the test cases for each file and create a general test file that should be used to make sure that new features do not break the code.
The first two points should be addressed with top priority! During these steps we will gain in stability of the application and will provide a solid basis for future development. It would be suicide to continue adding new features without making sure that the foundations of OpenBayes are solid enough to support them. If we're good enough, after defining the class specifications, we would be able to add gaussian distributions with no change at all to the rest of the code. This is what I'm looking for because it will open the door for people contributing with new distribution types, new inference engines, etc...
New features
Actually in this version no really new features will be made available. We will mostly try to make all the currently available features useful. For example the SEM algorithm that works once every two users... :-) This is clearly a problem of integration and bad class specifications that should be avoided in the future if we want to see a serious application in the near future.
I'll just make a list of the things that should be integrated in the new version :
EM and SEM Learning
new MCMC engine with importance sampling
Gaussian Distributions and Potentials (this is the only really new feature. I really believe that it is an important one because it clearly proves that OpenBayes is mature)
Augmented Bayesian Networks
Visualizing a graph
new MLLearning
GUI
...(is that all?)
License
For the moment OpenBayes is distributed under the Lesser GPL license, which means that OpenBayes can be integrated into proprietary software under the condition that the code of OpenBayes (but not the rest of the proprietary program) is distributed together with the program. I would like to change this to the full GPL licence which forbids including OpenBayes into proprietary programs. Normally it is illegal to change licenses like that but since OpenBayes is still young I do not believe that anybody will sue me. However, this also means that if the license is to change, it should be done right now and not later!
Why change the license? Well, I wouldn't like applications like MS Office Help Assistant to start using OpenBayes for annoying the user. :-) Seriously. If this program has been created by the contribution of it's users freely and without any renumeration, I find it more than fair not to be used by people that will earn money out of our work. If they're willing to sell a product that uses bayesian networks, they should pay their programmers to create it. For example, imagine that somebody creates a super GUI for using OpenBayes and starts selling it...small amount of work and major profits...
Some last notes
Throughout the skype meeting that will take place on Sunday 2pm Central American Time (9pm central Europe time), I would like to review all these points. Any critics, ideas or propositions are more than welcome. Please do not hesitate to tell me something like : 'this will never work, it's a bad idea!'.
What I'm trying to do is to implement the most difficult parts of OpenBayes and leave the easy parts for the common users. I think that this is fastest way to work in an opensource project (correct me if I'm wrong)
I do not believe that OpenBayes belongs to me. It belongs to anybody that's willing to use it. The thing is that for the moment I have a lot of work on my shoulders that could be distributed in a more productive and equitable way among the active users.
If you're willing to participate actively to the OpenBayes project, then this skype meeting will be the occasion for each one of us to decide which are the things that will provide the most pleasure and according to this decide which one does what.
This is an opensource project which means that : nobody is forced to work. No deadlines are imposed. Nobody gets angry if something is not done. Nobody is the chief. On the contrary, everybody decides, everybody launches initiatives, everybody works whenever it suits him best.
I hope that you agree with my general principles and I'm looking forward to talking with you all on Sunday.
Kosta Gaitanis