Thursday, March 27, 2008

Wednesday, March 19, 2008

The Biosciences in Google's Summer of Code

The Google Summer of Code project participants have been selected. I scanned the list to see how projects specifically aimed at the biosciences and bioinformatics fared:

  • GenMAPP (Gene Map Annotator and Pathway Profiler), a tool for visualizing gene expression data on top of graphical representations of biological pathways.
  • The NESCent (National Evolutionary Synthesis Centre) Phyloinformatics project, has range of potential projects to do with phylogenetic analysis, covering things like phyloXML integration with BioPerl and BioRuby, phyloinformatics web services and tree analysis using the MapReduce algorithm (with Hadoop).
  • OMII-UK, which covers a range of tools including the Taverna Workbench for workflow design and execution.
  • Also participating is OpenMRS, a medical record system aimed at developing countries.
There are also at least two platforms for cluster, parallel or grid computing on the list; I spotted the Globus Toolkit and OAR, but there are probably a few more in that that broad category (eg, OMII-UK oversees a bunch of Grid related projects too).

It's worth noting that I've ignored a bunch of really important pieces of software that are less field-specific, but are actually lower level components of the platforms critical for most large bioinformatics projects. Things like Python, Perl, R, various Open Source databases, and collaboration tools like wikis (MoinMoin) and CMSs (eg Drupal) are also participating.

I don't think coding for bioinformatics applications is as attractive to students as working on some of the other "sexier" projects available (eg the SecondLife client, or the Apache Webserver), but kudos to Google for letting a few bioinformatics tools into the fray. Hopefully the students who hack on them learn something, and hone their coding skills (you never know, they may even help improve these tools too :) ).

Monday, March 03, 2008


I see Lars Juhl Jensen has come up with a fun tag cloud of recently popular buzzwords in the biosciences. He calls it a BuzzCloud. The buzzword from the cloud I've noticed most lately is "Quantative Proteomics" ... quantitation is a good goal for the field of proteomics to aim for, since IMHO it doesn't really deserve the -omics prefix. "Omics" tends to imply the possibility of global proteome coverage, which proteomic studies rarely, if ever, achieve. But enough of the side-rants.

The way Lars' BuzzCloud is constructed by extracting phrases ending in -ics, -ology, -omy, -phy, -chemistry, -medicine, or -sciences etc reminded me of a stupid little CGI application I wrote a few years back ... the Biotech company name generator. When you take common prefixes like "Gene-", "Pept-" or "Chemi-" and suffixes like "-omics" or "-agen" etc, it's amazing how often Googling the name turns up a real honest-to-goodness biotech company.

Feel free to comment on any "biotechie" suffixes and prefixes that I should add ... the hardcoded list in the script isn't that long.