Codefest 2010

From Open Bioinformatics Foundation
Jump to: navigation, search

OpenBio Codefest 2010 will take place July 7th and 8th, 2010 in conjunction with BOSC 2010. This is an opportunity for OpenBio developers from projects like BioPerl, BioJava, Biopython, BioRuby, and EMBOSS to work collaboratively on improving Open Source Bioinformatics code.


OpenBio projects are typically coordinated remotely, with users from all over the world contributing and organizing themselves through mailing lists and IRC chats. Additionally, contributors work on these projects in their spare time, coordinating improving the projects with their day jobs and life outside of the computer. The objective of the Codefest is to give these talented developers a chance to be fully focused on the projects for a few days, interacting in real time. Previous Hackathons have been immensely successful at producing new high quality code and innovative project developments.

The general aim of the Codefest is improving the accessibility, functionality and interoperability of the existing libraries. The specific goals are determined based on the interests of attending members and inputs of sponsors. Some current areas of topic discussion are:

Cloud computing

Improving the presence of OpenBio libraries on distributed computing environments like Amazon Elastic Compute Cloud and Eucalyptus. Ntino has written up an excellent project proposal available for download in pdf format.

Initial work has started to develop an automated build environment that incorporates the Cloud BioLinux and bioperl-max efforts. See the blog post for full details. Code and configuration files are available from a GitHub repository. The post outlines several areas of improvements which could be targets for focused work at the Codefest.

Semantic Web

The 3rd DBCLS BioHackathon focused on the Semantic Web technologies in bioinformatics. As a result, in addition to the UniProt, several database providers including DDBJ, PDBj and KEGG have started to generate their data in RDF. These Linked Data can be queried by SPARQL and initial attempts to provide high level library for biological queries were made by BioPython and BioRuby groups. We propose to continue this challenge with all OpenBio projects to make a standard interface (query builder, ontology mapping etc.) for major biological SPARQL endpoints and handling RDF files.

To achieve this goal, we also need to develop an integrated/distributed triple store such as BioGateway. From our experience, to generate and store a large scale RDF triples is still a major issue even with standard triple stores. Additionally, we will try to convert biological queries in natural language to SPARQL with a NLP technology.


  • July 7, 2010 (10am to whenever) -- Countway Library of Medicine at the Harvard Medical School, 10 Shattuck St Boston [1]
    • Getting there: take the E-line green line train (direction Heath St) to the Brigham Circle stop. Cross over to the right side of the street and walk back a bit in the direction from which you came. There is a passageway directly before the Harvard School of Public Health; walk down the steps into the courtyard area behind the Harvard School of Public Health, and continue straight back through the courtyard and up the steps on the other side. Turn left at the top of the stairs and you will see the Countway Library on your left. When you arrive, the security guard should have your name, but if not, tell him you are attending the Hackathon and he will let you through. If you get lost use the PDF map as a guide.
  • July 8, 2010 (10:30am to whenever) -- Massachusetts General Hospital (MGH), Simches Research Building, Room 3130 185 Cambridge Street, Boston, MA
    • Getting there: The three closest MBTA stops are 'Charles/MGH' (Red Line), 'Bowdoin' (Blue line) and 'Government Center' (Green line). All three are located next to Cambridge Street. From the Red Line stop, walk away from the river; walk towards the river from either the Blue or Green line. Simches is located slightly off Cambridge Street -- the closest landmark on Cambridge Street is a big yellow Au Bon Pain. Walk up the stairs next to the Au Bon Pain and you'll be in a parking lot: the Simches building is located straight in front of you across the parking lot, to the left of the Whole Foods and CVS. Walk in the lobby and the security desk should have a badge for you. You can take the elevators up to the 3rd floor. Room 3130 is down the only non-secured corridor and is on the left side just past the cafeteria.


  • Amazon Web Services -- We will be able to use EC2 and other Amazon services thanks to Amazon grant support for a proposal put together by Steffan Moellar. Many thanks to Amazon for supporting the initiative and Steffan for putting together the application.
  • Eucalyptus -- The Universitätsklinikum Schleswig-Holstein is sponsoring a 12 node local Eucalyptus cloud for testing and development. Thanks again to Steffan for making this available.


Space and internet for the Codefest are kindly provided by the Harvard School of Public Health Bioinformatics Core and Massachusetts General Hospital. We are actively seeking sponsors to help supplement the travel, lodging and meal costs for developers. If you're interested in contributing to Open Source development in Bioinformatics and helping to direct the focus on the Codefest, please contact Brad.

ToDo List

Add your goals and plans for the Codefest here. This is a brainstorming section to help us organize ourselves.

Cloud computing

Work for the current community bioinformatics image (framework on GitHub):

  • Perl library support and useful package list
  • Java library organization and expand useful packages
  • Provide packaging for missing programs. See comments at the end of the Package config for some targets.
  • Documentation: especially targeted at new users.
  • Produce an automated manifest for an AMI, listing versions of all installed packages and libraries.
  • Provide standard data like indexed next-gen genomes, blast databases, and so on via EBS snapshots.
  • Automation to build AMI and roll out to Amazon on a bi-weekly/monthly basis based on latest code.
  • Website with documentation, AMI history.
  • Testing on Eucalyptus clouds.
  • Titus Brown's post on using cloud computing to teach a next-gen sequencing course.
  • Richard Holland's post on getting started with Amazon EC2.

Suggested Additions for Cloud computing image






  • UniRef
    • UniRef50 and UniRef90, and if you've got the space, UniRef100, too.

Semantic Web


  • Include SQLite support in BioSQL
  • DBIx::Class integration with BioSQL
  • A bit on Moose and Perl 6

Key signing


Feel free to add yourself if you are interested. We are happy to have you.


After two days of hard work, there will be a celebratory BBQ at Brad's house in Somerville the evening of July 8th. All are welcome for drinks and whatever magic I can whip up on my little charcoal grill.

The easiest way to get there is via cab. From Mass General Hospital, walk up Cambridge Street a few blocks to the Liberty Hotel where there is a cab stand. Ask the cab driver to take you to Medford Street in Somerville, via McGrath Highway. Partridge Avenue is located off Medford Street, on the right a few blocks after Central Street. I'll pass out my cell phone number to everyone during the coding sessions if more directions are needed on route.


We welcome any thoughts from interested participants. Please direct discussion to the OpenBio mailing list: open-bio-l@lists.open-bio.org.

For short-lived coordination tasks during the hackaton, an IRC channel has been setup on FreeNode: #codefest

Please use the hash tag #bosc2010 on twitter to help remote folks follow the discussion.