Flat files to Databases: For better Speed, Integration and Sharing

In an ordinary dictionary, a word can be sought in two different ways:
  1. Use the index and locate your word of choice, or,
  2. Start with the first word and keep going, one by one until you get there.
Obviously, the first way is the smart way. But, when it comes to a real-time organised data, most of us prefer the second way by choosing to read (line by line) and write into the flat files; even when the task is repetitive. Relational Database Management System (RDBMS), such as SQL (can be MySQL, OpenSQL, SQLite, PostgreSQL etc) are well suited for such tasks, yet they are under-implemented by many of the bioinformaticians. 


The use of databases can be intimidating without the formal training of database management, but this overall picture has changed to a great extent with the advent of Object Oriented Mapping (ORM) frameworks. ORMs provide language-specific, object-oriented access to databases. It brings the database handling in the comfort zone of object oriented language of user’s choice. For example, in order to access a sequence in the database, one can execute,
this will issue an SQL command at the back-end which is,
SELECT * FROM protein_sequences WHERE id=’P22725′
Another hectic of database handling is the server setup and maintenance issues. This can be reduced to a great extent by adopting a flexible, server-less and fully embed-able RDBMS, such as SQLite or BerkeleyDB. The rest of the operations of creating, modifying and deleting databases, tables and rows are well taken care by ORMs. The most popular ORMs include SQLObject (Python), DBIx::Class (Perl) and Hybernate (Java), which are open source and easily implementable.

In the modern era, the data is integrated from multiple sources and in complex fashions. This vast amount of information needs to be extracted in a reasonable way and channeled into the manageable and biologically meaningful outcomes in respect to medical applications. The database system offers efficient handling of the data and at the same time it delivers easy access via web applications, making it more suitable for scientific data sharing.