Developments in the technologies used in genome sequencing have continuously been improving, and costs have been dropping. Genome sequencing has therefore been much easier to perform.
Such advancements in data gathering, storage and processing in relation to genomics has become significant to the scientific world.
Researchers from the University of Illinois and the Cold Spring Harbor Laboratory conducted a study on data science and found that data gathered from DNA sequencing requires much more computational and storage capabilities than previously anticipated.
Findings of the study were published in the online journal PLOS Biology.
"The only way to handle this data deluge will be to improve the computing infrastructure for genomics," stated Gene Robinson, entomology professor and the University of Illinois's director of the Carl R. Woese Institute for Genomic Biology.
In the comparative study, the researchers looked at the data needs of genomics and analyzed them against three dominants of the Big Data world: astronomy, Twitter and YouTube. They predicted that, in 2025, genomics will have greater acquisition, storage and distribution than the three.
Astronomy requires complex computation and analysis. It gathers and generates a huge amount of data, but many of its processing technologies are used at the time of data gathering. It later asks for a bit less time and computational power.
In genomics, a whole genome can provide insight not anticipated even after data gathering. New ideas could spring from the additional DNA sequencing of a greater number of people. The researchers say genomics could require more time and computational power, even if it integrates a processing system similar to the methods of astronomy.
Twitter and YouTube, on the other hand, require high distribution processes. Data distributed in Twitter and YouTube are obtained from many users or sources. The data as a whole, however, is not as complex since they follow standard and specific formats. In genomics, data can come from many different sources in many different formats. Storage and distribution are thus more complex.
Sequence data have to be analyzed through sophisticated and computationally intensive algorithms against a myriad of biological data before arriving at an important clinical finding.
Robinson further highlighted the importance of genomics in leading to the development of some of the "most severe computational changes that we have ever experienced."
Scientists at the Cold Springs Harbor Laboratory named genomics the "biggest beast in the Big Data forest."