Martelarenlaan 38 (3th floor)
+32 495 88 66 88
Nucleic acids like DNA and RNA are very long, thread-like polymers, made up of a linear array of monomers called nucleotides. These carry the genetic instructions used in the growth, development, functioning and reproduction of organisms. Understanding the genome of an organism reveals insights with scientific and clinical significance like causes that drive cancer progression, intra-genomic processes influencing evolution, enhancing food quality and quantity from plants and animals. Genomics data is projected to become the largest producer of big data within the decade, eclipsing all other sources of information generation, including astronomical as well as social data. At the same time, genomics is expected to become an integral part of our daily life, providing insight and control over many of the processes taking place within our bodies and in our environment. An exciting prospect is personalised medicine, in which accurate diagnostics can identify patients who can benefit from precisely targeted therapies.
Despite the continual development of tools to process genomic data, current approaches are yet to meet the requirements for large-scale clinical genomics. In this case, patient turnaround time, ease-of-use, robustness and running costs are critical. As the cost of whole-genome sequencing continues to drop, more and more data is churned out creating a staggering computational demand. Therefore, efficient and cost-effective computational solutions are necessary to allow society to benefit from the potential positive impact of genomics. This research provides efficient solutions based on the quantum computing paradigm to the high computational demands in the field of genomics.
The length of genomes varies greatly among organisms, for example, the human genome is approximately 3.289 x 109 bp long. However, owing to this length, it is not possible to obtain the entire sequence in a single readout from the sequencing machines. In order to sequence the organism, multiple copies of the DNA/RNA are broken down into fragments since sequencing machines are not capable of reading the entire genome at once. Then these fragments are sequenced using modern sequencing technologies (such as Illumina), which produces reads of approximately 50-150 base pairs at a time. Then these short strings are stitched back together – a process called sequence reconstruction. Genome sequence reconstruction is primarily done using two techniques: (i) de novo assembly of read, (ii) ab initio alignment of reads on reference.