| dc.description.abstract |
The rapid increase in global data is expected to reach hundreds of zettabytes soon. It has
exposed the limitations of current storage technologies regarding capacity, durability, and energy efficiency.
DNA, nature’s own data carrier, offers a powerful alternative for data storage. It has an extremely high
storage density up to hundreds of exabytes per gram and can remain stable for thousands of years. This
makes DNA an ideal medium for long-term archival and biomedical data storage. However, using DNA
for practical data storage faces significant challenges. These include high synthesis and sequencing error
rates, as well as the need to maintain balanced GC content for reliable reading. To address these issues,
this work presents a new parallelised implementation of the Reed–Solomon code over the Galois Field,
specially optimised for DNA digital storage. Unlike previous methods, our design combines OpenMP-based
multi-core parallelism using 8 threads with SIMD vectorised block processing, achieving up to 5.4X speedup
while maintaining 100% decoding accuracy for two-symbol errors per block. The implementation extends the
Schifra Reed–Solomon library with custom modifications that handle DNA-specific error patterns, improving
robustness and enabling smooth integration with molecular data workflows. The proposed framework shows
strong potential for large-scale archival storage, biomedical research, and energy-efficient big data systems.
It highlights how computational parallelism can connect the worlds of molecular and digital information.
Future work will focus on scaling to larger code parameters and exploring GPU-based acceleration for real
time applications. |
en_US |