World on a string: DNA to be used for data storage

RIA Novosti / S. Solovjev

RIA Novosti / S. Solovjev

From cave paintings to cloud computing, man has sought out increasingly complex ways to store data. Now researchers have found that nature’s own hard drive — DNA – can be synthesized to back up a world of knowledge in the information age.

On Wednesday researchers from the UK-based European Bioinformatics Institute (EBI) published findings in the journal, Nature, describing how they had stored all 154 of Shakespeare’s sonnets, a digital photo of their lab, a PDF of the 1953 study that described the structure of DNA, and a 26-second sound clip from Martin Luther King Jr.’s “I Have a Dream” speech, in manufactured DNA.

“We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it,” says study co-author Nick Goldman of the EBI.

“It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy,” he continued.

DNA is in fact such an effective means of storage that the researchers estimate that at least 100 million hours of high-definition video would fit in a cup of it.

“A gram of DNA would hold the same information as a bit over a million compact discs,” Goldman said. “Your storage options are: one thing a bit smaller than your little finger, or a million CDs.”

DNA is a long, coiled molecular “ladder” comprising four nucleobases — adenine, cytosine, guanine and thymine — which are usually abbreviated as A, C, G and T. The various sequences of these four nucleobases are what encodes information in all known living organisms.

With the aid of a simple cipher, the scientists converted the ones and zeroes used in computing code into the four-letter alphabet of DNA code.

The EBI team is not the first to encode DNA. In 2012 a Harvard University research team published a paper in Science magazine, outlining their own method of DNA storage.

Goldman says their study stands apart because of the intrinsic measures that offset potential mistranslations.

Currently, technology only allows for the manufacture of DNA in short strings, and reading it is prone to error, as the same DNA letters are repeated over and over. But the researchers settled on a code structure that skillfully avoided both pitfalls.

“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail — and that would be very rare,” co-author Ewan Birney, Associate Director of EMBL-EBI, told Science Daily.

The code was then sent to Agilent, a US biotech company which produces synthetic DNA using a device similar to an inkjet printer.

“We downloaded the files from the web and used them to synthesize hundreds of thousands of pieces of DNA. The result looks like a tiny piece of dust,” Emily Leproust of Aglient said.

Agilent then mailed the sample back to the EBI, where the researchers soaked the DNA in a solution to reconstitute it and used standard sequencing machines to decipher the code. All the files were recovered and read with 100 percent accuracy.

DNA could be useful for keeping huge amounts of information that must be kept for a long time but not retrieved very often, the researchers said. In order to store it, it would simply need to be freeze-dried and stored in a cold, dry and dark place, for anywhere between 600 to 5,000 years.

DNA storage does not come cheap, however. While Agilent Technologies worked pro bono for the sake of science, they say the commercial rate for DNA synthesis runs between $10,000 and $30,000.

But the researchers argued in Nature that “current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.”

In the meantime, national archives and libraries will likely be the first beneficiaries, though it will eventually be possible for consumers to store information they want to have around in perpetuity, like wedding photos or videos for future grandchildren, Goldman said in an email to AP.

The researchers said they had no intention of ever putting the synthetic DNA into a living being, and it could never accidently become part of anything organic because its coding scheme would not be compatible.

“We have absolutely no intention of messing with life,” said Goldman.