Microsoft and University of Washington show DNA can store data in practical way

dna identification test double helix

The technology industry is always looking for new ways to extend performance and functionality. Sometimes that means pushing contemporary technologies to their max, and other times it’s about accessing materials, concepts, and processes from other areas. One great example of the latter is the use of DNA to store information for archival purposes, where it’s not pure speed but rather longevity that’s most important.

Clearly, DNA can carry data for extremely long periods of time. We have information stored in our DNA that’s millions of years old, for example. DNA can also store massive amounts of information in a very small amount of material. Those two factors make it a great option for archiving information that might not be accessed for decades or even centuries.
Some recent research conducted by Microsoft and the University of Washington has made strides in both saving information in DNA and accessing it later.

Generally speaking, storing data in DNA is done by breaking it up into pieces and then putting it back together in a way that encodes information for accessing it later. Getting the data back out is done by sequencing the DNA, identifying where specific information is being stored, and then decoding the information. The researchers’ contribution is to inject random access in the process, creating a sort of file system.

The value of adding random access is simple: The more data there is, the more difficult it is to convert the four data points stored in DNA — adenine, thymine, cytosine, and guanine — to and from the zeros and ones that computers can understand. By applying a new algorithm to interpret some data — called “primers” — that’s added to the stored information, the process of decoding the data can be sped up considerably. The process is complex, but the results are simple to understand.

As Microsoft Senior Researcher Sergey Yekhanin says: “Our work reduces the effort, both in sequencing capacity and in processing, to completely recover information stored in DNA. For the latter, we have devised new algorithms that are more tolerant to errors in writing and reading DNA sequences to minimize the effort in recovering this information.”

So far, the team has managed to retrieve 35 files amounting to a record 200 megabytes of information including video, audio, images, and text. That breaks the previous record of 22 megabytes accomplished by a collaboration between the Harvard Medical School and Germany’s Technicolor Research & Innovation.

The research takes us closer to a time when DNA might be the preferred storage tool for things like medical records, but that leap is still in the future. For now, it’s merely an indication that DNA storage is not only a durable and dense option, but it can be practical as well.