Humanity is great at creating things, but there’s one thing that our species creates more of than almost anything else: information.
Way back in 2013, a study concluded that 90 percent of all the world’s data had been generated in the previous two years, and yet that quantity still seems small compared to recent years. 2017 saw 26 zettabytes (one zettabyte = a billion terabytes) of data created, which is more than everything created in the years 2010-2013 combined.
According to a report published in 2019, each day we share 95 million photos and videos on Instagram, post 500 million tweets on Twitter, and send 294 billion emails. While the internet may seem ethereal, all of that data has to be stored physically, on hard drives and servers around the world. The trouble is, those traditional mediums of data storage probably can’t keep up with the expected flood of data over the coming decade.
What’s the solution? The hard drive of the future could actually be something very old, something that is inside every person reading this: DNA.
Deoxyribonucleic acid, or DNA, is the molecule that dictates how an organism develops. A DNA molecule contains four nitrogen bases — adenine (A), thymine (T), guanine (G), and cytosine (C) — and the sequence of these bases form instructions for how cells should develop, influencing things like hair and eye color, height, and so on. DNA is essentially the instruction manual for building a body.
DNA can also hold a staggering amount of information: 215 petabytes (1 petabyte is about 100 million gigabytes) of data on a single gram. Just as impressive is its longevity. Traditional mediums like magnetic tape and flash memory tend to degrade, whether through repeated use or simply time. DNA degrades, too, but at a significantly slower rate: Depending on the storage conditions, it can last thousands, or even tens of thousands, of years.
It’s no surprise, then, that researchers see nature’s storage system as a vessel for the world’s relentless stream of information.
“It’s almost coming full circle,” says Hyunjun Park, CEO of Catalog, a company building a platform for DNA-based storage. “We’re going back to nature to get the inspiration to develop this medium.”
Catalog is one of the companies on the bleeding edge of this technology, building a DNA-based storage platform that can accommodate the increasingly large files of the 5G, high definition era.
The idea of storing data on DNA was proposed back in the ‘60s by Soviet scientist Mikhail Neiman. In the decades since, researchers have made great strides in actually doing it, however there have been significant obstacles.
“The bottleneck that’s been keeping this technology from going mainstream,” Park explains, “was the fact that it’s really expensive and slow to store a lot of information.”
According to a study published in 2018, the most cost-effective DNA storage technique at the time cost about $3,500 per MB to write the data and $1,000 per MB to read it, so don’t retire your solid state drive just yet.
Catalog aims to bring down the cost of DNA storage by creating what they compare to a printing press, the revolutionary device which used interchangeable blocks of letters, coated in ink, to quickly print pages.
“The way it was done before,” Park explains, is that the bases of DNA –ATCG– could be used to “represent any long string of 1s and 0s, because that’s the data that you’re trying to write. But the problem with that approach is each base pair that you’re adding on has a cost and is time consuming.”
In Catalog’s printing press method, the wood blocks are “blocks of DNA molecules that we pre-synthesized, but in large quantities. In the DNA world,” he explains, “if you’re trying to synthesize large quantities of just a few different molecules — say, on the order of 100 — that’s really cheap and easy to do.
“But if you’re trying to synthesize very small quantities of a million different molecules,” he continues, “that’s really expensive and slow. We’re taking these larger blocks that we’ve made in large quantities and we’re using the the printer that we developed to arrange them in different combinations and attach them together so that we get this huge variety of different molecules that we can then ascribe different information to.”
While DNA’s storage capabilities are intriguing, Park is also excited about its potential for computing. For years, computers roughly followed the path laid out by Moore’s Law, which stated that every two years or so we could double the number of transistors that fit on a computer chip. However, computer chips have become so small these days that it’s increasingly unlikely we can continue to squeeze more transistors in there. Essentially, Moore’s Law is dead, or at least in a hospice.
Humanity’s need for ever greater computers is lively, however, and so researchers are racing to develop new breeds of computer (quantum computers, for example). A DNA-based computer is one possibility.
“We think once you have data in DNA, we can use enzymes and other DNA molecules to compute on that data,” Park says, “and that’s a highly efficient, extremely parallel way to compute on that data. It’s not going to be for all day-to-day applications or all computational problems, but for a set of problems that become increasingly more important to society, we think DNA will be a great way to go about it.”
Park says that DNA computers would be well-suited for problems where you have a vast amount of data, but the computations you need to do are not too complex. As an example, he imagines a scenario where someone needs to comb through exabytes of census data.
“You want to be able to quickly search through all that simultaneously and come up with the names of people that meet a certain set of criteria like a certain age range or income range or geographical region,” he says. “In order to do that in a traditional computer, to be able to go through all the exabytes that you’ve gathered for decades, you’d have to read back the magnetic tape that’s been sitting in cold storage … then compute on it in blocks that fit into the memory, and then in blocks that fit into the processing unit, and do that in a serial manner. If you have it in DNA, the volume would be really small because of the information density of DNA, and so you would be dropping in a few probes that bind to the characteristic that you’re looking for.”
So when should you prepare to toss out your current gear and replace it with bio-organic computer parts? Probably not anytime soon.
“I think that for the foreseeable future,” Park says, “the writing process where you’re converting digital data into DNA is happening at specialized facilities.” DNA data facilities will house the DNA-based data, which people can access like they would a traditional server, although he suggests that people could get copies of their data in test tubes.
For now, DNA-based storage and computing are not likely to be a noticeable part of everyday life, but something that could have a huge impact on the big picture view of humanity.