The Twice-Forbidden Fruit: A Quest to Create a Living Tree of Knowledge

apple tree, NPR
Image Credit: Catalin Petolea/

Last semester in my biology lab class, my peers and I read some research on DNA sequencing as a form of information storage, including a paper from the lab of George Church. We compiled information from several sources into a research paper, and I have shared mine below:

DNA’s Potential for Digital Information Storage Solutions

The onset of the Information Age has given rise to a need for more efficient methods for data storage and retrieval, as archiving data has become an increasingly complex task. In light of this problem, solutions such as cloud computing have been proposed as the savior of storage and now constitute a burgeoning market. However, to quote Einstein, “We can’t solve problems by using the same kind of thinking we used when we created them.” The key to our data storage problems may not lie in thinking bigger but in thinking smaller. DNA offers the possibility for storage of large amounts of data in a small amount of space. Additionally, data storage in the form of DNA can withstand the test of time, unlike many currently used data storage methodologies. DNA-based storage has potential as a practical, cost-effective solution to the digital archiving problem.

Although successful on a small scale, a significant limitation to the large-scale practical application of DNA-based information storage lies in the difficulty of synthesizing long strands of DNA de novo. George M. Church and his colleagues at Harvard Medical School were the first to attempt to tackle this problem using next-generation DNA synthesis and sequencing technologies. Rather than work with a single long stretch of DNA, the team opted to use shorter, overlapping fragments which together contain all the necessary information, yet individually are easier to manipulate. To move beyond the limited encoding of uppercase text which served as the basis of previous approaches, the team chose to encode an entire book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, which included 53, 426 words, 11 JPEG images, and one JavaScript program. The text was converted into a 5.27-megabit stream, and the resulting bit sequence was converted to DNA code using a 1-bit per base encoding. Following PCR to amplify the library, the sequence was read using an Illumina HiSeq next generation sequencer. All data blocks were recovered with a total of 10 bit errors out of 5.27 million, most of the errors being located within homopolymer runs (Church et al., 2012).

Researchers at the European Bioinformatics Institute used a similar strategy to create a way to store data in the form of DNA. In their study, Nick Goldman and his colleagues managed to create a code that is resistant to error and can last for at least 10,000 years (Goldman, et al., 2013). As was the case in the Harvard study, this code was created using short strings of DNA, which were broken up into overlapping fragments that ran in both directions (Church et al., 2012; Goldman, et al., 2013). In an effort to improve upon Church’s work, Goldman’s coding scheme did not allow for repeats in order to reduce error in DNA reading and writing. Goldman’s method ensures that no homopolymers are generated, significantly reducing high throughput sequencing errors. Given that a majority of errors associated with the Church method can be ascribed to homopolymers, the Goldman strategy is much less error prone than its predecessor (Church et al., 2012; Goldman, et al., 2013).

To test the efficacy of Goldman’s method, five files were encoded: all 154 of Shakespeare’s sonnets (ASCII), Watson and Crick’s seminal paper, “Molecular Structure of Nucleic Acids” (PDF), a medium resolution color photo (JPEG), an excerpt from Martin Luther King’s 1963 “I Have a Dream” Speech (MP3), and the Huffman code used to convert bytes to base-3 digits (ASCII). From these files, corresponding pieces of DNA were synthesized into base pair sequences. Four out of the five resulting DNA sequences were fully decoded without intervention. The fifth file contained 2 gaps, 25 bases each, but inspection of neighboring regions allowed researchers to hypothesize the missing fragments and manually insert the 50 missing bases, resulting in original files that had been reconstructed with 100% accuracy (Goldman, et al., 2013).

Church and Goldman were not the first to hypothesize about DNA’s powerful potential for information storage. Researchers at the University of Phoenix conducted a study comparing and contrasting the structure and function of computer hard drives and DNA. The study proposed that the same properties necessary for information processing in the hard drives of digital computers also reside in the DNA of eukaryotic cells. David D’Onofrio and Gary An identified four essential properties of information in a centralized storage and processing system: (1) orthogonal uniqueness, (2) low level formatting, (3) high level formatting, and (4) translation from stored to usable form. D’Onofrio and An asserted that both the DNA complex and the computer hard drives contain these components characteristic of centralized information storage and processing systems. While computer hard drives and the DNA of living organisms seem to exhibit functional equivalence, D’Onofrio and An acknowledged that there are places where the analogy breaks down. For example, biological systems do not have an external source for a map of their stored information or for a set of instructions; instead, they must possess an organizational template within their intermolecular structure. For this reason and several others, the authors of this study are weary to think of hard drives and DNA interchangeably. The implication is that attempts to disrupt DNA sequences by manipulating its components will invariably lead to unintended consequences, suggesting that the use of DNA for storage solutions is ill advised (D’Onofrio and An, 2010). While some, such as D’Onofrio and An, may approach the idea of DNA storage solutions with hesitancy, others, like Church and Goldman, champion DNA storage solutions as the beginning of a new digital frontier.


Church, G.M., Gao, Y., Kosuri, S. 2012. Next-Generation Digital Information Storage in DNA. Science. 337(6102): 1628.

D’Onofrio, D.J. and An, Gary. 2010. A Comparative Approach for the Investigation of Biological Information Processing: An Examination of the Structure and Function of Computer Hard Drives and DNA. Theor Biol Med Model. 7(3).

Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E.M., Sipos, B., Birney, E. 2013. Toward Practical High-Capacity Low-Maintenance Storage of Digital Information in Synthesized DNA. Nature. 494 (7435): 77-80.

Now, an artist from George Church’s lab, Joe Davis, plans to use synthetic biology to insert a DNA-encoded version of Wikipedia into a 4,000-year-old strain of apple to create a a living, literal tree of knowledge. He calls his endeavor Project “Malus ecclesia.” (Malus, the genus name for all apples, means both “evil” and “apple tree” in Latin. Ecclesia translates to “church,” an homage to George Church.) The process of inserting this extra information into an apple’s genome is akin to writing in the margins of a book; he will not alter any of the apple’s existing genome–responsible for the apple’s appearance, texture, and taste–but add to it. Furthermore, since the English version of Wikipedia contains two and a half billion words and the space in the bacterial genome is limited to a few thousand words, Davis plans to spread Wikipedia’s information out across many apples and many trees, which will likely compose a large grove. Because the Animal and Plant Health Inspection Service of the U.S. Department of Agriculture has strict regulations concerning the consumption of genetically altered plants, the engineered apple, when complete, will be twice forbidden.

What do you think of this project to create a living tree of knowledge? Are there potential advantages to storing information in this manner as opposed to using a digital platform? [Added December 20, 2014: Perhaps the only pertinent question left is, “Are you, are you coming to the tree?”] Leave your thoughts in the comments section below!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s