Pocket DNA computers could save the world by 2030
We first heard about Catalog, a pioneer in DNA-based data storage in October 2020, and interviewed David Turek, its CEO and IBM alumnus. Almost a year later, they announced a $ 35 million Series B funding round led by Hamwha Impact Partners and plans to launch its first chemicals-based computing platform that combines both data management (and storage) and computation via synthetic DNA manipulation.
So the time had come to catch up with Catalog and put its CEO, Hyunjun Park, in the interviewee’s place.
1. So what’s the latest on Shannon? What has happened since the last time we interviewed Dave Turek (Catalog CTO)?
Over the past year, CATALOG has worked with several leading IT, energy, media and entertainment companies on collaborations to help advance technology for commercialization. Through this work, CATALOG discovered a wide applicability of our platform in all industrial sectors, as well as an almost universal demand for what DNA-based computing promises among heavy data users. The first applications we can talk about right now include digital signal processing, such as seismic processing in the power industry, and database comparisons, such as fraud protection and identity management in the power sector. financial sector.
2. Right now Shannon is a bit like the ENIAC of her generation: bulky, slow, expensive, limited but revolutionary. If we were to move quickly to 2030; what would Shannon v10 look like?
Shannon helped prove that the process of automating and scaling DNA-based storage and now DNA-based computation was achievable. For that purpose alone, it was important to build Shannon. As we move forward a decade, future versions of the technology will be smaller and more portable, faster, and more efficient. It’s certainly conceivable that by 2030 you could see desktop and pocket versions of Shannon available and use very small amounts of power for storage and compute.
3. DNA in computer science is generally associated with data storage. Catalog wants to integrate DNA into algorithms and applications? But how?
By computation with DNA, we mean the transformation of data encoded in DNA into a new type of information. For example, if I have an input file of two large numbers, multiplying them together creates a number that was not previously present in the file – this is new information that represents the product of the two data. We believe that we can create a set of chemical “instructions” that can operate on data encoded by DNA to create new information. Examples include optimization issues (finding the biggest, smallest, best of something in finance, logistics, manufacturing), signal processing issues (applied in areas like seismic processing in the oil and gas industry) and inference and machine learning issues to begin with. The advantage with DNA is that we can perform these operations at extreme levels of parallelism, which means that we can apply billions or billions of computational agents to work collectively to solve the problem at hand. Each of the computational agents (likely made up of a set of molecules) will be relatively weak as a computational engine, but the ability to pull together billions or billions of billions to solve a problem will potentially dramatically reduce the time it takes to understand. .
Another area of interest to us is research. We can use chemical instructions to quickly find data objects encoded in DNA regardless of the volume of data. This means that as the amount of data we are looking for increases, we can use chemical research techniques that will be essentially independent of the volume of data – the resolution time will remain more or less invariant. This is not the case in many electronic research applications today and the reason for the difference is that a DNA memory is a collection of molecules floating in a liquid and independent of the type of physical organization that exists. with electronic media: a tape cartridge must be inspected in series as this is how it is physically organized (A precedes B precedes C and so on). In a DNA file, the molecules are all mixed in a liquid and can be looked for directly. This reduces the time required for analysis and reduces costs.
4. Does your funding news also mention that the DNA-based computation is due in 2022? What does this mean and will it be more widely available?
By next year, CATALOG will demonstrate the value of DNA-based computation through a specific business use case. This will likely show the business value of analyzing data previously stored in a cold store in a particular industry. Our expectation is that as use cases develop, we will enable customers to access our technology through web as a service (sometime in 2024); we are also considering the possibility of building miniature devices capable of performing calculations at the customer’s premises at some point thereafter
5. Right now, a DNA storage sample looks like an orange substance in a test tube. What shape / size will it eventually take?
DNA-based storage is DNA molecules floating around in a liquid (orange in the case of CATALOGS due to the composition of the inks we use to encode DNA) or perhaps a pellet of ‘DNA for long-term storage. It is very useful to have the storage in liquid form because it offers the possibility of directly finding “records” in the file: we can create probes which, once inserted in the file, will directly find the record or the data. targeted.
6. I asked the catalog a question last year and it was “how much is it going to cost?” Do we have an answer now that we can share? What kind of storage density are we looking at and what kind of cost per Po or TB stored?
The first commercialization option for DNA storage, followed by DNA-based computation, will likely be delivered as a service. We will be announcing pricing models a little closer to the availability of this offer. The goal is to be approximately equal to conventional storage, but to express value through dramatic improvements in surface density (a million times denser than electronic media), effectively infinite longevity and avoidance technological obsolescence: the DNA written today will be readable. at any time in the future because the DNA does not change: there are no issues such as firmware, operating system or device upgrades that are of concern.
7. What are the biggest obstacles to the rapid development of DNA storage / computational capacity today and what is being done to resolve them?
Right now, the barriers are technical in nature and focus on issues that customers see as always important when it comes to any compute technology: reliability, value for money, availability, consistency, etc. We have a dedicated team of engineers, chemists, and computer scientists who sort out each of these issues to create the kind of value metrics customers are used to. This includes the miniaturization of the current machine, the extension of automation to cover the entire process, as well as the design and implementation of the software infrastructure and tooling desired by customers.
8. What solutions are currently being considered to resolve the throughput problem (for example, 10MB / s written only represents 26TB per month).
Shannon’s current throughput attributes are intended to help CATALOG better understand the limiting impacts of the design choices we have made on the machine, including the implication of scaling the underlying chemistry to our coding and calculation models. We can adjust the throughput by changing some of the performance parameters on the current system and that would impact a few orders of magnitude. But we started to come up with other design choices that could go way beyond this improvement. For example, adding incremental inkjet printheads has an exponential impact on machine throughput. This is just one example of the many adjustments or design choices available to us.