Engineering Life: How CLASSIC Is Redefining Genetic Circuit Design
How the CLASSIC model characterizes genetic circuits and its application in biotech
The field of synthetic biology involves redesigning and engineering organisms to have new functions, allowing them to perform useful tasks for our body. One of the key ways that scientists have been able to redesign organisms such as cells is by first creating genetic circuits. Genetic circuits are networks of genes and regulatory elements that can control gene expression. They are designed to mimic the behavior of natural genetic regulatory systems, often using specific DNA sequences, such as promoters, operators, and transcription factors.
With large-scale gene expression data collected from millions of cells, it is possible to learn the relationship between genetic circuits and the traits of the cells they control. With genetic circuits proving to be effective tools in the field of synthetic biology, a main question in the field of synthetic biology is how can we systematically design genetic circuits for new tasks?
Many modeling frameworks and software tools have been developed for biological circuit design, but the sheer complexity of biology has limited their effectiveness. For now, these tools are still nascent. Thus, instead of pushing the implementation of new tools, it seems necessary to now take a step back and focus on better understanding our body.
We need to first obtain better quality data on the types of genetic circuits in order to later effectively model complex cellular processes. After all, a lot of meticulous work has gone into developing a framework for circuit design, but these biologists building the most complex systems are still relying on empirical intuition.
Thus, the main question for the field should shift towards: how do we collect the data necessary to accurately model and design new genetic circuits? In order to do this, scientists have begun to examine the parts of a genetic circuit to understand its meaning.
A group of scientists at the department of Bioengineering at Rice University have developed a platform called CLASSIC. CLASSIC attempts to characterize a large number of genetic circuits by combining multiple types of sequencing technologies. In characterizing numerous genetic circuits, the scientists can create mappings that reveal rules for how different genetic parts can be combined and quickly identify genes that would perform the desired functions or exhibit the desired behaviors most effectively.
So how does CLASSIC actually work? CLASSIC uses both long and short read next-generation sequencing (NGS) techniques to accurately analyze mixed libraries of DNA constructs. These libraries serve as the input for the model, and are created by combining genetic parts and a pool of barcode sequences. This library is then stitched together using a Golden Gate assembly protocol, and then later sequenced using a long-read nanopore sequencing technology. Sequencing refers to the process of determining the precise order of nucleotides or bases in a DNA or RNA molecule. This complex process ultimately establishes a mapping or index between each construct (genetic circuit) and its corresponding barcode, hence the name: construct-to-barcode index.
Next, these constructs are introduced into mammalian cells, which are just a type of cells found in the tissues of mammals. These cells contain the necessary machinery to process and express genes. Once the constructs are inside the cells, they can undergo a process called gene expression. This refers to the conversion of genetic information stored in the genes into functional products, typically proteins.
The phenotype or observable characteristics of the circuits is then measured using an approach called flow-seq. This process works by first measuring the range of observable characteristics or traits exhibited, creating bins of cells within that range using flow cytometry. Each bin then gets sequences using short-read illumina sequencing, which reads out the barcode and creates a phenotype-to-barcode index.
After this entire rather complex process, a given barcode is now linked to two pieces of information: The composition of a genetic circuit and the phenotype of the genetic circuit. Linking these two pieces of information now forms a composition-to-function relationship as we now know how the composition of a genetic circuit affects its observable traits, allowing us to determine its function.
Through this entire process, we have fully unpacked the acronym of CLASSIC: Combining Long And Short range Sequencing to Investigate genetic Complexity.
Putting this system into practice, the team optimized a system called synTF which is a circuit that is influenced by the presence or absence of a particular drug, affecting the activity of the transcription factor. A transcription factor is a class of proteins that regulates how fast the transcription process takes or the “factor” at which it performs. Using CLASSIC, they examined and analyzed a circuit design space of 165,888 possible distinct configurations. The circuit design space refers to a set of potential configurations or arrangements of a genetic circuit.
Their pooling screening approach was able to assign barcodes to 95.3% of the dataset, showcasing the incredible accuracy of their model. The scale of the circuit data that can be collected with a system like CLASSIC greatly speeds up and expands the field of synthetic biology, providing a foundation for designing complex genetic systems based on data-driven approaches.
In the labs concluding paragraph of their research paper, they mentioned an incredible application of CLASSIC.
They stated that using the immense data acquired with CLASSIC could allow them to train high capacity deep-learning algorithms to predict or even create models for extremely complex genetic circuits.2 With OpenAI’s language model, ChatGPT, showing us that AI algorithms become more accurate and efficient with the more data that they have, CLASSIC’s ability to obtain huge datasets moves us in the right direction to eventually have AI become an integral part of the synthetic biology space.
If we are eventually able to do this, it will allow us to design new medicines and other biological systems extremely quickly, propelling our society into a more digital and ultimately exciting future.
Sources:
https://www.biorxiv.org/content/10.1101/2023.03.16.532704v1
https://centuryofbio.com/p/accelerating-genetic-design