The pace of AI and machine learning has been extraordinary especially over the last few years and especially since it has burst onto the public scene at the end of 2022 with the release of ChatGPT. It seems that nearly every media channel is abuzz with the topic and the rate of academic publication is eye-watering.
It can be difficult to keep up with.
However, the basics of machine learning have actually not changed all that much. It is interesting that presentations and discussions by Ilya Sutskevar, the chief scientist at OpenAI, consistently cite that the ideas behind the GPT products are actually rather simple and not new. They are just done at a bigger scale that has been enabled largely by developments in computing architectures such as the GPU and access to vast data sources like the Internet. This relative lack of innovation in actual artificial intelligence and machine learning architectures has been somewhat cynically written about in the blog article, Machine Learning: The Great Stagnation, which has the primary thesis that the majority of machine learning “research” today is more focused on optimization and performance vs new architectures that lead toward advances in general intelligence.
Everyone is “Seeking SOTA”
While there have been significant developments in AI optimization of both language and image generation, the applications in biology are really just getting started.
This article is intended to break down the high-level architecture concepts of deep neural networks for the purposes of providing intuition as to how they can be applied to challenges in biology. There is a tremendous amount of high frequency noise in the publication sphere of machine learning so I will not attempt to cite all relevant publications and references to back up all statements. Rather, the goal will be to provide a framework for understanding what the jargon, architectures, and use cases are of deep learning approaches to biology.
As one last caveat — machine learning and “AI” have been around for a long time in many forms. This article will focus primarily on “deep learning” — or neural networks that have substantial internal depth.
The Goal of Machine Learning is About Relationships of Information
At the root of machine learning is the idea that everything can be represented as information and the goals of a learning algorithm are to determine the relationships between different pieces of that information. Information in isolation isn’t useful — it is the context that gives it value.
For an aside on this topic, I’d recommend this post by Francois Chollet that discusses the concept of intelligence as being a systems property not an isolated property. Intelligence exists in the interaction between an agent and its environment, not as a stand-alone concept.
From a structural perspective, there are relatively few ways to define the structural relationships between information and these ways have been at the center of most machine learning architectures. I’ll summarize them below along with their machine learning counterparts and then discuss their various applications to biological data sets.
Structure 1: The world is continuous:
Continuity of information is most commonly understood in our experience of the physical and temporal world and is important for relating things that are near each other. The basic gist is that the closer two things are, the more likely they are to be related and, as such, patterns can be found at different scales. This is the basis for how we, for example, see pictures — and how we compress them. We often use this continuous interpretation to fill in data. Our brains, for example, use this continuity property of the world to autocomplete our optical blind spots.
The most common neural network structure used in continuous settings in the Convolutional Neural Network (CNN).
Convolutions can take many forms but the primary one in machine learning is to create various filters that group nearby data together to find patterns. For a deeper dive into this, 3Blue1Brown has a good video.
Structure 2: The world is discontinuous:
The idea that two pieces of information can be related but are not in direct proximity is an important part of machine learning architectures. This is most readily applied to language where words can be related across long distances but is generally applicable in any setting where two pieces of information are related but not proximity.
The most common neural network structure used in discontinuous settings are variations on either Long Short Term Memory architectures (or Recurrent Networks) or more recently the “Attention” mechanism and its implementation in a Transformer architecture.
Attention mechanisms are relatively straight forward — they take any number of pieces of information and create a matrix with every other piece of information where the values represent how much each piece cares about every other. It’s a quadratic relationship which makes it computationally expensive — if you have 3 pieces of information, you get a 3x3 matrix where each piece “attends” (or the degree to which it pays attention) to itself and the other pieces. Attention methods can also apply to images and other types of data where you want to determine if there are discontinuous relationships between distal pieces of data. Attention in machine learning is often referred to as “self-attention” and is not a new concept but has been packaged as a core part of a broader Transformer architecture. The basics of self-attention are probably most clearly described in this video series.
Structure 3: The world is asymmetric continuous:
This is a hybrid of the continuous and discontinuous perspectives on the world and is useful when there is a mix of information that needs to be represented in different ways. In one sense it might be an attention matrix between nodes and edges, but with more information that can be mapped. The asymmetry indicates that nodes and edges can be directional and have differing influences and the continuity indicates that paths are connected throughout the network. Networks often fall into this category where there is a combination of local and distal relationships — such as in a social network.
The most common neural network structure used in an asymmetric continuous setting are Graph Neural Networks (GNNs). Variations on graphs consist of clustering algorithms.
In general, these are the three primary ways of viewing the relationships between information and they give rise to most (if not all) of the basic deep learning architectures used today in a nearly infinite set of combinations and integrations.
As a note: these frameworks are not intended to be exhaustive, but they are generally at the root of most concepts of information relationships. They can also be used for information representation such as using a convolutional framework to reduce dimensionality, but those methods are more about process than structure. For example, many generative algorithms are autoencoders which might use a convolutional network or a transformer to find patterns to encode (e.g., compress) an image or text and then decode (e.g., restore) the same information.
Thinking about Bio
When considering a biological problem or data set for deep learning, the first consideration is the nature of the relationships between the elements of data. A few examples are below:
DNA — DNA is both a linear local and 3D distal information set. What this means is that bases of DNA are relationally important both in local physical proximity and in 3D conformational folding. To build a neural network that accommodates both of these elements would then likely entail the two architectures that represent these two feature types: one that has a Convolutional Neural Network (CNN) component and one that has an Attention component.
Examples: The Enformer architecture has this CNN/Transformer hybrid structure.
It is possible as well to consider that the three architectures noted above are also on a continuum — the concept of a “continuous” vs “discontinuous” world is a matter of degree. In the example of DNA, an extreme version of this would consider that there is no specific local structure in DNA and that every base is a stand-alone feature that has independent relationships with all other bases. This approach does not seem biologically accurate but is computationally possible using an attention model.
Examples: The Nucleotide Transformer is an example of an increasingly discontinuous model of biological relationships of DNA
An approach to deep learning of DNA biology that does not consider the differential between local and distal structural impact ignores biological context and is computationally more expensive. However, if computational resources are not limiting, such an agnostic approach may prove more useful in uncovering novel information.
Lastly, a graph approach to DNA may be possible, but it is unclear whether such an architecture would have advantages.
Proteins — For the most part, proteins have similar considerations as DNA in that both molecules have a linear (local) and 3D (distal) relationship considerations. The shape and function of proteins is generally more biased toward 3D structure vs 1D structure so that distal relationships are likely more heavily weighted leading toward a more “attention” based transformer architecture (which is central to Alphafold’s architecture), though it has also been published that purely convolutional methods can be competitive with transformers for a variety of protein related tasks.
Cellular -omics — In the greater space of -omics within a cell, the approaches will naturally bias more toward discontinuous or asymmetric continuous models resulting in either attention or graph-based models. A general difficulty with defining convolutions over -omic data is that is the concept of a local relationship is less defined.
Examples: scGPT is an example of a transformer architecture applied to transcriptomic data while GEARS is a similar application utilizing a graph neural network approach.
Imaging — Imaging in biology is worth an honorable mention but it is generally no different than non-biological image analysis which generally use versions of convolutional neural networks and/or self-attention models for pattern and feature identification.
Intuition for Machine Learning Use Cases
The domain of machine learning and AI is currently both exciting and mysterious — and when combined with the complexity of biology, perhaps doubly so.
The goal of this article is to provide some intuitive ways of thinking both about what machine learning is actually doing as it relates to different types of biological information, but also what we might expect it to be able to do from a more first principles perspective about the possible relationships between data.
For scientists, executives, and anyone generally interested in what machine learning is about or aiming to understand what can or cannot be done (or what is desirable to do), the frameworks here might be able to provide a starting point for asking some questions related to data itself, and the goals of any learning algorithms.
A Few Questions that May be Useful to Assesses Machine Learning Potential
What are you looking for? — Are you looking to find new patterns? Are you looking to identify key importance features and relationships? Are you looking to create models of dynamic behavior?
What type of data is available and what are the features of the data? Is the data continuous? Discrete?
Does the data cover a sufficient amount of a sample space? See article on Foundation Models.
Asking some starting questions about what you want to accomplish and the type of information available to accomplish it is essential to launching an effective machine learning strategy in biology.
Machine learning architectures are powerful learning tools, largely on account of their size, but their underlying principles are about finding relationships between pieces of information.
Understanding the goals and nature of such information is the first step to gaining intuition about what machine learning might be able to accomplish.
Closing Thoughts
Intuition for the fundamentals of current machine learning architectures aside, there is little doubt that the capabilities that certain large models have demonstrated have surpassed many prior expectations. While it’s important to be grounded in the first principles of information and its interrelationships as the core elements of machine learning potential, we should not limit our expectation about what is possible.
Indeed, the future potential of machine learning in all domains, but most interestingly in biology is fascinating.
Disclaimer: The machine learning world is evolving very quickly and the above is not intended to be comprehensive. It is specifically in such high frequency situations that first principles are essential.
Image Credit: https://darylchow.com/frontiers/wp-content/uploads/2016/03/Data-intuition-better-decisions.jpg