Self-Learning Computers and the COVID-19 Vaccine You’re Getting
This article explains the mechanics of machine learning and how it is uniquely positioned to aid in drug discovery.
No, the COVID vaccine isn’t a ploy for implantable microchips or a means to turn us into genetically modified organisms. But it does have a less sinister, more fascinating mechanism with artificial intelligence at its core. Before I go into the details of vaccine development, there’s a fundamental question to be answered.
How Do Machines Learn?
This question seems pretty existential at first glance. If computers can understand ideas without being taught, then humans must eventually become unnecessary. That’s as crazy as believing your pet dog is going to take over the world just because you taught it how to roll over after months of training. It’s simply unrealistic.
Although there is some merit to how quickly processing power, algorithms, and memory are improving, most scientists predict that computer intelligence is not the solution to all problems. And of course, the whipped cream on this sundae of suspicion is that it is terribly difficult to model the human brain and achieve the same levels of creativity and understanding. Here’s an article to kickstart your research into the topic if you’re still skeptical.
Now that we’ve got the elephant out of the room, let’s talk about the basic algorithm behind machine learning: gradient descent. It’s used to train machine-learning models and improve their performance. Think of it as a minimization algorithm that finds the point of least error.
A gradient just measures how much the weights change as compared to how much error is produced. There are three basic parts to the figure above: 2 weights as inputs, an output loss, and a 3D depiction of a cost function. The parameters labeled “weights” are tweaked iteratively to arrive at a desired minimum point. The algorithm first assigns random weights to its cost function. This will look like any point on the graph above. Then the function will determine which direction it should step in the input space in order to reduce the function’s output most quickly. Think of a ball starting from this random point and taking the path of least resistance as it rolls down the hill.
In multivariable calculus, taking the negative gradient of a function will give you the direction of steepest descent. A self-learning machine just finds ways to repeat this process over and over, oftentimes with more than just two input weights.
A typical neural network can have around 13,000 weights and biases arranged in a giant column vector. Finding the negative gradient of a cost function will give you a vector that represents the most rapid decrease in error. The algorithm for computing this gradient efficiently is called backpropogation.
The reason there are two weights in the first graph is because a machine learning algorithm usually has weights and biases. Remember your pet dog, Chip, who just learned how to roll over? This is like Chip readjusting his actions based on how positively you react to him and how close his action is to the desired outcome. In computer science, weights are the strengths of connections in a neural network and biases are indications of whether a neuron is active or inactive.
A neural network, modeled from the one that exists in our brains, is a series of interconnected switches that turn on or off. Using the figure below, a machine learning algorithm will begin with random weights and biases, just as in the gradient descent explanation above, and use an activation function to find an output. There are many kinds of activation functions: Binary Step, Linear Activation, ReLU, Sigmoid, TanH, Softmax, and Swish. The activation function can be thought of as a way to decide which information is important to fire to the next neuron.
If you observe the graph above closely, you can tell that the algorithm’s pace of learning decreases over time. As the program’s output gets closer to the desired outcome, the gradient vectors that point to the steepest descent get smaller in magnitude.
Let’s zoom out and recap where we are so far. A neural network is a function with i number of inputs and o number of outputs defined by weighted sums, as depicted below. The cost function is a layer of complexity on top of this that takes all the weights and biases within the hidden layers of the network and spits out an output called error that’s found relative to all the training data. The gradient of this cost function is the thrid layer of complexity. It tells you which of the weights and biases will correspond ot the fastest decrease in the output of the cost function, error. This just tells you which changes to which weights matter the most.
Even with an understanding of how computers can find their own mistakes and optimize their output, it’s important to extrapolate these ideas the world around us.
Why Is AI the Key to Curing Disease?
Deep learning (DL) is capable of automatic feature extraction from raw data. This just means the algorithm can reduce the number of random variables by omitting any available features that don’t differentiate a set of data from other groups. In an algorithm inspired by evolutionary biology and natural selection, a computer generates an optimal binary vector where each bit is associated with a feature. If the bit, like a gene in an organism, is highly correlated to the desired output then it will be set to the number 1. If the opposite is true, it will be set to 0 and the corresponding feature will not participate in the classification. This technique of finding desirable features is performed through a version of gradient descent.
Secondly, with respect to drug discovery, DL models can use their generative ability to create more druggable molecules and improve epitope prediction, lowering the chance of failure in a clinical trial. An isolated epitope, which is a part of an foreign molecule to which an antibody attaches itself to, can simulate a specific immune response in an organism. Effective vaccines often have cocktails of specific epitopes that elicit cellular immune responses. Hypervariable viruses, like the ones previously mentioned, require antibodies with differing polypeptide segments on their ends. Machine learning models are uniquely positioned to iteratively understand these complexities.
Finally, the last defining quality in DL algorithms in the fight against novel viruses is transfer learning. Artificial intelligence is often criticized for its inability to apply its learned knowledge to different tasks after slight changes in parameters. But new advancements are letting algorithms leverage their learned knowledge from previous tasks. The data sets available to train models for in silico drug discovery, just a fancy phrase for a computer simulation, are often small. Moreover, labeled data for drug discovery is hard to find. Current algorithms can use their existing, generalizable knowledge from related tasks to perform new tasks without enormous amounts of additional data.
About the Author:
👋 I’m Tasha — an 18-year-old innovator at The Knowledge Society, with particular interests in artificial intelligence, biology, math, and everything in-between. Check out my other projects here, or contact me via LinkedIn!