This essay was originally part of the series "The Anatomical Basis of Mind", available on this website as The Anatomical Basis of Mind (Neurophysiology & Consciousness). The purpose of that series was to learn & explain what is known about how neurophysiology results in the phenomena of mind&self. Further, to understand which brain/neuronal structures must be preserved to preserve mind&self. And still further, to be able to use that knowledge to suggest & evaluate cryonics and other preservation methods.
This installment addresses the subject of computer-models of neural networks and the relevance of those models to the functioning brain. The computer field of Artificial Intelligence is a vast bottomless pit which would lead this series too far from biological reality -- and too far into speculation -- to be included. Neural network theory will be the singular exception because the model is so persuasive and so important that it cannot be ignored.
Neurobiology provides a great deal of information about the physiology
of individual neurons as well as about the function of nuclei and other
gross neuroanatomical structures. But understanding the behavior of
networks of neurons is exceedingly challenging for neurophysiology,
given current methods. Nonetheless, network behavior is
important, especially in light of evidence for so-called "emergent
properties", ie, properties of networks that are not obvious from an
understanding of neuron physiology. Although neural networks as they are
implemented on computers were inspired by the function of biological neurons,
many of the designs have become far removed from biological reality. Moreover,
many of the designers have lost all interest in simulating neurophysiology --
they are more interested in using their new tools to solve problems. The
theory of computation of artificial neural networks can be highly mathematical,
with some networks existing entirely as mathematical models. My exposition
will attempt to minimize mathematics, using only verbal descriptions and
some simple arithmetic.
The building-block of computer-model neural networks is a processing unit called a neurode, which captures many essential features of biological neurons.
In the diagram, three neurodes are shown, which can perform the logical operation "AND", ie, the output neurode will fire only if the two input neurodes are both firing. The output neurode has a "threshold" value (T) of 3/2 (ie, 1.5). If neither or only one input neurode is firing, the total input to the output neurode will be less than 1.5, and the output neurode will not fire. However, if both input neurodes are firing, the total input of 1+1=2 will be greater than the threshold value of 1.5, and the output neurode will fire. Similarly, an "OR" operation can be implemented using the same architecture, but changing the threshold value to 0.5. In this case, the output neurode fires only if it receives input from either or both neurodes.
The values in parenthesis (1) on the connections between the neurodes are weights of the connections, corresponding to the synaptic strength of neuron connections. In biological neural networks the firing of a neuron can result in varying amounts of neurotransmitter released at the synapses of that neuron. Imagine, for example, a neuron with 3 axons leading to 3 presynaptic terminals. One terminal releases neurotransmitter from 20 vesicles, another from 100 vesicles and the third from 900 vesicles. The synaptic strength (the weight) of the second terminal is 5 times as great as the first, everything else being equal. In the neurodes of computer models, weights tend to be values between -1 and +1. Notice that in the examples shown, the weights could have been (0.8) rather than (1) and the results would be the same.
Now consider a more complex network, one designed to do the logical operation "EXCLUSIVE-OR" (XOR). The threshold values are shown inside the neurode circles and the weights are shown alongside the connections. Note the addition of a neurode (the hidden neurode) between the input and output neurodes.
In an XOR operation, the output neurode only fires if one (but not both) of the input neurodes fire. In this case, the hidden neurode will not fire if only one input neurode fires. This will cause the output neurode to fire, since +1 is greater than the 0.5 threshold. But if both input neurodes fire, the result is a total input of 1+1-2=0 to the output neurode. Since 0 is less than the 0.5 threshold of the output neurode, the output neurode will not fire.
The solution shown is not the only possible solution to the XOR problem in a simple neurode network. There are, in fact, infinitely many possible solutions. Two more example solutions are shown. Negative connection weights represent inhibitory rather than excitatory weights (synapses). Note that threshold values can also be less than zero.
In these examples the
relationships between the
thresholds, weights, inputs
and outputs can be analyzed in
detail. But in neural networks
(both computer and biological)
with large numbers of inputs,
outputs and hidden neurodes
(neurons), the task of
determining weights and
threshold values required
to achieve desired outputs from given inputs
becomes practically impossible.
Computer models therefore attempt to train networks to
adjust their weights to give desired outputs from given inputs. If biological
memory and learning are the result of synapse strengths -- and modifications
of synapse strengths -- then the computer models can be very instructive.
Computer neural network models are described in terms of their
architecture (patterns of connection) and in terms of the way they are
trained (rules for modifying weights). I will therefore classify my
descriptions into four categories: (1) Perceptrons & Backpropagation,
(2) Competitive Learning, (3) Attractor Networks and (4) Other Neural
Network Models.
The architecture of a Perceptron consists of a single input layer of many neurodes, and a single output layer of many neurodes. The simple "networks" illustrated at the beginning, to produce logical "AND" and "OR" operations have a Perceptron architecture. But to be called a Perceptron, the network must also implement the Perceptron learning rule for weight adjustment. This learning rule compares the actual network output to the desired network output to determine the new weights. For example, if the network illustrated gives a "0 1 0" output when "0 1 1" is the desired output for some input, all of the weights leading to the third neurode would be adjusted by some factor.
The Adaline is a modification of the Perceptron, which substitutes bipolar (-1/+1) for binary (0/1) inputs, and adds "bias". But the most important modification is the use of a delta learning rule. As with the Perceptron, the delta rule compares desired output to actual output to compute weight adjustment. But the delta rule squares the errors and averages them to avoid negative errors cancelling-out positive ones. Adalines have been used to eliminate echoes in phone lines for nearly 30 years.
Neural network research went through
many years of stagnation after Marvin
Minsky and his colleague showed that
Perceptrons could not solve problems
such as the EXCLUSIVE-OR problem.
Several modifications of the
Perceptron model, however,
produced the Backpropagation
model -- a model which can solve
XOR and many more difficult
problems. Backpropagation has
proven to be so powerful that
it currently accounts for 80%
of all neural network applications.
In Backprop, a third neurode layer is added
(the hidden layer) and the discrete thresholding function is replaced
with a continuous (sigmoid) one. But the most important modification for
Backprop is the generalized delta rule, which allows for adjustment
of weights leading to the hidden layer neurodes in addition to the
usual adjustments to the weights leading to the output layer neurodes.
Using the generalize delta rule to adjust the weights leading to the
hidden units is backpropagating the error-adjustment.
The prototypic competitive learning ("self-organizing") model is the Kohonen network (named after the Finnish researcher who pioneered the research). A Kohonen network is a two-layered network, much like the Perceptron. But the output layer for a two-neurode input layer can be represented as a two-dimensional grid, also known as the "competitive layer". The input values are continuous, typically normalized to any value between -1 and +1. Training of the Kohonen network does not involve comparing the actual output with a desired output. Instead, the input vector is compared with the weight vectors leading to the competitive layer. The neurode with a weight vector most closely matching the input vector is called the winning neurode.
For example, if the input vector is (0.35, 0.8), the winning neurode might have weight vector (0.4, 0.78). The learning rule would adjust the weight vector to make it even closer to the input vector. Only the winning neurode produces output, and only the winning neurode gets its weights adjusted. In more sophisticated models, only the weights of the winning neurode and its immediate neighbors are updated.
After training, a limited number of input vectors will map to
activation of distinct output neurodes. Because the weights are modified
in response to the inputs, rather than in response to desired outputs,
competitive learning is called unsupervised learning, to distinguish
it from the supervised learning of Perceptrons, Adalines and
Backpropagation. In supervised learning, comparison is made between
actual outputs and desired outputs supplied by an external supervisor.
There is no external supervisor in competitive learning.
The most notable attractor networks are the Hopfield Network, the Boltzman Machine and the Bidirectional Associative Memory (BAM). The Hopfield Network can be represented in a number of ways, all of which are somewhat equivalent.
The diagram on the left indicates that every neurode has a connection with
every other neurode in two directions, but it omits the detail that each
neurode is also an input neurode and an output neurode, as is shown in
the middle diagram. The diagram on the right is called a Crossbar
Network representation of a Hopfield Network, and it is a convenient
tool when analyzing connection weights as a matrix of numbers.
The Hopfield Network is presented with an input vector, and the input vector remains active as the neurodes update their weights one-by-one in sequence (usually more than once for each neurode) until the output is constant. Weights are updated on the basis of the difference between input and output for each individual neurode. This process of arriving at the output is called relaxation or annealing, and can be expressed as an energy equation -- which is exactly what was done by physicist John Hopfield who conceived of this network.
The lower energy states are the "attractors" of the network. The settling of the network into its lowest energy state can be compared to a ball rolling to the bottom of a hill. If the hill has a hump, however, the ball may not fall to its lowest energy state, but be caught in a local minimum. The Boltzman Machine is a modified Hopfield Network that adds a "Boltzman temperature term" ("noise") to jostle the ball out of the local minimum.
The Hopfield Network is an associative memory because it can "recognize" patterns. For example, a fully trained network might give the three outputs (1,1,1,-1,-1,-1), (1,1,-1,-1,1,1) or (-1,1,-1,1,-1,1). If given the input (1,1,1,1,-1,-1) it would most likely give as output (1,1,1,-1,-1,-1) -- the first output -- since that is the pattern closest to the one that the network recognizes. In practice, to avoid errors, a Hopfield Network should not be expected to recognize a number of patterns that is more than 15% of the number of neurodes. That is, a 100 neurode network should not be expected to recognize more than 15 patterns.
Bidirectional Associative Memories consist of two layers of neurodes, each fully connected. For an autoassociative memory, the two layers will have the same number of neurodes and will output patterns similar to the input. For a heteroassociative memory, the two layers can have a different number of neurodes, as would be the case in mapping between ASCII codes and alphabetic letters.
Counterpropagation Networks are three-layered networks in which the hidden layer is a Kohonen layer. This model eliminates the need for backpropagation, thereby reducing training time, but performance is worse than with backpropagation.
Recurrent networks take some of the outputs and feed them back to the inputs or to hidden layer neurodes. (Hopfield Networks are totally recurrent.) Adaptive Resonance Theory (ART) networks attempt to simulate biological reality by the use of time-varying inputs rather than simultaneous inputs. Weights may be allowed to decay with time when they are not being continuously updated.
There are other models, but the ones already mentioned are the most prominent in the current field of neural network application and research.
The learning and memory properties of neural networks resemble the properties of human learning and memory. Associative memory is so-called content-addressable memory. For example, to remember the bird that reputedly puts its head in the sand, the description may be adequate to retrieve the name "ostrich" and a visual image of the bird -- comparable to the associative memory retrieval of the Hopfield Network.
Similarly, associative memory can allow one to decipher the word "MAKE" when some of the letters are partly obscured.
Neural networks also have a capacity to generalize from particulars. They can recognize handwritten letters, despite a wide variability in form that is anathema to algorithm-bound von Neuman computers. And neural networks learn by being presented with examples, rather than by being given algorithms. Implicitly, neural networks create their own algorithms.
Neurophysiologists spent many years searching for the "engram", ie, the precise location in the brain for specific memories. The engram proved to be elusive. The idea that memories are stored in a distributed fashion -- as synaptic strengths (weights) in a neural network -- now seems very compelling. Neural networks embody the integration of "software" and "hardware". Biological and artificial neural networks demonstrate the property of "graceful degradation", ie, destruction of individual neurons or of small groups of neurons reduces performance, but does not have the devastating effect that destroying the contents of a computer memory location would have.
This is not to say that localization does not exist in the brain. Neurons in the superior temporal sulcus of the cerebral cortex, for example, respond selectively to faces. But there is no "grandmother cell", ie, no cell that responds specifically to the face of someone's grandmother. Instead, each neuron has a different response pattern to a set of faces. Ensembles of neurons encode the response to identify a particular face. And an overlapping ensemble may identify another face.
A very real difficulty of correlating artificial neural networks with biological ones lies in the way weights are modified in the former and synaptic strengths are modified in the latter. Weights are altered mathematically in a computer network, based on differences in values. Synaptic strengths, on the other hand, are modified in response to synaptic activity. The backpropagation model, in particular, is held to be biologically unrealistic insofar as it would require a supervisor and a violation of the unidirectional flow of information seen in axons. Some researchers have postulated parallel, backward-directed axons to return error information, but the modification of synaptic strength by these axons is still very hypothetical.
Many researchers feel that competitive (unsupervised) learning is a more persuasive model for brain neural networks than any supervised learning model. The kind of learning that occurs in the visual cortex of the eye shortly after birth seems to correlate very well with the pattern discrimination that emerges from Kohonen Networks. Nonetheless, the mechanisms of synaptic strength modification remains a sticking point.
The CA3 region of the hippocampus receives inputs from diverse regions of the association cortex via the entorhinal cortex and the dentate gyrus. Of the roughly 16,000 synapses seen on a typical CA3 neuron, approximately 12,000 of those synapses will be inputs from other CA3 neurons. This suggests that the CA3 cells are a recurrent collateral system -- specifically an autoassociation matrix (Hopfield Network). It has been hypothesized that the CA3 neurons autoassociate the impressions from diverse sensations of an event into a single episodic memory. Ensembles of CA3 neurons associated with the episode would further congeal the memory by provoking competitive learning in the CA1 neurons, with the winning CA1 neurons returning impressions of the crystallized episode for storage in the cerebral cortex. Although circumstantial evidence lends support to this theory, it is still very much a theory.
The idea that memory & identity are distributed & redundantly stored, rather than localized & unique has positive implications for cryonics. It implies that precise reconstruction of the 100 trillion synapses of the brain need not be necessary to restore memory & identity.
Neural networks are "black boxes" of memory. By this I mean that a researcher may know the precise values of inputs, the precise values of outputs and the precise values of the connections weights without understanding the relationships -- because such understanding is awesomely difficult with complex networks. Researchers do not program neural networks by assigning weights -- they train the networks to give desired output for given input, and then (perhaps) record the weights. The implication of this approach is that near-term reconstruction of the human mind may take place by deducing and reconstructing synaptic strengths, without any understanding of the direct relationship between those weights and specific memories. For persons concerned about their "mental privacy", this might be reassuring, but for persons hoping for a reconstruction of the brain based on written memoirs, it is not reassuring. On the other hand, far-future reconstructions may be possible by assigning synaptic strengths based on written memoirs. In that case, complete destruction of the original synapses may prove not to be an ultimate disaster.
(For more on neural networks, see my essay Artificial Intelligence and the Preservation of Mind. For more on the subject of the locus of consciousness in the brain and the relevance to preservation of consciousness through cryonics see my essay Neurophysiology and Mental Function.)