Memory in NeurOS
Memory is a topic with a great deal of psychological research and attempts at useful categorization, but with little clear understanding of brain mechanisms. What does seem clear is that different forms of memory are “implemented” largely by different brain regions and mechanisms. Not surprisingly, the same is true of NeurOS. The chart below shows how various forms of memory are modeled and implemented using NeurOS facilities.
|Memory Type||Description/Examples||NeurOS model/implementation|
|Sensory||retinal after-images, sensor fatigue (sub-second)||Input modules and the physical systems they encapsulate|
|Working/Short-Term||sustaining current activations during mental activity (seconds to minutes)||
|Long Term||everything we “know” or “have learned” (hours to lifetime)||
|– Explicit/Declarative||“knowing what”: knowledge|
|– Episodic||experiences and events, usually in serial order||Sequence and Temporal modules|
|– Semantic||facts, concepts, relationships||Set module|
|– Implicit/Procedural||“knowing how”: skills||Sequences/sets of precondition patterns, actions and postconditions|
NeurOS imposes no particular macro-architectural structure on memory organization. It is up to cognitive application developers to make such system organizational choices, for example, in modeling a specific biological brain or designing a specific cognitive system.
Sensory memory is handled by respective input modules (and any physical systems they encapsulate) emitting appropriate event streams. Some processing modules, such as the Group Operations module, do this as part of their function, such as moving/time-windowed averages of activations.
The internal states of Transformer and other processing modules can be used as a form of very short-term memory, for example, computing cumulative functions over series of input events. The Working Memory module provides classical “blackboard” style short-term memory. It maintains an internal state of its recent input events, updated as new events arrive, and generates periodic repeated output events according to a parameterized decay profile, reflecting items “currently in mind”.
A working memory module is often used to persist momentary or short-lived events so they can participate in pattern recognitions over a longer time aperture. A typical assembly might start with a high-volatility momentary input source (e.g., a retina or cochlea analog), some filtering and pre-processing, feature recognition, and then a Working Memory module to allow detected features to accumulate (e.g., from eye saccades or image scanning) and persist long enough to be matched to long-term memory patterns.
The Working Memory module is also useful when multiple separate signal paths (e.g., different sensory domains) need to rendezvous with some temporal uncertainty, and in support of creative “thinking” processes that commingle new combinations of events. Feedback loops in neural graphs can function as resonance reservoirs, with feedback events continually restimulating original activations, effectively keeping “balls in the air”. Working memory in NeurOS also serves an important synchronization function: if a long-term memory pattern requires multiple input features to be active concurrently, a Working Memory module allows the features to accumulate (e.g., from eye saccades or changing scenes) over a parameterized time aperture to stimulate the pattern.
NeurOS takes a local rather than distributed approach to memory. Each “memory” is a distinct pattern of (non-zero) features over a potential input field. New patterns are created as distinctive-enough feature combinations appear on a module’s input port. Existing patterns are adjusted as similar-enough feature collections recur. Learning can be turned on and off dynamically, and a learning rate adjusted. Typically, the learning profile favors input pattern repeating in the range of hours to days, but disfavors immediate repetition (“rote” learning and “cramming”). Non-repeated (“accidental”) patterns are garbage-collected. Low-frequency features in patterns have weights diminished over time and new frequently co-occurring features find their way into existing patterns. There is no automatic merging or splitting among patterns. Rather, with a nod to evolution’s “survival of the fittest”, patterns that best represent useful experience tend to dominate pattern matching (i.e., produce the highest pattern matching confidences for most common patterns) and others may fade from disuse.
Long-term memory in NeurOS uses memory pattern modules and sharable pattern spaces. Pattern spaces hold the “memories”. Each memory is a discrete pattern typically represented as a sparse vector or matrix or other similar representation. Memory modules in neural graphs handle (auto-associative) pattern matching, learning and forgetting. It is typical for multiple modules in a graph to share a pattern space, doing pattern matching and learning for different purposes at different times. However, as in biological brains, memory is not (usually) globally shared. Memories of different kinds serve different purposes and tend to inhabit anatomically separate brain regions. NeurOS long-term memory modules are the closest analogs to biology, performing operations similar to neurons, dendrites and small neuron assemblies.
In theory, almost all the functionality available in NeurOS could be built with compositions of these modules. Practically, fixed-function performance-sensitive operations are often better performed by other processing modules, while functions involving learning over time are best handled with long-term memory modules. Set, sequence and temporal patterns are core primitive abstractions that are both biologically plausible and universally powerful. These abstractions are building blocks of a great many semantics. These patterns show wide general utility. Other memory models can be easily incorporated as needed using NeurOS customization and integration facilities.
- Set patterns respond to co-occurrences of input event values. Set patterns are widely reused for feature recognition, synonyms/naming, many-to-one relationships and abstraction. A set pattern is, effectively, a weight vector with weights corresponding to the “importance” of input features. Pattern match scoring multiplies concurrent input feature values and corresponding weights, scaled by a normalizing function. A response curve parameter enables a wide range of pattern matching semantics so that fewer or more input values are needed for significant matching confidence, spanning [any/OR/synonym, a few, some, many, most, all/AND]. Typical usage first feeds momentary events (e.g., sensory inputs) through a working memory module to persist (repeat) them, effectively providing a “concurrency time aperture” to a Set module. The Set module manages a collection of Set patterns, evaluating a matching score on each relevant pattern as new events arrive. A Set pattern is somewhat analogous to a single biological neuron, which, depending on input geometry and cell type, can serve a similar range of semantics.
- Sequence patterns learn and match event sequences independent of time. Although biological constructs for recording and recognizing such sequences are not yet well understood, it seems clear that brains widely employ such a core capability. NeurOS includes several alternative built-in sequence pattern representation and matching styles. The primary one uses a 2D weight matrix of event ID rows and sequential step columns. The weight of an ID element (row) is highest at the sequential step(s) (columns) of an original/preferred pattern instance. A tolerance parameter spreads weights to neighboring sequence columns to allow for matching with missing/extra/misordered sequence events, such as needed for recognizing misspelled words or similar melodies. Non-zero weights for multiple ID rows at the same step column represent multiple concurrent/alternative feature IDs at a sequence step. This is useful to accommodate, for example, multiple similar letter shapes like “a” and “o” arising from poor lighting or ambiguous penmanship, or multiple notes in a musical chord. Fixed-length diagonalizable sequence patterns with 0 positional tolerance function as classical N-gram recognizers. Other built-in sequence representation and matching styles are based on regular expressions, string edit distances, Dehaene’s positional open bigrams and other forms of cross-correlation. Biologically, sequence patterns plausibly relate to neuron chains where one neuron firing for one stimulus enables one or more subsequent neurons to fire in the presence of subsequent sequential stimuli.
- Temporal patterns further impose relative timing constraints and tolerances on sequences, and allow for matching at a range of speeds/tempos. A temporal pattern keeps a sequence of proportional time-relative peak weights/values for each input ID. A temporal tolerance parameter governs the spread and fall-off of weights around each such peak, producing an interpolated weight curve over proportional relative time. Matching aggregates cross-correlation computations on multiple component signals (input IDs). As new events arrive they are matched to the corresponding interpolated curves to allow for matching event occurrences that are near-enough in (virtual) time. Biological brains seem to have some mechanisms for recording, recognizing and replaying time-based and time-relative sequences like music and muscle coordination, although specific biological structures for this capability are not yet clear.
These patterns can be composed via NeurOS sub-graphs as needed to represent nearly arbitrary abstractions and semantics. Feedback connections among memory pattern modules can be used to iteratively “chain” through meshes of patterns. Multiple input features match a pattern (an “abstraction” of those features). This, together with other features or matched patterns, contributes to matching additional patterns, etc. Often the reification of a pattern into its constituent features (see Reification below) is included in the feedback loop. Feature weights in patterns are generally normalized to a range (-1,1) with 0 representing “irrelevant”, positive (excitatory) weights representing feature importance to the pattern, and negative (inhibitory) weights representing the importance that the particular feature NOT be present.
An alternate form of patterns using input differences from optimal feature values instead of or in addition to weights is under development.
Patterns are matched incrementally as input events arrive, and events for matched pattern IDs are emitted with values proportional to matching scores and the strengths (values) of input features. It is typical to see matching confidence scores for multiple pattern candidates grow and shrink as new input events arrive that confirm or disconfirm each pattern. Memory pattern spaces can be saved and reloaded, either as part of state-saving a neural graph, or separately. This allows stopping and restarting a NeurOS application as needed without re-learning, as well as sharing learning among applications.
Learning new patterns starts out as “one-shot” learning. A new-pattern threshold determines how strong a match is required within a memory space before a new input set/sequence spawns a new pattern. This threshold effectively controls how specific a pattern is to a particular feature set: higher thresholds create multiple distinct exemplars, while lower thresholds continually adjust stereotypical “average” patterns. Non-repeated patterns are typically garbage-collected.
Learning an existing pattern merges current input with the pattern based on learning rate and pattern matching history. Learning in patterns can add or remove elements based on experience. Learning in sequential and temporal patterns can shrink, lengthen, insert or remove sequential/temporal elements based on experience, and adjust relative timings and time-aperture tolerances. Memory modules additionally offer a “learning profile” to favor learning with repetition at different time intervals, and a “forgetting profile”, to emulate varying degrees of medium-term (hours-to-days) memory. NeurOS long-term memory modules implement flexible classification/clustering.
Unlike many classification schemes, there is no built-in splitting or combining of SST patterns. Rather, patterns learn progressively and in parallel (several patterns may match concurrently or at different times). The most frequent strongest matches tend to dominate future matching to continuing experience. Pattern spaces can be pre-loaded, as well as learn incrementally.
Batch learning can be accomplished with a FileInput module feeding a memory module’s input (along with other more dynamic input links). Initial running of the graph then “plays” the file’s contents through the memory module as events. Neural graphs that have run for a while can be saved together with their learned patterns, and restored and restarted at a later time with their prior learning preserved. Populated pattern spaces can be reused and even shared live with other applications.
Reify modules reverse-transform patterns back to their component features, emitting sets/sequences of events for the constituent elements of a pattern. Reifying a set pattern emits all the set’s components concurrently, with strength values proportional to pattern weights. Reifying a sequence or temporal pattern emits events for its elements with a parameterized tempo. Feature strengths emitted for any specific pattern follow the distribution of feature weights learned over experience matching that pattern.
The Reify module possibly models some of the extensive “downward” feedback connections found in brains. Layered or recursive reification expands highly abstract patterns down to successively lower levels of detail. This cascading reification is key to behavior, particularly actions affecting the external environment through layers and sub-layers of what we might think of as “learned action macros”. Reify modules effectively implement “generative” processes over learned patterns.
A frequent sub-assembly is an Memory-Reify pair that implements a form of imagination or pattern completion or prediction. A Memory module watches an input feature stream and generates events for likely candidate patterns in an auto-associative way. These feed a Reify module which generates output event sets/sequences including all the previously learned pattern component features, not just those that have actually been seen or heard. Looping back a Reify module’s output to a corresponding pattern module input port commingles perceived and imagined features to “firm up” recognition of some object, for better or worse: “we see what we expect to see”.
Neuroscience research indicates that newly formed memories today get consolidated and filtered over time, perhaps during sleep, into more permanent memories. There is no direct model of this in NeurOS. Rather, this is likely to emerge from building neural graphs that model these processes, a future research project. A likely candidate architecture would be for new memories to be formed in neural graphs in one relatively transient memory pattern space with a forgetting profile that clears them out after a while. Another neural graph part models the replay of these memories into another more permanent long-term memory pattern space. Net, NeurOS allows one to model and build practical implementations of a wide range of biologically motivated memory capabilities.