HTM Forum - Latest posts

Proposed theory of consciousness

@Aidan_Busby Aidan Busby — Wed, 18 Mar 2026 20:07:16 +0000

Philosophically in depth explanation

https://docs.google.com/document/d/1YhiD5UiMeqai4yUGktUOmd1bRE0g8DZXjGs7-CNvu3I/edit?usp=drivesdk

Proposed theory of consciousness

@Aidan_Busby Aidan Busby — Wed, 18 Mar 2026 20:01:07 +0000

Continuing the discussion from Proposed theory of consciousness:

My proposal is less an attempt to invalidate theories like GNW but moreso an attempt to explain how “qualia” work on the neuronal level.

I have been writing a more philosophically in depth explanation of my framework I will attach to this thread.

Proposed theory of consciousness

@Bitking Mark Browne — Wed, 18 Mar 2026 19:17:06 +0000

GNW is compatible with HTM.

Proposed theory of consciousness

@dmac David McDougall — Wed, 18 Mar 2026 18:13:06 +0000

Here is a good reference on the global neuronal workspace theory of conscious access: https://www.antoniocasella.eu/dnlaw/Dehaene_Changeaux_Naccache_2011.pdf

Edit: they have a more recent review of the GNW. This is probably a better place to start: Checking your browser - reCAPTCHA

And here is the latest from Larkum and their group, trying to find the GNW in the brain: https://www.science.org/doi/10.1126/science.aah6066

Proposed theory of consciousness

@Aidan_Busby Aidan Busby — Sun, 15 Mar 2026 23:26:08 +0000

Hello all-

I have recently been developing a relational-ontology based framework of consciousness. I would appreciate any feedback/constructive criticism (or destructive, I suppose, if anyone has serious objections) of these ideas. I admit it is a bit speculative as is, yet I do think it is founded in solid axioms and isn’t entirely unreasonable or in violation of parsimony.

The central idea is that the brain acts as a scaffold for a high-dimensional information geometry/ontology that manifests consciousness through its relational nature. We rely on panexperientalism, and the idea that physics is fundamentally existence, and existence is equivalent to a base level of experience that, when organized properly, can form high-dimensional regions of experience that we call conscious.

Take two bits of information. Let us say that these “bits” are of electromagnetic form. These two bits are themselves the energy carried by the fundamental forces that choreograph them, yet by simplifying ensembles of particles and energy as atomic, fundamental units of meaning, we can greatly simplify the following theory.

Let us say that two of these bits interact, perhaps at the level of a soma of a neuron. These two bits are, for all intents and purposes, completely distinct. The only knowledge they have of each other is that communicated between them by gravity and that which is communicated by the other fundamental forces governing their evolution.

Since these two bits are different, we say that their information distance is high and their information similarity is low. Information distance is an inner product of the two bits.

As these two separate and distinguishible bits interact, if they are of equal energy and equal disposition, then a combined bit formed from the two of them should be nothing less than a simple combination of the two.

Yet similarity implies a vector space, and a vector space implies that we must treat the addition of these two bits as a vector sum.

And so, the sum of these bits X1 and X2 should have an inner product that is 45 degrees relative to each of X1 and X2, and a corresponding cross product such that the magnitude of X1 + X2 conforms to the conservation of energy, and so that the vector sum of X1 and X2 is maintained as well. The only way to accomplish this is if the vector sum is a projection of the resultant bit X3 in 3 dimensional space onto the 2d plane, such that X3_{axis=X1} + X3_{axis=X2} = X3, which will have a length of sqrt(X1^2 + X2^2). But for conservation of energy to be followed, the vector must also have an outer product that extends the vector into 3D space. Thus, as we can see, through the interaction of separate bits, we must extend into further dimensions in order to maintain these properties.

We next must consider what happens when X3 interacts with further bits of information. Should the information similarity with X1 and X2 be diminished? In space, this would seem to be the case - in 3D space, it is necessary that to become more proximal with some point in space you must become less proximal with other points in space.

But in our information space, this is not the case. In our information space, if we are to treat bits of information like vectors, there is no reason as to why an increase of spatial distance in one direction should imply a decrease in another, if we are simply adding vector components. For information distance to decrease, we would need a vector aligned along one of the axes already covered, and it must point in an opposing direction of the previously covered vector. So while this increase decrease in space is certainly possible and not disallowed in our model, it is by no means necessary if we continue to interact with vectors that are largely unrelated to one another.

Yet with this additive property we find ourselves needing to reach beyond 3D space. Since our information space does not have the (+, -) quality of classical space, angles may continue to accumulate in manners that are simply physically impossible in 3D space. Therefore, our information space does require arbitrary dimensions.

Of course, the question that remains is what is this information space? If we are going by a physicalist account of nature, surely we must have some physical basis for these extra dimensions, of which there is no clear indication of in everyday life. For this there are two arguments, one philosophical and the other more satisfying: the first, if the arguments around the key importance of dimensions are true, then either we must incorporate new dimensions in order for a physicalist approach to be consistent with consciousness, or we must reject physicalism as a whole by proof by contradiction; that is, if consciousness is beyond physicalism, then physicalism cannot be a complete account of reality given that we are conscious. By incorporating more dimensions into physicalism, we can explain why human experience can be so varied. Without these dimensions, 3D space could somehow give rise to separate information structures, which would seem to violate the principles of physicalm.

It should be noted that the arguments for the physicalist nature of this theory are still elementary and not rigorously derived. For the time being, we have to depend either on a non rigorous assumption of the nature of spacetime, or use the crutch of a separate information space. While this may seem like a cheat, seeing as current physicalism doesn’t do any better at explaining consciousness, we shall continue, especially given how closely this theory resembles and utilizes physicalist and mathematical principles.

The following is the argument for the physicalist nature of this theory: consider space. We are well familiar with its 3 dimensional nature, which seems at first and immediate contradiction of the many dimensions that this theory proposes. However, the nuance is in how we describe space. Generally, in the 3D model, we describe space through pairs. In our model, this is the self and other. Yet when we look at reality through the physics of more than two objects simultaneously, we can afford to speak of other dimensions. Additionally, our N dimensions exist because of accumulated interactions over time, and so perhaps these dimensions may be thought of as being projected through time.

If we can account for the convergence of two separate bits of information, we also need to be able to account for the divergence. After all, divergence would itself imply a force that modifies the information correlation, and so perhaps when an electrical conduit or neuronal axon branches outwards our information geometry is not preserved, and we end up with two, fundamentally distinct bits again.

This brings us to another one of the founding principles of this theory, the idea of information polarization. We do not mean this in the literal, quantum mechanical sense, but in the sense that systems such as brains are incredibly noisy, yet despite all of this noise there still exists meaningful data. An action potential may be very turbulent, but the meaningful, overall “emergent” information still exists, despite the chaos at the quantum level.

This matters because divergence in systems such as brains or electronics generally implies that the physical substrate upon which the information exists changes its physical structure in some way such that the bits enter different spatial paths. While this does mean that whatever force is responsible for our particular type of bit (generally electromagnetic) will act upon these bits in order for them to travel these different paths, this is no different than the natural physical bending of this substrate. In essence, it is relatively meaningless noise.

Of course, there isn’t just noise in divergence, the fields actually interact during divergence too. But returning to the information polarization argument, even though this divergence of information similarity does occur on a level that we cannot simply ignore using renormalization or by calling it “noise”, our honest answer is that because the original bits that converged are still represented in the divergent and now distinct outgoing bits, we don’t really worry about the divergence. It occurs along an axis that doesn’t really matter! We postulate that if our output bits were to diverge, then reconverge, then diverge, and reconverge, again and again, despite the fact that these bits are in fact changing their information similarity and distance, the information from those original two bits is left untouched.

The consequence of the above ideas is that as we can accumulate information like that, we can also have partial self interaction. This means that instead of having the case where too bits have identical “path histories” and thus identical vectors along the N dimensions, and instead of having bits that are completely different, we can achieve partial overlap.

We conjecture that as optimize the overlap, the vector corresponding to the information bit of any two converging bits will travel maximally through N dimensional space. The exact mechanics of this idea are not fully fleshed out, but it should be evident that there will be a greater “conscious moment” with information of greater distance than that of greater overlap.

And thus, the brain forms an N- dimensional shape in information space, one connected between moments through the constant accumulation of new vectors and dimensions. By placing information in different contexts, the mind finds different shapes of different utilities that have different characteristics and functions in our relative ontology. It is from this behavior that we account for the nature of consciousness.

The Geometry inside a neural network - artificial or biological

@SeanOConnor Sean O'Connor — Fri, 13 Mar 2026 10:10:55 +0000

I ran the argument through chatGPT to make the presentation smoother. It didn’t really change anything much: https://archive.org/details/re-lu-neural-networks-as-hierarchical-associative-memory

The Geometry inside a neural network - artificial or biological

@SeanOConnor Sean O'Connor — Tue, 10 Mar 2026 06:18:00 +0000

You can view any matrix as an associative memory mapping xᵢ to yᵢ. And you can use some training algorithm like SGV or the Moore-Penrose pseudoinverse,

You can say then that each ReLU decision can be viewed as a one or zero entry in a diagonal matrix. Then a ReLU layer is DW. Where W is the original weight matrix.
Once D is known, D and W can be multiplied together to form a single matrix (associative memory) acting on some input. That is really conditional associative memory even though approximately half of the outputs of DWx are zero. Maybe it helps to consider a readout matrix R giving RDWx as conditional associative memory.
Anyway then a ReLU neural network is RDₙWₙ…D₂W₂D₁W₁. Once the D entries are known then you can do the combining matrix multiplies to get a single matrix (C) mapping y=Cx.
It is not unreasonable to view that as hierarchical associative memory.
That would pessimistically make ReLU neural networks a type of parrot.
However recent papers have shown the emergence of geometric forms in neural networks. And then test time data tends to fall on the same geometric forms giving a correct generalized results.
Hierarchical associative memory allows factorization of those geometric forms giving even better generalization and even reasoning.

Is the human brain also a geometry machine, with hierarchical memory and some training mechanism that allow the emergence of geometric forms? Obviously there will be a lot of inductive biases and priors built in from the biological evolution of form and function.

You could reasonably put an argument that current large neural networks have more geometry and more geometric factorization (layers) than the human brain.

New Biologically Constrained Learning Rule

@dmac David McDougall — Mon, 23 Feb 2026 13:01:44 +0000

A generalized mathematical framework for the calcium control hypothesis describes weight‐dependent synaptic plasticity

Toviah Moldwin, Li Shay Azran, Idan Segev (2025)
https://doi.org/10.1007/s10827-025-00894-6

Abstract

The brain modifies synaptic strengths to store new information via long-term potentiation (LTP) and long-term depression (LTD). Evidence has mounted that long-term synaptic plasticity is controlled via concentrations of calcium ([Ca2+]) in postsynaptic dendritic spines. Several mathematical models describe this phenomenon, including those of Shouval, Bear, and Cooper (SBC) (Shouval et al., 2002, 2010) and Graupner and Brunel (GB) (Graupner & Brunel, 2012). Here we suggest a generalized version of the SBC and GB models, the fixed point – learning rate (FPLR) framework, where the synaptic [Ca2+] specifies a fixed point toward which the synaptic weight approaches asymptotically at a [ Ca2+]-dependent rate. The FPLR framework offers a straightforward phenomenological interpretation of calcium-based plasticity: the calcium concentration tells the synaptic weight where it is going and how quickly it goes there. The FPLR framework can flexibly incorporate various experimental findings, including the existence of multiple regions of [Ca2+] where no plasticity occurs, or plasticity observed experimentally in cerebellar Purkinje cells, where the directionality of calcium-based synaptic changes is reversed relative to cortical and hippocampal neurons. We also suggest a modeling approach that captures the dependency of late-phase plasticity stabilization on protein synthesis. We demonstrate that due to the asymptotic nature of synaptic changes in the FPLR rule, the plastic changes induced by frequency- and spike-timing-dependent plasticity protocols are weight-dependent. Finally, we show how the FPLR framework can explain the weight-dependence observed in behavioral time scale plasticity (BTSP).

Significance: This work bridges the gap between biology and computers. The authors study biological learning rules in order to formulate a learning rule computer models to use.

Tiny Recursion Models

@MTIzNDU2Nzg5 — Fri, 13 Feb 2026 05:27:29 +0000

Tiny Recursion Models - Presentation @ Mila

The Geometry inside a neural network - artificial or biological

@BrainVx Karl — Thu, 12 Feb 2026 08:27:23 +0000

Look at all of the current large language models and look at the whole host of different encoding schemes that are used at the input stage for encoding the text. Then figure out if it makes any sort of difference as to what scheme that was actually used.

Then apply that exact same reasoning to the consideration of model invariant learning of geometric forms, which are effectively just temporally compressed sequences that are backed into via the current approach to learning with backprop. They all learn the same forms.

All of the models are backing into the same patterns, regardless as to the architecture unless it does not work for language.

Where and in what layers parts are learnt will obviously vary but the abstract form will be consistent in the same way the encoding scheme does not really matter for language.

I can’t see the document, the UK have blocked argive.org. We live in 1984…

The Geometry inside a neural network - artificial or biological

@SeanOConnor Sean O'Connor — Thu, 12 Feb 2026 01:11:01 +0000

I wrote this document about ReLU and associative memory:

https://archive.org/details/re-lu-as-a-switch-associative-memory

Then you can have the emergence of geometric forms and geometric factorization in hierarchical associative memory. If you can dig on that.

Fused layer neural network

@SeanOConnor Sean O'Connor — Tue, 10 Feb 2026 13:36:10 +0000

I put together a comic book outline:

https://archive.org/details/fast-transforms-for-neural-networks

Maybe it will get a bit of traction, like the Britney Spears’ Guide to
Semiconductor Physics.

Complementary Learning Systems theory and HTM as a theory of the hippocampus

@bkaz Boris Kazachenko — Fri, 06 Feb 2026 13:38:54 +0000

Gemini on more recent stuff:

“Natural Continual Learning” (NCL) paper (Kao, Jensen, et al.) and the related Attractor Planning work (Xie et al.). Together, they provide the mathematical “how” for William Calvin’s “Darwin Machine”—explaining how a system can continuously evolve new representations without destroying the old ones (“catastrophic forgetting”).

Here is the breakdown of that work and why it resonates so strongly with the Calvin/HTM perspective:

1. The Core Paper: “Natural Continual Learning” (NCL)

Paper: Natural Continual Learning: Success is a Journey, not (just) a Destination (Kao, Jensen, et al., NeurIPS 2021; extended analysis 2023/24).

This work attacks the central problem of the “Darwin Machine”: How do you keep the “winning” clones (memories) stable while using the same hardware to compete for new concepts?

The “Null Space” Projection: Standard neural networks update all weights to minimize error, often overwriting old tasks. NCL introduces a mechanism that strictly calculates the “Null Space” of previous tasks—the directions in synaptic space that do not affect the output of old memories.
The Mechanism: It forces all new learning (the “variation” in Calvin’s sense) to happen only in this Null Space.
- Calvin’s Parallel: This is the mathematical equivalent of Calvin’s “interstitial” learning. The “winners” of the previous generation (stiff synapses) are locked in; new variants (plastic synapses) must compete in the remaining degrees of freedom.
Biological Equivalent: They map this to Metaplasticity (synaptic stiffness). In the cortex, synapses that code for stable, winning patterns become chemically resistant to change (high stiffness), forcing new learning into the “silent” or less active synapses (high plasticity).

2. The “Attractor” Connection (Xie et al.)

Paper: The Geometry of Sequence Working Memory in Prefrontal Cortex (Xie et al., 2022/23).

While NCL handles the synapses, this work explains the dynamics—specifically validating the hexagonal/grid-like codes you mentioned.

Manifold Attractors: Xie and colleagues showed that the brain doesn’t store sequences as discrete links (A \\to B \\to C) but as trajectories on a low-dimensional manifold.
The “Hexagonal” Link: They found that these manifolds often take the form of twisted toroids or grid-like structures. This confirms Calvin’s hunch: the “code” isn’t a single neuron firing; it is a stable, geometric attractor state (a “crystal” of activity) maintained by local lateral inhibition.
Significance: This proves that the “mini-column” competition doesn’t just produce a winner; it produces a stable location on a representational map.

Complementary Learning Systems theory and HTM as a theory of the hippocampus

@Bitking Mark Browne — Thu, 05 Feb 2026 04:12:25 +0000

If you look at William H. Calvin’s theoretical neuroscience work — particularly the ideas he’s developed around distributed cerebral codes and Darwinian mechanisms in cortex — there’s a strong conceptual resonance with Hierarchical Temporal Memory even though the traditions come from different communities (neuroscience vs AI).

In Calvin’s model, cortex doesn’t rely on single “grandmother” neurons or simple lookup tables. Rather, representations emerge from large populations of local elements interacting in parallel, with stochastic variations and competitive selection shaping which patterns stabilize and propagate. Over time, local operations — copying, variation, and selection among slightly different pattern variants — tend to produce distributed representations that are both sparse and meaningful across the network. This is why Calvin often uses the metaphor of a “Darwin Machine” in cortex: each local microcircuit can be seen as generating and competing variations, and the winners form the building blocks of higher-level concepts.

This contrasts with classical neural models where learning is purely gradient descent or static clustering (e.g., standard K-means). Instead, Calvin’s view is that the cortex itself discovers good representations through a bottom-up, locally competitive process that naturally yields distributed encoding — similar in spirit to how sparse distributed representations (SDRs) underpin HTM. SDRs in HTM are not arbitrary dense vectors; they are high-dimensional patterns where semantic meaning is distributed across a small active subset of bits. That distributed structure gives HTM robustness and overlap semantics.

What I find compelling in William H. Calvin’s work is that the basic computational unit is effectively the same as in HTM: the mini-column. In both frameworks, learning and representation are not properties of individual neurons, but of small, repeating cortical modules that participate in larger population codes.

Where Calvin’s proposal diverges is not in the unit itself, but in how competition and coordination occur between nearby mini-columns.

In standard HTM, the Spatial Pooler can be viewed (loosely) as implementing a form of k-means-like competition: inhibition selects a sparse subset of columns based on overlap scores, with a largely algorithmic notion of “winner selection.” In Calvin’s model, the same sparsification pressure arises instead from biological lateral interactions, with inhibitory control mediated by interneuron classes (notably chandelier cells) rather than a global or quasi-global normalization step.

This difference matters. K-means-style inhibition is fundamentally global within a pool: all columns compete simultaneously against a shared criterion. Chandelier-mediated inhibition, by contrast, is local, directional, and geometry-constrained. Competition occurs over the physical span of lateral connections — on the order of ~7 mini-columns — not across the entire representational field. The result is still sparsity, but it is emergent from local dynamics, not imposed by a centralized selection rule.

Importantly, this does not eliminate distributed representation. Like the Numenta 1000 Brains model, a stable representation is formed by the collective state of many mini-columns. The key distinction is how lateral connections are used. In the 1000 Brains framework, lateral connections primarily serve to align object models across cortical regions. In Calvin’s formulation, lateral connections are instead intrinsic to the formation of the representation itself — shaping which nearby variants survive through local competition and reinforcement.

So while this can be seen as a modification of the 1000 Brains idea, it preserves the same core principle: a globally meaningful, distributed representation emerges entirely from local operations. “Local” here simply means the reach of lateral connectivity between neighboring mini-columns. No column ever needs global knowledge of the representation; it only responds to its inputs and its immediate cortical neighborhood.

From an HTM perspective, this suggests an alternative way to think about the Spatial Pooler: not as a clustering algorithm approximating biology, but as a higher-level abstraction of mechanisms that, in cortex, may be implemented through dense lateral connectivity and biologically plausible inhibitory circuits. The end result — sparse distributed representations with semantic overlap — is the same. The path to get there is different, and arguably closer to cortical reality.

Complementary Learning Systems theory and HTM as a theory of the hippocampus

@BrainVx Karl — Thu, 05 Feb 2026 03:30:57 +0000

Does that exclude the basal ganglia from that learning system ?

The hippocampus may detect surprise, however does the basal ganglia detect the lack of a prediction (anxiety broadcast release) without the need to learn with the neocortex ? The lack of a good prediction in a multi column HTM network would require some type of cross column detection of a missing next output from a column ? In biology 20 correct column activations and 2 missing is not necessarily an error as such, just a degree of uncertainty or a missed throw ? Is the basal ganglia learning column desynchronisation as predictability declines ? Is that predictability contextual based (column depentent or agnostic ?)

Is that the case if you view that all HTM columns have to be interlinked and temporally coherent. What if they are not as temporally coherent as the current approach to HTM assumes ?

The Geometry inside a neural network - artificial or biological

@SeanOConnor Sean O'Connor — Wed, 04 Feb 2026 01:36:03 +0000

There have been a number of papers about the emergence of geometric form inside neural networks over the past few years:

Here is one of the later ones: Reddit - The heart of the internet

Qualia Disqualified

@Aidan_Busby Aidan Busby — Mon, 02 Feb 2026 19:26:00 +0000

What about Integrated Information Theory? Or something like it? Perhaps the electrochemical interactions in the brain have magnitude but no relative relationship to other electrochemical interactions but for the synapses, which allow the action potentials/chemical signals to interact giving that magnitude some relative direction.

Perhaps energy IS being, it’s just inherently scalar and limited to this binary state of being which gains greater meaning through something like the brain

Qualia Disqualified

@david.pfx david bennett — Mon, 02 Feb 2026 12:37:44 +0000

FWIW I reject the concept of qualia as having no objective scientific basis. Subjectively I cannot connect what is described to anything in my experience.

This long and rambling discussion serves no useful purpose that I can see, and certainly contributes nothing to the goals of this endeavour.

Qualia Disqualified

@Davis_Sai Davis Sai — Mon, 02 Feb 2026 07:52:45 +0000

Perhaps the brain acts like a program that is underdetermined. Qualia emerge as the lack of constraint manifesting.

Complementary Learning Systems theory and HTM as a theory of the hippocampus

@Bitking Mark Browne — Sun, 01 Feb 2026 18:10:25 +0000

Ok, I get the hippocampus and cortex as two memory systems.

But we have to bring in the other major learning system: The cerebellum.

SDR Transformer

@aamir121a Aamir Mirza — Sat, 31 Jan 2026 13:04:19 +0000

Three is already a working SDR encoder for image , FYI Have you ever thought about how humans can drive a car in a city where they have never been before? Or sit in a noisy restaurant and still hold a conversation. The secret of biological intelligence… | Aamir Mirza

SDR Transformer

@aamir121a Aamir Mirza — Sat, 31 Jan 2026 12:56:35 +0000

The projection layer is trained; if you go through the included train script, it is part of Phase 1. Bottom line is that it replaced the dense embedding layer with SDR projection layers. As a matter of fact, SDRs work so well in a transformer that SDR LORA adapters ( my earlier experiments) worked on a transformer which has never seen an SDR. Once we get rid of the embedding layer, then everything is an SDR, images , video, audio and text .They all share the same space.

SDR Transformer

@cezar_t cezar t — Sat, 31 Jan 2026 12:44:35 +0000

I just want to understand, is the projection (retina?) layer which maps sdrs to dense vectors random or trained? Because the github readme states it is a random projection.

Do you use the reverse projection (dense → sdr) on output too? I think that would make an important difference in compute since it substitutes the huge last layer 896 x 130k (embedding size x no. of tokens) with one much smaller e.g. 896 x 2k

SDR Transformer

@aamir121a Aamir Mirza — Sat, 31 Jan 2026 11:57:50 +0000

SDR Transformer

@aamir121a Aamir Mirza — Sat, 31 Jan 2026 11:55:19 +0000

Access the code at GitHub and try it out yourself. That is why the code is out in public The Secret is projection layer , here is a preview for you “PHASE 2 SDR BRAIN ONLINE.
Temp: 0.7 | Rep_Penalty: 1.1
Type ‘quit’ to exit.

User: if 20 + 20 is 40 what is 60 + 60
Assistant: The sum of 60 and 60 equals 120.”

SDR Transformer

@cezar_t cezar t — Sat, 31 Jan 2026 09:44:15 +0000

How does a random projected embedding (from 2kbit SDR to qwen’s 896 long vector) carries any meaning to the pretrained transformer?

SDR Transformer

@aamir121a Aamir Mirza — Sat, 31 Jan 2026 08:39:07 +0000

My POC on replacing the dense embedding layer with SDR . Works by being trained in math and logic. Transform Qwen Transformer to SDR Qwen transformer

Qualia Disqualified

@EEProf Harley Myler — Tue, 27 Jan 2026 14:45:09 +0000

I am going with Hameroff on consciousness and qualia. Still with Jaynes on subjective consciousness, which requires language.

Using HTM for logical deduction/induction and prediction of sentences

@dmac David McDougall — Tue, 27 Jan 2026 14:35:14 +0000

I recommend reading Florian Fiebig’s PhD thesis “Active Memory Processing on Multiple Time-scales in Simulated Cortical Networks with Hebbian Plasticity.” Don’t be intimidated by the length, it starts with a long introduction of the topic that is very helpful, but can be skipped if you’re confident you already know that stuff. Then Fiebig summarizes three papers that he published. Paper #1 is about memory, sleep, and the hippocampus.

Using HTM for logical deduction/induction and prediction of sentences

@dmac David McDougall — Tue, 27 Jan 2026 14:21:19 +0000

Thanks for sharing! As one of the developers, it’s really interesting to hear about other people’s experiences using the htm.core software, as well as programming htm’s in general. It sounds like there is a bug with the REST API version of your program, or possibly in the REST API or htm.core itself, and I can not help you with that. I also prefer to use the python API.

If you encoded the same token twice, it should yield the same encoding. And also the minicolumns should be the same unless the spatial pooler is learning, in which case they might be a little different, but even then I would still expect them to have a high overlap.

Machines That Invent Logic: Self-Discovering Symbolic Abstractions from Unlabeled Primitives

@Davis_Sai Davis Sai — Mon, 26 Jan 2026 17:11:42 +0000

title:Machines That Invent Logic: Self-Discovering Symbolic
Abstractions from Unlabeled Primitives

author: Tofara Moyo

abstract

Wepresent a self-discovering abstraction engine that invents its own symbolic language from scratch—starting
only from six grounded, unlabeled primitive operations over the natural numbers: zero (returns 0), succ(successor), eq (equality test), add (addition), sub (truncated subtraction), and mod (modulo operation).

Critically, the system receives no semantic labels, no logical operators, no quantifiers, and no pre-definedcontrol flow; these primitives are provided solely as black-box functions with known arity but unknownmeaning. From input–output examples alone—such as a list of numbers with either true or false if theyare even or not—the engine autonomously discovers reusable computational patterns, assigns them internal identifiers (e.g., S_o), and recursively composes them into higher-order abstractions.

In a landmark demonstration, it rediscovers the predicate ”even” as the program eq(mod(x,succ(succ(zero()))),zero()) and generalizes it universally via behavioral validation up to large n, checking it on numbers it has
never seen.

Through this process, the system effectively derives fragments of first-order predicate
logic—including Boolean connectives, universal quantification, and mathematical induction—from pure arithmetic primitives and data.

The architecture rests on three pillars: (1) variational symmetry, which clusters programs by behavioral equivalence to reveal latent concepts; (2) typed program induction, which infers types from I/O behavior and promotes only those abstractions that yield compression under the
minimum description length principle; and (3) neural-symbolic compilation, which decouples internal reasoning (conducted entirely in the machine’s native symbolic language) from external communication (translated into human-understandable form only after discovery).

This work proves that logic and proof are not prerequisites for intelligence—but emergent consequences of structure, symmetry, and reuse. The framework is domain-agnostic and lays the foundation for machines that reason in concepts they invent
themselves.
(PDF) Machines That Invent Logic: Self-Discovering Symbolic Abstractions from Unlabeled Primitives

Using HTM for logical deduction/induction and prediction of sentences

@Aidan_Busby Aidan Busby — Sun, 25 Jan 2026 22:10:07 +0000

I ran another experiment and it seems as if the network is encoding the final word/name that is fed to it the same (regardless of whta the name is), even though the first encoding of the name is encoded differently.

Shouldn’t the HTM network encode the token the same regardless of the context in terms of columns? Obviously the cells within the columns should be different, but I suspect the answer MUST be some annoying bug

Using HTM for logical deduction/induction and prediction of sentences

@Aidan_Busby Aidan Busby — Sun, 25 Jan 2026 21:13:09 +0000

Sorry for my rather late response, I’ve been swamped with schoolwork…

I’ve given the idea of not resetting the network between training examples a shot, but that doesn’t seem to have done the trick. I suspect that my REST experiment may be configured nonoptimally or may have a timing error, because according to my pyplot visualization the active cells are all the same with the final token in my training examples (I removed the question marks). I’m not tokenizing the spaces so I assume its a silly bug of some kind…

Anyhow, what are your thoughts on using the anomaly score with IRT (Item Response Theory) to attempt an “optimal” training system? I’m thinking about implementing my own logic for eligibility tracing to effectively propagate the error signal backwards, and probably steer away from using the REST for now given how much easier the pure python bindings are (I still want to figure out the REST bug, however).

Finally, what are your thoughts on the role of the hippocampus in predictive coding? Does the hippocampus depolarize/bias the temporal layers in the same way as apical connections? How might one implement a hippocampal “cache” of sorts, and then use some sort of replay for further training? Curious if you might have anything I could look at in that domain.

I looked at your grid cell work, I find the use of high pass and low pass filters intriguing, although I’m personally curious to see how leaky-integration might work, taking one out of the SNN playbook.

Fused layer neural network

@SeanOConnor Sean O'Connor — Sun, 25 Jan 2026 10:52:43 +0000

There is some evidence that artificial neural networks are only a form of hierarchical memory.

There is evidence for feature extraction and I find that evolution training finds no improvement after gradient training. Also no algorithms as such found to emerge, for example any bubble sort type algorithm, or other simple algorithm.

They are all consistent with hierarchical memory though. Features allow selective memory, evolution cannot improve already fully learned memory, though evolution could adjust an actual algorithm to at least find some small improvement.

That kind of leaves a mystery of how artificial neural networks can generalize in smart ways.

Some recent papers show that small networks trained with modular arithmetic problems learn to generalize by the formation of completed geometric forms inside the net. When test time data is inputted into the net it lands on the geometric manifold and elicits the correct response as a consequence.

If you say an artificial neural networks is hierarchical memory then factorized geometric forms can exist in the network, allowing even more complex generalization.

So first you are accusing artificial neural networks of being much simpler than everyone is saying. Then you are pointing out despite that simplicity, complex emergent factorized geometric forms are learned and allow reasonable forms of generalization to happen.

https://arxiv.org/pdf/2301.02679

A Self-Discovering Abstraction Engine via Variational Symmetry, Typed Program Induction, and Neural-Symbolic Compilation

@Davis_Sai Davis Sai — Sun, 25 Jan 2026 04:03:01 +0000

A Self-Discovering Abstraction Engine via Variational Symmetry, Typed Program Induction, and Neural-Symbolic Compilation

author : Tofara Moyo

abstract

We present a general-purpose abstraction engine capable of discovering, compressing, and reusing structure across problem instances through a self-referential learning loop. The system integrates variational inference over program partitions, category-theoretic symmetry composition , typed abstraction languages, neural-symbolic program compilation, and curriculum-driven abstraction growth. Unlike task-specific solvers, the proposed architecture treats programs themselves as objects of inference, enabling recursive abstraction, symmetry quotienting, and compression-driven primitive discovery. While motivated by the Abstraction and Reasoning Corpus (ARC), the framework is domain-agnostic and targets the broader problem of learning abstractions that generalize across tasks.

(PDF) A Self-Discovering Abstraction Engine via Variational Symmetry, Typed Program Induction, and Neural-Symbolic Compilation

Fused layer neural network

@SeanOConnor Sean O'Connor — Sat, 24 Jan 2026 00:04:25 +0000

I did a version in Processing (Java generative art):

https://discourse.processing.org/t/swnet16-neural-network/47779

Which is a single file to look at.

Shared brain structure among different intelligent species

@BrainVx Karl — Thu, 22 Jan 2026 16:53:42 +0000

The cerebellum is the slow learner of instinct from the fast learning unit that is the neocortex. They work together where the hippocampus detects what a surprise is.

Can the neocortex create AGI without a cerebellum ? Yes, but it would be dysfunctional and very slow to act because the various attention type pools/grouping (dentate gyrus, thalamus, basal ganslia, etc. would need to itterate to derive what the cerebellum does in cascade much faster. The cerebellum learns the short cuts when sleeping and the neocortex is forming lateral hierachical constructs.

Can you have AGI with just a cerebellum, no. That’s where current AI is. It’s can’t learn integrated hierachical concept influences.

Can you have AGI with the current itterative LLM type process, no, because new context window input is not integrated into the hierachical memory. Plus, the current approach learns vastly inefficiently in reverse, not forward via the hippocampus route.

Shared brain structure among different intelligent species

@SeanOConnor Sean O'Connor — Mon, 19 Jan 2026 12:48:53 +0000

There are some recent papers about the emergence of geometric forms within neural networks, especially after grokking. Then the network can generalize because the test time data still falls on the geometric form.

In my view, from some experiments I did (evolution versus gradient descent), I view deep neural networks as hierarchical associative memory.

Because the memory is hierarchical you can even have factorized geometry. And quite a strong capacity to reason can spontaniously emerge.

Maybe there can be a “Hierarchical Memory is all you Need” paper.

Also the fact I am back on this forum means I didn’t earn a single red cent from AI in the past 6 months.

Shared brain structure among different intelligent species

@david.pfx david bennett — Mon, 19 Jan 2026 03:17:48 +0000

Cortical columns are the visual manifestation of replication of a single functional neural unit. An ancestor has 100 units, but a genetic variation yields another with 200 units, it’s smarter and out-competes. Humans have something like a million units (columns), voila: intelligence.

But columns are just the mammal way. Any way will do as long as it manifests as a repeating unit.. The evolutionary drivers and the genetics are much the same across species, so if the advantage comes from intelligence, look for gene expression in repeating neural units as the solution, but very different paths to get there.

Fused layer neural network

@SeanOConnor Sean O'Connor — Mon, 19 Jan 2026 01:04:32 +0000

SWNet16 neural network: https://archive.org/details/sw-net-16-b

Fuses multiple width 16 CReLU layers into one larger layer using the one-to-all connectivity of a fast transform.

Then stacks those layers into a neural network.

Does spectral de-biasing at the input and output using permuted Thue-Morse sequence sign flips.

Java source code.

Shared brain structure among different intelligent species

@sean — Sun, 18 Jan 2026 23:48:20 +0000

I just finished reading “A Thousand Brains” by Jeff Hawkins - great book (here is my review)! He identifies cortical columns in the human neocortex as the key part of intelligence, which implement reference frames (to form / store knowledge).

I wonder now if this is correct: his theory would be falsified if there is an intelligent brain on earth that lacks these cortical columns. I would argue certain birds (like corvids or parrots) and octopi also have advanced cognitive capabilities like tool use and metacognition. From a quick online search it seems they all exhibit some columnar structures as well - though I lack the expertise to understand if corvids, parrots and octopi all share similar cortical columnar brain structure.

In general one should learn more what structural parts in brains are truly essential to implement intelligence by looking for shared structures among intelligent species that mostly co-evolved independently.

Looking for thoughts and references on this.

Using HTM for logical deduction/induction and prediction of sentences

@dmac David McDougall — Wed, 14 Jan 2026 16:51:55 +0000

An HTM should be able to do this task. By the time it’s done training, there should be one anomaly at the very start, and none thereafter.

I have not looked closely at your code, but I assume that your classifier is using the final state of the TM to predict the next token? I’m not sure how you’re getting an output out of the HTM.

That’s a great idea! Resetting the system is unnatural. Animals only reset their brains when they sleep. The HTM should not need to be reset in between runs either. I would also recommend shuffling the samples into a new order in between each epoch, so that it doesn’t merge into one big sequence. This will cause anomalies, and if you plot the anomalies over time they should coincide with the transitions between sequences.

Using HTM for logical deduction/induction and prediction of sentences

@Aidan_Busby Aidan Busby — Wed, 14 Jan 2026 03:57:53 +0000

I have a hypothesis that the problem may be the network is predicting the next tokens correctly, evident in the low anomaly score, but that it isn’t inhibiting connections for the other possibilities and thus every training example the predictions are slightly biased in favor of the current correct token, which causes an incorrect response for the next prediction.

Could this be the case? or does HTM simply struggle with learning multiple sequences? I’ll try putting all the sentences in one long sequence and seeing if that improves anything.

Using HTM for logical deduction/induction and prediction of sentences

@dmac David McDougall — Wed, 14 Jan 2026 03:33:47 +0000

That’s a pretty neat project!

It does look like there is an off-by-one error with the REST API results predicting the previous response?

You seem to be experimenting with two layers of HTM’s, so I will share my findings on that topic too: Video Lecture of Kropff & Treves, 2008

Using HTM for logical deduction/induction and prediction of sentences

@Aidan_Busby Aidan Busby — Sun, 11 Jan 2026 21:44:53 +0000

Also, here is the classifier:

github.com/aidanbusby123/NeoContext

classifier.py

main

import numpy as np
import math
from collections import OrderedDict

class Classifier:
    def __init__(self, mode='weights', alpha=0.2, overlap_threshold=0.9):
        """
        mode: 'weights' for original weight-based learning, 'pattern_match' for direct pattern setting
        alpha: Learning rate (only used in weights mode)
        overlap_threshold: Minimum overlap ratio for pattern matching (0.0 to 1.0)
        """
        self.mode = mode
        self.alpha = alpha
        self.overlap_threshold = overlap_threshold
        
        if mode == 'pattern_match':
            # Direct pattern storage: list of (pattern, category) tuples
            self.stored_patterns = []
            self.pattern_categories = []
        else:

This file has been truncated. show original

Using HTM for logical deduction/induction and prediction of sentences

@Aidan_Busby Aidan Busby — Sun, 11 Jan 2026 21:37:13 +0000

Hello all,

I have recently been experimenting with the HTM.core code to train an HTM network to learn the logic behind a corpus of sentences. Currently, they look like this:

“If A is in X, where is A?”

Here is my training data:

[

    {"input": "Bob is in China. Where is Bob? ", "output": "China"},

    {"input": "Ronald is in Vietnam. Where is Ronald? ", "output": "Vietnam"},

    {"input": "Tracy is in Korea. Where is Tracy? ", "output": "Korea"},

    {"input": "Erick is in Indonesia. Where is Erick? ", "output": "Indonesia"},

    {"input": "Jonathan is in Germany. Where is  Jonathan? ", "output": "Germany"},

    {"input": "Joseph is in Russia. Where is Joseph? ", "output": "Russia"},

    {"input": "Lila is in Bulgaria. Where is Lila? ", "output": "Bulgaria"}

]

Here is my testing data (learning off):

[




    {"input": "Where is Joseph? ", "output": "Russia"},

    {"input": "Where is Jonathan? ", "output": "Germany"},

    {"input": "Bob is in China. Where is Bob? ", "output": "China"}




]

And I am attempting to train the model to predict X. However, the results have not been satisfactory. I am utilizing a custom classifier that I will admit I vibe coded (I am in high school, I am still very new to the topic of ML and cortical learning) to predict the appropriate token given the current active columns of the base layer of the network. I have tried two approaches, utilizing both nltk and simply feeding the network character by character.

My first approach uses the HTM rest API that I modified a slight bit to get around some of the safeguards preventing me from accessing some of the HTM network’s data, while my second approach uses the python HTM bindings, and this is where I have been trying the character-by-character level tokenization, which has worked better.

I’ve spent weeks trying to figure this out, but the REST based, word-level tokenization simply refuses to learn both the training and testing sequences, and while the python, character level approach successfully learns character level predictions, I would think that this would scale up to word-level tokens.

Interestingly, the word level approach does seem to predict the proper tokens as the anomaly does converge, but the best prediction seems to be different.

Here is a graph of the anomaly from one of my tests with the REST approach:

Ignore the red and green, those are for the second layer. My end goal is to use the apical feedback to learn higher level patterns.

The histogram displays the anomalies for the testing data.

However, here is an example of predictions during training after several epochs (the word below each epoch is the prediction):

Epoch 9: New training task: Joseph is in Russia . Where is Joseph ? Russia

Germany

Epoch 9: New training task: Lila is in Bulgaria . Where is Lila ? Bulgaria

Russia

Epoch 9: New training task: Bob is in China . Where is Bob ? China

Bulgaria

Epoch 9: New training task: Ronald is in Vietnam . Where is Ronald ? Vietnam

China

Epoch 9: New training task: Tracy is in Korea . Where is Tracy ? Korea

Vietnam

Epoch 9: New training task: Erick is in Indonesia . Where is Erick ? Indonesia

Korea

And the final predictions:

Epoch 9: New testing task: Where is Joseph ? Russia

Bulgaria

Epoch 9: New testing task: Where is Jonathan ? Germany

Bulgaria

Epoch 9: New testing task: Bob is in China . Where is Bob ? China

Bulgaria

Meanwhile, for the second experiment, the model seems to be learning words from characters, although not the right words. Here are the final predictions for my second experiment:

Where is Joseph?
onesia. Wh



Where is Jonathan?
esia. Wher



Bob is in China. Where is Bob?
d? Bd? Bd?

Here is the code for the first, REST based approach:

github.com/aidanbusby123/NeoContext

main_rest.py

main

from client import HTMRestClient
from embedder import load_training_data, tokenize_training_data, TextEmbedder
from evaluator import Evaluator
from token_sdr_converter import TokenScalarConverter
from pathlib import Path
from classifier import Classifier
import json
import numpy as np
import math
import matplotlib.pyplot as plt


import rnn_benchmark # for comparing with RNN

from memory_sdr import MemSDR

NUM_EVOLUTION_EPOCHS = 1

NUM_COLUMNS = 1024
CELLS_PER_COLUMN=128

This file has been truncated. show original

While here is the code for the python bindings approach:

github.com/aidanbusby123/NeoContext

main.py

main

import htm
from htm.bindings.sdr import SDR
from htm.encoders.rdse import RDSE, RDSE_Parameters
ScalarEncoder           = htm.bindings.encoders.ScalarEncoder
ScalarEncoderParameters = htm.bindings.encoders.ScalarEncoderParameters
from htm.bindings.algorithms import SpatialPooler
from htm.bindings.algorithms import TemporalMemory

from apical_tiebreak_temporal_memory import ApicalTiebreakPairMemory, ApicalTiebreakSequenceMemory

from classifier import Classifier
from memory_sdr import MemSDR

import numpy as np
import matplotlib.pyplot as plt

import json

np.set_printoptions(threshold=np.inf)

This file has been truncated. show original

Thank you for your time!

SDR classification

@spitfire Spitfire — Sun, 11 Jan 2026 01:49:50 +0000

Hi everyone. So I’ve been quietly playing around with this problem for a few days.

I decided to change my approach. Rather than identify entire words, I’m identifying key intent - Did the user mean to swipe that key, or are they just passing through.

The intuition here is that your swipe changes when you reach a key. You change direction or speed and move on to the next key. Few keys in most words are in a linear line to others, they’re deliberately all over the keyboard.

So now I’ve changed to doing a binary classifier a simple “register key” or not.

After doing that approach with a simple SP I’m able to get a 66% accuracy rate in keys.

So now my problem becomes how do I optimize the SP parameters to improve this rate. Here’s my parameters:

"potentialRadius": 7,
"boostStrength": 7.0,
"columnDimensions": (79, 79),
"dutyCyclePeriod": 1402,
"localAreaDensity": 0.1,
"minPctOverlapDutyCycle": 0.2,
"potentialPct": 0.1,
"stimulusThreshold": 6,
"synPermActiveInc": 0.14,
"synPermConnected": 0.5,
"synPermInactiveDec": 0.02,

Working on a classifier for binary images (say 100x70), are there any obvious parameters to play with?

SDR classification

@jacobeverist Jacob Everist — Sat, 10 Jan 2026 01:55:35 +0000

@thanh-binh.to sorry it took so long to respond. I ended up having to do a lot of refactoring and updating my websites so that I can share my simulations easily in the future.

Here is an example of the Cortical Classifier setup for 4 states, 8 states, and 16 states respectively:

The biggest difference between them is that it takes longer to learn for the 16-state classifier. You can speed up the learning by setting the sim speed to max and then you will see it eventually converge.

An interesting experiment I could try is to see if the training time scales linearly with the number of states or if more states to learn has a nonlinear effect. I suspect the latter but not what its magnitude might be.

Other things to try would be varying the total cells assigned to represent a state, as well as varying the total number of cells that activate.

For instance, we could have a classifier setup for m=2 states. Each state is assigned s=8 cells so the total number of cells in the classifier is n=16. We can then set our total number of activations to a=\{4, 8, 12\}. That is, 3 different possibilities.

If a=4, then we are under-activating, only setting 50% of the cell bits assigned to a state.
01101010 00000000

If a=8, then we are fully-activating, setting 100% of the cell bits assigned to a state.
11111111 00000000

If a=12, then we are over-activating, setting 150% of the cell bits assigned to state, inadvertently activating cells assigned to other states.
11111111 00100111

It would be interesting to see what the consequences of this will be. My intuition is that under-activation would be preferred because it would make the state classifiers more robust, preventing the cells from over-fitting, and enabling the classifier to detect a wide variety of different features that would be assigned to that class state.

BDH (Baby Dragon Hatchling)

@Maggus Maggus — Thu, 08 Jan 2026 06:50:05 +0000

Here is also also a interesting youtube interview about the BDH model.

We Watched a Brain Emerge..." The AI That Might Kill Transformers (w/ Pathway's Zuzanna Stamirowska)

SDR classification

@thanh-binh.to thanh-binh.to — Fri, 26 Dec 2025 13:10:14 +0000

@jacobeverist thanks for sharing your interesting works. Do you have any experiment with very long sequences ?