In recent years, Artificial Intelligence (AI) has undergone a massive growth spurt. However, not so long ago, Large Language Models (LLMs) such as ChatGPT, Gemini, and Claude were curiosities. You could trick them, confuse them, or make them contradict themselves. Today, they have evolved into versatile companions that can write software, assist scientific research, extract insights from a large set of documents, and offer structured guidance across a wide range of domains. Today’s multi-modal AI systems no longer operate on text alone; they interpret images, analyse audio, generate video, and combine these streams in seamless ways. Language, reasoning, and creativity, capacities we associated with ourselves are now appearing, at least on the surface, in machines. Scaling a neuron Tracing the foundations of these AI systems, one can observe that the core idea behind them is not new. Artificial neural networks have existed since the late 20th century, and their conceptual roots go back even further. In 1943, Warren McCulloch and Walter Pitts proposed a simple mathematical model of a neuron. The McCulloch–Pitts neuron takes numerical inputs, multiplies them by adjustable weights, sums the results, and applies a non-linear function to produce an output. This is similar to how one takes input from multiple people and makes a decision if enough people agree on a course of action. Individually, such units are extremely simple. Yet a powerful mathematical insight, known as the universal approximation theorem, shows that networks composed of enough of these simple units can approximate virtually any function connecting input to output. With sufficient scale, they can process remarkably complex patterns. For a long time, that was the limiting factor. Neural networks existed, but the hardware and data required to make them powerful were not available. What changed over the past 15 years was not the invention of neural networks but the availability of enormous computational power and data. Graphics Processing Units (GPUs), originally developed for video games, enabled researchers to train networks with millions and eventually billions of parameters. At the same time, new architectural ideas improved how these networks were organised. Convolutional neural networks proved effective for image recognition by exploiting spatial structure. Recurrent neural networks were designed to handle sequences such as speech and text by allowing information to persist over time. The major breakthrough, however, came with the transformer architecture, which introduced attention mechanisms that allow models to dynamically weigh which parts of their input matter most at any given moment GPT, short for Generative Pre-trained Transformer, builds on this architecture. It is trained on vast collections of text to predict the next word in a sequence. Although this objective appears simple, when implemented at an enormous scale and trained on extensive datasets, the model begins to capture grammar, facts, stylistic patterns, conceptual relationships, and even fragments of reasoning embedded in language. Intelligence, in this framework, emerges from the statistical regularities underlying the text the model is trained on. On the artificial and biological As these systems grow, they are beginning to rival the human brain in sheer numbers. GPT-3 contained 175 billion parameters, while newer models are estimated to reach into the trillions, approaching the roughly 100 trillion synapses in the human brain. Despite this apparent convergence in scale, AI and biological intelligence operate on fundamentally different principles. To take advantage of modern computing hardware, models such as GPT-3 process information in a strictly feed-forward manner. Input enters the network, flows through stacked layers, and produces an output. Each layer transforms the representation and passes it forward without revisiting earlier computations during the same pass. This design enables efficient training across thousands of GPUs simultaneously and allows rapid scaling. The human brain operates differently. Constrained by biology and metabolism, it cannot rely on brute-force parallel computation. Instead, it is built around dense feedback loops. Signals in the cortex flow forward from sensory areas to higher regions, but also backward and laterally. Perception is therefore not a one-way processing of the world but an ongoing dialogue between incoming information and prior knowledge. A simple example makes this clear. When you read the name “Harry” in a story about wizards, you probably think of Harry Potter. In a newspaper article about the British royal family, you would think of Prince Harry. Context shapes perception almost immediately. The brain does not first register a neutral word and then decide what it means. Meaning is built into perception itself. In ambiguous sentences, visual illusions, or noisy environments, expectations strongly influence what we see and hear. The brain constantly predicts and then updates those predictions based on incoming signals. For example, when searching for a friend in a crowded space, the brain enhances features associated with that person while suppressing irrelevant details. Neuromodulatory systems carry out this selective attention by changing the input-output gains of the synapses instead of actually changing the hard-wired connections. Acetylcholine, a neurotransmitter tracks the uncertainty of different cues adjusting the reliance accordingly. Norepinephrine, another neurotransmitter, indicates when context changes abruptly and suppresses the feedback strength allowing the circuits to shift activity to a different behavioural mode. Dopamine conveys reward prediction errors, modulating synaptic plasticity to form behaviourally relevant representations. At the cellular level, the brain achieves efficiency through event-driven signalling. Neurons communicate using brief electrical spikes. If a neuron does not spike, it consumes minimal energy. At any moment, only a small fraction of neurons are active. This natural sparsity ensures that energy is expended only when meaningful events occur. Memory and computation are co-located at synapses. The same molecular structures that transmit signals also store the history of past interactions. Learning occurs locally at individual connections rather than through a global optimisation signal sweeping across the entire brain. This feedback-rich and chemically regulated architecture allows the brain to perform perception and reasoning using roughly 20 watts of power, roughly what a couple of LED bulbs consume. Large AI systems, in contrast, activate vast numerical matrices for every input and rely on high-precision digital arithmetic. Training and operating them requires data centres consuming megawatts of electricity. They are also far more data-intensive. Modern LLMs are trained on trillions of words, whereas humans encounter only a fraction of that amount over a lifetime, yet develop flexible understanding and generalisation. Mirroring the brain Researchers are increasingly attempting to borrow insights from biology. GPT-4 introduced mixture-of-experts architectures that activate only specialised portions of a network for a given task. This resembles the brain’s modularity and selective recruitment of different regions for language, vision, or movement. Newer models now adjust computational effort based on task difficulty. Recent hardware developments include changes in the fundamental chip design to build neuromorphic chips that use spike-like operations to reduce energy consumption. At the same time, researchers are also exploring richer training methods that move beyond mere next-word prediction, allowing machines to learn complex concepts from far less data. Yet these advances remain approximations. Neural networks implement sparsity through engineered routing rather than metabolic constraint. They approximate feedback using attention mechanisms and deep layering rather than through densely interwoven recurrent circuits. True recurrence, in which activity loops continuously between layers, is computationally difficult to train and challenging to parallelise efficiently at scale. As a result, most LLMs rely primarily on deep feed-forward structures that simulate certain effects of feedback without fully embodying it. Moreover, biological neurons are far more complex than artificial ones. A single cortical neuron contains branching dendrites that perform intricate nonlinear computations before producing a spike. In some respects, the many stacked layers of a deep network attempt collectively to approximate the internal processing that fewer biological neurons accomplish individually. Artificial neurons are simplified mathematical constructs. Biological neurons are biochemical systems embedded in a constantly changing chemical environment. Where this leads remains uncertain. Machines are not bound by the evolutionary and size constraints that shaped our brains. They can scale in ways biology cannot. It is possible that AI systems will gradually bridge and perhaps even surpass the brain in terms of computational power, energy efficiency and learning efficiency. But it is equally possible that machines will diverge further from biological intelligence, evolving architectures better suited to processing information at scales far beyond human experience. In the end, intelligent behaviour does not have to be implemented in brain-like circuits. A pacemaker supports the function of the heart without resembling cardiac tissue. Similarly, AI may one day mimic, extend, or augment human cognition without replicating its biological substrate. In the end, it is intelligence itself, not its resemblance to the brain, that matters. Sabyasachi Shivkumar is a computational cognitive neuroscientist who recently completed his postdoctoral training at Columbia University working on understanding behaviour through computational models. Share this: Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Facebook (Opens in new window) Facebook Click to share on Threads (Opens in new window) Threads Click to share on X (Opens in new window) X Click to share on Telegram (Opens in new window) Telegram Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Pinterest (Opens in new window) Pinterest Click to email a link to a friend (Opens in new window) Email More Click to print (Opens in new window) Print Click to share on Reddit (Opens in new window) Reddit Click to share on Tumblr (Opens in new window) Tumblr Click to share on Pocket (Opens in new window) Pocket Click to share on Mastodon (Opens in new window) Mastodon Click to share on Nextdoor (Opens in new window) Nextdoor Click to share on Bluesky (Opens in new window) Bluesky Like this:Like Loading... Post navigation Why are apple traders in J&K worried? | Explained JNU protest turns violent; JNUSU, ABVP accuse each other of attacks