Ask ChatGPT a question and it responds with writing essays, explaining science, even composing poetry. It feels intelligent. But is it really thinking?

To mathematicians, systems like ChatGPT are not conscious minds but highly structured pattern-recognition machines built on Mathematics. Beneath every sentence lies geometry, probability, and continuous error correction. The intelligence we perceive is not awareness, but Mathematics operating at a massive scale.

This article explores the key mathematical ideas that make modern AI work — and why recent advances have brought it into everyday life.

What are the key mathematical principles that allow AI systems like ChatGPT to work?

“When mathematicians look at systems like ChatGPT, we do not see ‘intelligence’ in the human sense. Instead, we see a very large and carefully structured mathematical machine designed to recognise patterns,” says Associate Professor Priyavrat Deshpande of the Chennai Mathematical Institute.

Three core mathematical ideas

He explains this using an analogy of children learning their mother tongues. Children are exposed to everyday language and gradually develop an intuition for what sounds correct—long before they formally learn grammar rules. Similarly, AI systems learn patterns from vast amounts of language data. The difference is that everything in AI is translated into numbers. “Words, sentences, and even abstract ideas are represented as points in a high-dimensional numerical space,” he explains.

According to him, three core mathematical ideas drive these systems – Linear algebra enables the manipulation of enormous collections of numbers, much like coordinates help us navigate a city using maps. Probability theory helps the system deal with uncertainty by estimating what is likely rather than what is certain. Optimisation provides a structured method to improve performance by learning from errors.

“From a mathematician’s perspective, AI is less about machines ‘thinking’ and more about geometry, probability, and feedback operating together at an enormous scale,” he adds.

How do AI systems learn using optimisation?

The Associate Professor Priyavrat Deshpande of the Chennai Mathematical Institute explains optimisation with the analogy of walking down a mountain in the dark. You cannot see the entire landscape, but you can sense whether each step takes you uphill or downhill. By consistently taking small steps downhill, you eventually reach the base.

“Learning in AI works similarly. The system initially makes poor predictions. Each mistake provides a small signal indicating whether a change would improve performance or worsen it. The system adjusts itself accordingly. Optimization is simply this process automated and scaled up—millions or billions of small improvements applied consistently until the system performs remarkably well,” he says.

When does prediction start to look like intelligence?

“This is where probability becomes surprisingly powerful. If a system can consistently predict what comes next in a conversation, it must have implicitly learned patterns about grammar, facts about the world, logical relationships,human preferences and intentions etc,” explains Tejas Bodas, Assistant Professor of Computer Systems Group at IIIT Hyderabad and a member of AlphaGrep lab.

AI systems are built to make very accurate guesses about what will happen next. They do this by finding patterns in lots of data, even very complicated ones. Cognitive scientists think the human brain works in a similar way — it is always guessing what it will see, hear, or experience next.

“So what we call reasoning might actually come from being really good at making smart predictions. When a system (like a brain or an AI) becomes very good at spotting complex patterns, its predictions can look like logical thinking,” he says.

How does a model improve over time?

“Today’s AI models do not improve continuously on their own,” says Professor Sunita Sarawagi, a member of the Artificial Intelligence Lab at IIT Bombay. “Instead, they evolve through periodic update cycles.”

As millions of people use these models, they generate both implicit signals—such as corrections, errors, or user disengagement—and explicit feedback, such as ratings and annotations. This feedback helps developers identify weaknesses.

To address these gaps, curated datasets are created, often with the help of human annotators who focus on the model’s failure cases. The model is then retrained or fine-tuned using a mix of its original training data and newly collected data.

“Not all improvements rely on human labelling. Reinforcement learning from AI feedback is also becoming increasingly common,” she adds. While this remains the dominant approach, there is ongoing research into continuous learning and online adaptation, where models could improve incrementally without undergoing full retraining cycles.

How do neural networks learn from errors, and why is that so powerful?

Professor Mausam of the Computer Science and Engineering Department, and Founding Head of Yardi School of AI at IIT Delhi said, “A Neural network based AI, similar to a human brain, consists of millions of interconnected (artificial) neurons. When we provide inputs such as audio or visual signals, the neurons process them and fire in varying intensities across different layers, ultimately allowing the AI to interpret a scene with certainty. Some neurons respond strongly, while others only slightly, depending on their importance for this input, which is determined by associated parameters.”

He explains the learning process with an analogy, “Just as a child learns from her parents, on what to do in a given situation, similarly, neural networks are provided inputs and asked to predict outputs. If the prediction is correct, no change is needed. If it is wrong, the network adjusts its parameters to get it right, until the model has learnt this particular task. This process is known as error-driven learning, and it underpins nearly all machine learning approaches.”

What makes neural networks particularly powerful is their ability to train billions of parameters on large datasets. “A neural network’s size determines its capacity to learn, and the amount of data it can meaningfully learn from,” Professor Mausam notes. “For example, an infant has limited experiences and therefore a smaller data ‘training set’ along with a smaller brain, whereas a 10-year-old has much broader experiences and larger brain, enabling more learning. Similarly, larger neural networks have greater computational power and can learn from far more data.”

The concept of neural networks has been around since the 1960s, with many fundamental innovations in the 1970s-80s. “Until 2016, neural network design was not novel, and key disruptions happened because of large computing power and data availability. The real technical change came in 2017, when Google created the ‘Transformer’ architecture — a novel arrangement of neurons that enabled efficient training on massive datasets — significantly boosting performance,” he says.

Is modern AI mainly a software achievement, or a mathematics breakthrough?

Regarding the recent AI revolution, Professor Mausam observes, “Modern artificial intelligence combines applied mathematics, statistics, algorithms from computer science, inspiration from behavioural psychology and innovations in neural architecture design. The major breakthrough in neural architecture design, the Transformer, is both a marvel of innovative engineering and, to some extent, a fundamental mathematical advance in AI.”

Are Indian institutions contributing to the foundational mathematics and AI research, or mostly applying models created abroad?

On India’s role in AI research, Professor Mausam adds, “Indian institutions contribute both by applying existing models and conducting foundational research. Globally, almost all researchers today work with these large, standardised neural models pre-trained by large companies, due to the inherent intelligence they possess. These massive models are used to generate artificial data to train other specialised models, critique errors of smaller models, and develop applications in domain areas such as healthcare, law and software engineering.”

Is the work at Indian AI Research Organisation more about building applications or contributing to the fundamental Mathematics behind AI?

“IAIRO is designed as an execution-first institution that bridges the gap between foundational science and real-world deployment. We are not just building wrappers around existing tools; we are developing the C3AN (Custom, Compact, Composite, Collaborative, and Neurosymbolic) framework,” says Amit Sheth, Founding Director at Indian AI Research Organisation.

He explains that this involves contributing to the fundamental Mathematics of Compact and Neurosymbolic AI—creating smaller, high-performance models that use less compute—while simultaneously applying these breakthroughs to nationally critical sectors like healthcare, pharma, and sustainability. “We believe you cannot have world-class applications without owning the underlying Mathematical IP,” he says.

How is India contributing to AI research globally, particularly in Mathematics and modelling and what gap still exists?

“India has moved from being a consumer of AI to a significant contributor, now ranking second globally in GitHub AI projects and leading in AI skill penetration. However, a structural gap remains in Frontier AI research. While we excel at application and data engineering, we need more “Sovereign AI” that reduces reliance on foreign monolithic models. By focusing on NeSy AI (combines two different ways of thinking – neural networks and symbolic reasoning) domain-specific architectures like the C3AN framework, India is now positioning itself to lead the next generation of “Reasoning AI” rather than just chasing the “larger is better” trend of the West,” says Prof. Sheth.

(Sign up for THEdge, The Hindu’s weekly education newsletter.)


Leave a Reply

Your email address will not be published. Required fields are marked *