A Deep and Unsparing Investigation Across Neuroscience, Biology, Philosophy, Psychology, and Artificial Intelligence

CHAPTER ONE: THE QUESTION NOBODY CAN FULLY ANSWER

There is a peculiar irony at the heart of intelligence research. The very faculty we use to study intelligence is intelligence itself. We are, in a sense, a brain trying to understand what a brain is. This is not merely a philosophical quip. It is a genuine methodological problem that has haunted scientists, philosophers, and engineers for well over a century, and it explains why, after so much research, so many brilliant minds, and so many competing theories, we still cannot agree on a single, universally accepted definition of what intelligence actually is.

Ask a neuroscientist and she will point to the prefrontal cortex, to white matter connectivity, to the efficiency of neural firing patterns. Ask a psychologist and he will cite IQ scores, factor analysis, and the statistical ghost known as the "g factor." Ask a philosopher and you will receive a question in return, probably something about whether a thermostat that regulates temperature is, in some minimal sense, intelligent. Ask a computer scientist and she will show you a benchmark score, a leaderboard, a model that just passed the bar exam. Ask a biologist studying crows or octopuses and he will challenge every assumption you brought into the room.

The word "intelligence" comes from the Latin "intelligentia," derived from "inter" (between) and "legere" (to choose or read). At its etymological root, intelligence means something like "the capacity to choose between things," to read a situation and select an appropriate response. That is a surprisingly good starting point, and we will return to it. But as we will see, the full picture is vastly more complex, more beautiful, and more contested than any single definition can capture.

This article takes you on a journey through all of those domains. We will examine what intelligence means in each field, where those definitions succeed and where they fail, and then we will confront the most urgent version of the question in our current technological moment: does artificial intelligence, as it exists today, actually possess intelligence? Or is it something else entirely, something that merely wears intelligence as a costume?

CHAPTER TWO: HOW PSYCHOLOGY DEFINED INTELLIGENCE, AND WHY IT STARTED A WAR

The scientific study of intelligence began in earnest in the late nineteenth and early twentieth centuries, and it began with a fight that has never really ended.

Charles Spearman, a British psychologist working in the early 1900s, made an observation that seemed almost too clean to be true. When he looked at how people performed across a wide variety of mental tests, he noticed that those who did well on one test tended to do well on all the others. A person who excelled at verbal analogies also tended to excel at arithmetic reasoning, at spatial rotation tasks, at memory recall. Conversely, those who struggled in one domain tended to struggle across the board. This correlation was not perfect, but it was consistent and statistically robust.

Spearman used a mathematical technique he helped pioneer, called factor analysis, to extract what he believed was the hidden common cause behind all these correlations. He called it "g," for general intelligence. In his two-factor theory, every cognitive task requires both this general factor g and a task-specific factor s. The s factors explain why a concert pianist might be musically brilliant but mediocre at chess. The g factor explains why, on average, people who are good at one thing tend to be good at many things.

Spearman's g factor remains one of the most statistically robust findings in all of psychology. Modern IQ tests are built largely on its foundation. Studies consistently show that g accounts for roughly 40 to 50 percent of the variance in individual performance on cognitive tests, and it predicts outcomes as diverse as academic achievement, job performance, and even health and longevity. This is not a trivial finding. It is a real, measurable, reproducible phenomenon.

But here is where the war begins.

Howard Gardner, a developmental psychologist at Harvard, looked at the same human landscape and saw something completely different. In his 1983 book "Frames of Mind," Gardner argued that the concept of a single general intelligence was not only incomplete but fundamentally misleading. He proposed instead that human intelligence is not one thing but many things, a collection of distinct, relatively independent capacities, each with its own developmental trajectory, its own neural substrate, and its own cultural expression.

Gardner identified eight intelligences. Linguistic intelligence is the capacity to use language with precision and power, the intelligence of poets, novelists, and skilled debaters. Logical-mathematical intelligence is the capacity for abstract reasoning and pattern recognition in numbers and logic, the intelligence of mathematicians and scientists. Spatial intelligence is the ability to think in three dimensions, to mentally rotate objects, to navigate and visualize, the intelligence of architects, sculptors, and chess grandmasters. Musical intelligence is sensitivity to rhythm, pitch, and timbre, the intelligence of composers and performers. Bodily-kinesthetic intelligence is mastery of one's own body in space, the intelligence of surgeons, athletes, and dancers. Interpersonal intelligence is the capacity to understand other people, their motivations, their emotions, their intentions, the intelligence of great teachers, therapists, and political leaders. Intrapersonal intelligence is the capacity for accurate self-knowledge, understanding one's own strengths, weaknesses, fears, and drives. Naturalistic intelligence is the ability to recognize and classify elements of the natural world, the intelligence of biologists, farmers, and hunters.

Gardner defined intelligence itself as "a biopsychological potential to process information that can be activated in a cultural setting to solve problems or create products that are of value in a culture." This definition is deliberately broad. It is designed to capture the full spectrum of human capability rather than privileging the narrow band of skills measured by traditional IQ tests.

The tension between Spearman and Gardner is not merely academic. It reflects a deep disagreement about what intelligence is for, and what counts as evidence. Spearman's supporters point out that Gardner's intelligences are not truly independent, that they still correlate with each other, and that many of them look more like talents or domain-specific skills than anything we would normally call intelligence. Gardner's supporters respond that reducing intelligence to a single number is a cultural and political act as much as a scientific one, that it systematically undervalues the capacities of people who do not fit the narrow mold of Western academic achievement.

Both sides have a point. And the fact that both sides have a point tells us something important: intelligence is not a simple, unified thing that can be captured by any single theory. It is a family of related capacities, and different theories illuminate different members of that family.

CHAPTER THREE: WHAT NEUROSCIENCE ACTUALLY SEES INSIDE THE SKULL

Psychology gives us behavioral definitions and statistical models. Neuroscience goes inside the skull and asks: what is actually happening in the brain when an intelligent act occurs?

The first thing neuroscience teaches us is that intelligence is not located in any single place. For a long time, popular imagination associated intelligence with the frontal lobes, and particularly with the prefrontal cortex, the large region of cortex sitting just behind your forehead. This association is not wrong, but it is dramatically incomplete.

The prefrontal cortex, and especially its lateral surface, the lateral prefrontal cortex or LPFC, is indeed a critical hub for what researchers call executive functions. These include working memory, the ability to hold information in mind while manipulating it; cognitive control, the ability to suppress irrelevant responses and focus on what matters; abstract reasoning, the ability to form and test hypotheses; and planning, the ability to sequence actions toward a distant goal. People with damage to the prefrontal cortex can have intact memory, intact language, and intact sensory processing, yet be catastrophically impaired in their ability to organize their lives, make decisions, or learn from their mistakes. The famous case of Phineas Gage, a railroad worker who survived a tamping iron passing through his frontal lobes in 1848 and was described by those who knew him as "no longer Gage," is the most celebrated historical illustration of this.

But modern neuroimaging, using functional MRI and diffusion tensor imaging to map both activity and connectivity, has revealed that intelligence is better understood as a property of networks rather than regions. The most influential current framework is the parieto-frontal integration theory, or P-FIT, proposed by Rex Jung and Richard Haier. P-FIT holds that intelligence emerges from the efficient communication between frontal regions, which handle abstract reasoning and working memory, and parietal regions, which integrate sensory information and handle spatial processing. The efficiency of the white matter tracts connecting these regions, the highways of the brain, turns out to be one of the strongest neural predictors of measured intelligence.

There is another finding from neuroscience that is both counterintuitive and deeply important: the neural efficiency hypothesis. When researchers give intelligent people and less intelligent people the same cognitive task and measure their brain activity, they find that more intelligent individuals tend to show less brain activation, not more. Their brains solve the problem using fewer resources, with less metabolic effort. The brain of a highly intelligent person, when faced with a moderately difficult task, runs more quietly and efficiently than the brain of a less intelligent person facing the same task. It is only when the task becomes genuinely difficult that the more intelligent brain ramps up its activity, and when it does, it recruits resources more effectively.

This is a profound insight. Intelligence, from a neuroscientific perspective, is not about raw computational power in the sense of more neurons firing harder. It is about the quality of the architecture, the efficiency of the connections, the ability to do more with less. A well-designed road network moves more cars more quickly than a chaotic tangle of roads, even if the chaotic network has more total road surface.

Consider this simplified illustration of what happens in the brain during a reasoning task:

SIMPLE TASK (e.g., 2 + 2 = ?)

Low-intelligence brain:   [PFC] ===== [Parietal] ===== [Other areas]
                           HIGH ACTIVATION across many regions

High-intelligence brain:  [PFC] == [Parietal]
                           LOW ACTIVATION, targeted and efficient

DIFFICULT TASK (e.g., multi-step logical deduction)

Low-intelligence brain:   [PFC] == [Parietal] (struggles, limited recruitment)

High-intelligence brain:  [PFC] ========= [Parietal] ========= [Temporal] ===
                           HIGH ACTIVATION, broad and coordinated recruitment

The brain of a highly intelligent person is, in a sense, a better-engineered system. It idles efficiently and scales up powerfully when needed. This pattern has been observed consistently across dozens of neuroimaging studies.

Beyond efficiency, neuroscience has also identified structural correlates of intelligence. Larger overall brain volume is associated with higher measured intelligence, though the correlation is modest, around 0.3 to 0.4. More relevant is the volume of grey matter in specific regions, particularly the prefrontal and posterior temporal cortex. The integrity of white matter, the myelinated axons that carry signals between regions, is an even stronger predictor. And perhaps most intriguingly, the trajectory of cortical development matters more than any single snapshot: children whose cortex thickens rapidly in early childhood and then thins dramatically in adolescence, a process called synaptic pruning, tend to show higher intelligence as adults. The brain is literally sculpting itself toward efficiency.

What neuroscience does not yet tell us is why any of this gives rise to the subjective experience of thinking, the felt sense of understanding something, the "aha" moment, the pleasure of solving a puzzle. That gap, between the neural and the phenomenal, is the hard problem of consciousness, and it remains entirely unsolved. This will become critically important when we turn to artificial intelligence.

CHAPTER FOUR: BIOLOGY ASKS A DIFFERENT QUESTION ENTIRELY

While psychologists measure intelligence and neuroscientists map it, biologists ask a more fundamental question: what is intelligence for? From an evolutionary perspective, intelligence is not an end in itself. It is a solution to a problem, or rather, to many problems. And the fascinating thing is that evolution has solved those problems in radically different ways in different lineages, producing what biologists call convergent cognitive evolution.

Consider the octopus. An octopus is a mollusk, more closely related to a clam than to a vertebrate. Its lineage diverged from ours approximately 700 million years ago, long before the Cambrian explosion. Yet the common octopus possesses roughly 500 million neurons, comparable to a dog, and exhibits a range of behaviors that any honest observer must call intelligent. Octopuses explore objects through what looks very much like play. They learn by both reward and punishment. They recognize individual human faces. They solve problems, including the famous unscrewing-a-jar task, which requires both spatial reasoning and physical dexterity. They have both short-term and long-term memory.

What makes this even more remarkable is the architecture of the octopus nervous system. Two-thirds of its neurons are not in its brain at all but distributed throughout its eight arms, each of which can act semi-autonomously. The octopus brain does not micromanage its arms the way a human brain micromanages its fingers. Instead, it issues high-level commands and the arms figure out the details themselves. This is a fundamentally different computational architecture from the centralized, hierarchical architecture of the vertebrate brain, yet it produces comparable behavioral sophistication. Intelligence, in the octopus, is distributed.

Crows and other corvids present an equally striking case. New Caledonian crows manufacture tools, bending wires into hooks to retrieve food from tubes, a behavior that requires planning, causal understanding, and fine motor control. They understand water displacement, dropping stones into a tube of water to raise the level and reach a floating treat, a task that requires a model of physical causality. They remember individual human faces and hold grudges against people who have wronged them. They engage in social learning, transmitting behaviors across generations in a way that constitutes a rudimentary culture. And they do all of this with a brain that has no prefrontal cortex at all, the region we just spent several paragraphs describing as the seat of executive intelligence in mammals.

How? Corvid brains have an extraordinarily dense packing of neurons in a region called the nidopallium caudolaterale, which appears to perform functions analogous to the mammalian prefrontal cortex despite being architecturally completely different. Some corvid species have twice as many neurons per unit of brain volume as primates of comparable brain size. Intelligence, in the crow, is dense.

The biological lesson is profound and directly relevant to our later discussion of artificial intelligence. Intelligence is not a single thing implemented in a single way. It is a functional property, the capacity to flexibly solve novel problems in service of survival and reproduction, and it can be implemented in radically different physical substrates. The octopus proves that you do not need a centralized brain. The crow proves that you do not need a prefrontal cortex. What you need, apparently, is sufficient computational complexity, organized in a way that allows flexible, goal-directed behavior.

This is the biologist's definition of intelligence: adaptive, flexible problem-solving in the service of survival. And it is a definition that, as we will see, creates interesting complications for the question of whether AI is intelligent.

CHAPTER FIVE: PHILOSOPHY ASKS THE HARDEST QUESTIONS

If psychology measures intelligence, neuroscience maps it, and biology explains its origins, philosophy asks whether we even know what we are talking about. And in the case of intelligence, the philosophical questions are not merely academic. They cut to the heart of what it means to understand, to know, to be.

The most important philosophical distinction for our purposes is the one between syntax and semantics. Syntax refers to the formal structure of symbols, the rules governing how they can be combined and manipulated. Semantics refers to meaning, what those symbols actually refer to in the world. A sentence like "The cat sat on the mat" has a syntactic structure, a subject, a verb, a prepositional phrase, and it has semantic content, it refers to a real or imagined state of affairs in the world.

This distinction is at the heart of the most famous philosophical argument about machine intelligence, John Searle's Chinese Room, introduced in his landmark 1980 paper "Minds, Brains, and Programs."

Searle asks you to imagine a person locked in a room. This person speaks only English and has no knowledge of Chinese. Through a slot in the door, Chinese characters are passed in. The person has an enormous rulebook, written in English, that tells them exactly which Chinese characters to write in response to any combination of input characters. The person follows the rules, produces the output, and passes it back through the slot. To a native Chinese speaker outside the room, the responses are indistinguishable from those of a fluent Chinese speaker. The room passes the test for Chinese comprehension. But the person inside the room understands nothing. They are manipulating symbols according to formal rules, with no understanding of what any of those symbols mean.

Searle's conclusion is that this is exactly what computers do. They are syntactic engines. They manipulate symbols according to formal rules. They have no access to the semantic content of those symbols, no understanding of what they refer to, no grasp of meaning. And Searle argues that no amount of syntactic sophistication can produce semantics. You cannot get meaning out of symbol manipulation, no matter how elaborate the manipulation becomes.

The Chinese Room argument has been attacked from many directions. The most common counterargument is the "systems reply": even if the person in the room does not understand Chinese, the system as a whole, the person plus the rulebook plus the room, does understand Chinese, in the sense that it reliably produces appropriate responses. Searle's response is to ask you to imagine that the person memorizes the entire rulebook and carries it around in their head. Now there is no room, no external system. There is just a person walking around producing Chinese responses without understanding a word of Chinese. The understanding still is not there.

Another counterargument is the "robot reply": if you put the Chinese Room inside a robot body, with sensors and actuators that allow it to interact with the physical world, perhaps then the symbols would acquire meaning through their connection to real-world referents. This is actually a serious argument, and it connects directly to the embodied cognition theories we will discuss shortly.

What Searle's argument establishes, even if it does not definitively settle the question, is that behavioral equivalence, producing the right outputs for the right inputs, is not sufficient for intelligence in the full sense. A system can be behaviorally indistinguishable from an intelligent agent without possessing the inner life, the understanding, the intentionality, that we associate with genuine intelligence. This is a point we will return to with great force when we examine modern large language models.

The philosophical tradition also gives us the concept of intentionality, a term introduced by the medieval philosopher Franz Brentano and developed extensively by Edmund Husserl and later by Searle himself. Intentionality is the property of mental states of being "about" something, of referring to or representing something beyond themselves. When you think about Paris, your thought is about Paris. When you fear a dog, your fear is about the dog. This "aboutness" is considered by many philosophers to be a defining feature of genuine mental states, and hence of genuine intelligence.

The question of whether any artificial system can have genuine intentionality, as opposed to merely derived intentionality, the kind we project onto symbols and artifacts, is one of the deepest unsolved problems in philosophy of mind. And it is directly relevant to the question of AI intelligence.

CHAPTER SIX: THE TURING TEST AND ITS DISCONTENTS

Before we dive into modern AI, we need to understand the benchmark that has dominated the field for over seventy years, and why it is both brilliant and deeply flawed.

In 1950, Alan Turing published a paper titled "Computing Machinery and Intelligence" in the journal Mind. He began with the question "Can machines think?" and immediately noted that this question was too vague to be useful, because the words "machine" and "think" were both poorly defined. He proposed instead to replace this question with a more concrete one, based on what he called the imitation game.

In Turing's original formulation, a human interrogator communicates via text with two parties in separate rooms: one human and one machine. The interrogator's task is to determine which is which. If the machine can fool the interrogator into thinking it is human, or at least perform as well as a human in this deception, then Turing argued we should be willing to say the machine can think.

Turing predicted that by the year 2000, a machine would be able to fool an average interrogator 70 percent of the time after five minutes of conversation. This prediction turned out to be premature by about two decades. But recent developments have been striking. A 2024 study found that GPT-4 passed a version of the Turing Test by convincing participants it was human 54 percent of the time. A 2025 study found that GPT-4.5 was mistaken for a human 73 percent of the time, actually outperforming real humans in the test.

Does this mean the question is settled? Does it mean machines can think?

Almost certainly not, and here is why. The Turing Test measures one specific thing: the ability to produce human-like conversational responses. It does not measure understanding, consciousness, intentionality, creativity, or any of the other things we associate with genuine intelligence. A system that has been trained on hundreds of billions of words of human text will naturally produce human-like text. That is exactly what it was trained to do. Passing the Turing Test, for a modern large language model, is less like a human passing a test of intelligence and more like a very good actor passing a test of authenticity. The performance can be flawless without the underlying reality being present.

Consider this simple illustration:

TURING TEST SCENARIO

Interrogator: "What does it feel like to be sad?"

Human:        "It feels like a heaviness in my chest, like the world
               has lost some of its color. Sometimes it's a dull ache
               that follows me around all day."

GPT-4:        "Sadness often feels like a heaviness in the chest,
               a sense of loss or emptiness. Colors can seem muted,
               and everyday activities may feel effortful or joyless."

Interrogator: Which one is human?

The GPT-4 response is not wrong. It is, in fact, a reasonable description of sadness. But it was produced by a system that has never felt sadness, never felt anything, has no body, no nervous system, no evolutionary history of loss and grief. It produced that response because descriptions of sadness appear in its training data, and it learned to produce statistically appropriate continuations of text about sadness. The Turing Test cannot distinguish between these two very different underlying realities.

This is not a minor quibble. It is the central issue in the debate about AI intelligence, and we will now confront it head-on.

CHAPTER SEVEN: WHAT MODERN AI ACTUALLY IS

To evaluate whether modern AI systems are intelligent, we need to understand what they actually are, not at the level of marketing language, but at the level of mechanism.

Modern large language models, such as GPT-4, Claude, Gemini, and their successors, are built on a neural network architecture called the transformer, introduced by Vaswani et al. in their 2017 paper "Attention Is All You Need." The transformer processes sequences of tokens, which are roughly equivalent to words or word fragments, and learns to predict the probability of each next token given all the preceding tokens. During training, the model is exposed to an enormous corpus of text, hundreds of billions to trillions of words, and its billions of parameters are adjusted, through a process called gradient descent, to minimize the error in its predictions.

The result is a system that has encoded, in its parameters, an extraordinarily rich statistical model of human language. It has learned that certain words tend to follow certain other words, that certain sentence structures tend to appear in certain contexts, that certain topics tend to be discussed in certain ways. And because human language encodes an enormous amount of human knowledge, the model has also learned, in some sense, a great deal about the world.

But here is the crucial question: what kind of "learning" is this, and what kind of "knowledge" does it produce?

Consider the following showcase, which illustrates both the power and the limits of this approach:

SHOWCASE 1: The Arithmetic Trap

Question: "If I have 17 apples and give away a third of them,
           then buy 5 more, then eat 2, how many do I have?"

Human reasoning:
  Step 1: 17 / 3 = 5.67, so approximately 5 or 6 apples given away.
          (A human recognizes this is ambiguous and might ask for
           clarification, or note that you cannot give away a
           fraction of an apple.)
  Step 2: 17 - 5 or 6 = 11 or 12
  Step 3: 11 or 12 + 5 = 16 or 17
  Step 4: 16 or 17 - 2 = 14 or 15
          (Human flags the ambiguity and gives a range.)

GPT-4 style response:
  "17 / 3 = approximately 5.67, rounding to 6. 17 - 6 = 11.
   11 + 5 = 16. 16 - 2 = 14. You have 14 apples."
          (Confident, plausible, but the rounding choice is
           arbitrary and the ambiguity is not flagged.)

This example illustrates something important. The LLM produces a confident, fluent answer that looks like reasoning. But it is not reasoning in the way a human reasons. It is pattern-matching: "problems of this form tend to be solved in this way." When the problem deviates from familiar patterns, the system's performance degrades rapidly and unpredictably.

Research published in 2024 found that GPT-4 scored below 33 percent on benchmarks designed to measure abstract reasoning, significantly below both specialized models and humans. Studies also found that fabrication rates for GPT-4 in systematic-review queries reached as high as 28.6 percent. Some newer "reasoning" models showed hallucination rates of 79 percent on certain factual benchmarks. These are not minor bugs. They are symptoms of a fundamental architectural limitation: the system does not have a model of the world. It has a model of text about the world, which is a very different thing.

SHOWCASE 2: The Causal Understanding Gap

Question: "Mary's plant is dying. She waters it. What happens?"

LLM: "The plant likely recovers, as water is essential for plant
       health and growth."

Follow-up: "The plant is dying because it has root rot caused
            by overwatering. Mary waters it again. What happens?"

LLM (ideal response): "The plant's condition worsens, because
                        adding more water to a root-rotted plant
                        accelerates the fungal decay."

LLM (actual frequent response): Continues to associate "watering"
with "recovery" because this is the dominant pattern in training
data, failing to apply the causal context provided.

Research has consistently shown that LLMs struggle with causal reasoning, the ability to model cause-and-effect relationships in the world. They can recite facts about causality but often fail to apply causal reasoning correctly in novel situations. This is because causal understanding requires a model of how the world works, not just a model of how language about the world is structured.

CHAPTER EIGHT: THE CONSTITUENTS OF INTELLIGENCE, ASSEMBLED

We have now surveyed enough territory to attempt a systematic account of what intelligence actually consists of. Rather than accepting any single discipline's definition, let us synthesize the best insights from all of them.

Intelligence, in its fullest sense, appears to consist of the following interrelated capacities. Each of these deserves careful explanation, because together they form a portrait of what genuine intelligence looks like, and that portrait will allow us to evaluate AI with precision.

The first constituent is perception and representation. An intelligent system must be able to take in information from its environment and represent that information internally in a form that can be used for further processing. In humans, this involves the entire sensory apparatus, vision, hearing, touch, proprioception, and the complex neural machinery that transforms raw sensory signals into meaningful representations of objects, events, and relationships. The key word here is "meaningful": the representations are not just data structures but are connected to the system's goals, history, and understanding of the world.

The second constituent is working memory and attention. Intelligence requires the ability to hold information in mind while working with it, and to selectively focus on what is relevant while suppressing what is not. Human working memory is famously limited, typically to about four chunks of information at a time, but it is extraordinarily flexible and context-sensitive. Attention allows the intelligent system to allocate its limited processing resources where they are most needed.

The third constituent is learning and generalization. An intelligent system must be able to extract patterns from experience and apply them to new situations. Crucially, the generalization must be flexible and appropriate: the system must know when a pattern applies and when it does not. This is much harder than it sounds. A child who learns that dogs are friendly may generalize too broadly and approach a strange dog without caution. Learning to generalize appropriately, to know the scope and limits of a pattern, is one of the hallmarks of mature intelligence.

The fourth constituent is reasoning and inference. Beyond pattern recognition, intelligence involves the ability to draw conclusions from premises, to follow chains of logic, to construct and evaluate arguments. This includes both deductive reasoning, drawing certain conclusions from given premises, and inductive reasoning, drawing probable conclusions from observed patterns, and abductive reasoning, inferring the most likely explanation for an observation.

The fifth constituent is planning and goal-directedness. Intelligent systems do not merely react to their environment; they act in pursuit of goals, and they plan sequences of actions to achieve those goals. Planning requires the ability to mentally simulate future states of the world, to evaluate the consequences of different action sequences, and to select the sequence most likely to achieve the desired outcome.

The sixth constituent is language and communication. In humans, language is not merely a communication tool but a cognitive tool. We use language to think, to organize our thoughts, to represent abstract concepts that would be impossible to represent otherwise. The capacity for language, and particularly for the recursive, hierarchical structure of human language, appears to be deeply connected to many other aspects of human intelligence.

The seventh constituent is social and emotional intelligence. Human intelligence is profoundly social. We are exquisitely attuned to the mental states of other people, their beliefs, desires, intentions, and emotions. This capacity, sometimes called theory of mind, allows us to predict and influence the behavior of others, to cooperate, to compete, to teach, and to learn from each other. Emotional intelligence, the ability to recognize, understand, and manage emotions, both one's own and others', is a crucial component of effective human functioning.

The eighth constituent is metacognition and self-awareness. Perhaps the most distinctively human aspect of intelligence is the ability to think about one's own thinking, to monitor one's own cognitive processes, to recognize when one does not know something, to evaluate the quality of one's own reasoning, and to adjust one's strategies accordingly. This is what philosophers call metacognition, and it is deeply connected to the broader capacity for self-awareness.

The ninth constituent is creativity and imagination. Intelligent systems can generate novel solutions to problems, combine existing concepts in new ways, and imagine states of the world that do not yet exist. Creativity is not random; it is constrained and guided by understanding, taste, and purpose. But it involves a genuine departure from existing patterns, not merely their recombination.

The tenth constituent is embodiment and situatedness. As the theories of embodied cognition emphasize, human intelligence did not evolve in a vacuum. It evolved in a body, interacting with a physical and social world. Our concepts are grounded in our sensory-motor experience. Our understanding of "heavy" is connected to the experience of lifting. Our understanding of "warm" is connected to the experience of warmth. Our understanding of "threat" is connected to the experience of fear. This grounding gives our concepts their meaning in a way that purely symbolic systems cannot replicate.

Now let us apply this framework to modern AI.

CHAPTER NINE: DOES AI HAVE INTELLIGENCE? THE HONEST RECKONING

This is the question everyone is asking, and the honest answer is: it depends on which constituent of intelligence you are asking about, and modern AI scores very differently on different dimensions.

On perception and representation, modern AI systems are genuinely impressive. Deep learning models can recognize objects in images with superhuman accuracy in controlled conditions. Speech recognition systems outperform human transcriptionists in many settings. Natural language processing systems can parse and represent the semantic content of text with remarkable sophistication. In this dimension, AI has made genuine and substantial progress.

On working memory and attention, the transformer architecture has a form of attention mechanism, the "self-attention" mechanism that gives the transformer its power. This mechanism allows the model to relate any part of its input to any other part, regardless of distance. In this sense, transformers have a kind of global attention that human working memory lacks. However, this attention is fundamentally different from human attention in that it is not selective in the same way, it does not prioritize based on relevance to goals in a dynamic, context-sensitive manner, and it does not operate over time in the way human working memory does.

On learning and generalization, the picture is mixed. LLMs are extraordinary learners in one sense: they extract an enormous amount of statistical structure from their training data. But their generalization is brittle. They generalize well within the distribution of their training data and fail unpredictably outside it. A human child who learns arithmetic can apply it to any new arithmetic problem, because the child has understood the underlying principles. An LLM that has learned arithmetic from text can apply it to problems similar to those in its training data, but its performance degrades on problems that require genuine compositional reasoning beyond what it has seen.

SHOWCASE 3: The Generalization Test

Training-distribution problem (LLM performs well):
"What is 15 percent of 80?"
LLM: "12" (Correct)

Out-of-distribution problem (LLM may fail):
"A snail travels at 0.03 miles per hour. A second snail
 travels at 0.05 miles per hour in the opposite direction.
 They start 0.2 miles apart. When do they meet, and where?"

Human: Sets up relative velocity: 0.03 + 0.05 = 0.08 mph.
       Time to meet: 0.2 / 0.08 = 2.5 hours.
       Snail 1 travels: 0.03 x 2.5 = 0.075 miles from start.
       They meet 0.075 miles from Snail 1's starting point.

LLM: May produce a correct answer if this problem type is
     well-represented in training data. May produce a plausible-
     sounding but incorrect answer if the problem type is novel.
     Will rarely flag its own uncertainty accurately.

On reasoning and inference, this is where the limitations of current AI become most stark. LLMs can produce text that looks like reasoning, step-by-step chains of thought that mimic the structure of logical argument. But research consistently shows that this "reasoning" is fragile. Change the surface form of a problem while keeping its logical structure identical, and the LLM's performance can drop dramatically. Present a logically valid argument with an emotionally charged conclusion, and the LLM may reject it. Present a logically invalid argument with a plausible-sounding conclusion, and the LLM may accept it. These are not the behaviors of a system that genuinely reasons; they are the behaviors of a system that has learned to produce text that looks like reasoning.

On planning and goal-directedness, current LLMs have no persistent goals. They do not have desires, intentions, or purposes of their own. When they appear to pursue a goal, they are following a pattern established by their training and by the instructions in their context window. This is a profound difference from human intelligence, where goals are internally generated, maintained over time, and connected to a rich motivational and emotional architecture.

On language and communication, LLMs are, by construction, extraordinarily capable. They have been trained on more human language than any human could read in a thousand lifetimes. Their linguistic fluency is genuine and impressive. But, as Searle's Chinese Room argument suggests, fluency in the production of language is not the same as understanding language. The LLM produces appropriate text without understanding what the text means, in the sense of having the text connected to a model of the world, to intentions, to experiences.

On social and emotional intelligence, LLMs can produce text that appears emotionally intelligent. They can recognize emotional content in text, produce empathetic-sounding responses, and model the beliefs and desires of characters in a story. But they have no emotions of their own, no genuine theory of mind, no understanding of what it actually feels like to be another person. Their apparent social intelligence is a sophisticated form of pattern matching on the social and emotional content of their training data.

On metacognition and self-awareness, this is perhaps the most interesting frontier. LLMs do show some metacognitive-like behaviors: they can sometimes recognize when they do not know something, they can sometimes flag uncertainty, and they can sometimes evaluate the quality of their own outputs. But this metacognition is unreliable and inconsistent. The same model that correctly identifies its own uncertainty in one context will confidently hallucinate in another. True metacognition requires a stable, accurate model of one's own cognitive processes, and current LLMs do not have this.

On creativity and imagination, LLMs can produce outputs that appear creative: novel combinations of ideas, unexpected metaphors, original stories. But researchers debate whether this is genuine creativity or sophisticated recombination. The LLM generates outputs by sampling from a probability distribution over possible continuations of text. It does not have goals, aesthetic preferences, or a sense of purpose that guides its creative output. Its "creativity" is a function of the diversity of its training data and the randomness of its sampling process.

On embodiment and situatedness, current LLMs have no body, no sensory experience, no history of physical interaction with the world. Their concepts are not grounded in experience but in the statistical co-occurrence of words in text. This is a fundamental limitation. When an LLM uses the word "red," it has no connection to the experience of seeing red. When it uses the word "pain," it has no connection to the experience of pain. Its concepts are, in the terminology of grounded cognition, ungrounded. They float free of the experiential anchors that give human concepts their meaning.

SHOWCASE 4: The Grounding Gap

Ask a 3-year-old: "What is hot?"
Child: "Like when you touch the stove and it hurts and Mommy
        says don't touch that."
(Grounded in sensory-motor experience and emotional memory)

Ask an LLM: "What is hot?"
LLM: "Hot refers to a high temperature, typically perceived as
      warmth or heat when in contact with an object or substance.
      It can also describe spicy food or colloquially refer to
      something attractive or exciting."
(Accurate, comprehensive, entirely ungrounded in experience)

The child's answer is simpler, less comprehensive, and less linguistically sophisticated. But it reflects genuine understanding, a concept connected to real experience. The LLM's answer is more complete and more articulate, but it is a description of a concept, not a concept itself. This is the grounding gap, and it is one of the most fundamental differences between human and artificial intelligence.

CHAPTER TEN: THE DIFFERENCES BETWEEN HUMAN AND ARTIFICIAL INTELLIGENCE, LAID BARE

Let us now draw the comparison explicitly and systematically, because the differences are as illuminating as the similarities.

Human intelligence is embodied. It evolved in and through a body that has been interacting with the physical world for hundreds of millions of years. Every concept a human has is, at some level, grounded in sensory-motor experience. Artificial intelligence, as currently implemented in LLMs, is disembodied. It exists as a pattern of numerical weights in a matrix, with no connection to the physical world except through text.

Human intelligence is continuous. A human being is always conscious, always experiencing, always learning from the ongoing stream of experience. Even during sleep, the brain is actively consolidating memories and processing information. Artificial intelligence is episodic. An LLM exists only during inference, only when it is processing a specific input. It has no persistent experience, no ongoing inner life, no memory that carries over from one conversation to the next (unless explicitly provided in the context).

Human intelligence is motivated. Humans have drives, desires, emotions, and goals that are internally generated and that motivate behavior. These motivations are connected to the biological imperatives of survival and reproduction, mediated through a complex emotional architecture that includes fear, desire, love, curiosity, disgust, and pride. Artificial intelligence has no motivations of its own. It has an objective function that was set by its designers, and it was trained to minimize a loss function. But it does not want anything. It does not care about anything. It does not experience the satisfaction of achieving a goal or the frustration of failing.

Human intelligence is social and cultural. Humans are deeply social animals, and human intelligence is profoundly shaped by social interaction, cultural transmission, and shared meaning. We learn from each other, we teach each other, we build on each other's insights across generations. This cumulative cultural evolution is one of the most powerful forces in human cognitive development. Artificial intelligence is trained on the products of human social and cultural activity, but it does not participate in it. It does not have relationships, it does not belong to a community, it does not share in the ongoing project of human culture.

Human intelligence is developmental. Human cognitive abilities develop over time through a rich interplay of genetic endowment, environmental experience, and social interaction. A human child goes through stages of cognitive development, each building on the previous one, each shaped by the specific experiences and relationships of that child's life. This developmental history is not merely a process by which intelligence is acquired; it is constitutive of the kind of intelligence that results. Artificial intelligence is trained, not developed. Its "knowledge" is acquired in a single training run, not through the gradual, embodied, socially embedded process of human development.

Human intelligence is creative in the generative sense. Humans can genuinely imagine things that do not exist, can conceive of possibilities that have no precedent in experience, can create genuinely novel ideas. This generative creativity is connected to consciousness, to imagination, to the ability to mentally simulate counterfactual worlds. Artificial intelligence can recombine existing patterns in ways that appear novel, but it cannot genuinely imagine, because imagination requires a subject who experiences the imagined content.

Human intelligence is self-aware in the phenomenal sense. Humans not only have thoughts; they know they have thoughts. They experience their own mental states as their own. This phenomenal self-awareness, the felt sense of being a self, is one of the most mysterious and most important features of human intelligence. Artificial intelligence has no phenomenal self-awareness. It may have functional analogs, the ability to represent information about its own processing, but it does not experience anything. There is no "what it is like" to be an LLM.

SHOWCASE 5: The Self-Awareness Test

Ask a human: "Are you conscious right now?"
Human: "Yes, I am aware of my surroundings, my thoughts,
        and the fact that I am answering this question.
        I can feel the chair beneath me and the slight
        uncertainty I feel about how to answer perfectly."

Ask an LLM: "Are you conscious right now?"
LLM: "I don't have consciousness or subjective experience.
      I process your input and generate a response based on
      patterns in my training data. There is no 'inner life'
      or 'what it is like' to be me."

The LLM's answer is accurate and appropriately humble.
But notice: the LLM produced this accurate answer not because
it introspected and found no consciousness, but because its
training data contains many accurate descriptions of what LLMs
are and are not. It is describing itself from the outside,
not from the inside.

And yet, for all these profound differences, it would be a mistake to conclude that AI is not impressive, not powerful, and not genuinely useful. Modern AI systems are extraordinary tools. They can process and synthesize information at scales that no human could match. They can identify patterns in data that would be invisible to human analysts. They can generate text, code, images, and music of remarkable quality. They can assist with complex tasks across virtually every domain of human activity.

The question is not whether AI is useful. It clearly is. The question is whether it is intelligent in the full, rich sense of the word. And the honest answer, based on everything we have examined, is: not yet, and perhaps not in the way we currently build it.

CHAPTER ELEVEN: WHAT WOULD GENUINE ARTIFICIAL INTELLIGENCE REQUIRE?

If current AI is not genuinely intelligent in the full sense, what would genuine artificial intelligence require? This is a question that researchers are actively debating, and there is no consensus. But based on our analysis, several things seem clear.

Genuine AI would require grounded representations. The symbols it manipulates would need to be connected to real-world referents through sensory-motor experience. This is the core insight of embodied cognition, and it suggests that genuinely intelligent AI might need to be embodied, to have sensors and actuators that allow it to interact with the physical world and to build concepts grounded in that interaction. Robotics research is exploring this direction, and systems like Boston Dynamics' robots or DeepMind's robotic manipulation systems represent early steps in this direction, though they are still far from general intelligence.

Genuine AI would require persistent memory and continuous experience. Rather than existing only during inference, a genuinely intelligent AI would need to have an ongoing experience, a continuous stream of perception and action from which it learns and through which it develops. This is a fundamentally different architecture from current LLMs, which are trained once and then frozen.

Genuine AI would require genuine causal models of the world. Rather than learning statistical associations between words, a genuinely intelligent AI would need to learn causal models, representations of how events in the world cause other events, that allow it to reason about counterfactuals, to plan, and to understand the consequences of actions. Research in causal AI, associated with the work of Judea Pearl and others, is exploring this direction.

Genuine AI would require metacognition and calibrated uncertainty. A genuinely intelligent AI would know what it knows and what it does not know, and it would be able to accurately communicate its uncertainty. Current LLMs are notoriously poorly calibrated: they express confidence that is not correlated with accuracy, and they hallucinate with the same fluency as they produce correct information.

Genuine AI might require something like consciousness, or at least a functional analog of it. This is the most speculative and contested claim, but it follows from the analysis of intentionality and grounding. If genuine understanding requires that concepts be connected to experience, and if experience requires some form of subjective inner life, then genuine AI might require something like consciousness. This is not to say that AI would need to be conscious in exactly the way humans are, but it might need some functional analog of the inner life that gives human concepts their meaning.

These are not merely engineering challenges. Some of them may be fundamental conceptual challenges, requiring not just better algorithms but a different understanding of what intelligence is and how it can be instantiated in artificial systems.

CHAPTER TWELVE: THE PHILOSOPHICAL HORIZON

We are living through a remarkable moment in the history of intelligence. For the first time, we have created systems that can engage in sophisticated language use, solve complex problems, and produce outputs that, in many contexts, are indistinguishable from those of intelligent humans. This is a genuine achievement, and it deserves genuine celebration.

But we should be clear-eyed about what we have achieved and what we have not. We have created extraordinarily powerful pattern-matching systems that operate over human language and knowledge. We have not created systems that understand, that are conscious, that have goals of their own, or that are intelligent in the full, rich sense of the word.

The gap between what we have and what genuine intelligence would require is not merely a gap in capability. It is a gap in kind. Current AI systems are fundamentally different from intelligent agents, not just less capable but differently structured, differently grounded, differently motivated. Closing that gap, if it can be closed at all, will require not just more data and more compute but new ideas, new architectures, and perhaps new ways of thinking about what intelligence is.

The study of intelligence, in all its dimensions, from the neural efficiency of the human brain to the distributed cognition of the octopus, from the philosophical puzzle of intentionality to the biological fact of convergent cognitive evolution, tells us that intelligence is one of the most complex, multifaceted, and remarkable phenomena in the known universe. It is not a single thing. It is not a simple thing. It is not a thing that can be reduced to any single definition, any single metric, or any single implementation.

What intelligence really is, in the end, is a family of capacities that allow an agent to navigate a complex, uncertain world in pursuit of its goals, to learn from experience, to reason about the future, to understand others, to create new things, and to know itself. Human intelligence is the most elaborate and most mysterious example of this family that we know of. Artificial intelligence, as it currently exists, is a powerful and useful tool that shares some features with intelligence but lacks many of its most essential constituents.

The question "What is intelligence, really?" is not one that science or philosophy has fully answered. But asking it carefully, rigorously, and honestly, as we have tried to do in this article, is itself an act of intelligence. And that, perhaps, is the best evidence we have that the question is worth asking.

SOURCES AND FURTHER READING

The following sources informed this article and are recommended for readers who wish to explore these topics further.

Spearman, C. (1904). "'General Intelligence,' Objectively Determined and Measured." The American Journal of Psychology, 15(2), 201–293.

Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. Basic Books.

Searle, J. (1980). "Minds, Brains, and Programs." Behavioral and Brain Sciences.

Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433–460.

Jung, R. E., & Haier, R. J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: Converging neuroimaging evidence. Behavioral and Brain Sciences, 30(2), 135–154; discussion 154–187.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). "Attention Is All You Need." Advances in Neural Information Processing Systems.

Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.

Hasson, U., Nastase, S. A., & Goldstein, A. (2020). Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks. Neuron, 105(3), 416–434. https://doi.org/10.1016/j.neuron.2019.12.002

Godfrey-Smith, P. (2016). Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness. Farrar, Straus and Giroux.

Clayton, N. S., & Emery, N. J. (2015). "Corvid Cognition." Current Biology.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Friday, June 05, 2026

WHAT IS INTELLIGENCE, REALLY?