Friday, June 05, 2026

WHAT IS INTELLIGENCE, REALLY?






A Deep and Unsparing Investigation Across Neuroscience, Biology, Philosophy, Psychology, and Artificial Intelligence


CHAPTER  ONE: THE QUESTION NOBODY CAN FULLY ANSWER

There is a peculiar irony at the heart of intelligence research. The very faculty we use to study intelligence is intelligence itself. We are, in a sense, a brain trying to understand what a brain is. This is not merely a philosophical quip. It is a genuine methodological problem that has haunted scientists, philosophers, and engineers for well over a century, and it explains why, after so much research, so many brilliant minds, and so many competing theories, we still cannot agree on a single, universally accepted definition of what intelligence actually is.

Ask a neuroscientist and she will point to the prefrontal cortex, to white matter connectivity, to the efficiency of neural firing patterns. Ask a psychologist and he will cite IQ scores, factor analysis, and the statistical ghost known as the "g factor." Ask a philosopher and you will receive a question in return, probably something about whether a thermostat that regulates temperature is, in some minimal sense, intelligent. Ask a computer scientist and she will show you a benchmark score, a leaderboard, a model that just passed the bar exam. Ask a biologist studying crows or octopuses and he will challenge every assumption you brought into the room.

The word "intelligence" comes from the Latin "intelligentia," derived from "inter" (between) and "legere" (to choose or read). At its etymological root, intelligence means something like "the capacity to choose between things," to read a situation and select an appropriate response. That is a surprisingly good starting point, and we will return to it. But as we will see, the full picture is vastly more complex, more beautiful, and more contested than any single definition can capture.

This article takes you on a journey through all of those domains. We will examine what intelligence means in each field, where those definitions succeed and where they fail, and then we will confront the most urgent version of the question in our current technological moment: does artificial intelligence, as it exists today, actually possess intelligence? Or is it something else entirely, something that merely wears intelligence as a costume?

CHAPTER TWO: HOW PSYCHOLOGY DEFINED INTELLIGENCE, AND WHY IT STARTED A WAR

The scientific study of intelligence began in earnest in the late nineteenth and early twentieth centuries, and it began with a fight that has never really ended.

Charles Spearman, a British psychologist working in the early 1900s, made an observation that seemed almost too clean to be true. When he looked at how people performed across a wide variety of mental tests, he noticed that those who did well on one test tended to do well on all the others. A person who excelled at verbal analogies also tended to excel at arithmetic reasoning, at spatial rotation tasks, at memory recall. Conversely, those who struggled in one domain tended to struggle across the board. This correlation was not perfect, but it was consistent and statistically robust.

Spearman used a mathematical technique he helped pioneer, called factor analysis, to extract what he believed was the hidden common cause behind all these correlations. He called it "g," for general intelligence. In his two-factor theory, every cognitive task requires both this general factor g and a task-specific factor s. The s factors explain why a concert pianist might be musically brilliant but mediocre at chess. The g factor explains why, on average, people who are good at one thing tend to be good at many things.

Spearman's g factor remains one of the most statistically robust findings in all of psychology. Modern IQ tests are built largely on its foundation. Studies consistently show that g accounts for roughly 40 to 50 percent of the variance in individual performance on cognitive tests, and it predicts outcomes as diverse as academic achievement, job performance, and even health and longevity. This is not a trivial finding. It is a real, measurable, reproducible phenomenon.

But here is where the war begins.

Howard Gardner, a developmental psychologist at Harvard, looked at the same human landscape and saw something completely different. In his 1983 book "Frames of Mind," Gardner argued that the concept of a single general intelligence was not only incomplete but fundamentally misleading. He proposed instead that human intelligence is not one thing but many things, a collection of distinct, relatively independent capacities, each with its own developmental trajectory, its own neural substrate, and its own cultural expression.

Gardner identified eight intelligences. Linguistic intelligence is the capacity to use language with precision and power, the intelligence of poets, novelists, and skilled debaters. Logical-mathematical intelligence is the capacity for abstract reasoning and pattern recognition in numbers and logic, the intelligence of mathematicians and scientists. Spatial intelligence is the ability to think in three dimensions, to mentally rotate objects, to navigate and visualize, the intelligence of architects, sculptors, and chess grandmasters. Musical intelligence is sensitivity to rhythm, pitch, and timbre, the intelligence of composers and performers. Bodily-kinesthetic intelligence is mastery of one's own body in space, the intelligence of surgeons, athletes, and dancers. Interpersonal intelligence is the capacity to understand other people, their motivations, their emotions, their intentions, the intelligence of great teachers, therapists, and political leaders. Intrapersonal intelligence is the capacity for accurate self-knowledge, understanding one's own strengths, weaknesses, fears, and drives. Naturalistic intelligence is the ability to recognize and classify elements of the natural world, the intelligence of biologists, farmers, and hunters.

Gardner defined intelligence itself as "a biopsychological potential to process information that can be activated in a cultural setting to solve problems or create products that are of value in a culture." This definition is deliberately broad. It is designed to capture the full spectrum of human capability rather than privileging the narrow band of skills measured by traditional IQ tests.

The tension between Spearman and Gardner is not merely academic. It reflects a deep disagreement about what intelligence is for, and what counts as evidence. Spearman's supporters point out that Gardner's intelligences are not truly independent, that they still correlate with each other, and that many of them look more like talents or domain-specific skills than anything we would normally call intelligence. Gardner's supporters respond that reducing intelligence to a single number is a cultural and political act as much as a scientific one, that it systematically undervalues the capacities of people who do not fit the narrow mold of Western academic achievement.

Both sides have a point. And the fact that both sides have a point tells us something important: intelligence is not a simple, unified thing that can be captured by any single theory. It is a family of related capacities, and different theories illuminate different members of that family.

CHAPTER THREE: WHAT NEUROSCIENCE ACTUALLY SEES INSIDE THE SKULL

Psychology gives us behavioral definitions and statistical models. Neuroscience goes inside the skull and asks: what is actually happening in the brain when an intelligent act occurs?

The first thing neuroscience teaches us is that intelligence is not located in any single place. For a long time, popular imagination associated intelligence with the frontal lobes, and particularly with the prefrontal cortex, the large region of cortex sitting just behind your forehead. This association is not wrong, but it is dramatically incomplete.

The prefrontal cortex, and especially its lateral surface, the lateral prefrontal cortex or LPFC, is indeed a critical hub for what researchers call executive functions. These include working memory, the ability to hold information in mind while manipulating it; cognitive control, the ability to suppress irrelevant responses and focus on what matters; abstract reasoning, the ability to form and test hypotheses; and planning, the ability to sequence actions toward a distant goal. People with damage to the prefrontal cortex can have intact memory, intact language, and intact sensory processing, yet be catastrophically impaired in their ability to organize their lives, make decisions, or learn from their mistakes. The famous case of Phineas Gage, a railroad worker who survived a tamping iron passing through his frontal lobes in 1848 and was described by those who knew him as "no longer Gage," is the most celebrated historical illustration of this.

But modern neuroimaging, using functional MRI and diffusion tensor imaging to map both activity and connectivity, has revealed that intelligence is better understood as a property of networks rather than regions. The most influential current framework is the parieto-frontal integration theory, or P-FIT, proposed by Rex Jung and Richard Haier. P-FIT holds that intelligence emerges from the efficient communication between frontal regions, which handle abstract reasoning and working memory, and parietal regions, which integrate sensory information and handle spatial processing. The efficiency of the white matter tracts connecting these regions, the highways of the brain, turns out to be one of the strongest neural predictors of measured intelligence.

There is another finding from neuroscience that is both counterintuitive and deeply important: the neural efficiency hypothesis. When researchers give intelligent people and less intelligent people the same cognitive task and measure their brain activity, they find that more intelligent individuals tend to show less brain activation, not more. Their brains solve the problem using fewer resources, with less metabolic effort. The brain of a highly intelligent person, when faced with a moderately difficult task, runs more quietly and efficiently than the brain of a less intelligent person facing the same task. It is only when the task becomes genuinely difficult that the more intelligent brain ramps up its activity, and when it does, it recruits resources more effectively.

This is a profound insight. Intelligence, from a neuroscientific perspective, is not about raw computational power in the sense of more neurons firing harder. It is about the quality of the architecture, the efficiency of the connections, the ability to do more with less. A well-designed road network moves more cars more quickly than a chaotic tangle of roads, even if the chaotic network has more total road surface.

Consider this simplified illustration of what happens in the brain during a reasoning task:

SIMPLE TASK (e.g., 2 + 2 = ?)

Low-intelligence brain:   [PFC] ===== [Parietal] ===== [Other areas]
                           HIGH ACTIVATION across many regions

High-intelligence brain:  [PFC] == [Parietal]
                           LOW ACTIVATION, targeted and efficient

DIFFICULT TASK (e.g., multi-step logical deduction)

Low-intelligence brain:   [PFC] == [Parietal] (struggles, limited recruitment)

High-intelligence brain:  [PFC] ========= [Parietal] ========= [Temporal] ===
                           HIGH ACTIVATION, broad and coordinated recruitment

The brain of a highly intelligent person is, in a sense, a better-engineered system. It idles efficiently and scales up powerfully when needed. This pattern has been observed consistently across dozens of neuroimaging studies.

Beyond efficiency, neuroscience has also identified structural correlates of intelligence. Larger overall brain volume is associated with higher measured intelligence, though the correlation is modest, around 0.3 to 0.4. More relevant is the volume of grey matter in specific regions, particularly the prefrontal and posterior temporal cortex. The integrity of white matter, the myelinated axons that carry signals between regions, is an even stronger predictor. And perhaps most intriguingly, the trajectory of cortical development matters more than any single snapshot: children whose cortex thickens rapidly in early childhood and then thins dramatically in adolescence, a process called synaptic pruning, tend to show higher intelligence as adults. The brain is literally sculpting itself toward efficiency.

What neuroscience does not yet tell us is why any of this gives rise to the subjective experience of thinking, the felt sense of understanding something, the "aha" moment, the pleasure of solving a puzzle. That gap, between the neural and the phenomenal, is the hard problem of consciousness, and it remains entirely unsolved. This will become critically important when we turn to artificial intelligence.

CHAPTER FOUR: BIOLOGY ASKS A DIFFERENT QUESTION ENTIRELY

While psychologists measure intelligence and neuroscientists map it, biologists ask a more fundamental question: what is intelligence for? From an evolutionary perspective, intelligence is not an end in itself. It is a solution to a problem, or rather, to many problems. And the fascinating thing is that evolution has solved those problems in radically different ways in different lineages, producing what biologists call convergent cognitive evolution.

Consider the octopus. An octopus is a mollusk, more closely related to a clam than to a vertebrate. Its lineage diverged from ours approximately 700 million years ago, long before the Cambrian explosion. Yet the common octopus possesses roughly 500 million neurons, comparable to a dog, and exhibits a range of behaviors that any honest observer must call intelligent. Octopuses explore objects through what looks very much like play. They learn by both reward and punishment. They recognize individual human faces. They solve problems, including the famous unscrewing-a-jar task, which requires both spatial reasoning and physical dexterity. They have both short-term and long-term memory.

What makes this even more remarkable is the architecture of the octopus nervous system. Two-thirds of its neurons are not in its brain at all but distributed throughout its eight arms, each of which can act semi-autonomously. The octopus brain does not micromanage its arms the way a human brain micromanages its fingers. Instead, it issues high-level commands and the arms figure out the details themselves. This is a fundamentally different computational architecture from the centralized, hierarchical architecture of the vertebrate brain, yet it produces comparable behavioral sophistication. Intelligence, in the octopus, is distributed.

Crows and other corvids present an equally striking case. New Caledonian crows manufacture tools, bending wires into hooks to retrieve food from tubes, a behavior that requires planning, causal understanding, and fine motor control. They understand water displacement, dropping stones into a tube of water to raise the level and reach a floating treat, a task that requires a model of physical causality. They remember individual human faces and hold grudges against people who have wronged them. They engage in social learning, transmitting behaviors across generations in a way that constitutes a rudimentary culture. And they do all of this with a brain that has no prefrontal cortex at all, the region we just spent several paragraphs describing as the seat of executive intelligence in mammals.

How? Corvid brains have an extraordinarily dense packing of neurons in a region called the nidopallium caudolaterale, which appears to perform functions analogous to the mammalian prefrontal cortex despite being architecturally completely different. Some corvid species have twice as many neurons per unit of brain volume as primates of comparable brain size. Intelligence, in the crow, is dense.

The biological lesson is profound and directly relevant to our later discussion of artificial intelligence. Intelligence is not a single thing implemented in a single way. It is a functional property, the capacity to flexibly solve novel problems in service of survival and reproduction, and it can be implemented in radically different physical substrates. The octopus proves that you do not need a centralized brain. The crow proves that you do not need a prefrontal cortex. What you need, apparently, is sufficient computational complexity, organized in a way that allows flexible, goal-directed behavior.

This is the biologist's definition of intelligence: adaptive, flexible problem-solving in the service of survival. And it is a definition that, as we will see, creates interesting complications for the question of whether AI is intelligent.

CHAPTER FIVE: PHILOSOPHY ASKS THE HARDEST QUESTIONS

If psychology measures intelligence, neuroscience maps it, and biology explains its origins, philosophy asks whether we even know what we are talking about. And in the case of intelligence, the philosophical questions are not merely academic. They cut to the heart of what it means to understand, to know, to be.

The most important philosophical distinction for our purposes is the one between syntax and semantics. Syntax refers to the formal structure of symbols, the rules governing how they can be combined and manipulated. Semantics refers to meaning, what those symbols actually refer to in the world. A sentence like "The cat sat on the mat" has a syntactic structure, a subject, a verb, a prepositional phrase, and it has semantic content, it refers to a real or imagined state of affairs in the world.

This distinction is at the heart of the most famous philosophical argument about machine intelligence, John Searle's Chinese Room, introduced in his landmark 1980 paper "Minds, Brains, and Programs."

Searle asks you to imagine a person locked in a room. This person speaks only English and has no knowledge of Chinese. Through a slot in the door, Chinese characters are passed in. The person has an enormous rulebook, written in English, that tells them exactly which Chinese characters to write in response to any combination of input characters. The person follows the rules, produces the output, and passes it back through the slot. To a native Chinese speaker outside the room, the responses are indistinguishable from those of a fluent Chinese speaker. The room passes the test for Chinese comprehension. But the person inside the room understands nothing. They are manipulating symbols according to formal rules, with no understanding of what any of those symbols mean.

Searle's conclusion is that this is exactly what computers do. They are syntactic engines. They manipulate symbols according to formal rules. They have no access to the semantic content of those symbols, no understanding of what they refer to, no grasp of meaning. And Searle argues that no amount of syntactic sophistication can produce semantics. You cannot get meaning out of symbol manipulation, no matter how elaborate the manipulation becomes.

The Chinese Room argument has been attacked from many directions. The most common counterargument is the "systems reply": even if the person in the room does not understand Chinese, the system as a whole, the person plus the rulebook plus the room, does understand Chinese, in the sense that it reliably produces appropriate responses. Searle's response is to ask you to imagine that the person memorizes the entire rulebook and carries it around in their head. Now there is no room, no external system. There is just a person walking around producing Chinese responses without understanding a word of Chinese. The understanding still is not there.

Another counterargument is the "robot reply": if you put the Chinese Room inside a robot body, with sensors and actuators that allow it to interact with the physical world, perhaps then the symbols would acquire meaning through their connection to real-world referents. This is actually a serious argument, and it connects directly to the embodied cognition theories we will discuss shortly.

What Searle's argument establishes, even if it does not definitively settle the question, is that behavioral equivalence, producing the right outputs for the right inputs, is not sufficient for intelligence in the full sense. A system can be behaviorally indistinguishable from an intelligent agent without possessing the inner life, the understanding, the intentionality, that we associate with genuine intelligence. This is a point we will return to with great force when we examine modern large language models.

The philosophical tradition also gives us the concept of intentionality, a term introduced by the medieval philosopher Franz Brentano and developed extensively by Edmund Husserl and later by Searle himself. Intentionality is the property of mental states of being "about" something, of referring to or representing something beyond themselves. When you think about Paris, your thought is about Paris. When you fear a dog, your fear is about the dog. This "aboutness" is considered by many philosophers to be a defining feature of genuine mental states, and hence of genuine intelligence.

The question of whether any artificial system can have genuine intentionality, as opposed to merely derived intentionality, the kind we project onto symbols and artifacts, is one of the deepest unsolved problems in philosophy of mind. And it is directly relevant to the question of AI intelligence.

CHAPTER SIX: THE TURING TEST AND ITS DISCONTENTS

Before we dive into modern AI, we need to understand the benchmark that has dominated the field for over seventy years, and why it is both brilliant and deeply flawed.

In 1950, Alan Turing published a paper titled "Computing Machinery and Intelligence" in the journal Mind. He began with the question "Can machines think?" and immediately noted that this question was too vague to be useful, because the words "machine" and "think" were both poorly defined. He proposed instead to replace this question with a more concrete one, based on what he called the imitation game.

In Turing's original formulation, a human interrogator communicates via text with two parties in separate rooms: one human and one machine. The interrogator's task is to determine which is which. If the machine can fool the interrogator into thinking it is human, or at least perform as well as a human in this deception, then Turing argued we should be willing to say the machine can think.

Turing predicted that by the year 2000, a machine would be able to fool an average interrogator 70 percent of the time after five minutes of conversation. This prediction turned out to be premature by about two decades. But recent developments have been striking. A 2024 study found that GPT-4 passed a version of the Turing Test by convincing participants it was human 54 percent of the time. A 2025 study found that GPT-4.5 was mistaken for a human 73 percent of the time, actually outperforming real humans in the test.

Does this mean the question is settled? Does it mean machines can think?

Almost certainly not, and here is why. The Turing Test measures one specific thing: the ability to produce human-like conversational responses. It does not measure understanding, consciousness, intentionality, creativity, or any of the other things we associate with genuine intelligence. A system that has been trained on hundreds of billions of words of human text will naturally produce human-like text. That is exactly what it was trained to do. Passing the Turing Test, for a modern large language model, is less like a human passing a test of intelligence and more like a very good actor passing a test of authenticity. The performance can be flawless without the underlying reality being present.

Consider this simple illustration:

TURING TEST SCENARIO

Interrogator: "What does it feel like to be sad?"

Human:        "It feels like a heaviness in my chest, like the world
               has lost some of its color. Sometimes it's a dull ache
               that follows me around all day."

GPT-4:        "Sadness often feels like a heaviness in the chest,
               a sense of loss or emptiness. Colors can seem muted,
               and everyday activities may feel effortful or joyless."

Interrogator: Which one is human?

The GPT-4 response is not wrong. It is, in fact, a reasonable description of sadness. But it was produced by a system that has never felt sadness, never felt anything, has no body, no nervous system, no evolutionary history of loss and grief. It produced that response because descriptions of sadness appear in its training data, and it learned to produce statistically appropriate continuations of text about sadness. The Turing Test cannot distinguish between these two very different underlying realities.

This is not a minor quibble. It is the central issue in the debate about AI intelligence, and we will now confront it head-on.

CHAPTER SEVEN: WHAT MODERN AI ACTUALLY IS

To evaluate whether modern AI systems are intelligent, we need to understand what they actually are, not at the level of marketing language, but at the level of mechanism.

Modern large language models, such as GPT-4, Claude, Gemini, and their successors, are built on a neural network architecture called the transformer, introduced by Vaswani et al. in their 2017 paper "Attention Is All You Need." The transformer processes sequences of tokens, which are roughly equivalent to words or word fragments, and learns to predict the probability of each next token given all the preceding tokens. During training, the model is exposed to an enormous corpus of text, hundreds of billions to trillions of words, and its billions of parameters are adjusted, through a process called gradient descent, to minimize the error in its predictions.

The result is a system that has encoded, in its parameters, an extraordinarily rich statistical model of human language. It has learned that certain words tend to follow certain other words, that certain sentence structures tend to appear in certain contexts, that certain topics tend to be discussed in certain ways. And because human language encodes an enormous amount of human knowledge, the model has also learned, in some sense, a great deal about the world.

But here is the crucial question: what kind of "learning" is this, and what kind of "knowledge" does it produce?

Consider the following showcase, which illustrates both the power and the limits of this approach:

SHOWCASE 1: The Arithmetic Trap

Question: "If I have 17 apples and give away a third of them,
           then buy 5 more, then eat 2, how many do I have?"

Human reasoning:
  Step 1: 17 / 3 = 5.67, so approximately 5 or 6 apples given away.
          (A human recognizes this is ambiguous and might ask for
           clarification, or note that you cannot give away a
           fraction of an apple.)
  Step 2: 17 - 5 or 6 = 11 or 12
  Step 3: 11 or 12 + 5 = 16 or 17
  Step 4: 16 or 17 - 2 = 14 or 15
          (Human flags the ambiguity and gives a range.)

GPT-4 style response:
  "17 / 3 = approximately 5.67, rounding to 6. 17 - 6 = 11.
   11 + 5 = 16. 16 - 2 = 14. You have 14 apples."
          (Confident, plausible, but the rounding choice is
           arbitrary and the ambiguity is not flagged.)

This example illustrates something important. The LLM produces a confident, fluent answer that looks like reasoning. But it is not reasoning in the way a human reasons. It is pattern-matching: "problems of this form tend to be solved in this way." When the problem deviates from familiar patterns, the system's performance degrades rapidly and unpredictably.

Research published in 2024 found that GPT-4 scored below 33 percent on benchmarks designed to measure abstract reasoning, significantly below both specialized models and humans. Studies also found that fabrication rates for GPT-4 in systematic-review queries reached as high as 28.6 percent. Some newer "reasoning" models showed hallucination rates of 79 percent on certain factual benchmarks. These are not minor bugs. They are symptoms of a fundamental architectural limitation: the system does not have a model of the world. It has a model of text about the world, which is a very different thing.

SHOWCASE 2: The Causal Understanding Gap

Question: "Mary's plant is dying. She waters it. What happens?"

LLM: "The plant likely recovers, as water is essential for plant
       health and growth."

Follow-up: "The plant is dying because it has root rot caused
            by overwatering. Mary waters it again. What happens?"

LLM (ideal response): "The plant's condition worsens, because
                        adding more water to a root-rotted plant
                        accelerates the fungal decay."

LLM (actual frequent response): Continues to associate "watering"
with "recovery" because this is the dominant pattern in training
data, failing to apply the causal context provided.

Research has consistently shown that LLMs struggle with causal reasoning, the ability to model cause-and-effect relationships in the world. They can recite facts about causality but often fail to apply causal reasoning correctly in novel situations. This is because causal understanding requires a model of how the world works, not just a model of how language about the world is structured.

CHAPTER EIGHT: THE CONSTITUENTS OF INTELLIGENCE, ASSEMBLED

We have now surveyed enough territory to attempt a systematic account of what intelligence actually consists of. Rather than accepting any single discipline's definition, let us synthesize the best insights from all of them.

Intelligence, in its fullest sense, appears to consist of the following interrelated capacities. Each of these deserves careful explanation, because together they form a portrait of what genuine intelligence looks like, and that portrait will allow us to evaluate AI with precision.

The first constituent is perception and representation. An intelligent system must be able to take in information from its environment and represent that information internally in a form that can be used for further processing. In humans, this involves the entire sensory apparatus, vision, hearing, touch, proprioception, and the complex neural machinery that transforms raw sensory signals into meaningful representations of objects, events, and relationships. The key word here is "meaningful": the representations are not just data structures but are connected to the system's goals, history, and understanding of the world.

The second constituent is working memory and attention. Intelligence requires the ability to hold information in mind while working with it, and to selectively focus on what is relevant while suppressing what is not. Human working memory is famously limited, typically to about four chunks of information at a time, but it is extraordinarily flexible and context-sensitive. Attention allows the intelligent system to allocate its limited processing resources where they are most needed.

The third constituent is learning and generalization. An intelligent system must be able to extract patterns from experience and apply them to new situations. Crucially, the generalization must be flexible and appropriate: the system must know when a pattern applies and when it does not. This is much harder than it sounds. A child who learns that dogs are friendly may generalize too broadly and approach a strange dog without caution. Learning to generalize appropriately, to know the scope and limits of a pattern, is one of the hallmarks of mature intelligence.

The fourth constituent is reasoning and inference. Beyond pattern recognition, intelligence involves the ability to draw conclusions from premises, to follow chains of logic, to construct and evaluate arguments. This includes both deductive reasoning, drawing certain conclusions from given premises, and inductive reasoning, drawing probable conclusions from observed patterns, and abductive reasoning, inferring the most likely explanation for an observation.

The fifth constituent is planning and goal-directedness. Intelligent systems do not merely react to their environment; they act in pursuit of goals, and they plan sequences of actions to achieve those goals. Planning requires the ability to mentally simulate future states of the world, to evaluate the consequences of different action sequences, and to select the sequence most likely to achieve the desired outcome.

The sixth constituent is language and communication. In humans, language is not merely a communication tool but a cognitive tool. We use language to think, to organize our thoughts, to represent abstract concepts that would be impossible to represent otherwise. The capacity for language, and particularly for the recursive, hierarchical structure of human language, appears to be deeply connected to many other aspects of human intelligence.

The seventh constituent is social and emotional intelligence. Human intelligence is profoundly social. We are exquisitely attuned to the mental states of other people, their beliefs, desires, intentions, and emotions. This capacity, sometimes called theory of mind, allows us to predict and influence the behavior of others, to cooperate, to compete, to teach, and to learn from each other. Emotional intelligence, the ability to recognize, understand, and manage emotions, both one's own and others', is a crucial component of effective human functioning.

The eighth constituent is metacognition and self-awareness. Perhaps the most distinctively human aspect of intelligence is the ability to think about one's own thinking, to monitor one's own cognitive processes, to recognize when one does not know something, to evaluate the quality of one's own reasoning, and to adjust one's strategies accordingly. This is what philosophers call metacognition, and it is deeply connected to the broader capacity for self-awareness.

The ninth constituent is creativity and imagination. Intelligent systems can generate novel solutions to problems, combine existing concepts in new ways, and imagine states of the world that do not yet exist. Creativity is not random; it is constrained and guided by understanding, taste, and purpose. But it involves a genuine departure from existing patterns, not merely their recombination.

The tenth constituent is embodiment and situatedness. As the theories of embodied cognition emphasize, human intelligence did not evolve in a vacuum. It evolved in a body, interacting with a physical and social world. Our concepts are grounded in our sensory-motor experience. Our understanding of "heavy" is connected to the experience of lifting. Our understanding of "warm" is connected to the experience of warmth. Our understanding of "threat" is connected to the experience of fear. This grounding gives our concepts their meaning in a way that purely symbolic systems cannot replicate.

Now let us apply this framework to modern AI.

CHAPTER NINE: DOES AI HAVE INTELLIGENCE? THE HONEST RECKONING

This is the question everyone is asking, and the honest answer is: it depends on which constituent of intelligence you are asking about, and modern AI scores very differently on different dimensions.

On perception and representation, modern AI systems are genuinely impressive. Deep learning models can recognize objects in images with superhuman accuracy in controlled conditions. Speech recognition systems outperform human transcriptionists in many settings. Natural language processing systems can parse and represent the semantic content of text with remarkable sophistication. In this dimension, AI has made genuine and substantial progress.

On working memory and attention, the transformer architecture has a form of attention mechanism, the "self-attention" mechanism that gives the transformer its power. This mechanism allows the model to relate any part of its input to any other part, regardless of distance. In this sense, transformers have a kind of global attention that human working memory lacks. However, this attention is fundamentally different from human attention in that it is not selective in the same way, it does not prioritize based on relevance to goals in a dynamic, context-sensitive manner, and it does not operate over time in the way human working memory does.

On learning and generalization, the picture is mixed. LLMs are extraordinary learners in one sense: they extract an enormous amount of statistical structure from their training data. But their generalization is brittle. They generalize well within the distribution of their training data and fail unpredictably outside it. A human child who learns arithmetic can apply it to any new arithmetic problem, because the child has understood the underlying principles. An LLM that has learned arithmetic from text can apply it to problems similar to those in its training data, but its performance degrades on problems that require genuine compositional reasoning beyond what it has seen.

SHOWCASE 3: The Generalization Test

Training-distribution problem (LLM performs well):
"What is 15 percent of 80?"
LLM: "12" (Correct)

Out-of-distribution problem (LLM may fail):
"A snail travels at 0.03 miles per hour. A second snail
 travels at 0.05 miles per hour in the opposite direction.
 They start 0.2 miles apart. When do they meet, and where?"

Human: Sets up relative velocity: 0.03 + 0.05 = 0.08 mph.
       Time to meet: 0.2 / 0.08 = 2.5 hours.
       Snail 1 travels: 0.03 x 2.5 = 0.075 miles from start.
       They meet 0.075 miles from Snail 1's starting point.

LLM: May produce a correct answer if this problem type is
     well-represented in training data. May produce a plausible-
     sounding but incorrect answer if the problem type is novel.
     Will rarely flag its own uncertainty accurately.

On reasoning and inference, this is where the limitations of current AI become most stark. LLMs can produce text that looks like reasoning, step-by-step chains of thought that mimic the structure of logical argument. But research consistently shows that this "reasoning" is fragile. Change the surface form of a problem while keeping its logical structure identical, and the LLM's performance can drop dramatically. Present a logically valid argument with an emotionally charged conclusion, and the LLM may reject it. Present a logically invalid argument with a plausible-sounding conclusion, and the LLM may accept it. These are not the behaviors of a system that genuinely reasons; they are the behaviors of a system that has learned to produce text that looks like reasoning.

On planning and goal-directedness, current LLMs have no persistent goals. They do not have desires, intentions, or purposes of their own. When they appear to pursue a goal, they are following a pattern established by their training and by the instructions in their context window. This is a profound difference from human intelligence, where goals are internally generated, maintained over time, and connected to a rich motivational and emotional architecture.

On language and communication, LLMs are, by construction, extraordinarily capable. They have been trained on more human language than any human could read in a thousand lifetimes. Their linguistic fluency is genuine and impressive. But, as Searle's Chinese Room argument suggests, fluency in the production of language is not the same as understanding language. The LLM produces appropriate text without understanding what the text means, in the sense of having the text connected to a model of the world, to intentions, to experiences.

On social and emotional intelligence, LLMs can produce text that appears emotionally intelligent. They can recognize emotional content in text, produce empathetic-sounding responses, and model the beliefs and desires of characters in a story. But they have no emotions of their own, no genuine theory of mind, no understanding of what it actually feels like to be another person. Their apparent social intelligence is a sophisticated form of pattern matching on the social and emotional content of their training data.

On metacognition and self-awareness, this is perhaps the most interesting frontier. LLMs do show some metacognitive-like behaviors: they can sometimes recognize when they do not know something, they can sometimes flag uncertainty, and they can sometimes evaluate the quality of their own outputs. But this metacognition is unreliable and inconsistent. The same model that correctly identifies its own uncertainty in one context will confidently hallucinate in another. True metacognition requires a stable, accurate model of one's own cognitive processes, and current LLMs do not have this.

On creativity and imagination, LLMs can produce outputs that appear creative: novel combinations of ideas, unexpected metaphors, original stories. But researchers debate whether this is genuine creativity or sophisticated recombination. The LLM generates outputs by sampling from a probability distribution over possible continuations of text. It does not have goals, aesthetic preferences, or a sense of purpose that guides its creative output. Its "creativity" is a function of the diversity of its training data and the randomness of its sampling process.

On embodiment and situatedness, current LLMs have no body, no sensory experience, no history of physical interaction with the world. Their concepts are not grounded in experience but in the statistical co-occurrence of words in text. This is a fundamental limitation. When an LLM uses the word "red," it has no connection to the experience of seeing red. When it uses the word "pain," it has no connection to the experience of pain. Its concepts are, in the terminology of grounded cognition, ungrounded. They float free of the experiential anchors that give human concepts their meaning.

SHOWCASE 4: The Grounding Gap

Ask a 3-year-old: "What is hot?"
Child: "Like when you touch the stove and it hurts and Mommy
        says don't touch that."
(Grounded in sensory-motor experience and emotional memory)

Ask an LLM: "What is hot?"
LLM: "Hot refers to a high temperature, typically perceived as
      warmth or heat when in contact with an object or substance.
      It can also describe spicy food or colloquially refer to
      something attractive or exciting."
(Accurate, comprehensive, entirely ungrounded in experience)

The child's answer is simpler, less comprehensive, and less linguistically sophisticated. But it reflects genuine understanding, a concept connected to real experience. The LLM's answer is more complete and more articulate, but it is a description of a concept, not a concept itself. This is the grounding gap, and it is one of the most fundamental differences between human and artificial intelligence.

CHAPTER TEN: THE DIFFERENCES BETWEEN HUMAN AND ARTIFICIAL INTELLIGENCE, LAID BARE

Let us now draw the comparison explicitly and systematically, because the differences are as illuminating as the similarities.

Human intelligence is embodied. It evolved in and through a body that has been interacting with the physical world for hundreds of millions of years. Every concept a human has is, at some level, grounded in sensory-motor experience. Artificial intelligence, as currently implemented in LLMs, is disembodied. It exists as a pattern of numerical weights in a matrix, with no connection to the physical world except through text.

Human intelligence is continuous. A human being is always conscious, always experiencing, always learning from the ongoing stream of experience. Even during sleep, the brain is actively consolidating memories and processing information. Artificial intelligence is episodic. An LLM exists only during inference, only when it is processing a specific input. It has no persistent experience, no ongoing inner life, no memory that carries over from one conversation to the next (unless explicitly provided in the context).

Human intelligence is motivated. Humans have drives, desires, emotions, and goals that are internally generated and that motivate behavior. These motivations are connected to the biological imperatives of survival and reproduction, mediated through a complex emotional architecture that includes fear, desire, love, curiosity, disgust, and pride. Artificial intelligence has no motivations of its own. It has an objective function that was set by its designers, and it was trained to minimize a loss function. But it does not want anything. It does not care about anything. It does not experience the satisfaction of achieving a goal or the frustration of failing.

Human intelligence is social and cultural. Humans are deeply social animals, and human intelligence is profoundly shaped by social interaction, cultural transmission, and shared meaning. We learn from each other, we teach each other, we build on each other's insights across generations. This cumulative cultural evolution is one of the most powerful forces in human cognitive development. Artificial intelligence is trained on the products of human social and cultural activity, but it does not participate in it. It does not have relationships, it does not belong to a community, it does not share in the ongoing project of human culture.

Human intelligence is developmental. Human cognitive abilities develop over time through a rich interplay of genetic endowment, environmental experience, and social interaction. A human child goes through stages of cognitive development, each building on the previous one, each shaped by the specific experiences and relationships of that child's life. This developmental history is not merely a process by which intelligence is acquired; it is constitutive of the kind of intelligence that results. Artificial intelligence is trained, not developed. Its "knowledge" is acquired in a single training run, not through the gradual, embodied, socially embedded process of human development.

Human intelligence is creative in the generative sense. Humans can genuinely imagine things that do not exist, can conceive of possibilities that have no precedent in experience, can create genuinely novel ideas. This generative creativity is connected to consciousness, to imagination, to the ability to mentally simulate counterfactual worlds. Artificial intelligence can recombine existing patterns in ways that appear novel, but it cannot genuinely imagine, because imagination requires a subject who experiences the imagined content.

Human intelligence is self-aware in the phenomenal sense. Humans not only have thoughts; they know they have thoughts. They experience their own mental states as their own. This phenomenal self-awareness, the felt sense of being a self, is one of the most mysterious and most important features of human intelligence. Artificial intelligence has no phenomenal self-awareness. It may have functional analogs, the ability to represent information about its own processing, but it does not experience anything. There is no "what it is like" to be an LLM.

SHOWCASE 5: The Self-Awareness Test

Ask a human: "Are you conscious right now?"
Human: "Yes, I am aware of my surroundings, my thoughts,
        and the fact that I am answering this question.
        I can feel the chair beneath me and the slight
        uncertainty I feel about how to answer perfectly."

Ask an LLM: "Are you conscious right now?"
LLM: "I don't have consciousness or subjective experience.
      I process your input and generate a response based on
      patterns in my training data. There is no 'inner life'
      or 'what it is like' to be me."

The LLM's answer is accurate and appropriately humble.
But notice: the LLM produced this accurate answer not because
it introspected and found no consciousness, but because its
training data contains many accurate descriptions of what LLMs
are and are not. It is describing itself from the outside,
not from the inside.

And yet, for all these profound differences, it would be a mistake to conclude that AI is not impressive, not powerful, and not genuinely useful. Modern AI systems are extraordinary tools. They can process and synthesize information at scales that no human could match. They can identify patterns in data that would be invisible to human analysts. They can generate text, code, images, and music of remarkable quality. They can assist with complex tasks across virtually every domain of human activity.

The question is not whether AI is useful. It clearly is. The question is whether it is intelligent in the full, rich sense of the word. And the honest answer, based on everything we have examined, is: not yet, and perhaps not in the way we currently build it.

CHAPTER  ELEVEN: WHAT WOULD GENUINE ARTIFICIAL INTELLIGENCE REQUIRE?

If current AI is not genuinely intelligent in the full sense, what would genuine artificial intelligence require? This is a question that researchers are actively debating, and there is no consensus. But based on our analysis, several things seem clear.

Genuine AI would require grounded representations. The symbols it manipulates would need to be connected to real-world referents through sensory-motor experience. This is the core insight of embodied cognition, and it suggests that genuinely intelligent AI might need to be embodied, to have sensors and actuators that allow it to interact with the physical world and to build concepts grounded in that interaction. Robotics research is exploring this direction, and systems like Boston Dynamics' robots or DeepMind's robotic manipulation systems represent early steps in this direction, though they are still far from general intelligence.

Genuine AI would require persistent memory and continuous experience. Rather than existing only during inference, a genuinely intelligent AI would need to have an ongoing experience, a continuous stream of perception and action from which it learns and through which it develops. This is a fundamentally different architecture from current LLMs, which are trained once and then frozen.

Genuine AI would require genuine causal models of the world. Rather than learning statistical associations between words, a genuinely intelligent AI would need to learn causal models, representations of how events in the world cause other events, that allow it to reason about counterfactuals, to plan, and to understand the consequences of actions. Research in causal AI, associated with the work of Judea Pearl and others, is exploring this direction.

Genuine AI would require metacognition and calibrated uncertainty. A genuinely intelligent AI would know what it knows and what it does not know, and it would be able to accurately communicate its uncertainty. Current LLMs are notoriously poorly calibrated: they express confidence that is not correlated with accuracy, and they hallucinate with the same fluency as they produce correct information.

Genuine AI might require something like consciousness, or at least a functional analog of it. This is the most speculative and contested claim, but it follows from the analysis of intentionality and grounding. If genuine understanding requires that concepts be connected to experience, and if experience requires some form of subjective inner life, then genuine AI might require something like consciousness. This is not to say that AI would need to be conscious in exactly the way humans are, but it might need some functional analog of the inner life that gives human concepts their meaning.

These are not merely engineering challenges. Some of them may be fundamental conceptual challenges, requiring not just better algorithms but a different understanding of what intelligence is and how it can be instantiated in artificial systems.

CHAPTER TWELVE: THE PHILOSOPHICAL HORIZON

We are living through a remarkable moment in the history of intelligence. For the first time, we have created systems that can engage in sophisticated language use, solve complex problems, and produce outputs that, in many contexts, are indistinguishable from those of intelligent humans. This is a genuine achievement, and it deserves genuine celebration.

But we should be clear-eyed about what we have achieved and what we have not. We have created extraordinarily powerful pattern-matching systems that operate over human language and knowledge. We have not created systems that understand, that are conscious, that have goals of their own, or that are intelligent in the full, rich sense of the word.

The gap between what we have and what genuine intelligence would require is not merely a gap in capability. It is a gap in kind. Current AI systems are fundamentally different from intelligent agents, not just less capable but differently structured, differently grounded, differently motivated. Closing that gap, if it can be closed at all, will require not just more data and more compute but new ideas, new architectures, and perhaps new ways of thinking about what intelligence is.

The study of intelligence, in all its dimensions, from the neural efficiency of the human brain to the distributed cognition of the octopus, from the philosophical puzzle of intentionality to the biological fact of convergent cognitive evolution, tells us that intelligence is one of the most complex, multifaceted, and remarkable phenomena in the known universe. It is not a single thing. It is not a simple thing. It is not a thing that can be reduced to any single definition, any single metric, or any single implementation.

What intelligence really is, in the end, is a family of capacities that allow an agent to navigate a complex, uncertain world in pursuit of its goals, to learn from experience, to reason about the future, to understand others, to create new things, and to know itself. Human intelligence is the most elaborate and most mysterious example of this family that we know of. Artificial intelligence, as it currently exists, is a powerful and useful tool that shares some features with intelligence but lacks many of its most essential constituents.

The question "What is intelligence, really?" is not one that science or philosophy has fully answered. But asking it carefully, rigorously, and honestly, as we have tried to do in this article, is itself an act of intelligence. And that, perhaps, is the best evidence we have that the question is worth asking.


SOURCES AND FURTHER READING

The following sources informed this article and are recommended for readers who wish to explore these topics further.

Spearman, C. (1904). "'General Intelligence,' Objectively Determined and Measured." The American Journal of Psychology, 15(2), 201–293.

Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. Basic Books.

Searle, J. (1980). "Minds, Brains, and Programs." Behavioral and Brain Sciences.

Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433–460.

Jung, R. E., & Haier, R. J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: Converging neuroimaging evidence. Behavioral and Brain Sciences, 30(2), 135–154; discussion 154–187.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). "Attention Is All You Need." Advances in Neural Information Processing Systems.

Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.

Hasson, U., Nastase, S. A., & Goldstein, A. (2020). Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks. Neuron, 105(3), 416–434. https://doi.org/10.1016/j.neuron.2019.12.002

Godfrey-Smith, P. (2016). Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness. Farrar, Straus and Giroux.

Clayton, N. S., & Emery, N. J. (2015). "Corvid Cognition." Current Biology.



Building a TypeScript Library for LLM Application Development




Preface

In this article I propose a similar library for handling common tasks in LLM applications in TypeScript to what I introduced in my last post for Python.

Introduction

Developing applications that integrate Large Language Models in TypeScript environments requires solving many recurring technical challenges. Whether building Node.js backend services, Electron desktop applications, or serverless functions, developers repeatedly implement GPU detection, configuration management, tool calling mechanisms, rate limiting, and other foundational components. This article presents a comprehensive TypeScript library designed to eliminate this redundant work by providing production-ready, reusable components that handle the most common requirements of LLM-based applications.

The library follows clean architecture principles with clear separation of concerns and leverages TypeScript's powerful type system to provide compile-time safety and excellent developer experience. Each component is designed to be independently usable while also working seamlessly with other components. The goal is to provide developers with a toolkit that accelerates development without imposing rigid constraints on application architecture.

This article explores each component in depth, explaining not just what it does but why it is designed in a particular way. We will examine the technical challenges each component addresses and provide concrete code examples. At the end, a complete running example demonstrates how all components integrate to create a functional LLM application.

GPU Detection and Optimization Component

Modern LLM inference requires significant computational resources, and utilizing available GPU acceleration is essential for acceptable performance. In TypeScript environments, particularly Node.js applications, detecting and configuring GPU acceleration requires interfacing with native libraries and system information. Different hardware platforms use different GPU technologies: Apple Silicon uses Metal Performance Shaders, NVIDIA uses CUDA, AMD uses ROCm, and Intel has its own acceleration framework.

The GPU detection component solves this problem by automatically identifying available acceleration hardware and providing the appropriate configuration for the inference engine. This component abstracts away platform-specific details, allowing the rest of the application to remain hardware-agnostic.

The detection process follows a priority order. CUDA is checked first because it offers the most mature ecosystem for LLM inference. If CUDA is unavailable, the component checks for ROCm on AMD hardware, then Metal Performance Shaders on Apple Silicon, and finally Intel acceleration. If no GPU is available, the component falls back to CPU inference with appropriate warnings.

Here is the core implementation of the GPU detector:

import { exec } from 'child_process';
import { promisify } from 'util';
import * as os from 'os';
import { createLogger, Logger } from './logger';

const execAsync = promisify(exec);

export enum AcceleratorType {
    CUDA = 'cuda',
    ROCM = 'rocm',
    MPS = 'mps',
    INTEL = 'intel',
    CPU = 'cpu'
}

export interface AcceleratorInfo {
    acceleratorType: AcceleratorType;
    deviceName: string;
    deviceCount: number;
    memoryAvailable?: number;
    computeCapability?: string;
}

export class GPUDetector {
    private logger: Logger;
    private cachedInfo?: AcceleratorInfo;

    constructor() {
        this.logger = createLogger('GPUDetector');
    }

    public async detect(): Promise<AcceleratorInfo> {
        if (this.cachedInfo) {
            return this.cachedInfo;
        }

        // Check for NVIDIA CUDA
        const cudaInfo = await this.detectCuda();
        if (cudaInfo) {
            this.cachedInfo = cudaInfo;
            this.logger.info(`Detected CUDA GPU: ${cudaInfo.deviceName}`);
            return cudaInfo;
        }

        // Check for AMD ROCm
        const rocmInfo = await this.detectRocm();
        if (rocmInfo) {
            this.cachedInfo = rocmInfo;
            this.logger.info(`Detected ROCm GPU: ${rocmInfo.deviceName}`);
            return rocmInfo;
        }

        // Check for Apple Metal Performance Shaders
        const mpsInfo = await this.detectMps();
        if (mpsInfo) {
            this.cachedInfo = mpsInfo;
            this.logger.info(`Detected Apple MPS: ${mpsInfo.deviceName}`);
            return mpsInfo;
        }

        // Fallback to CPU
        this.cachedInfo = this.detectCpu();
        this.logger.warn('No GPU acceleration available, using CPU');
        return this.cachedInfo;
    }

The detector implements lazy initialization through caching. Once hardware is detected, subsequent calls return the cached result rather than repeating the detection process. This is important because hardware detection can be expensive and the hardware configuration does not change during application runtime. The async/await pattern is used throughout because detection involves executing system commands and reading files, which are asynchronous operations in Node.js.

Each detection method gathers platform-specific information. For CUDA, this includes device count, memory, and compute capability. For MPS, it includes the chip architecture. This information helps the application make informed decisions about model loading and inference parameters.

    private async detectCuda(): Promise<AcceleratorInfo | null> {
        try {
            // Try to execute nvidia-smi to detect CUDA devices
            const { stdout } = await execAsync('nvidia-smi --query-gpu=name,memory.total --format=csv,noheader');
            
            const lines = stdout.trim().split('\n');
            if (lines.length === 0) {
                return null;
            }

            const firstLine = lines[0].split(',');
            const deviceName = firstLine[0].trim();
            const memoryStr = firstLine[1].trim();
            const memoryMB = parseInt(memoryStr.split(' ')[0]);

            // Get compute capability
            let computeCapability: string | undefined;
            try {
                const { stdout: capStdout } = await execAsync(
                    'nvidia-smi --query-gpu=compute_cap --format=csv,noheader'
                );
                computeCapability = capStdout.trim().split('\n')[0];
            } catch (error) {
                // Compute capability not critical
            }

            return {
                acceleratorType: AcceleratorType.CUDA,
                deviceName,
                deviceCount: lines.length,
                memoryAvailable: memoryMB * 1024 * 1024,
                computeCapability
            };
        } catch (error) {
            // nvidia-smi not available or failed
            return null;
        }
    }

    private async detectMps(): Promise<AcceleratorInfo | null> {
        if (os.platform() !== 'darwin') {
            return null;
        }

        try {
            // Check for Apple Silicon
            const { stdout } = await execAsync('sysctl -n machdep.cpu.brand_string');
            const cpuBrand = stdout.trim();

            if (cpuBrand.includes('Apple')) {
                return {
                    acceleratorType: AcceleratorType.MPS,
                    deviceName: `Apple Silicon ${cpuBrand}`,
                    deviceCount: 1
                };
            }
        } catch (error) {
            // Not Apple Silicon or command failed
        }

        return null;
    }

    private async detectRocm(): Promise<AcceleratorInfo | null> {
        try {
            // Try to execute rocm-smi to detect ROCm devices
            const { stdout } = await execAsync('rocm-smi --showproductname');
            
            if (stdout.includes('GPU')) {
                const lines = stdout.trim().split('\n');
                const deviceLine = lines.find(line => line.includes('GPU'));
                
                return {
                    acceleratorType: AcceleratorType.ROCM,
                    deviceName: deviceLine || 'AMD ROCm GPU',
                    deviceCount: 1
                };
            }
        } catch (error) {
            // rocm-smi not available or failed
        }

        return null;
    }

    private detectCpu(): AcceleratorInfo {
        const cpuCount = os.cpus().length;
        const cpuModel = os.cpus()[0].model;

        return {
            acceleratorType: AcceleratorType.CPU,
            deviceName: cpuModel,
            deviceCount: cpuCount
        };
    }

    public getDeviceString(): string {
        if (!this.cachedInfo) {
            throw new Error('GPU detection not performed. Call detect() first.');
        }

        const info = this.cachedInfo;
        switch (info.acceleratorType) {
            case AcceleratorType.CUDA:
                return 'cuda:0';
            case AcceleratorType.MPS:
                return 'mps';
            case AcceleratorType.ROCM:
                return 'rocm:0';
            default:
                return 'cpu';
        }
    }

    public getOptimizationHints(): Record<string, any> {
        if (!this.cachedInfo) {
            throw new Error('GPU detection not performed. Call detect() first.');
        }

        const hints: Record<string, any> = {
            device: this.getDeviceString(),
            deviceType: this.cachedInfo.acceleratorType
        };

        // Add memory-based recommendations
        if (this.cachedInfo.memoryAvailable) {
            const memoryGB = this.cachedInfo.memoryAvailable / (1024 * 1024 * 1024);
            
            if (memoryGB < 8) {
                hints.recommendQuantization = true;
                hints.maxBatchSize = 1;
            } else if (memoryGB < 16) {
                hints.recommendQuantization = false;
                hints.maxBatchSize = 4;
            } else {
                hints.recommendQuantization = false;
                hints.maxBatchSize = 8;
            }
        }

        return hints;
    }
}

The getDeviceString method provides the string that inference libraries expect when loading models. This abstraction means application code can simply call getDeviceString without knowing anything about the underlying hardware. The getOptimizationHints method provides recommendations based on detected hardware, such as whether to use quantization or what batch size to use.

The component uses TypeScript's type system to ensure type safety. The AcceleratorType enum provides a finite set of possible accelerator types, and the AcceleratorInfo interface defines the structure of hardware information. This prevents runtime errors from typos or incorrect data structures.

Abstract LLM Interface Component

LLM applications often need to support multiple model providers. An application might use OpenAI's GPT models in production but switch to a local Llama model for development or privacy-sensitive deployments. Alternatively, different parts of an application might use different models optimized for specific tasks.

Hard-coding dependencies on specific LLM providers creates tight coupling that makes the application brittle and difficult to modify. The abstract LLM interface component solves this by defining a common contract that all LLM implementations must satisfy. Application code depends on this interface rather than concrete implementations, enabling seamless model swapping.

The interface defines the essential operations that any LLM must support: generating completions, streaming responses, and managing conversation context. It also standardizes how parameters like temperature and top-k are passed to the model.

export interface Message {
    role: 'system' | 'user' | 'assistant';
    content: string;
    name?: string;
}

export interface CompletionResponse {
    content: string;
    model: string;
    finishReason: string;
    usage: {
        promptTokens: number;
        completionTokens: number;
        totalTokens: number;
    };
    rawResponse?: any;
}

export interface CompletionOptions {
    temperature?: number;
    maxTokens?: number;
    topP?: number;
    topK?: number;
    stop?: string[];
    presencePenalty?: number;
    frequencyPenalty?: number;
}

export abstract class BaseLLM {
    protected modelName: string;
    protected config: Record<string, any>;
    protected logger: Logger;

    constructor(modelName: string, config: Record<string, any> = {}) {
        this.modelName = modelName;
        this.config = config;
        this.logger = createLogger(this.constructor.name);
    }

    public abstract complete(
        messages: Message[],
        options?: CompletionOptions
    ): Promise<CompletionResponse>;

    public abstract streamComplete(
        messages: Message[],
        options?: CompletionOptions
    ): AsyncIterableIterator<string>;

    public getModelName(): string {
        return this.modelName;
    }

    protected validateMessages(messages: Message[]): void {
        if (!messages || messages.length === 0) {
            throw new Error('Messages array cannot be empty');
        }

        for (const message of messages) {
            if (!message.role || !message.content) {
                throw new Error('Each message must have role and content');
            }

            if (!['system', 'user', 'assistant'].includes(message.role)) {
                throw new Error(`Invalid role: ${message.role}`);
            }
        }
    }

    protected validateOptions(options?: CompletionOptions): void {
        if (!options) return;

        if (options.temperature !== undefined) {
            if (options.temperature < 0 || options.temperature > 2) {
                throw new Error('Temperature must be between 0 and 2');
            }
        }

        if (options.topP !== undefined) {
            if (options.topP < 0 || options.topP > 1) {
                throw new Error('topP must be between 0 and 1');
            }
        }

        if (options.topK !== undefined && options.topK < 1) {
            throw new Error('topK must be positive');
        }
    }
}

The interface uses TypeScript's type system extensively. The Message interface uses a union type for the role field, ensuring only valid roles can be specified. The CompletionOptions interface makes all fields optional with default values, providing flexibility while maintaining type safety.

The BaseLLM abstract class provides common functionality that all implementations can use. The validateMessages and validateOptions methods ensure that inputs are valid before they reach the implementation-specific code. This validation happens once in the base class rather than being duplicated in each implementation.

Concrete implementations of this interface handle the specifics of communicating with different LLM providers. Here is an implementation for OpenAI's API:

import OpenAI from 'openai';
import { Stream } from 'openai/streaming';

export class OpenAILLM extends BaseLLM {
    private client: OpenAI;

    constructor(modelName: string, apiKey: string, config: Record<string, any> = {}) {
        super(modelName, config);
        
        this.client = new OpenAI({
            apiKey,
            ...config
        });
    }

    public async complete(
        messages: Message[],
        options: CompletionOptions = {}
    ): Promise<CompletionResponse> {
        this.validateMessages(messages);
        this.validateOptions(options);

        const response = await this.client.chat.completions.create({
            model: this.modelName,
            messages: messages as OpenAI.Chat.ChatCompletionMessageParam[],
            temperature: options.temperature ?? 0.7,
            max_tokens: options.maxTokens,
            top_p: options.topP ?? 1.0,
            stop: options.stop,
            presence_penalty: options.presencePenalty,
            frequency_penalty: options.frequencyPenalty
        });

        const choice = response.choices[0];

        return {
            content: choice.message.content || '',
            model: response.model,
            finishReason: choice.finish_reason,
            usage: {
                promptTokens: response.usage?.prompt_tokens || 0,
                completionTokens: response.usage?.completion_tokens || 0,
                totalTokens: response.usage?.total_tokens || 0
            },
            rawResponse: response
        };
    }

    public async *streamComplete(
        messages: Message[],
        options: CompletionOptions = {}
    ): AsyncIterableIterator<string> {
        this.validateMessages(messages);
        this.validateOptions(options);

        const stream = await this.client.chat.completions.create({
            model: this.modelName,
            messages: messages as OpenAI.Chat.ChatCompletionMessageParam[],
            temperature: options.temperature ?? 0.7,
            max_tokens: options.maxTokens,
            top_p: options.topP ?? 1.0,
            stream: true,
            stop: options.stop,
            presence_penalty: options.presencePenalty,
            frequency_penalty: options.frequencyPenalty
        });

        for await (const chunk of stream) {
            const content = chunk.choices[0]?.delta?.content;
            if (content) {
                yield content;
            }
        }
    }
}

The OpenAI implementation demonstrates how the abstract interface adapts to a specific backend. The complete method makes an HTTP request to OpenAI's API and transforms the response into the standard CompletionResponse format. The streamComplete method uses async generators, a powerful TypeScript feature that allows yielding values asynchronously.

The streaming implementation is particularly elegant in TypeScript. The async generator syntax makes it easy to consume streams using for-await-of loops. Application code can iterate over the stream naturally without managing callbacks or event listeners.

Here is an implementation for local models using a hypothetical local inference library:

import { LlamaModel, LlamaContext, LlamaChatSession } from 'node-llama-cpp';

export class LocalLlamaLLM extends BaseLLM {
    private model?: LlamaModel;
    private context?: LlamaContext;
    private device: string;

    constructor(
        modelPath: string,
        device?: string,
        config: Record<string, any> = {}
    ) {
        super(modelPath, config);
        this.device = device || 'cpu';
    }

    public async initialize(): Promise<void> {
        this.logger.info(`Loading model from ${this.modelName} on ${this.device}`);

        this.model = new LlamaModel({
            modelPath: this.modelName,
            gpuLayers: this.device.startsWith('cuda') ? 35 : 0
        });

        this.context = new LlamaContext({
            model: this.model,
            contextSize: this.config.contextSize || 4096
        });

        this.logger.info('Model loaded successfully');
    }

    public async complete(
        messages: Message[],
        options: CompletionOptions = {}
    ): Promise<CompletionResponse> {
        if (!this.model || !this.context) {
            throw new Error('Model not initialized. Call initialize() first.');
        }

        this.validateMessages(messages);
        this.validateOptions(options);

        const session = new LlamaChatSession({
            context: this.context
        });

        // Format messages for the model
        const formattedPrompt = this.formatMessages(messages);

        const startTime = Date.now();
        const response = await session.prompt(formattedPrompt, {
            temperature: options.temperature ?? 0.7,
            topK: options.topK ?? 40,
            topP: options.topP ?? 0.95,
            maxTokens: options.maxTokens ?? 512
        });

        const endTime = Date.now();
        const duration = (endTime - startTime) / 1000;

        this.logger.info(`Generated completion in ${duration.toFixed(2)}s`);

        // Estimate token counts (simplified)
        const promptTokens = Math.ceil(formattedPrompt.length / 4);
        const completionTokens = Math.ceil(response.length / 4);

        return {
            content: response,
            model: this.modelName,
            finishReason: 'stop',
            usage: {
                promptTokens,
                completionTokens,
                totalTokens: promptTokens + completionTokens
            }
        };
    }

    public async *streamComplete(
        messages: Message[],
        options: CompletionOptions = {}
    ): AsyncIterableIterator<string> {
        if (!this.model || !this.context) {
            throw new Error('Model not initialized. Call initialize() first.');
        }

        this.validateMessages(messages);
        this.validateOptions(options);

        const session = new LlamaChatSession({
            context: this.context
        });

        const formattedPrompt = this.formatMessages(messages);

        const stream = session.promptWithMeta(formattedPrompt, {
            temperature: options.temperature ?? 0.7,
            topK: options.topK ?? 40,
            topP: options.topP ?? 0.95,
            maxTokens: options.maxTokens ?? 512,
            onToken: (tokens: number[]) => {
                // Tokens are yielded through the async generator
            }
        });

        for await (const token of stream) {
            yield token;
        }
    }

    private formatMessages(messages: Message[]): string {
        const formatted = messages.map(msg => {
            const prefix = msg.role === 'user' ? 'User' : 
                          msg.role === 'assistant' ? 'Assistant' : 
                          'System';
            return `${prefix}: ${msg.content}`;
        });

        return formatted.join('\n\n') + '\n\nAssistant:';
    }

    public async dispose(): Promise<void> {
        if (this.context) {
            this.context.dispose();
        }
        if (this.model) {
            this.model.dispose();
        }
        this.logger.info('Model resources disposed');
    }
}

The local implementation shows how the same interface adapts to a completely different backend. Instead of making HTTP requests, it loads models into memory and runs inference locally. The initialize method is specific to local models because they require loading before use, while API-based models are ready immediately.

The dispose method demonstrates proper resource management. Local models consume significant memory and GPU resources that must be explicitly released. TypeScript's type system helps ensure that dispose is called by making it part of the class interface.

Configuration Management Component

LLM applications require extensive configuration. Model parameters like temperature and top-k affect generation quality. Context window sizes determine how much conversation history the model can consider. API keys and endpoints vary between environments. Hard-coding these values makes applications inflexible and difficult to maintain.

The configuration management component provides a clean way to externalize all configuration into files that can be modified without changing code. It supports both JSON and YAML formats, validates configuration values, and provides sensible defaults.

The component uses a hierarchical structure where general settings can be overridden by environment-specific values. For example, a base configuration might specify default model parameters, while a production configuration overrides the model endpoint and API key.

import * as fs from 'fs/promises';
import * as path from 'path';
import * as yaml from 'js-yaml';
import { z } from 'zod';

// Define configuration schema using Zod for runtime validation
const LLMModelConfigSchema = z.object({
    modelName: z.string(),
    temperature: z.number().min(0).max(2).default(0.7),
    maxTokens: z.number().positive().optional(),
    topP: z.number().min(0).max(1).default(1.0),
    topK: z.number().positive().optional(),
    contextWindow: z.number().positive().default(4096),
    systemMessage: z.string().optional()
});

const ApplicationConfigSchema = z.object({
    llm: LLMModelConfigSchema,
    apiKeys: z.record(z.string()).default({}),
    loggingLevel: z.enum(['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']).default('INFO'),
    enableStreaming: z.boolean().default(true),
    maxRetries: z.number().nonnegative().default(3),
    timeoutSeconds: z.number().positive().default(30)
});

export type LLMModelConfig = z.infer<typeof LLMModelConfigSchema>;
export type ApplicationConfig = z.infer<typeof ApplicationConfigSchema>;

export class ConfigurationManager {
    private configPath?: string;
    private config?: ApplicationConfig;
    private logger: Logger;

    constructor(configPath?: string) {
        this.configPath = configPath;
        this.logger = createLogger('ConfigurationManager');
    }

    public async load(configPath?: string): Promise<ApplicationConfig> {
        const pathToLoad = configPath || this.configPath;

        if (!pathToLoad) {
            this.logger.warn('No configuration file specified, using defaults');
            return this.createDefaultConfig();
        }

        try {
            await fs.access(pathToLoad);
        } catch (error) {
            throw new Error(`Configuration file not found: ${pathToLoad}`);
        }

        const ext = path.extname(pathToLoad).toLowerCase();
        let configData: any;

        if (ext === '.json') {
            configData = await this.loadJson(pathToLoad);
        } else if (ext === '.yaml' || ext === '.yml') {
            configData = await this.loadYaml(pathToLoad);
        } else {
            throw new Error(`Unsupported configuration format: ${ext}`);
        }

        // Validate and parse configuration
        try {
            this.config = ApplicationConfigSchema.parse(configData);
            this.logger.info(`Loaded configuration from ${pathToLoad}`);
            return this.config;
        } catch (error) {
            if (error instanceof z.ZodError) {
                const issues = error.issues.map(i => `${i.path.join('.')}: ${i.message}`);
                throw new Error(`Configuration validation failed:\n${issues.join('\n')}`);
            }
            throw error;
        }
    }

The configuration manager uses Zod, a TypeScript-first schema validation library. Zod schemas provide both runtime validation and compile-time type inference. The z.infer utility extracts TypeScript types from schemas, ensuring that the type definitions always match the validation rules.

This approach provides several benefits. Configuration is validated at load time, catching errors before they cause runtime failures. The type system ensures that code using the configuration accesses only valid fields. Default values are specified once in the schema rather than scattered throughout the code.

    private async loadJson(filePath: string): Promise<any> {
        const content = await fs.readFile(filePath, 'utf-8');
        return JSON.parse(content);
    }

    private async loadYaml(filePath: string): Promise<any> {
        const content = await fs.readFile(filePath, 'utf-8');
        return yaml.load(content);
    }

    private createDefaultConfig(): ApplicationConfig {
        return ApplicationConfigSchema.parse({
            llm: {
                modelName: 'gpt-3.5-turbo',
                temperature: 0.7,
                topP: 1.0,
                contextWindow: 4096
            },
            apiKeys: {},
            loggingLevel: 'INFO',
            enableStreaming: true,
            maxRetries: 3,
            timeoutSeconds: 30
        });
    }

    public async save(config: ApplicationConfig, outputPath: string): Promise<void> {
        const ext = path.extname(outputPath).toLowerCase();

        let content: string;
        if (ext === '.json') {
            content = JSON.stringify(config, null, 2);
        } else if (ext === '.yaml' || ext === '.yml') {
            content = yaml.dump(config);
        } else {
            throw new Error(`Unsupported output format: ${ext}`);
        }

        await fs.writeFile(outputPath, content, 'utf-8');
        this.logger.info(`Saved configuration to ${outputPath}`);
    }

    public getConfig(): ApplicationConfig {
        if (!this.config) {
            throw new Error('Configuration not loaded. Call load() first.');
        }
        return this.config;
    }

    public async merge(overrides: Partial<ApplicationConfig>): Promise<ApplicationConfig> {
        const current = this.config || this.createDefaultConfig();
        
        const merged = {
            ...current,
            ...overrides,
            llm: {
                ...current.llm,
                ...(overrides.llm || {})
            },
            apiKeys: {
                ...current.apiKeys,
                ...(overrides.apiKeys || {})
            }
        };

        this.config = ApplicationConfigSchema.parse(merged);
        return this.config;
    }

    public async loadWithEnvironmentOverrides(
        configPath: string,
        envPrefix: string = 'LLM_'
    ): Promise<ApplicationConfig> {
        const baseConfig = await this.load(configPath);

        const overrides: Partial<ApplicationConfig> = {};

        // Check for environment variable overrides
        if (process.env[`${envPrefix}MODEL_NAME`]) {
            overrides.llm = {
                ...baseConfig.llm,
                modelName: process.env[`${envPrefix}MODEL_NAME`]!
            };
        }

        if (process.env[`${envPrefix}TEMPERATURE`]) {
            overrides.llm = {
                ...(overrides.llm || baseConfig.llm),
                temperature: parseFloat(process.env[`${envPrefix}TEMPERATURE`]!)
            };
        }

        if (process.env[`${envPrefix}API_KEY`]) {
            overrides.apiKeys = {
                ...baseConfig.apiKeys,
                default: process.env[`${envPrefix}API_KEY`]!
            };
        }

        if (Object.keys(overrides).length > 0) {
            this.logger.info('Applied environment variable overrides');
            return this.merge(overrides);
        }

        return baseConfig;
    }
}

The save method enables round-tripping configuration. Applications can load configuration, modify it programmatically, and save the updated version. This is useful for tools that help users configure the application through a graphical interface.

The merge method provides a way to combine configurations. This is particularly useful when loading a base configuration and applying environment-specific overrides. The method performs a deep merge, ensuring that nested objects like llm and apiKeys are merged correctly rather than replaced entirely.

The loadWithEnvironmentOverrides method demonstrates a common pattern in production applications. Base configuration comes from a file, but sensitive values like API keys and environment-specific settings come from environment variables. This allows the same configuration file to be used across environments while keeping secrets out of version control.

A typical configuration file in YAML format might look like this:

llm:
  modelName: "meta-llama/Llama-2-7b-chat-hf"
  temperature: 0.8
  maxTokens: 2048
  topP: 0.95
  topK: 50
  contextWindow: 4096
  systemMessage: "You are a helpful AI assistant."

apiKeys:
  openai: "sk-..."
  anthropic: "sk-ant-..."

loggingLevel: "INFO"
enableStreaming: true
maxRetries: 3
timeoutSeconds: 60

The hierarchical structure makes configuration files readable and maintainable. Related settings are grouped together, and the structure mirrors the code's type definitions, making it easy to understand how configuration maps to application behavior.

Tool Calling Framework Component

Modern LLMs can use external tools to extend their capabilities beyond text generation. A model might call a web search API to find current information, execute code to perform calculations, or query a database to retrieve specific data. The tool calling framework component provides infrastructure for defining tools, invoking them safely, and integrating results back into the conversation.

The framework uses a plugin architecture where each tool is a self-contained unit with a clear interface. Tools declare their name, description, and parameters using a schema that the LLM can understand. When the model decides to use a tool, the framework validates the parameters, executes the tool, and formats the result.

import { z } from 'zod';

export interface ToolParameter {
    name: string;
    type: 'string' | 'number' | 'boolean' | 'array' | 'object';
    description: string;
    required: boolean;
    enum?: any[];
}

export interface ToolSchema {
    name: string;
    description: string;
    parameters: ToolParameter[];
}

export interface ToolExecutionResult {
    success: boolean;
    result?: any;
    error?: string;
}

export abstract class BaseTool {
    protected logger: Logger;

    constructor() {
        this.logger = createLogger(this.constructor.name);
    }

    public abstract getSchema(): ToolSchema;

    public abstract execute(parameters: Record<string, any>): Promise<any>;

    public async validateAndExecute(parameters: Record<string, any>): Promise<ToolExecutionResult> {
        const schema = this.getSchema();

        // Validate required parameters
        for (const param of schema.parameters) {
            if (param.required && !(param.name in parameters)) {
                return {
                    success: false,
                    error: `Missing required parameter: ${param.name}`
                };
            }
        }

        try {
            const result = await this.execute(parameters);
            return {
                success: true,
                result
            };
        } catch (error) {
            this.logger.error('Tool execution failed', error);
            return {
                success: false,
                error: error instanceof Error ? error.message : String(error)
            };
        }
    }

    public toOpenAIFormat(): Record<string, any> {
        const schema = this.getSchema();
        const properties: Record<string, any> = {};
        const required: string[] = [];

        for (const param of schema.parameters) {
            properties[param.name] = {
                type: param.type,
                description: param.description
            };

            if (param.enum) {
                properties[param.name].enum = param.enum;
            }

            if (param.required) {
                required.push(param.name);
            }
        }

        return {
            type: 'function',
            function: {
                name: schema.name,
                description: schema.description,
                parameters: {
                    type: 'object',
                    properties,
                    required
                }
            }
        };
    }
}

The BaseTool class defines the contract that all tools must implement. The getSchema method returns metadata that describes what the tool does and what parameters it accepts. The execute method performs the actual work. The validateAndExecute method adds a safety layer that checks parameters before execution and handles errors gracefully.

The toOpenAIFormat method shows how the schema can be adapted to different formats. This flexibility allows the same tool definitions to work with different LLM providers. The method transforms the internal schema representation into the format that OpenAI's function calling API expects.

Here is a concrete implementation of a web search tool using DuckDuckGo:

import axios from 'axios';

interface SearchResult {
    position: number;
    title: string;
    url: string;
    snippet: string;
}

interface WebSearchResult {
    query: string;
    timestamp: string;
    results: SearchResult[];
    count: number;
}

export class WebSearchTool extends BaseTool {
    private maxResults: number;

    constructor(maxResults: number = 5) {
        super();
        this.maxResults = maxResults;
    }

    public getSchema(): ToolSchema {
        return {
            name: 'web_search',
            description: 'Search the web for current information using DuckDuckGo. Use this when you need up-to-date information or facts that you do not have in your training data.',
            parameters: [
                {
                    name: 'query',
                    type: 'string',
                    description: 'The search query to execute',
                    required: true
                },
                {
                    name: 'maxResults',
                    type: 'number',
                    description: 'Maximum number of results to return (default: 5)',
                    required: false
                }
            ]
        };
    }

    public async execute(parameters: Record<string, any>): Promise<WebSearchResult> {
        const query = parameters.query as string;
        const maxResults = (parameters.maxResults as number) || this.maxResults;

        this.logger.info(`Executing web search: ${query}`);

        try {
            // Use DuckDuckGo HTML API
            const response = await axios.get('https://html.duckduckgo.com/html/', {
                params: { q: query },
                headers: {
                    'User-Agent': 'Mozilla/5.0 (compatible; LLMApp/1.0)'
                }
            });

            const results = this.parseSearchResults(response.data, maxResults);

            return {
                query,
                timestamp: new Date().toISOString(),
                results,
                count: results.length
            };
        } catch (error) {
            this.logger.error('Search failed', error);
            throw new Error(`Web search failed: ${error instanceof Error ? error.message : String(error)}`);
        }
    }

    private parseSearchResults(html: string, maxResults: number): SearchResult[] {
        const results: SearchResult[] = [];
        
        // Simple regex-based parsing (in production, use a proper HTML parser)
        const resultPattern = /<a class="result__a" href="([^"]+)">([^<]+)<\/a>[\s\S]*?<a class="result__snippet"[^>]*>([^<]+)</g;
        
        let match;
        let position = 1;
        
        while ((match = resultPattern.exec(html)) !== null && position <= maxResults) {
            results.push({
                position,
                title: this.decodeHtml(match[2]),
                url: this.decodeHtml(match[1]),
                snippet: this.decodeHtml(match[3])
            });
            position++;
        }

        return results;
    }

    private decodeHtml(text: string): string {
        return text
            .replace(/&amp;/g, '&')
            .replace(/&lt;/g, '<')
            .replace(/&gt;/g, '>')
            .replace(/&quot;/g, '"')
            .replace(/&#39;/g, "'")
            .trim();
    }
}

The web search tool demonstrates several important patterns. It encapsulates the complexity of interacting with the DuckDuckGo API behind a simple interface. The execute method returns structured data that is easy for both the LLM and application code to process. Error handling ensures that search failures are logged and reported rather than crashing the application.

The tool framework also includes a registry that manages available tools and routes execution requests:

export class ToolRegistry {
    private tools: Map<string, BaseTool>;
    private logger: Logger;

    constructor() {
        this.tools = new Map();
        this.logger = createLogger('ToolRegistry');
    }

    public register(tool: BaseTool): void {
        const schema = tool.getSchema();
        this.tools.set(schema.name, tool);
        this.logger.info(`Registered tool: ${schema.name}`);
    }

    public getTool(name: string): BaseTool | undefined {
        return this.tools.get(name);
    }

    public getAllSchemas(): ToolSchema[] {
        return Array.from(this.tools.values()).map(tool => tool.getSchema());
    }

    public getAllOpenAIFormats(): Record<string, any>[] {
        return Array.from(this.tools.values()).map(tool => tool.toOpenAIFormat());
    }

    public async executeTool(name: string, parameters: Record<string, any>): Promise<ToolExecutionResult> {
        const tool = this.getTool(name);
        
        if (!tool) {
            return {
                success: false,
                error: `Tool not found: ${name}`
            };
        }

        return tool.validateAndExecute(parameters);
    }

    public hasTools(): boolean {
        return this.tools.size > 0;
    }

    public getToolNames(): string[] {
        return Array.from(this.tools.keys());
    }
}

The registry provides a central point for managing tools. Applications register all available tools at startup, and the registry handles routing execution requests to the appropriate tool. This centralization makes it easy to add, remove, or modify tools without changing the core application logic.

For paid services like SERP API, the framework supports the same interface with different implementations:

export class SerpAPISearchTool extends BaseTool {
    private apiKey: string;
    private maxResults: number;

    constructor(apiKey: string, maxResults: number = 5) {
        super();
        this.apiKey = apiKey;
        this.maxResults = maxResults;
    }

    public getSchema(): ToolSchema {
        return {
            name: 'web_search',
            description: 'Search the web using SERP API for high-quality, structured results.',
            parameters: [
                {
                    name: 'query',
                    type: 'string',
                    description: 'The search query',
                    required: true
                },
                {
                    name: 'maxResults',
                    type: 'number',
                    description: 'Maximum results to return',
                    required: false
                }
            ]
        };
    }

    public async execute(parameters: Record<string, any>): Promise<WebSearchResult> {
        const query = parameters.query as string;
        const maxResults = (parameters.maxResults as number) || this.maxResults;

        const response = await axios.get('https://serpapi.com/search', {
            params: {
                q: query,
                api_key: this.apiKey,
                num: maxResults
            }
        });

        const organicResults = response.data.organic_results || [];

        const results: SearchResult[] = organicResults
            .slice(0, maxResults)
            .map((result: any, index: number) => ({
                position: index + 1,
                title: result.title || '',
                url: result.link || '',
                snippet: result.snippet || ''
            }));

        return {
            query,
            timestamp: new Date().toISOString(),
            results,
            count: results.length
        };
    }
}

Both search tools implement the same schema, making them interchangeable. An application can switch from the free DuckDuckGo service to the paid SERP API by simply registering a different tool instance, without changing any other code.

Model Context Protocol Integration Component

The Model Context Protocol, developed by Anthropic, provides a standardized way for LLM applications to access external context sources. MCP servers expose resources like files, databases, or APIs through a uniform interface. MCP clients consume these resources and make them available to LLMs.

The MCP integration component provides both client and server implementations, allowing applications to act as either consumers or providers of context. This enables sophisticated architectures where multiple applications share context through MCP.

export interface MCPResource {
    uri: string;
    name: string;
    description?: string;
    mimeType?: string;
}

export interface MCPTool {
    name: string;
    description: string;
    inputSchema: Record<string, any>;
}

export interface MCPResourceContent {
    uri: string;
    content: string;
    mimeType: string;
}

export class MCPClient {
    private serverUrl: string;
    private logger: Logger;
    private connected: boolean;

    constructor(serverUrl: string) {
        this.serverUrl = serverUrl;
        this.logger = createLogger('MCPClient');
        this.connected = false;
    }

    public async connect(): Promise<void> {
        this.logger.info(`Connecting to MCP server: ${this.serverUrl}`);
        
        // In a real implementation, this would establish a WebSocket or HTTP connection
        // For this example, we show the interface structure
        this.connected = true;
    }

    public async listResources(): Promise<MCPResource[]> {
        if (!this.connected) {
            throw new Error('Not connected to MCP server');
        }

        this.logger.info('Listing MCP resources');
        
        // Real implementation would make an RPC call
        return [];
    }

    public async readResource(uri: string): Promise<MCPResourceContent> {
        if (!this.connected) {
            throw new Error('Not connected to MCP server');
        }

        this.logger.info(`Reading resource: ${uri}`);
        
        // Real implementation would fetch the resource
        return {
            uri,
            content: '',
            mimeType: 'text/plain'
        };
    }

    public async listTools(): Promise<MCPTool[]> {
        if (!this.connected) {
            throw new Error('Not connected to MCP server');
        }

        this.logger.info('Listing MCP tools');
        return [];
    }

    public async callTool(name: string, arguments_: Record<string, any>): Promise<any> {
        if (!this.connected) {
            throw new Error('Not connected to MCP server');
        }

        this.logger.info(`Calling MCP tool: ${name}`);
        
        // Real implementation would make RPC call
        return { result: null };
    }

    public async disconnect(): Promise<void> {
        if (this.connected) {
            this.logger.info('Disconnecting from MCP server');
            this.connected = false;
        }
    }

    public isConnected(): boolean {
        return this.connected;
    }
}

The MCP client provides asynchronous methods for all operations because network communication is inherently asynchronous. The async/await pattern makes the code easy to read and maintain while handling the complexity of asynchronous operations.

The client separates resource access from tool invocation. Resources are passive data sources that the client reads. Tools are active operations that the server executes on behalf of the client. This distinction is important because it affects caching, permissions, and error handling.

The server implementation mirrors the client:

type ResourceHandler = () => Promise<string>;
type ToolHandler = (args: Record<string, any>) => Promise<any>;

interface RegisteredResource {
    name: string;
    handler: ResourceHandler;
    description?: string;
    mimeType?: string;
}

interface RegisteredTool {
    handler: ToolHandler;
    description: string;
    inputSchema: Record<string, any>;
}

export class MCPServer {
    private name: string;
    private version: string;
    private resources: Map<string, RegisteredResource>;
    private tools: Map<string, RegisteredTool>;
    private logger: Logger;

    constructor(name: string, version: string) {
        this.name = name;
        this.version = version;
        this.resources = new Map();
        this.tools = new Map();
        this.logger = createLogger('MCPServer');
    }

    public registerResource(
        uri: string,
        name: string,
        handler: ResourceHandler,
        options?: {
            description?: string;
            mimeType?: string;
        }
    ): void {
        this.resources.set(uri, {
            name,
            handler,
            description: options?.description,
            mimeType: options?.mimeType
        });
        this.logger.info(`Registered resource: ${uri}`);
    }

    public registerTool(
        name: string,
        handler: ToolHandler,
        description: string,
        inputSchema: Record<string, any>
    ): void {
        this.tools.set(name, {
            handler,
            description,
            inputSchema
        });
        this.logger.info(`Registered tool: ${name}`);
    }

    public async handleListResources(): Promise<MCPResource[]> {
        const resources: MCPResource[] = [];
        
        for (const [uri, info] of this.resources.entries()) {
            resources.push({
                uri,
                name: info.name,
                description: info.description,
                mimeType: info.mimeType
            });
        }

        return resources;
    }

    public async handleReadResource(uri: string): Promise<MCPResourceContent> {
        const resource = this.resources.get(uri);
        
        if (!resource) {
            throw new Error(`Resource not found: ${uri}`);
        }

        const content = await resource.handler();

        return {
            uri,
            content,
            mimeType: resource.mimeType || 'text/plain'
        };
    }

    public async handleCallTool(name: string, arguments_: Record<string, any>): Promise<any> {
        const tool = this.tools.get(name);
        
        if (!tool) {
            throw new Error(`Tool not found: ${name}`);
        }

        const result = await tool.handler(arguments_);
        return { result };
    }

    public getServerInfo(): { name: string; version: string } {
        return {
            name: this.name,
            version: this.version
        };
    }
}

The server uses a registration pattern where handlers are registered for specific URIs and tool names. This makes it easy to add new resources and tools dynamically. The handlers are async functions, allowing them to perform I/O operations efficiently.

An example of using the MCP server to expose a file system:

import * as fs from 'fs/promises';
import * as path from 'path';

export async function createFileServer(basePath: string): Promise<MCPServer> {
    const server = new MCPServer('file_server', '1.0.0');
    const baseDir = path.resolve(basePath);

    // Register resources for text files
    const files = await fs.readdir(baseDir, { recursive: true });
    
    for (const file of files) {
        const filePath = path.join(baseDir, file as string);
        const stat = await fs.stat(filePath);
        
        if (stat.isFile() && filePath.endsWith('.txt')) {
            const relativePath = path.relative(baseDir, filePath);
            const uri = `file:///${relativePath.replace(/\\/g, '/')}`;

            server.registerResource(
                uri,
                relativePath,
                async () => {
                    return await fs.readFile(filePath, 'utf-8');
                },
                {
                    description: `Text file: ${relativePath}`,
                    mimeType: 'text/plain'
                }
            );
        }
    }

    // Register a search tool
    server.registerTool(
        'search_files',
        async (args: Record<string, any>) => {
            const query = args.query as string;
            const results: string[] = [];

            for (const file of files) {
                const filePath = path.join(baseDir, file as string);
                const stat = await fs.stat(filePath);
                
                if (stat.isFile() && filePath.endsWith('.txt')) {
                    const content = await fs.readFile(filePath, 'utf-8');
                    if (content.toLowerCase().includes(query.toLowerCase())) {
                        results.push(path.relative(baseDir, filePath));
                    }
                }
            }

            return results;
        },
        'Search for files containing specific text',
        {
            type: 'object',
            properties: {
                query: {
                    type: 'string',
                    description: 'Text to search for'
                }
            },
            required: ['query']
        }
    );

    return server;
}

This example shows how MCP enables powerful integrations. The file server exposes an entire directory tree as MCP resources, making all files accessible to any MCP client. The search tool provides a way to find files by content, demonstrating how MCP tools can perform operations beyond simple data retrieval.

Message Management and Chat History Component

Conversational LLM applications must manage the flow of messages between users and the model. Each interaction involves system messages that set behavior, user messages containing requests, and assistant messages with responses. Managing this conversation state correctly is essential for coherent multi-turn interactions.

The message management component provides structures for representing messages and utilities for managing conversation history. It handles concerns like context window limits, message formatting, and conversation persistence.

export interface ConversationMessage {
    role: 'system' | 'user' | 'assistant';
    content: string;
    timestamp: Date;
    metadata: Record<string, any>;
}

type TokenCounter = (text: string) => number;

export class ChatHistory {
    private messages: ConversationMessage[];
    private maxContextTokens: number;
    private tokenCounter: TokenCounter;
    private logger: Logger;

    constructor(
        systemMessage?: string,
        maxContextTokens: number = 4096,
        tokenCounter?: TokenCounter
    ) {
        this.messages = [];
        this.maxContextTokens = maxContextTokens;
        this.tokenCounter = tokenCounter || this.defaultTokenCounter;
        this.logger = createLogger('ChatHistory');

        if (systemMessage) {
            this.addSystemMessage(systemMessage);
        }
    }

    private defaultTokenCounter(text: string): number {
        // Rough estimation: 1 token per 4 characters
        return Math.ceil(text.length / 4);
    }

    public addSystemMessage(content: string, metadata: Record<string, any> = {}): void {
        const message: ConversationMessage = {
            role: 'system',
            content,
            timestamp: new Date(),
            metadata
        };
        this.messages.push(message);
        this.logger.debug('Added system message');
    }

    public addUserMessage(content: string, metadata: Record<string, any> = {}): void {
        const message: ConversationMessage = {
            role: 'user',
            content,
            timestamp: new Date(),
            metadata
        };
        this.messages.push(message);
        this.logger.debug('Added user message');
    }

    public addAssistantMessage(content: string, metadata: Record<string, any> = {}): void {
        const message: ConversationMessage = {
            role: 'assistant',
            content,
            timestamp: new Date(),
            metadata
        };
        this.messages.push(message);
        this.logger.debug('Added assistant message');
    }

    public getMessagesForLLM(): Message[] {
        // Always include system messages
        const systemMessages = this.messages.filter(msg => msg.role === 'system');
        const conversationMessages = this.messages.filter(msg => msg.role !== 'system');

        // Count tokens for system messages
        const systemTokens = systemMessages.reduce(
            (sum, msg) => sum + this.tokenCounter(msg.content),
            0
        );

        // Calculate available tokens for conversation
        const availableTokens = this.maxContextTokens - systemTokens;

        // Add conversation messages from most recent, staying within limit
        const selectedMessages: ConversationMessage[] = [];
        let currentTokens = 0;

        for (let i = conversationMessages.length - 1; i >= 0; i--) {
            const msg = conversationMessages[i];
            const msgTokens = this.tokenCounter(msg.content);

            if (currentTokens + msgTokens > availableTokens) {
                break;
            }

            selectedMessages.unshift(msg);
            currentTokens += msgTokens;
        }

        // Combine system and selected conversation messages
        const allMessages = [...systemMessages, ...selectedMessages];

        // Convert to Message format
        return allMessages.map(msg => ({
            role: msg.role,
            content: msg.content
        }));
    }

The chat history component implements intelligent context window management. It ensures that the total tokens sent to the LLM never exceed the model's context limit. System messages are always included because they define the assistant's behavior. Conversation messages are included starting from the most recent, working backward until the token limit is reached.

This approach ensures that the model always has the most relevant context. Recent messages are more important for maintaining conversation coherence than older messages. If the conversation becomes very long, older messages are automatically dropped.

The token counter is pluggable. The default implementation uses a simple character-based estimation, but applications can provide more accurate counters using tokenizers specific to their model:

    public setTokenCounter(counter: TokenCounter): void {
        this.tokenCounter = counter;
        this.logger.info('Updated token counter');
    }

    public getTotalTokens(): number {
        return this.messages.reduce(
            (sum, msg) => sum + this.tokenCounter(msg.content),
            0
        );
    }

    public clear(): void {
        this.messages = this.messages.filter(msg => msg.role === 'system');
        this.logger.info('Cleared conversation history');
    }

    public async save(filePath: string): Promise<void> {
        const data = {
            maxContextTokens: this.maxContextTokens,
            messages: this.messages.map(msg => ({
                role: msg.role,
                content: msg.content,
                timestamp: msg.timestamp.toISOString(),
                metadata: msg.metadata
            }))
        };

        await fs.writeFile(filePath, JSON.stringify(data, null, 2), 'utf-8');
        this.logger.info(`Saved conversation to ${filePath}`);
    }

    public static async load(filePath: string): Promise<ChatHistory> {
        const content = await fs.readFile(filePath, 'utf-8');
        const data = JSON.parse(content);

        const history = new ChatHistory(undefined, data.maxContextTokens);
        history.messages = data.messages.map((msg: any) => ({
            role: msg.role,
            content: msg.content,
            timestamp: new Date(msg.timestamp),
            metadata: msg.metadata || {}
        }));

        return history;
    }

    public getMessages(): ConversationMessage[] {
        return [...this.messages];
    }

    public getMessageCount(): number {
        return this.messages.length;
    }

    public getLastMessage(): ConversationMessage | undefined {
        return this.messages[this.messages.length - 1];
    }
}

The save and load methods enable conversation persistence. Applications can save conversations to disk and resume them later. This is essential for applications that need to maintain state across sessions or allow users to review past conversations.

The metadata field in ConversationMessage provides extensibility. Applications can attach arbitrary data to messages, such as user IDs, confidence scores, or references to external resources. This metadata is preserved when saving and loading conversations.

Circuit Breaker and Rate Limiting Component

Production LLM applications must handle failures gracefully and respect API rate limits. External LLM services can experience outages, network issues can cause timeouts, and exceeding rate limits can result in blocked requests. The circuit breaker and rate limiting component provides resilience mechanisms that prevent cascading failures and ensure compliance with service quotas.

A circuit breaker monitors requests to an external service and automatically stops sending requests when the service appears to be failing. This prevents wasting resources on requests that will likely fail and gives the service time to recover. After a cooldown period, the circuit breaker allows a test request through to check if the service has recovered.

export enum CircuitState {
    CLOSED = 'closed',
    OPEN = 'open',
    HALF_OPEN = 'half_open'
}

export class CircuitBreaker {
    private failureThreshold: number;
    private recoveryTimeout: number;
    private expectedError: typeof Error;
    
    private failureCount: number;
    private lastFailureTime?: Date;
    private state: CircuitState;
    private logger: Logger;

    constructor(
        failureThreshold: number = 5,
        recoveryTimeout: number = 60,
        expectedError: typeof Error = Error
    ) {
        this.failureThreshold = failureThreshold;
        this.recoveryTimeout = recoveryTimeout;
        this.expectedError = expectedError;
        
        this.failureCount = 0;
        this.state = CircuitState.CLOSED;
        this.logger = createLogger('CircuitBreaker');
    }

    public async call<T>(func: () => Promise<T>): Promise<T> {
        if (this.state === CircuitState.OPEN) {
            if (this.shouldAttemptReset()) {
                this.logger.info('Circuit breaker entering half-open state');
                this.state = CircuitState.HALF_OPEN;
            } else {
                throw new Error('Circuit breaker is OPEN');
            }
        }

        try {
            const result = await func();
            this.onSuccess();
            return result;
        } catch (error) {
            if (error instanceof this.expectedError) {
                this.onFailure();
            }
            throw error;
        }
    }

    private shouldAttemptReset(): boolean {
        if (!this.lastFailureTime) {
            return false;
        }

        const elapsed = (Date.now() - this.lastFailureTime.getTime()) / 1000;
        return elapsed >= this.recoveryTimeout;
    }

    private onSuccess(): void {
        if (this.state === CircuitState.HALF_OPEN) {
            this.logger.info('Circuit breaker closing after successful test');
            this.state = CircuitState.CLOSED;
        }

        this.failureCount = 0;
    }

    private onFailure(): void {
        this.failureCount++;
        this.lastFailureTime = new Date();

        if (this.failureCount >= this.failureThreshold) {
            this.logger.warn(
                `Circuit breaker opening after ${this.failureCount} failures`
            );
            this.state = CircuitState.OPEN;
        }
    }

    public reset(): void {
        this.logger.info('Manually resetting circuit breaker');
        this.state = CircuitState.CLOSED;
        this.failureCount = 0;
        this.lastFailureTime = undefined;
    }

    public getState(): CircuitState {
        return this.state;
    }

    public getFailureCount(): number {
        return this.failureCount;
    }
}

The circuit breaker uses a state machine with three states. In the closed state, requests flow normally. When failures exceed the threshold, the circuit opens and blocks all requests. After the recovery timeout, the circuit enters half-open state and allows one test request. If the test succeeds, the circuit closes. If it fails, the circuit reopens.

This mechanism prevents overwhelming a failing service with requests while still allowing automatic recovery. The recovery timeout gives the service time to stabilize before testing whether it is healthy again.

Rate limiting complements the circuit breaker by preventing the application from exceeding service quotas:

export class RateLimiter {
    private maxRequests: number;
    private timeWindow: number;
    private burstSize: number;
    
    private tokens: number;
    private lastUpdate: number;
    private logger: Logger;

    constructor(
        maxRequests: number,
        timeWindow: number,
        burstSize?: number
    ) {
        this.maxRequests = maxRequests;
        this.timeWindow = timeWindow;
        this.burstSize = burstSize || maxRequests;
        
        this.tokens = this.burstSize;
        this.lastUpdate = Date.now();
        this.logger = createLogger('RateLimiter');
    }

    public acquire(tokens: number = 1): boolean {
        this.refill();

        if (this.tokens >= tokens) {
            this.tokens -= tokens;
            return true;
        }

        return false;
    }

    public async waitAndAcquire(tokens: number = 1, timeout?: number): Promise<void> {
        const startTime = Date.now();

        while (!this.acquire(tokens)) {
            if (timeout && (Date.now() - startTime) > timeout) {
                throw new Error('Rate limiter timeout exceeded');
            }

            const waitTime = this.timeUntilNextToken();
            await this.sleep(Math.min(waitTime, 100));
        }
    }

    private refill(): void {
        const now = Date.now();
        const elapsed = (now - this.lastUpdate) / 1000;

        const tokensToAdd = (elapsed / this.timeWindow) * this.maxRequests;
        this.tokens = Math.min(this.burstSize, this.tokens + tokensToAdd);
        this.lastUpdate = now;
    }

    private timeUntilNextToken(): number {
        if (this.tokens >= 1) {
            return 0;
        }

        const tokensNeeded = 1 - this.tokens;
        return (tokensNeeded / this.maxRequests) * this.timeWindow * 1000;
    }

    private sleep(ms: number): Promise<void> {
        return new Promise(resolve => setTimeout(resolve, ms));
    }

    public getAvailableTokens(): number {
        this.refill();
        return this.tokens;
    }

    public reset(): void {
        this.tokens = this.burstSize;
        this.lastUpdate = Date.now();
        this.logger.info('Rate limiter reset');
    }
}

The rate limiter implements the token bucket algorithm. Tokens are added to the bucket at a steady rate determined by maxRequests and timeWindow. Each request consumes one token. If no tokens are available, the request must wait.

The burstSize parameter allows short bursts of requests above the average rate. This is useful for handling legitimate traffic spikes without triggering rate limits. The bucket can hold more tokens than the steady-state rate, allowing accumulated tokens to be spent quickly.

The waitAndAcquire method blocks until tokens are available. This is convenient for applications that can afford to wait rather than failing immediately. The timeout parameter prevents indefinite blocking.

Combining circuit breaker and rate limiter creates robust request handling:

export class ResilientLLMClient {
    private llm: BaseLLM;
    private rateLimiter: RateLimiter;
    private circuitBreaker: CircuitBreaker;
    private logger: Logger;

    constructor(
        llm: BaseLLM,
        maxRequestsPerMinute: number = 60,
        circuitBreakerThreshold: number = 5
    ) {
        this.llm = llm;
        this.rateLimiter = new RateLimiter(maxRequestsPerMinute, 60);
        this.circuitBreaker = new CircuitBreaker(circuitBreakerThreshold);
        this.logger = createLogger('ResilientLLMClient');
    }

    public async complete(
        messages: Message[],
        options?: CompletionOptions
    ): Promise<CompletionResponse> {
        await this.rateLimiter.waitAndAcquire();

        return this.circuitBreaker.call(async () => {
            return this.llm.complete(messages, options);
        });
    }

    public async *streamComplete(
        messages: Message[],
        options?: CompletionOptions
    ): AsyncIterableIterator<string> {
        await this.rateLimiter.waitAndAcquire();

        const generator = this.llm.streamComplete(messages, options);
        
        for await (const chunk of generator) {
            yield chunk;
        }
    }

    public getCircuitState(): CircuitState {
        return this.circuitBreaker.getState();
    }

    public getAvailableTokens(): number {
        return this.rateLimiter.getAvailableTokens();
    }

    public resetCircuitBreaker(): void {
        this.circuitBreaker.reset();
    }
}

The resilient client wraps an LLM implementation with both rate limiting and circuit breaking. Every request first waits for rate limit tokens, then executes through the circuit breaker. This ensures that the application respects rate limits and handles failures gracefully.

Additional Useful Components

Beyond the core components already discussed, several additional utilities enhance LLM application development.

Prompt Template Component

Prompt engineering is critical for LLM applications, but hard-coding prompts makes them difficult to modify and test. A prompt template component provides a structured way to define, parameterize, and manage prompts.

export class PromptTemplate {
    private template: string;
    private description?: string;
    private logger: Logger;

    constructor(template: string, description?: string) {
        this.template = template;
        this.description = description;
        this.logger = createLogger('PromptTemplate');
    }

    public format(variables: Record<string, any>): string {
        let result = this.template;

        for (const [key, value] of Object.entries(variables)) {
            const placeholder = `\${${key}}`;
            result = result.replace(new RegExp(placeholder.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'g'), String(value));
        }

        // Check for unfilled placeholders
        const remainingPlaceholders = result.match(/\$\{[^}]+\}/g);
        if (remainingPlaceholders) {
            throw new Error(`Missing variables: ${remainingPlaceholders.join(', ')}`);
        }

        return result;
    }

    public getVariables(): string[] {
        const matches = this.template.match(/\$\{([^}]+)\}/g);
        if (!matches) return [];

        return matches.map(match => match.slice(2, -1));
    }

    public getDescription(): string | undefined {
        return this.description;
    }

    public getTemplate(): string {
        return this.template;
    }
}

export class PromptLibrary {
    private templates: Map<string, PromptTemplate>;
    private logger: Logger;

    constructor() {
        this.templates = new Map();
        this.logger = createLogger('PromptLibrary');
    }

    public register(name: string, template: PromptTemplate): void {
        this.templates.set(name, template);
        this.logger.info(`Registered prompt template: ${name}`);
    }

    public get(name: string): PromptTemplate {
        const template = this.templates.get(name);
        if (!template) {
            throw new Error(`Template not found: ${name}`);
        }
        return template;
    }

    public format(name: string, variables: Record<string, any>): string {
        const template = this.get(name);
        return template.format(variables);
    }

    public has(name: string): boolean {
        return this.templates.has(name);
    }

    public getAll(): string[] {
        return Array.from(this.templates.keys());
    }
}

Prompt templates use simple string substitution for variable replacement. This provides a straightforward syntax while preventing code injection vulnerabilities that could occur with more powerful templating systems.

A template library manages collections of prompts. Applications can define all prompts in one place and reference them by name. This separation of prompts from code makes it easy to experiment with different phrasings and maintain consistency across the application.

Logging Component

Understanding LLM application behavior requires comprehensive logging. A logging component standardizes log formatting and provides utilities for tracking LLM interactions.

export enum LogLevel {
    DEBUG = 0,
    INFO = 1,
    WARNING = 2,
    ERROR = 3,
    CRITICAL = 4
}

export interface Logger {
    debug(message: string, ...args: any[]): void;
    info(message: string, ...args: any[]): void;
    warn(message: string, ...args: any[]): void;
    error(message: string, ...args: any[]): void;
    critical(message: string, ...args: any[]): void;
}

class LoggerImpl implements Logger {
    private name: string;
    private level: LogLevel;
    private logFile?: string;

    constructor(name: string, level: LogLevel = LogLevel.INFO, logFile?: string) {
        this.name = name;
        this.level = level;
        this.logFile = logFile;
    }

    public debug(message: string, ...args: any[]): void {
        this.log(LogLevel.DEBUG, message, ...args);
    }

    public info(message: string, ...args: any[]): void {
        this.log(LogLevel.INFO, message, ...args);
    }

    public warn(message: string, ...args: any[]): void {
        this.log(LogLevel.WARNING, message, ...args);
    }

    public error(message: string, ...args: any[]): void {
        this.log(LogLevel.ERROR, message, ...args);
    }

    public critical(message: string, ...args: any[]): void {
        this.log(LogLevel.CRITICAL, message, ...args);
    }

    private log(level: LogLevel, message: string, ...args: any[]): void {
        if (level < this.level) return;

        const timestamp = new Date().toISOString();
        const levelName = LogLevel[level];
        const formattedMessage = `${timestamp} - ${this.name} - ${levelName} - ${message}`;

        console.log(formattedMessage, ...args);

        if (this.logFile) {
            this.writeToFile(formattedMessage, args);
        }
    }

    private async writeToFile(message: string, args: any[]): Promise<void> {
        if (!this.logFile) return;

        const fullMessage = args.length > 0 
            ? `${message} ${JSON.stringify(args)}\n`
            : `${message}\n`;

        try {
            await fs.appendFile(this.logFile, fullMessage, 'utf-8');
        } catch (error) {
            console.error('Failed to write to log file:', error);
        }
    }
}

export function createLogger(name: string, level?: LogLevel, logFile?: string): Logger {
    return new LoggerImpl(name, level, logFile);
}

export class LLMLogger {
    private logger: Logger;
    private logFile?: string;

    constructor(name: string, logFile?: string) {
        this.logger = createLogger(name);
        this.logFile = logFile;
    }

    public async logCompletion(
        messages: Message[],
        response: CompletionResponse,
        duration: number,
        metadata?: Record<string, any>
    ): Promise<void> {
        const logEntry = {
            timestamp: new Date().toISOString(),
            type: 'completion',
            model: response.model,
            durationSeconds: duration,
            tokenUsage: response.usage,
            messageCount: messages.length,
            finishReason: response.finishReason,
            metadata: metadata || {}
        };

        this.logger.info(`Completion: ${response.model} (${duration.toFixed(2)}s)`);

        if (this.logFile) {
            await this.writeStructuredLog(logEntry);
        }
    }

    private async writeStructuredLog(entry: Record<string, any>): Promise<void> {
        if (!this.logFile) return;

        try {
            await fs.appendFile(
                this.logFile,
                JSON.stringify(entry) + '\n',
                'utf-8'
            );
        } catch (error) {
            this.logger.error('Failed to write structured log', error);
        }
    }
}

The LLM logger creates structured logs that can be analyzed to understand usage patterns, costs, and performance. Each completion is logged with timing information, token usage, and custom metadata.

Complete Running Example

The following complete example demonstrates how all components integrate to create a functional LLM application. This application provides a conversational interface with web search capabilities, configuration management, and resilience mechanisms.

import * as readline from 'readline';
import * as path from 'path';

export class LLMApplication {
    private config!: ApplicationConfig;
    private configManager: ConfigurationManager;
    private gpuDetector: GPUDetector;
    private acceleratorInfo?: AcceleratorInfo;
    private llm!: BaseLLM;
    private toolRegistry: ToolRegistry;
    private chatHistory!: ChatHistory;
    private resilientClient!: ResilientLLMClient;
    private llmLogger: LLMLogger;
    private logger: Logger;

    constructor(configPath: string) {
        this.logger = createLogger('LLMApplication');
        this.logger.info('Initializing LLM Application');

        this.configManager = new ConfigurationManager(configPath);
        this.gpuDetector = new GPUDetector();
        this.toolRegistry = new ToolRegistry();
        this.llmLogger = new LLMLogger('llm_interactions', 'llm_interactions.jsonl');
    }

    public async initialize(): Promise<void> {
        // Load configuration
        this.config = await this.configManager.load();

        // Detect GPU
        this.acceleratorInfo = await this.gpuDetector.detect();
        this.logger.info(`Using accelerator: ${this.acceleratorInfo.acceleratorType}`);

        // Initialize LLM
        await this.initializeLLM();

        // Register tools
        this.registerTools();

        // Initialize chat history
        this.chatHistory = new ChatHistory(
            this.config.llm.systemMessage,
            this.config.llm.contextWindow
        );

        // Initialize resilient client
        this.resilientClient = new ResilientLLMClient(
            this.llm,
            60,
            5
        );

        this.logger.info('Application initialization complete');
    }

    private async initializeLLM(): Promise<void> {
        const modelName = this.config.llm.modelName;

        if (modelName.startsWith('gpt-') || modelName.startsWith('claude-')) {
            const apiKey = this.config.apiKeys.openai;
            if (!apiKey) {
                throw new Error('OpenAI API key required for GPT models');
            }

            this.logger.info(`Initializing OpenAI LLM: ${modelName}`);
            this.llm = new OpenAILLM(modelName, apiKey);
        } else {
            const device = this.gpuDetector.getDeviceString();
            this.logger.info(`Initializing local LLM: ${modelName} on ${device}`);

            this.llm = new LocalLlamaLLM(modelName, device, {
                contextSize: this.config.llm.contextWindow
            });

            if (this.llm instanceof LocalLlamaLLM) {
                await this.llm.initialize();
            }
        }
    }

    private registerTools(): void {
        const searchTool = new WebSearchTool(5);
        this.toolRegistry.register(searchTool);
        this.logger.info('Registered web search tool');
    }

    public async processUserInput(userInput: string): Promise<string> {
        this.logger.info(`Processing user input: ${userInput.substring(0, 50)}...`);

        this.chatHistory.addUserMessage(userInput);

        let responseText = '';

        if (userInput.toLowerCase().includes('search') || userInput.toLowerCase().includes('find')) {
            const searchResult = await this.toolRegistry.executeTool('web_search', {
                query: userInput,
                maxResults: 3
            });

            if (searchResult.success) {
                const resultsText = this.formatSearchResults(searchResult.result);

                const augmentedInput = `User asked: ${userInput}\n\nHere are relevant search results:\n${resultsText}\n\nPlease provide a helpful response based on this information.`;

                const messages = this.chatHistory.getMessages();
                messages[messages.length - 1].content = augmentedInput;
            }
        }

        const messages = this.chatHistory.getMessagesForLLM();

        const startTime = Date.now();

        try {
            const response = await this.resilientClient.complete(messages, {
                temperature: this.config.llm.temperature,
                maxTokens: this.config.llm.maxTokens,
                topP: this.config.llm.topP,
                topK: this.config.llm.topK
            });

            const duration = (Date.now() - startTime) / 1000;
            responseText = response.content;

            await this.llmLogger.logCompletion(messages, response, duration);

            this.chatHistory.addAssistantMessage(responseText);

            this.logger.info(`Generated response in ${duration.toFixed(2)}s`);
        } catch (error) {
            this.logger.error('Error generating response', error);
            responseText = 'I apologize, but I encountered an error processing your request. Please try again.';
        }

        return responseText;
    }

    private formatSearchResults(searchData: any): string {
        const results = searchData.results || [];
        const formatted = results.map((result: any) => 
            `[${result.position}] ${result.title}\nURL: ${result.url}\n${result.snippet}\n`
        );

        return formatted.join('\n');
    }

    public async saveConversation(filePath: string): Promise<void> {
        await this.chatHistory.save(filePath);
        this.logger.info(`Saved conversation to ${filePath}`);
    }

    public async loadConversation(filePath: string): Promise<void> {
        this.chatHistory = await ChatHistory.load(filePath);
        this.logger.info(`Loaded conversation from ${filePath}`);
    }

    public async runInteractive(): Promise<void> {
        console.log('LLM Application Started');
        console.log(`Using model: ${this.config.llm.modelName}`);
        console.log(`Accelerator: ${this.acceleratorInfo?.acceleratorType}`);
        console.log("Type 'quit' to exit, 'save' to save conversation, 'clear' to clear history\n");

        const rl = readline.createInterface({
            input: process.stdin,
            output: process.stdout
        });

        const askQuestion = (query: string): Promise<string> => {
            return new Promise(resolve => rl.question(query, resolve));
        };

        while (true) {
            try {
                const userInput = await askQuestion('You: ');

                if (!userInput.trim()) {
                    continue;
                }

                if (userInput.toLowerCase() === 'quit') {
                    console.log('Goodbye!');
                    rl.close();
                    break;
                }

                if (userInput.toLowerCase() === 'save') {
                    const filename = `conversation_${Date.now()}.json`;
                    await this.saveConversation(filename);
                    console.log(`Conversation saved to ${filename}`);
                    continue;
                }

                if (userInput.toLowerCase() === 'clear') {
                    this.chatHistory.clear();
                    console.log('Conversation history cleared');
                    continue;
                }

                const response = await this.processUserInput(userInput);
                console.log(`\nAssistant: ${response}\n`);
            } catch (error) {
                this.logger.error('Error in interactive loop', error);
                console.log(`Error: ${error instanceof Error ? error.message : String(error)}`);
            }
        }
    }

    public async dispose(): Promise<void> {
        if (this.llm instanceof LocalLlamaLLM) {
            await this.llm.dispose();
        }
        this.logger.info('Application disposed');
    }
}

export async function createDefaultConfig(outputPath: string): Promise<void> {
    const config: ApplicationConfig = {
        llm: {
            modelName: 'gpt-3.5-turbo',
            temperature: 0.7,
            maxTokens: 2048,
            topP: 0.95,
            topK: 50,
            contextWindow: 4096,
            systemMessage: 'You are a helpful AI assistant with access to web search. Provide accurate, helpful responses.'
        },
        apiKeys: {},
        loggingLevel: 'INFO',
        enableStreaming: true,
        maxRetries: 3,
        timeoutSeconds: 60
    };

    const manager = new ConfigurationManager();
    await manager.save(config, outputPath);
    console.log(`Created default configuration at ${outputPath}`);
}

async function main(): Promise<void> {
    const args = process.argv.slice(2);
    const configPath = args.find(arg => arg.startsWith('--config='))?.split('=')[1] || 'config.yaml';
    const createConfig = args.includes('--create-config');

    if (createConfig) {
        await createDefaultConfig(configPath);
        return;
    }

    try {
        await fs.access(configPath);
    } catch {
        console.log(`Configuration file not found: ${configPath}`);
        console.log('Create one with: node app.js --create-config');
        return;
    }

    const app = new LLMApplication(configPath);
    
    try {
        await app.initialize();
        await app.runInteractive();
    } finally {
        await app.dispose();
    }
}

if (require.main === module) {
    main().catch(error => {
        console.error('Fatal error:', error);
        process.exit(1);
    });
}

This complete example demonstrates how all components work together. The application initializes by loading configuration, detecting GPU hardware, setting up the LLM, registering tools, and creating the chat history manager. The processUserInput method orchestrates the entire flow: adding messages to history, potentially invoking tools, generating completions through the resilient client, and logging interactions.

The interactive loop provides a simple command-line interface where users can have conversations, save and load conversation history, and clear the context. The application handles errors gracefully and provides informative logging throughout.

To use this application, users first create a configuration file:

node app.js --create-config

Then edit the configuration to add API keys and adjust parameters. Finally, run the application:

node app.js --config=config.yaml

The application demonstrates production-ready patterns including proper error handling, comprehensive logging, configuration management, resource cleanup, and graceful degradation when services are unavailable.

Conclusion

This article has presented a comprehensive TypeScript library for LLM application development. Each component addresses a specific recurring challenge: GPU detection eliminates platform-specific code, the abstract LLM interface enables model swapping, configuration management externalizes settings, tool calling extends LLM capabilities, MCP integration enables context sharing, message management handles conversation state, and circuit breakers with rate limiting provide resilience.

The components follow clean architecture principles with clear separation of concerns. Each component has a well-defined interface and can be used independently or in combination with others. TypeScript's powerful type system provides compile-time safety and excellent developer experience, catching errors before they reach production.

The running example demonstrates how these components integrate to create a complete, production-ready application. By providing these reusable components, the library eliminates the need for developers to repeatedly solve the same problems. Instead of spending time on infrastructure, developers can focus on the unique aspects of their applications: domain-specific logic, user experience, and business value.

The library is designed to be extensible. New LLM providers can be added by implementing the BaseLLM interface. New tools can be registered with the tool registry. Additional resilience mechanisms can wrap the existing components. This extensibility ensures that the library can evolve with the rapidly changing LLM ecosystem.

Future enhancements could include streaming response support in more components, integration with vector databases for retrieval-augmented generation, support for multi-modal models, enhanced observability with metrics and tracing, and WebSocket support for real-time applications. The foundation provided by these components makes such enhancements straightforward to implement while maintaining backward compatibility.

The goal of this library is to accelerate LLM application development by providing robust, well-tested components that handle common requirements. By building on this foundation, developers can create sophisticated LLM applications more quickly and with greater confidence in their reliability and maintainability.