LLMs and AI Agents: How Modern Artificial Intelligence Actually Works

11.01.2026
Today, almost everyone is adopting AI, yet truly functional products are still pretty rare. Why is that?
We explore this topic with Alex Martynov, the first expert of the new season at Lucky Hunter School.
  • Alex Martynov
    CEO & Founder at CombyCode Inc. | AI\​ML Researcher since 2005
Alex is a Fractional CTO and AI/ML researcher with 20+ years of experience. He was building neural networks and distributed systems long before they went mainstream.In this article, we’ll break down the inner workings of Large Language Models (LLMs), the architectural decisions driving their development, and how individual models are assembled into a full-fledged AI stack for real-world products.
Stick around until the end for a quick self-assessment quiz to reinforce what you've learned.

From the Human Brain to the Artificial Neuron

To understand modern LLMs, it is helpful to start with their biological prototype: the human brain. However, it is crucial to dispel one illusion immediately: neither the brain nor a neural network consists of "smart" components.

A single neuron, whether biological or artificial, is extremely simple. It doesn’t reason, understand meaning, or make decisions. Intelligence emerges not at the element level, but through the interaction of vast numbers of these elements.

The Biological Neuron and Synapses

A biological neuron is a living cell whose sole task is processing incoming signals. It receives impulses from other neurons via synapses. These impulses can be excitatory or inhibitory, while the synapses themselves act as signal strength regulators, amplifying some signals and dampening others.

If the sum of input signals exceeds a certain threshold, the neuron activates and passes the impulse along. On its own, this is a primitive mechanism. But when you have tens of billions of neurons with hundreds of trillions of connections, a system emerges that is capable of learning, abstraction, and adaptive behaviour.

The Artificial Neuron

The artificial neuron is a simplified mathematical model of the biological one. Its task is elementary signal processing.

The neuron receives a set of input signals X. Each signal is assigned a weight, which reflects its importance.Then the neuron calculates the weighted sum of the inputs and compares the result to a threshold value.
  • If the final value exceeds the threshold, the neuron activates and passes the signal forward.
  • If not, it remains inactive.
This mechanism directly mirrors the "all-or-nothing" principle: the neuron either fires, or it doesn't.

The Brain as a Conceptual "Human LLM"

Using ML terminology, the human brain can be loosely imagined as a giant model with over 80 billion neurons and hundreds of trillions of connections (parameters).

However, there is a fundamental difference. The brain doesn't learn from text; it learns in the real world. It has a body, sensory input, and the experience of actions and consequences. Meaning is formed through interaction with physical reality.

Modern LLMs work differently. They identify statistical dependencies between tokens and symbols. They operate beautifully with language but lack personal experience. This limitation defines the "ceiling" for purely text-based models and sets the direction for the future evolution of AI.

Machine Learning: How Neural Networks Learn

The terms AI, ML, and Deep Learning are often used interchangeably, but they actually describe different levels of the same system:
  • AI (Artificial Intelligence)
    The goal. A system designed to perform intellectual tasks.
  • ML (Machine Learning)
    The method. The system learns from data rather than being manually programmed.
  • Deep Learning
    A subset of ML that uses multi-layered neural networks to extract complex features.
The key idea of machine learning is the rejection of rigid rules. We don't explain to the model what an object is or what "meaning" is. We simply show examples and minimise the error.
LLMs and AI Agents: artificial intelligence, machine learning and deep learning
AI, ML and Deep Learning

The Principle of Neural Network Training

Neural networks are not programmed directly, they are trained on data. The training process boils down to a repetitive numerical cycle:
  • Forward Propagation. The network receives input data and generates a prediction.
  • Evaluation. The prediction is compared to the correct answer, and the error (loss) is calculated, a numerical measure of how far the model deviated from the expected result.
  • Backpropagation. The error is propagated backward through all layers of the network. The weights of the neurons are adjusted so that the next prediction is more accurate.
This process is repeated thousands or millions of times.

Important: This is not "learning" in the human sense. The model doesn't "understand" anything or draw conclusions. It simply solves an optimisation problem, which is finding the specific weight values that minimise error. Yet, this mechanical process allows neural networks to solve tasks that cannot be described by hard logic or rules.

Network Architecture: Width, Depth, and Limits

The quality of a neural network is determined not just by data, but by its architecture.
  • Width (number of neurons in layers) determines how many different factors the model can consider simultaneously.
  • Depth (number of layers) is responsible for the ability to learn complex, multi-level, and abstract dependencies.
It might seem that the bigger and deeper the network, the smarter it is. In practice, this isn't true. If the architecture is too complex for the volume and quality of the data, overfitting occurs. The model memorizes specific details and noise; it works perfectly on training examples but fails on real-world data.
For over 7 years, Lucky Hunter has been connecting top IT talents with global companies and startups

Looking for an IT Specialist?

The Evolution of Architectures: From Memory to Attention

The rise of Large Language Models is often perceived as a sudden technological leap. In reality, it is the result of a long evolution of architectures, where each new stage solved a specific limitation of the previous one.

From Universal Networks to Data Structure

The first neural networks were architectural generalists. Feedforward networks processed input data as a set of numbers without internal structure. This worked well for classification and regression but scaled poorly for complex data types.

The next step was CNNs (Convolutional Neural Networks). For the first time, an assumption about data structure was explicitly "hard-coded" into the architecture: locality and spatial connectivity. This allowed for efficient work with images and video, extracting features ranging from simple contours to complex shapes.
However, text and speech have a different nature. Here, the order of elements and dependence on previous context are critical.

Sequences and Memory Limits

To work with sequences, RNNs (Recurrent Neural Networks) appeared. Their key idea was passing the "state" from step to step, allowing the model to consider past context.

In practice, RNNs quickly revealed a fundamental flaw. When working with long sequences, useful information would either fade away (vanishing gradients) or grow uncontrollably. The model lost the ability to hold onto distant context, and training became unstable.

LSTM: Managed Memory Instead of Simple Recursion

The solution came in the form of LSTM (Long Short-Term Memory) — recurrent cells with a managed memory mechanism.

Unlike classic RNNs, LSTMs explicitly separate:
  • Long-term memory.
  • Short-term state.
  • Mechanisms for writing, reading, and forgetting information (gates).
This allowed the model to decide for itself which elements of context were important for future generation and which could be discarded. LSTMs became the first stable solution for working with long texts and laid the foundation for early language models.

Why That Wasn't Enough

Despite their success, LSTMs remained architecturally sequential: text was processed step-by-step. This limited parallelism, complicated scaling, and made training expensive. The next evolutionary step required abandoning the idea of "memory as a chain" and moving to a fundamentally different mechanism for handling context. This led to the Attention Mechanism and the Transformer architecture.

Transformers: Attention Instead of Memory

At the core of the Transformer architecture lies the Attention Mechanism—a way to work with context not sequentially, but all at once.

Instead of "remembering" a past state, at every step, the model evaluates which parts of the input text are most important right now, regardless of their position in the sequence. This allows it to account for distant dependencies without signal degradation.

Key consequences of this shift:
  • Context is processed structurally as a whole, not step-by-step.
  • Training is easily parallelised and scaled.
  • The model works efficiently with very long texts.
Modern LLMs are built on this exact idea. Models in the GPT family use a Transformer Decoder, which probabilistically predicts the next token based on the given context—building the answer step by step.
LLMs and AI Agents: ai architecture

The Language of Numbers: How LLMs Process Meaning

One of the key reasons LLMs are misunderstood is the misconception that the model "reads" and "understands" text. In reality, a language model never works with words directly.

Any text is first broken down into tokens, which are minimal fragments of words or characters. These tokens are then converted into embeddings that are numerical vectors of fixed dimensions.

Embeddings and the Geometry of Meaning

In the space of embeddings, meaning ceases to be an abstract concept and becomes geometry. Words and phrases used in similar contexts are positioned closer to each other. Differences in meaning are expressed by the distances and directions between these vectors.

All LLM operations (attention, comparison, context weighting) take place within this numerical space. The model does not operate on the definitions of words; it works with their relative positions.

Generation as a Probabilistic Process

An LLM response is not the retrieval of pre-existing knowledge. It is a sequential process of predicting the next token based on a probability distribution.

At each step, the model evaluates: Which token is most probable given the entire available context? The selected token is then added to the sequence, and the process repeats.

It is this mechanism that creates the illusion of a meaningful answer, even though the underlying principle remains purely statistical.

Managing Uncertainty: Temperature

Text generation in an LLM is probabilistic. At every step, the model selects the next token from a distribution of probabilities. The Temperature parameter controls the shape of this distribution, effectively managing the degree of uncertainty in the choice.
  • Low values (≈0 – 0.3)
    These compress the distribution. The model almost always chooses the single most probable token, yielding stable, predictable, and deterministic responses. This mode is used for code, strict instructions, and tasks where accuracy is paramount.
  • Medium values (≈0.4 – 0.8)
    These provide a balance between accuracy and variety. Responses remain coherent but become less formulaic. This is the most universal mode.
  • High values (≈0.9 – 1.3+)
    These flatten the distribution. The model begins to select less probable tokens more frequently, increasing variability and creativity, but simultaneously raising the risk of errors and hallucinations

The Modern AI Stack

In real-world products, the LLM is just one component. Modern AI is a system where language, data, and actions are linked into a single process. In practice, this stack can be divided into several levels: Decision Making, Context Management, and Action Execution.

Thinking: AI Agents

An AI Agent is a system where the LLM receives a goal and acts not as a text generator, but as a process orchestrator. An agent can break a task into steps, select tools, evaluate intermediate results, and adjust its strategy.

This agentic approach transforms an LLM from a "smart chat" into an executive system capable of solving applied tasks within a product.

Memory: RAG

LLMs work within a limited context window and do not have access to a company's internal data. This problem is solved via RAG (Retrieval-Augmented Generation)—a mechanism for external search and injecting relevant information into the model's context.

RAG allows the use of corporate data without retraining the model and directly influences the quality and stability of results. In practice, unprepared data, not the model itself, is the most common cause of failed AI implementations.

Behaviour: Tuning

To control the style of responses and behavioural logic, additional tuning methods can be applied, ranging from System Instructions to Fine-tuning on limited domain data. However, full model retraining remains expensive and is only justified in narrow scenarios. In most business cases, the key role is played not by model training, but by the quality of the data and context provided to it.

Action: Tools and MCP

For an agent to perform actions, it needs access to tools: APIs, databases, and internal services. Through such integrations, an LLM can not only reason but actually influence the system, triggering processes, modifying data, and making decisions.

MCP (Multi-Tool Communication Protocol) allows for the construction of safe and managed execution of these actions within a product.
For over 7 years, Lucky Hunter has been connecting top IT talents with global companies and startups

Looking for an IT Specialist?

Why AI Implementation Often Fails

Even though the technology has matured, a huge number of AI projects still fail to deliver. The culprit is almost never the model choice; it is systemic flaws in implementation.

Data Is Everything

LLMs don’t magically produce high-quality results; they simply mirror the data you feed them. If you input fragmented or outdated information, you will get poor results, no matter how advanced the system is.

Implementing AI is, first and foremost, a data project. The only sustainable workflow looks like this:
Audit → Clean → Structure → Deploy
Trying to skip these steps leads to hallucinations, a loss of user trust, and a project that gets shut down fast.
LLMs and AI Agents: Why AI Implementation Often Fails
Implementing AI workflow

Organisation Over Architecture

The second major cause of failure is undefined roles.
  • ML Engineers work inside the model (training, optimisation).
  • LLM/AI Engineers build around the model (architecture, RAG, integrations).
  • AI Product Managers define the business goals and success metrics.
Without that last role, you often end up with a project that is technically sound but delivers zero business value. The model runs, but it doesn't solve the problem.

In our experience, the bottleneck is rarely the AI itself, but the lack of team members. For example, the startup 42 recently came to Lucky Hunter needing to fill multiple research roles, from Senior ML Researchers to Low-Level Engineers, requiring deep expertise in Foundation Models and Finance under tight deadlines.

We covered all the details of our collaboration in a separate case study.
ML Researchers with Niche Expertise: Where We Searched and How We Found 3 Experts for 42.

The Future of AI: Key Trends

The LLM Scaling Ceiling

Large Language Models have largely reached their scaling plateau. Simply increasing parameter counts and expanding context windows is no longer yielding significant quality improvements. Even relatively compact models are now capable enough to handle the majority of applied tasks. Consequently, the focus of future development is shifting away from "bigger" and toward cheaper, faster, and more accessible. In the coming years, we will see a mass migration of LLMs onto specialised chips, making them available locally on phones and edge devices even without internet connectivity. Cloud optimisation will be driven primarily by architectural improvements rather than model growth.

From Interface AI to AI-First Systems

Today, we can identify two primary scenarios for AI implementation:
  • Interface AI (The "Wrapper" Approach): This is currently the most common method, adding a chatbot layer on top of an existing product. Here, AI acts merely as an interface; it triggers existing functions, tools, and APIs. The user still manually drives the process: receiving a task, searching for information, making decisions, executing actions, and verifying results. The form of interaction changes, but the logic of the work remains the same. This approach rarely provides a strategic advantage.
  • AI-First: In this scenario, processes are re-engineered from the ground up, assuming the system includes a fully capable "executor", an AI Agent that can be assigned a complete task.
The difference becomes obvious when viewed through the lens of a standard task tracker:
  • The Human Workflow. A person receives a ticket describing what needs to be done, the rules to follow, and how to verify the result.
  • The AI Workflow. An AI agent receives the exact same ticket and executes it using available tools.
The key question becomes: which tasks are sufficiently described, formalised, and repeatable enough for an agent to handle? In practice, this leads to a complete process overhaul rather than simple automation. Humans are removed from mechanical stages, or the chain of actions is radically simplified. The human role elevates to setting goals, maintaining control, and making non-standard decisions.

Orchestration Over Rigid Logic

In AI-first products, the user interacts with an orchestrator, not a set of pre-programmed scripts. This layer makes decisions at the product level: which capabilities to use, in what order, and to what extent. The fundamental difference from classic software is that we no longer code rigid execution logic. Instead, we describe the system's available capabilities, and the LLM decides which parts of the program to engage based on the user's request. This is a paradigm shift: previously, humans worked within the limited space defined by a program. Now, the program adapts to the human's request. This is the essence of the AI-first approach.

AI Entering the Physical World

The accessibility of LLMs is paving the way for ubiquitous AI adoption beyond purely digital products. While AI is currently active in marketing, development, and online services, the "real" sector (manufacturing, logistics, construction) remains largely untouched. The main limitation of modern LLMs is their lack of empirical experience. They do not understand the physical world because they have never interacted with it.

The next stage of development is linked to robotics. Cheap, compact LLMs will be embedded into robots and autonomous systems that learn multimodally: through vision, sound, tactile feedback, and action. These systems will begin to form an understanding of physics, objects, and cause-and-effect relationships through experience rather than text.

This data will form the foundation for a new generation of multimodal models that will eventually return to the digital environment with a grounded understanding of reality. Current estimates suggest active robotisation of the real sector will begin within a 5-7 year horizon.

The Human Factor and Market Maturity

It is crucial to understand that today's implementation challenges are less about technology and more about people. The models themselves are already sufficiently reliable.

While many specialists claim they use AI and have become more productive, objective measurements often show the opposite: significant time is spent communicating with the model, refining prompts, and correcting errors. This discrepancy between subjective feeling and actual efficiency is a hallmark of early technology adoption.

Working with AI has not yet become a basic skill. Over time, interacting with AI will become as routine as using a computer. Niche roles like "Prompt Engineers" will fade away, giving way to broader specialisations. The market will experience turbulence for another 2–3 years before a stable set of roles and best practices is established.
AI is no longer an experiment – it's the competitive foundation of modern business. The winners will be companies that implement technology into operational processes faster than their peers. At Lucky Hunter, we help you identify exactly which AI specialists your project needs and source the talent capable of converting business requirements into functioning systems.

Ready to test yourself? Below is the quiz we announced at the beginning of the article.
LLM и AI-агенты: тест для закрепления знаний
Let’s review what we’ve learned! ✨
To make sure the new knowledge doesn’t slip away, let’s lock it in with a short quiz. Don’t worry if you forget something - that’s exactly why we’re doing this! After submitting your answers, you’ll see the results and everything will fall into place.

Think of it as a helpful brain workout after an intensive lecture. Ready?
Let’s go!
What do models use to communicate / perceive information?
Next
Check
See results
What is an embedding?
Next
Check
See results
What are modern LLMs based on?
Next
Check
See results
Which parameter affects the model’s creativity?
Next
Check
See results
What is an AI agent?
Next
Check
See results
What is RAG?
Next
Check
See results
What is model tuning?
Next
Check
See results
What is MCP?
Next
Check
See results
In progress
LLMs and agents sound like magic until you break them down. Read the article again and retake the test — your progress will be fast.
Try again
In progress
LLMs and agents sound like magic until you break them down. Read the article again and retake the test — your progress will be fast.
Try again
Almost there
Re-read the article and next time it’ll be better.
Try again
Almost there
Re-read the article and next time it’ll be better.
Try again
Almost there
Re-read the article and next time it’ll be better.
Try again
Expert!
Your knowledge is genuinely impressive.
Try again
Share
Alexandra Godunova
Content Manager in Lucky Hunter
Сontact us — we fill even the most complex and rare IT positions!
We deliver
while others search!

What else to read