Agent
An LLM that can take actions in the world by invoking tools, not just produce text.
In practice, an agent is a loop: read input → decide what to do → call a tool (or respond) → observe the result → decide again. The same model becomes 'an agent' the moment you give it the ability to call functions like 'book_appointment' or 'query_crm'. SMB-relevant agents almost always have a narrow scope and a small toolbox — open-ended 'do anything' agents fail in production.
Agent loop
The repeating cycle of decide → act → observe → decide that lets an agent complete multi-step tasks.
The classic loop is ReAct (Reasoning + Acting): the model thinks aloud, picks a tool, sees the tool's output, then thinks again. The loop ends when the model decides it's done or hits a step cap. Step caps matter — without one, a confused agent can burn dollars looping forever.
RAG (Retrieval-Augmented Generation)
Look up relevant documents from your data first, then ask the LLM to answer using only those documents.
RAG is the most reliable way to ground an LLM in your specific business data — your docs, your past tickets, your SOPs. Pipeline: embed your documents into vectors, store them in a vector DB, and at query time retrieve the top-K most similar chunks and pass them to the LLM with a strict 'answer only from this context' instruction. Cheap, controllable, and dramatically reduces hallucinations.
Vector database
A database optimized for finding the most semantically similar pieces of text to a given query.
Pinecone, Weaviate, and Postgres's pgvector extension are the common picks. You convert text to embeddings (vectors of numbers) once, store them, and at query time convert the question to a vector and find the closest matches. The 'closest' matches are the ones the model should read before answering.
Embeddings
Numeric representations of text that capture meaning — close in number-space means close in meaning.
Created by an embedding model (OpenAI's text-embedding-3-large is the common default). The same word can have different embeddings in different contexts, which is what makes them useful — 'Apple the company' and 'apple the fruit' end up in different neighborhoods.
Tool calling (function calling)
The mechanism by which an LLM produces a structured request to invoke an external function.
Instead of just producing text, the model outputs JSON like {tool: 'book_appointment', args: {…}}. Your code reads that, runs the actual function, returns the result, and feeds it back to the model. Every modern voice agent, SDR agent, or workflow agent is built on tool calling under the hood.
Multi-agent system
Multiple specialized agents collaborating — typically a router/orchestrator that delegates to sub-agents.
Common pattern: a planner agent breaks down the task, sub-agents (researcher, writer, classifier) each handle a slice, an aggregator combines results. Useful for tasks that genuinely span domains. Risky for the same reason — every additional agent multiplies the failure surface. Don't reach for multi-agent until single-agent + tools has clearly failed.
Orchestrator
The top-level agent or workflow engine that decides which sub-agent or tool to invoke next.
Could be an LLM ('ask Claude which agent to route to') or rule-based ('if ticket starts with 'refund' send to billing-agent'). Rule-based is more predictable; LLM-based is more flexible. SMB stacks usually start rule-based and add LLM routing only where the rules get too messy to maintain.
MCP (Model Context Protocol)
An open protocol for connecting LLMs to external data sources and tools through a standard interface.
Anthropic introduced MCP to standardize how an agent 'plugs into' a CRM, a calendar, a knowledge base, etc. Rather than re-implementing tool integrations for every agent framework, you build an MCP server once and any MCP-compatible agent can use it. Adoption is growing across Claude Desktop, IDEs, and increasingly third-party agent platforms.
Human-in-the-loop (HITL)
An agent design where a human approves or corrects an action before it ships.
Mandatory for irreversible, externally visible, or financially sensitive actions: posting a public review reply, refunding money, sending a customer-facing email, deleting data. Don't deploy autonomous agents on these surfaces — the cost of the rare bad output is way higher than the cost of the human approval click.
Guardrails
Hard rules that constrain what an agent can say, do, or output — independent of the model's judgment.
Examples: a regex filter that blocks 'remove me' replies from being argued with, a hard rule that escalates anything mentioning 'lawsuit', a JSON schema validator that rejects malformed tool calls, a max-cost cap on a query. Guardrails should be deterministic, not LLM-based — you never want your safety mechanism to itself be the unreliable part.
Evals
A test suite for agent behavior — fixed inputs run against the agent and graded for correctness.
Without evals, you're guessing whether your prompt change made things better or worse. The minimum useful eval set: 20-50 representative inputs with the expected behavior labeled. Run after every significant prompt or model change. The discipline matters more than the tooling — a Google Sheet eval beats no eval.
Hallucination
When a model produces confident-sounding output that's factually wrong or invented.
The classic example: an agent invents a price, a policy, or a person's name. Mitigations: (a) ground the model in retrieved context (RAG), (b) instruct it to refuse rather than guess, (c) check confidence and escalate when low. You cannot eliminate hallucinations — you contain them.
Latency
How long the agent takes to respond — often the make-or-break factor for voice agents.
For text agents, 2-5 seconds is fine. For voice agents, anything over 800ms feels broken — humans expect immediate turn-taking. Common levers: smaller model on the first turn, parallel tool calls, streaming the response, caching common answers.
Autonomous agent
An agent that runs without per-action human approval — often on a schedule or trigger.
Examples: a competitor monitoring agent that runs every Monday and posts to Slack. An AR chase agent that nudges overdue invoices on a cadence. Autonomy is fine when (a) actions are reversible, (b) there's a kill switch, (c) outputs are observable. Don't go autonomous on customer-facing actions without HITL.
Ready to put these patterns to work?
Browse 12+ end-to-end agentic blueprints — each with prompts, architecture, and failure modes.