What is agent memory and how does it work?

Agent memory comes in two forms: short-term (the active context window — the conversation and tool call history) and long-term (external storage in vector databases or SQL). Long-term memory is retrieved by similarity search when the agent encounters a relevant query.

How AI Agents Use Tools, APIs and Memory to Work Autonomously

Q: How do AI agents call external APIs?

AI agents call external APIs through function/tool calling. The LLM generates a structured JSON object specifying the tool name and parameters. The agent runtime executes the actual API call, returns the result to the LLM, and the LLM decides what to do next.

Q: What tools can AI agents use?

AI agents can use any tool you define: web search, web scraping, code execution, file read/write, database queries, REST API calls, email/calendar access, CRM updates, and more. Tools are Python functions with a description that the LLM reads to decide when to use them.

Last updated: 2026-05-23

What actually happens inside an AI agent when it executes a task? This guide breaks down tool calling, API integrations, and memory systems in plain English — with code examples.

By SpiderHunts Technologies · 23 May 2026 · 10 min read

TL;DR

AI agents use tool calling — the LLM outputs structured JSON and the runtime executes the actual function
Tools can be any Python function: web search, API call, database query, file write
Short-term memory is the context window — all prior steps in the current run
Long-term memory lives in vector databases — retrieved by semantic similarity
The agent loop: perceive → reason → act → observe → repeat
Good tool design is the most important factor in agent reliability

The Agent Loop Explained

Every AI agent runs a continuous loop. Understanding this loop is the key to understanding how agents work:

PERCEIVE

The LLM reads the system prompt (its instructions), the conversation history, tool results from previous steps, and any retrieved memories. This is the full context it reasons from.

REASON

The LLM thinks about what to do next. In ReAct agents this is visible as a "Thought:" section. The model decides: do I have enough information? Do I need to call a tool? Am I done?

ACT

The LLM outputs a tool call — a JSON object naming the function and its arguments. The agent runtime intercepts this, executes the real function, and captures the result.

OBSERVE

The tool result is added to the context window as an observation. The loop repeats — perceive the new state, reason about what to do next, act again — until the goal is achieved.

Tool Calling: How Agents Interact with the World

Tool calling is the mechanism by which an agent moves from language to action. Here's the process:

Step 1 — Define the tool. You write a Python function with a docstring describing what it does, its parameters, and what it returns. This description is shown to the LLM.

def search_web(query: str) -> str:
 """
 Search the web for current information.
 Use this when you need up-to-date facts,
 competitor data, or recent news.

 Args:
 query: The search query string
 Returns:
 Top 5 search results with titles and snippets
 """
 results = serper_api.search(query)
 return format_results(results)

Step 2 — LLM decides to use it. When the LLM determines it needs web data, it outputs a structured call:

{
 "tool": "search_web",
 "arguments": {
 "query": "HubSpot pricing 2026 UK"
 }
}

Step 3 — Runtime executes it. The agent framework intercepts this JSON and calls the actual Python function. It gets the result and appends it to the context as an observation. The LLM never directly touches the internet. It just "requests" actions, and the runtime executes them.

The Standard Tool Library

These are the tools we include in most production agents:

Tool	What It Does	Typical API
web_search	Find current information on the web	Serper, Bing Search
browse_url	Fetch and extract content from a web page	Playwright, BeautifulSoup
run_code	Execute Python code and return output	Sandboxed Python executor
query_database	Run SQL queries against business data	SQLAlchemy, psycopg2
call_crm	Read/update CRM records	HubSpot, Salesforce, Pipedrive APIs
read_file	Read documents (PDF, DOCX, CSV)	PyMuPDF, python-docx
send_email	Draft and send emails	Gmail API, SendGrid
memory_search	Retrieve relevant past context	Pinecone, Qdrant, Chroma

How Agent Memory Works

Memory is what allows an agent to persist knowledge across runs, recall past interactions, and improve over time. There are two types:

Short-Term Memory

The active context window — every message, tool call, and observation in the current run.

Capacity: 128k–1M tokens depending on the model. Automatically managed by the framework.

Cleared when the run ends.

Long-Term Memory

External storage — vector databases (Pinecone, Qdrant, Chroma) or structured SQL. Persists between runs.

Retrieved by semantic similarity: the agent asks "what do I know about X?" and gets the most relevant stored chunks.

Survives run termination — enables learning over time.

How long-term memory retrieval works: When an agent needs to recall something, it calls the memory_search tool with a query. The system converts that query to a vector embedding. Then it searches the vector database for the most semantically similar stored content. Finally it returns the top-k results to the LLM.

This is why agents can "remember" a client's preferences from six months ago. They can also know that a particular supplier always needs a specific format. That information was stored as embeddings after a previous run.

API Integration: How Agents Connect to Business Systems

Connecting an AI agent to your business systems is the same as building an API integration. But it is wrapped as a tool the LLM can call. Here's the pattern:

def update_hubspot_contact(
 contact_id: str,
 properties: dict
) -> dict:
 """
 Update a HubSpot contact record.
 Use this to save information gathered
 about a prospect to the CRM.

 Args:
 contact_id: HubSpot contact ID
 properties: Dict of property names and values
 Returns:
 Updated contact record confirmation
 """
 response = requests.patch(
 f"https://api.hubapi.com/contacts/v1/contact/vid/{contact_id}/profile",
 headers={"Authorization": f"Bearer {HUBSPOT_API_KEY}"},
 json={"properties": [{"property": k, "value": v}
 for k,v in properties.items()]}
 )
 return response.json()

The agent never sees your API keys. It just knows the tool exists and what it does. The runtime handles authentication, rate limiting, error handling, and retries.

What Makes a Well-Designed Agent Tool

Tool design is the biggest predictor of agent reliability. Common mistakes and how to avoid them:

Mistake	Problem	Fix
Vague docstring	LLM calls wrong tool or with wrong args	Be specific: include when to use it, what each param means
Too many tools	LLM confusion, slow reasoning	Max 10–15 tools per agent; break into sub-agents
No error handling	Agent crashes on API failure	Return structured error messages the LLM can reason about
Overlapping tools	LLM picks the wrong one or calls both	Make each tool's scope distinct and mutually exclusive
Large raw outputs	Fills context window, hides relevant data	Pre-process tool output — summarise or extract before returning

Frequently Asked Questions

How do AI agents call external APIs?

The LLM generates structured JSON specifying the tool name and parameters. The agent runtime executes the actual API call, returns the result, and the LLM decides what to do next. The LLM never makes HTTP requests directly.

What is agent memory?

Short-term memory is the active context window — everything the agent knows in its current run. Long-term memory is external storage (vector databases) that persists between runs and is retrieved by semantic similarity search.

What tools can AI agents use?

Any Python function can become a tool: web search, web scraping, code execution, file read/write, database queries, REST API calls, email/calendar access, CRM updates, and more. The LLM reads the function's description to decide when to use it.

Want Us to Build Your AI Agent?

We design and build production-ready AI agents with proper tool design, memory systems, and monitoring. Tell us what you want automated.

Discuss Your Agent

Related Services

Service

AI Agent Development

Custom autonomous agents

Service

Business Automation

End-to-end workflow automation

Service

API Development

Custom APIs and integrations

AI Agents What Are AI Agents? The Complete 2026 Guide for Businesses AI Agents AI Agents vs Chatbots: What's the Difference? AI Agents LangChain AI Agents Explained: A Non-Technical Guide

🤖 More in AI & Machine Learning

How AI Agents Use Tools, APIs and Memory to Work Autonomously

The Agent Loop Explained

Tool Calling: How Agents Interact with the World

The Standard Tool Library

How Agent Memory Works

Short-Term Memory

Long-Term Memory

API Integration: How Agents Connect to Business Systems

What Makes a Well-Designed Agent Tool

Frequently Asked Questions

How do AI agents call external APIs?

What is agent memory?

What tools can AI agents use?

Want Us to Build Your AI Agent?

Related Services

Related Articles

Continue reading

AI Coding Tools 2026: Cursor vs GitHub Copilot vs Windsurf vs Claude Code

LLM API Comparison 2026: OpenAI vs Anthropic vs Google Gemini for SaaS

Vector Database Comparison 2026: Pinecone vs Weaviate vs Qdrant vs pg_vector

AI Automation Agency: What It Is, What to Look For, and What It Costs in 2026