How AI Agents Use Tools, APIs and Memory to Work Autonomously
What actually happens inside an AI agent when it executes a task? This guide breaks down tool calling, API integrations, and memory systems in plain English โ with code examples.
TL;DR
- AI agents use tool calling โ the LLM outputs structured JSON and the runtime executes the actual function
- Tools can be any Python function: web search, API call, database query, file write
- Short-term memory is the context window โ all prior steps in the current run
- Long-term memory lives in vector databases โ retrieved by semantic similarity
- The agent loop: perceive โ reason โ act โ observe โ repeat
- Good tool design is the most important factor in agent reliability
The Agent Loop Explained
Every AI agent runs a continuous loop. Understanding this loop is the key to understanding how agents work:
The LLM reads the system prompt (its instructions), the conversation history, tool results from previous steps, and any retrieved memories. This is the full context it reasons from.
The LLM thinks about what to do next. In ReAct agents this is visible as a "Thought:" section. The model decides: do I have enough information? Do I need to call a tool? Am I done?
The LLM outputs a tool call โ a JSON object naming the function and its arguments. The agent runtime intercepts this, executes the real function, and captures the result.
The tool result is added to the context window as an observation. The loop repeats โ perceive the new state, reason about what to do next, act again โ until the goal is achieved.
Tool Calling: How Agents Interact with the World
Tool calling is the mechanism by which an agent moves from language to action. Here's the process:
Step 1 โ Define the tool. You write a Python function with a docstring describing what it does, its parameters, and what it returns. This description is shown to the LLM.
def search_web(query: str) -> str: """ Search the web for current information. Use this when you need up-to-date facts, competitor data, or recent news. Args: query: The search query string Returns: Top 5 search results with titles and snippets """ results = serper_api.search(query) return format_results(results)
Step 2 โ LLM decides to use it. When the LLM determines it needs web data, it outputs a structured call:
{
"tool": "search_web",
"arguments": {
"query": "HubSpot pricing 2026 UK"
}
}
Step 3 โ Runtime executes it. The agent framework intercepts this JSON, calls the actual Python function, gets the result, and appends it to the context as an observation. The LLM never directly touches the internet โ it just "requests" actions and the runtime executes them.
The Standard Tool Library
These are the tools we include in most production agents:
| Tool | What It Does | Typical API |
|---|---|---|
| web_search | Find current information on the web | Serper, Bing Search |
| browse_url | Fetch and extract content from a web page | Playwright, BeautifulSoup |
| run_code | Execute Python code and return output | Sandboxed Python executor |
| query_database | Run SQL queries against business data | SQLAlchemy, psycopg2 |
| call_crm | Read/update CRM records | HubSpot, Salesforce, Pipedrive APIs |
| read_file | Read documents (PDF, DOCX, CSV) | PyMuPDF, python-docx |
| send_email | Draft and send emails | Gmail API, SendGrid |
| memory_search | Retrieve relevant past context | Pinecone, Qdrant, Chroma |
How Agent Memory Works
Memory is what allows an agent to persist knowledge across runs, recall past interactions, and improve over time. There are two types:
Short-Term Memory
The active context window โ every message, tool call, and observation in the current run.
Capacity: 128kโ1M tokens depending on the model. Automatically managed by the framework.
Cleared when the run ends.
Long-Term Memory
External storage โ vector databases (Pinecone, Qdrant, Chroma) or structured SQL. Persists between runs.
Retrieved by semantic similarity: the agent asks "what do I know about X?" and gets the most relevant stored chunks.
Survives run termination โ enables learning over time.
How long-term memory retrieval works: When an agent needs to recall something, it calls the memory_search tool with a query. The system converts that query to a vector embedding, searches the vector database for the most semantically similar stored content, and returns the top-k results to the LLM.
This is why agents can "remember" a client's preferences from six months ago, or know that a particular supplier always needs a specific format โ that information was stored as embeddings after a previous run.
API Integration: How Agents Connect to Business Systems
Connecting an AI agent to your business systems is the same as building an API integration โ but wrapped as a tool the LLM can call. Here's the pattern:
def update_hubspot_contact(
contact_id: str,
properties: dict
) -> dict:
"""
Update a HubSpot contact record.
Use this to save information gathered
about a prospect to the CRM.
Args:
contact_id: HubSpot contact ID
properties: Dict of property names and values
Returns:
Updated contact record confirmation
"""
response = requests.patch(
f"https://api.hubapi.com/contacts/v1/contact/vid/{contact_id}/profile",
headers={"Authorization": f"Bearer {HUBSPOT_API_KEY}"},
json={"properties": [{"property": k, "value": v}
for k,v in properties.items()]}
)
return response.json()
The agent never sees your API keys. It just knows the tool exists and what it does. The runtime handles authentication, rate limiting, error handling, and retries.
What Makes a Well-Designed Agent Tool
Tool design is the biggest predictor of agent reliability. Common mistakes and how to avoid them:
| Mistake | Problem | Fix |
|---|---|---|
| Vague docstring | LLM calls wrong tool or with wrong args | Be specific: include when to use it, what each param means |
| Too many tools | LLM confusion, slow reasoning | Max 10โ15 tools per agent; break into sub-agents |
| No error handling | Agent crashes on API failure | Return structured error messages the LLM can reason about |
| Overlapping tools | LLM picks the wrong one or calls both | Make each tool's scope distinct and mutually exclusive |
| Large raw outputs | Fills context window, hides relevant data | Pre-process tool output โ summarise or extract before returning |
Frequently Asked Questions
How do AI agents call external APIs?
The LLM generates structured JSON specifying the tool name and parameters. The agent runtime executes the actual API call, returns the result, and the LLM decides what to do next. The LLM never makes HTTP requests directly.
What is agent memory?
Short-term memory is the active context window โ everything the agent knows in its current run. Long-term memory is external storage (vector databases) that persists between runs and is retrieved by semantic similarity search.
What tools can AI agents use?
Any Python function can become a tool: web search, web scraping, code execution, file read/write, database queries, REST API calls, email/calendar access, CRM updates, and more. The LLM reads the function's description to decide when to use it.
Want Us to Build Your AI Agent?
We design and build production-ready AI agents with proper tool design, memory systems, and monitoring. Tell us what you want automated.
Discuss Your Agent