How AI Agents Use Tools, APIs and Memory to Work Autonomously

What actually happens inside an AI agent when it executes a task? This guide breaks down tool calling, API integrations, and memory systems in plain English โ€” with code examples.

By SpiderHunts Technologies  ยท  23 May 2026  ยท  10 min read

TL;DR

  • AI agents use tool calling โ€” the LLM outputs structured JSON and the runtime executes the actual function
  • Tools can be any Python function: web search, API call, database query, file write
  • Short-term memory is the context window โ€” all prior steps in the current run
  • Long-term memory lives in vector databases โ€” retrieved by semantic similarity
  • The agent loop: perceive โ†’ reason โ†’ act โ†’ observe โ†’ repeat
  • Good tool design is the most important factor in agent reliability

The Agent Loop Explained

Every AI agent runs a continuous loop. Understanding this loop is the key to understanding how agents work:

PERCEIVE

The LLM reads the system prompt (its instructions), the conversation history, tool results from previous steps, and any retrieved memories. This is the full context it reasons from.

REASON

The LLM thinks about what to do next. In ReAct agents this is visible as a "Thought:" section. The model decides: do I have enough information? Do I need to call a tool? Am I done?

ACT

The LLM outputs a tool call โ€” a JSON object naming the function and its arguments. The agent runtime intercepts this, executes the real function, and captures the result.

OBSERVE

The tool result is added to the context window as an observation. The loop repeats โ€” perceive the new state, reason about what to do next, act again โ€” until the goal is achieved.

Tool Calling: How Agents Interact with the World

Tool calling is the mechanism by which an agent moves from language to action. Here's the process:

Step 1 โ€” Define the tool. You write a Python function with a docstring describing what it does, its parameters, and what it returns. This description is shown to the LLM.

def search_web(query: str) -> str:
 """
 Search the web for current information.
 Use this when you need up-to-date facts,
 competitor data, or recent news.

 Args:
 query: The search query string
 Returns:
 Top 5 search results with titles and snippets
 """
 results = serper_api.search(query)
 return format_results(results)

Step 2 โ€” LLM decides to use it. When the LLM determines it needs web data, it outputs a structured call:

{
 "tool": "search_web",
 "arguments": {
 "query": "HubSpot pricing 2026 UK"
 }
}

Step 3 โ€” Runtime executes it. The agent framework intercepts this JSON, calls the actual Python function, gets the result, and appends it to the context as an observation. The LLM never directly touches the internet โ€” it just "requests" actions and the runtime executes them.

The Standard Tool Library

These are the tools we include in most production agents:

Tool What It Does Typical API
web_search Find current information on the web Serper, Bing Search
browse_url Fetch and extract content from a web page Playwright, BeautifulSoup
run_code Execute Python code and return output Sandboxed Python executor
query_database Run SQL queries against business data SQLAlchemy, psycopg2
call_crm Read/update CRM records HubSpot, Salesforce, Pipedrive APIs
read_file Read documents (PDF, DOCX, CSV) PyMuPDF, python-docx
send_email Draft and send emails Gmail API, SendGrid
memory_search Retrieve relevant past context Pinecone, Qdrant, Chroma

How Agent Memory Works

Memory is what allows an agent to persist knowledge across runs, recall past interactions, and improve over time. There are two types:

Short-Term Memory

The active context window โ€” every message, tool call, and observation in the current run.

Capacity: 128kโ€“1M tokens depending on the model. Automatically managed by the framework.

Cleared when the run ends.

Long-Term Memory

External storage โ€” vector databases (Pinecone, Qdrant, Chroma) or structured SQL. Persists between runs.

Retrieved by semantic similarity: the agent asks "what do I know about X?" and gets the most relevant stored chunks.

Survives run termination โ€” enables learning over time.

How long-term memory retrieval works: When an agent needs to recall something, it calls the memory_search tool with a query. The system converts that query to a vector embedding, searches the vector database for the most semantically similar stored content, and returns the top-k results to the LLM.

This is why agents can "remember" a client's preferences from six months ago, or know that a particular supplier always needs a specific format โ€” that information was stored as embeddings after a previous run.

API Integration: How Agents Connect to Business Systems

Connecting an AI agent to your business systems is the same as building an API integration โ€” but wrapped as a tool the LLM can call. Here's the pattern:

def update_hubspot_contact(
 contact_id: str,
 properties: dict
) -> dict:
 """
 Update a HubSpot contact record.
 Use this to save information gathered
 about a prospect to the CRM.

 Args:
 contact_id: HubSpot contact ID
 properties: Dict of property names and values
 Returns:
 Updated contact record confirmation
 """
 response = requests.patch(
 f"https://api.hubapi.com/contacts/v1/contact/vid/{contact_id}/profile",
 headers={"Authorization": f"Bearer {HUBSPOT_API_KEY}"},
 json={"properties": [{"property": k, "value": v}
 for k,v in properties.items()]}
 )
 return response.json()

The agent never sees your API keys. It just knows the tool exists and what it does. The runtime handles authentication, rate limiting, error handling, and retries.

What Makes a Well-Designed Agent Tool

Tool design is the biggest predictor of agent reliability. Common mistakes and how to avoid them:

Mistake Problem Fix
Vague docstring LLM calls wrong tool or with wrong args Be specific: include when to use it, what each param means
Too many tools LLM confusion, slow reasoning Max 10โ€“15 tools per agent; break into sub-agents
No error handling Agent crashes on API failure Return structured error messages the LLM can reason about
Overlapping tools LLM picks the wrong one or calls both Make each tool's scope distinct and mutually exclusive
Large raw outputs Fills context window, hides relevant data Pre-process tool output โ€” summarise or extract before returning

Frequently Asked Questions

How do AI agents call external APIs?

The LLM generates structured JSON specifying the tool name and parameters. The agent runtime executes the actual API call, returns the result, and the LLM decides what to do next. The LLM never makes HTTP requests directly.

What is agent memory?

Short-term memory is the active context window โ€” everything the agent knows in its current run. Long-term memory is external storage (vector databases) that persists between runs and is retrieved by semantic similarity search.

What tools can AI agents use?

Any Python function can become a tool: web search, web scraping, code execution, file read/write, database queries, REST API calls, email/calendar access, CRM updates, and more. The LLM reads the function's description to decide when to use it.

Want Us to Build Your AI Agent?

We design and build production-ready AI agents with proper tool design, memory systems, and monitoring. Tell us what you want automated.

Discuss Your Agent