Models

What Is Folium?

An open-source Python library that renders interactive Leaflet.js maps from Python data structures, including pandas DataFrames and GeoJSON layers.

What Is Folium?

Folium is an open-source Python library that wraps Leaflet.js so you can build interactive web maps directly from Python. You hand it a pandas DataFrame with latitude and longitude columns or a GeoJSON object, and it returns an HTML/JS map you can render in a Jupyter notebook, a Streamlit app, or a static page. It supports tile providers, markers, popups, choropleths, and heatmaps, plus enough Leaflet plugin coverage that most “draw a map of these points” tasks land in three lines of code. In LLM stacks it commonly appears as a tool an agent calls or a notebook output during eval-result inspection.

Why It Matters in Production LLM and Agent Systems

Geospatial questions are an everyday tool-calling failure mode. “Show me delivery delays in the Bay Area,” “Map the failed traces by region,” “Plot fire-risk against vegetation density” — each requires the agent to choose the right tool, pass the right arguments (column names, projection, color scale), and surface an answer the user can interpret. A weak agent picks Folium when matplotlib would be cleaner, or hands matplotlib lat/long when a Folium choropleth was the better answer.

The pain is felt by everyone in the loop. A data engineer ships a Folium-based reporting agent and gets pinged for empty maps because the agent silently dropped rows with NaN coordinates. An ML lead sees the agent call Folium with the wrong CRS, rendering a “U.S. map” with three points in the Pacific. A product owner asks “is geospatial QA a regression target,” and there is no answer because nobody graded the tool calls separately from the final answer.

In 2026 agent stacks where MCP servers expose internal datasets and Folium-style tools turn rows into maps, geospatial tool fidelity is no longer a niche concern. It is a routine failure surface that needs evaluator coverage like any other tool call.

How FutureAGI Handles Folium-Style Tool Use

FutureAGI does not ship Folium; we evaluate the agent that wraps it. The anchor surfaces are ToolSelectionAccuracy, FunctionCallAccuracy, and the OTel attribute agent.trajectory.step.

Concretely: an internal-analytics team ships a reporting agent on the OpenAI Agents SDK with Folium registered as one of several visualization tools. Every call lands in a trace as a tool span with the function name, arguments, and result preview. ToolSelectionAccuracy scores whether the agent picked Folium when a map was the right answer (and matplotlib when a chart was). FunctionCallAccuracy checks the argument set: did the agent pass the right column names, the right CRS, the right tile provider? JSONValidation confirms the structured arguments parse against the registered tool schema.

For datasets, the team curates 200 representative geospatial queries with ground-truth tool selections and argument sets. The same evaluators run offline as a RegressionEval — every prompt change, every model swap, the agent’s geospatial competence is rescored. When a model upgrade breaks one specific category (choropleth construction, say), the dashboard surfaces it before it reaches production. FutureAGI’s approach treats Folium-wielding agents like any other tool-using agent: trace, evaluate per step, regress on every change.

How to Measure or Detect It

Folium-using agents are graded on tool-selection and argument fidelity, not just visual output:

  • ToolSelectionAccuracy — returns 0–1 plus a reason for whether each tool call was the correct choice given the user query.
  • FunctionCallAccuracy — comprehensive function-call accuracy evaluation, scoring name and parameter correctness.
  • FunctionCallExactMatch — strict AST-level match against a reference call; useful for high-stakes tool flows.
  • JSONValidation — validates the tool’s structured arguments against its registered JSON Schema.
  • Per-tool failure rate (dashboard signal) — track Folium-call failures separately from chart-call failures; aggregate rates hide one-tool regressions.
from fi.evals import ToolSelectionAccuracy, FunctionCallAccuracy

tool = ToolSelectionAccuracy()
result = tool.evaluate(
    input="Map customer churn by ZIP code",
    output={"tool": "folium_choropleth", "args": {"by": "zip", "value": "churn"}},
)
print(result.score, result.reason)

Common Mistakes

  • Letting the agent pick chart tools without evaluator coverage. “Visualize this” is one of the most ambiguous agent inputs; grade the call, not the picture.
  • Caching map HTML by query text only. Slight rephrasings get cache misses and slight rephrasings get false hits — use semantic-cache keyed on query intent.
  • Ignoring CRS mismatches. Folium expects WGS84 (EPSG:4326); custom projections silently misalign markers and break trust.
  • Skipping NaN-coordinate handling in the tool. Empty maps look like a model failure but are actually a data-cleaning bug at the tool boundary.
  • Treating the rendered map as the eval target. Score the tool call upstream; the map is the side-effect.

Frequently Asked Questions

What is Folium?

Folium is a Python library that produces interactive Leaflet.js maps from Python data structures like pandas DataFrames and GeoJSON, mostly used in Jupyter notebooks for geospatial visualization.

How is Folium different from matplotlib or plotly?

Matplotlib produces static plots and plotly produces general interactive charts. Folium is purpose-built for slippy-map visualizations on top of Leaflet.js — tile layers, markers, choropleths — and is the easiest path from a DataFrame to a web map.

How does FutureAGI evaluate agents that use Folium?

When an agent calls Folium as a tool, FutureAGI scores tool selection with ToolSelectionAccuracy and argument correctness with FunctionCallAccuracy, with traces wired through traceAI.