Exclusive Webinar on AI Failures & Smarter Evaluations -

Cross

Exclusive Webinar on AI Failures & Smarter Evaluations -

Cross
Logo Text

Exclusive Webinar on AI Failures & Smarter Evaluations -

Cross

Exclusive Webinar on AI Failures & Smarter Evaluations -

Cross
Logo Text

Sahil N

Sahil N

Mar 1, 2025

Mar 1, 2025

Elevating SQL Accuracy: How Future AGI Streamlined Retail Analytics

Elevating SQL Accuracy: How Future AGI Streamlined Retail Analytics

Elevating SQL Accuracy: How Future AGI Streamlined Retail Analytics

Elevating SQL Accuracy: How Future AGI Streamlined Retail Analytics

Table of Contents

Let's discuss

Let's discuss

Overview

A leading Fortune 50 company from retail industry encountered significant challenges in maintaining the reliability and efficiency of their RAG-based analytical tool. This tool, powered by SQL agents, was designed to simplify database queries using natural language. However, frequent performance bottlenecks, inaccurate query generation, and limited scalability hindered its adoption. These issues led to inefficiencies in data-driven decision-making, affecting key business operations such as inventory management and customer insights.

To address these limitations, the company integrated Future AGI in their workflow, which systematically improved query accuracy, system stability, and scalability—leading to increased adoption and trust in AI-powered analytics.

Challenge: Bridging the Gap Between SQL Complexity and Business Needs

Writing SQL queries is inherently complex and requires specialised knowledge, making it challenging for non-technical users to retrieve and analyze large datasets. Traditionally, business users have relied on BI tools (e.g., Tableau, Power BI), query builders (e.g., Metabase, Looker), and no-code/low-code platforms (e.g., Airtable). While these tools offer a level of abstraction, they come with inherent limitations:

  • Rigid Predefined Schemas – Users must work within predefined schemas and query templates, limiting flexibility.

  • Restricted Filters and Metrics – Many tools impose constraints on the types of queries users can run.

  • Limited Exploration – Users are often unable to freely explore data beyond predefined dashboards.

An emerging solution is SQL agents, which leverage natural language processing (NLP) to allow users to query databases using plain language. These agents automatically convert natural language queries into SQL, making data retrieval more intuitive and accessible. However, ensuring accuracy and reliability in SQL generation is critical as incorrect queries can lead to misleading data insights, impacting strategic decision-making and operational efficiency.

To address these concerns, a reliable SQL agent must focus on three key factors:

  • Query Structure Accuracy – Ensuring SQL queries align with schema and syntax standards.

  • Context Awareness – Understanding database relationships to generate relevant queries.

  • Output Precision – Generating queries that return complete and correct data.

Solution: Future AGI’s AI-Driven Evaluation Framework

To enhance SQL agent performance and RAG-based analytical reliability, Future AGI implemented its intelligent optimization framework, which introduced the following improvements:

  • Enhanced Query Validation – Ensured generated SQL queries were syntactically correct and contextually appropriate.

  • Advanced NLP Refinement – Improved query interpretation by aligning AI models with database structures.

  • Scalable Performance Optimization – Reduced query execution times and eliminated system bottlenecks.

Evaluation Methodology

To ensure SQL agents met enterprise standards, Future AGI applied a rigorous three-phase evaluation in the following order:

  1. Validating SQL Queries Using Deterministic Evaluation – Verified SQL structure and adherence to database schema.

  2. Evaluating Context Sufficiency for SQL Queries – Ensured necessary data was present before execution to prevent misleading results.

  3. Evaluating SQL Agent Accuracy in Answering Queries – ****Assessed whether generated queries returned full and correct datasets.

Installing Future AGI

pip install futureagi

Initialising the Evaluation Client

The evaluation framework requires an API key to interact with Future AGI’s evaluation framework.

Click here to learn how to access Future AGI’s API key

from fi.evals import EvalClient

evaluator = EvalClient(fi_api_key=FI_API_KEY,
                       fi_secret_key=FI_SECRET_KEY,
                       fi_base_url="<https://api.futureagi.com>")

Loading the Dataset

The dataset used here is our inhouse data and we are using a subset of it for the case study. Users provide natural language questions, which are processed by an AI agent that converts them into SQL queries. These queries are then executed on the corresponding table to retrieve the relevant information, producing the final output.

import pandas as pd

dataset = pd.read_csv("data.csv")

Validating SQL Queries Using Deterministic Eval

The Deterministic Evaluation method checks whether an SQL query accurately represents the natural language question and aligns with the provided table structure. This rule-based approach assigns a Pass if the SQL is correctly formulated and a Fail otherwise.

  • Pass cases confirm that the SQL query is correctly structured and accurately represents the user's intent.

  • Fail cases often arise from incorrect column selection, missing conditions, or logical errors in SQL generation.

This evaluation helps improve SQL agent reliability by identifying structural errors early in the pipeline.

Click here to learn more about Deterministic Eval

from fi.testcases import MLLMTestCase
from fi.evals import Deterministic

class DeterministicTestCase(MLLMTestCase):
    table: str
    question: str
    sql: str
    
deterministic_eval = Deterministic(config={
  "multi_choice": False,
  "choices": ["Pass", "Fail"],
  "rule_prompt": '''table : {{input_key1}}, question : {{input_key2}}, sql : {{input_key3}}.
                    Given the table, question and sql, choose Pass if the sql is according to the question from table, else choose Fail''',
      "input": {
      "input_key1": "table",
      "input_key2": "question",
      "input_key3": "sql"
      }
})

complete_result = {}

options = []
reasons = []
for index, row in dataset.iterrows():
  test_case = DeterministicTestCase(
      table=row["table"],
      question=row["question"],
      sql=row["sql"]
      )

  result = evaluator.evaluate([deterministic_eval], [test_case])
  option = result.eval_results[0].data[0]
  reason = result.eval_results[0].reason
  options.append(option)
  reasons.append(reason)

complete_result["Det-Eval-Rating"] = options
complete_result["Det-Eval-Reason"] = reasons

Result After Validating SQL Queries Using Deterministic Eval:

Evaluating Context Sufficiency for SQL Queries

For an SQL agent to generate reliable results, it must ensure that the query has enough supporting data within the available table. If the table lacks relevant columns or necessary details, the SQL query may return incomplete, ambiguous, or incorrect results, thus leading to poor decision-making and unreliable data insights.

To address this, we use Context Sufficiency Evaluation, which determines whether the provided table contains enough information to accurately answer the user's query. This is especially important in enterprise environments, where missing data can introduce critical errors in business intelligence reports and automated workflows.

Click here to learn more about Context Sufficiency Eval

from fi.testcases import TestCase
from fi.evals.templates import ContextSufficiency

context_scores = []
context_reasons = []

for _, row in dataset.iterrows():
    test_case = TestCase(
        query=row["sql"],
        context=row["table"]
    )

    context_template = ContextSufficiency(config={
        "model": "gpt-4o-mini"
    })

    response = evaluator.evaluate(eval_templates=[context_template], inputs=[test_case])

    context_result = response.eval_results[0].metrics[0].value
    reason = response.eval_results[0].reason

    context_scores.append(context_result)
    context_reasons.append(reason)

dataset["context_sufficiency_score"] = context_scores
dataset["context_sufficiency_reason"] = context_reasons

complete_result["Context-Eval-Score"] = context_scores
complete_result["Context-Eval-Reason"] = context_reasons

Result After Evaluating Context Sufficiency for SQL Queries:

Evaluating SQL Agent Accuracy in Answering Queries

Even if an SQL query is syntactically correct, it must also return the correct result for the given question. The Completeness evaluation assesses whether the SQL-generated output fully answers the user's question.

This evaluation ensures SQL agents generate queries that produce the right answers, not just syntactically correct SQL.

Click here to learn more about Completeness Eval

from fi.testcases import TestCase
from fi.evals.templates import Completeness

completeness_scores = []
completeness_reasons = []

for _, row in dataset.iterrows():
    test_case = TestCase(
        input=row["question"],
        output=row["output"]
    )

    completeness_template = Completeness(config={
        "required_keys": ["input", "output"],
        "output": "completeness_score",
    })

    response = evaluator.evaluate(eval_templates=[completeness_template], inputs=[test_case])

    completeness_result = response.eval_results[0].metrics[0].value
    reason = response.eval_results[0].reason

    completeness_scores.append(completeness_result)
    completeness_reasons.append(reason)

dataset["completeness_score"] = completeness_scores
dataset["completeness_reason"] = completeness_reasons

complete_result["Completeness-Eval-Score"] = completeness_scores
complete_result["Completeness-Eval-Reason"] = completeness_reasons

Result After Evaluating SQL Agent Accuracy in Answering Queries:

Evaluate Your RAG-based Analytical Tool Using Future AGI’s Dashboard

  • In addition to the Python SDK, Future AGI provides an intuitive dashboard that enables users to conduct the same evaluations without writing code. The dashboard allows users to upload datasets and perform evaluation with an interactive interface.

  • This no-code approach makes it easier for analysts and business users to leverage Future AGI’s evaluation capabilities without requiring programming expertise.

Click here to learn how to evaluate your AI-workflow using Future AGI’s Dashboard

  • The screenshot below showcases the evaluation results displayed within Future AGI’s dashboard, highlighting query validation status, context sufficiency, and completeness scores in an intuitive interface.

Key Findings

Our evaluations revealed several critical challenges affecting SQL agent reliability:

  • Flawed SQL structures lead to execution failures due to errors in column selection, omitted conditions, or incorrect query logic.

  • Incomplete or inaccurate results occur when queries do not fully address the user’s question, returning partial or misleading data.

  • Insufficient data context reduces accuracy and completeness when SQL queries rely on missing or inadequate table information.

By systematically applying these evaluation methods, enterprises can:

  • Optimizing SQL agent performance helps refine query generation models and reduce structural errors.

  • Ensuring accurate and reliable data retrieval improves business intelligence, reporting, and automation.

  • Strengthening trust in AI-driven SQL systems minimises incorrect outputs and enhances response quality.

Impact

  • Faster Query Validation: Future AGI improves SQL query validation speed by 10x, ensuring that generated queries are syntactically correct and contextually appropriate.

  • Reduction in Query Errors: By leveraging deterministic evaluation and context sufficiency analysis, Future AGI helps in minimising incorrect SQL generation by 90%, reducing misleading data insights and improving decision-making accuracy.

  • Enhanced Scalability: The AI-driven framework allows enterprises to process large-scale SQL queries efficiently, ensuring consistent performance and reliable analytics for data-driven business operations.

  • Improved SQL Query Accuracy: Through systematic query structure validation, context evaluation, and output correctness checks, Future AGI enhances SQL query reliability, achieving a 5X reduction in repeated queries due to the robustness of the agent.

  • Increased Trust in RAG-based Analytical Tools: By addressing performance bottlenecks and inaccuracies, Future AGI strengthens trust in AI-driven analytical tools, leading to higher adoption and better decision-making capabilities within enterprise environments.

Conclusion

With the ability to enhance query accuracy, optimize system performance, and support large-scale data analytics, Future AGI has become an essential tool for enterprises. The improvements in query validation, error reduction, and system scalability have made data-driven decision-making more efficient and reliable. The company now confidently processes high volumes of SQL queries while maintaining accuracy and performance. “Without these enhancements, our analytics workflow would still face major roadblocks,” a key stakeholder noted. Future AGI has effectively streamlined operations, reinforcing trust in AI-powered analytical tools.

future agi background
Background image

Ready to deploy Accurate AI?

Book a Demo
Background image

Ready to deploy Accurate AI?

Book a Demo
Background image

Ready to deploy Accurate AI?

Book a Demo
future agi background
Background image

Ready to deploy Accurate AI?

Book a Demo