FutureAGI for Startups: Get 6 months of Pro access free plus $5,000 in credits. Apply now!

Products

Research

Customers

Company

Resources

Docs

Pricing

Book a Demo

FutureAGI for Startups: Get 6 months of Pro access free plus $5,000 in credits. Apply now!

Home

Customers

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Last Updated

Jan 7, 2025

Rishav Hada

Time to read

5 mins

Enhancing meeting summarization with future agi

Join Future AGI

Advance Evaluations

Real-time monitoring

Safety Guardrails

Get started for free

Overview

Effective meeting summarization is critical for organizations to streamline communication and maintain productivity. Various AI models generate summaries, but selecting the best-performing model requires systematic evaluation based on specific metrics.

Future AGI provides a robust evaluation platform, accessible via its dashboard and SDK, enabling organizations to:

Compare AI-generated meeting summaries across multiple models.
Evaluate summaries based on relevance, coherence, brevity, and coverage.
Identify the best-performing model through actionable metrics.
Gain insights through a detailed dashboard.

This case study demonstrates how Future AGI’s SDK empowers organizations to optimize meeting summarization workflows efficiently.

Problem Statement

An organization utilizes multiple AI models for generating meeting summaries. The workflow involves:

Generating summaries from different models.
Comparing and evaluating summaries manually.

Challenges:

Subjective Evaluation: Human biases in assessing summaries lead to inconsistent results.
Scalability Issues: Evaluating outputs from multiple models is time-consuming.
Lack of Metrics: No systematic metrics for evaluating summary quality.

Solution Provided by Future AGI

Future AGI offers an automated evaluation framework to overcome these challenges. Here’s how:

1. Generating Summaries from Multiple Models

Meeting transcripts were processed through various AI models to obtain summaries. These summaries formed the input for the evaluation framework.

2. Initializing Future AGI’s Evaluator Client

The evaluator client enables automated evaluation of model outputs based on pre-defined metrics.

Code Snippet:

from fi.evals import EvalClient
from getpass import getpass

evaluator = EvalClient(
    fi_api_key=getpass("Enter your API Key: "),
    fi_secret_key=getpass("Enter your Secret Key: "),
    fi_api_url="<https://dev.api.futureagi.com>"
)
print("Evaluator client initialized successfully!")

3. Defining Evaluation Criteria

The summaries were evaluated using deterministic tests based on the following criteria:

Relevance: Does the summary cover the key points?
Coherence: Is the summary logically structured?
Brevity: Is the summary concise without losing critical information?
Coverage: Does the summary include all important topics?
FutureAGI also provides SummaryQuality metric that allows you to access the overall quality of the summary.

Code Snippet:

EVALUATION_CRITERIA = {
    "Relevance": "Evaluate whether the summary covers all the key points discussed in the meeting.",
    "Coherence": "Evaluate whether the summary is logically structured and easy to follow.",
    "Brevity": "Evaluate whether the summary is concise and free from unnecessary details.",
    "Coverage": "Evaluate whether the summary includes all important topics discussed."
}

4. Performing Evaluation

Each summary was evaluated against the criteria using Future AGI’s deterministic evaluation module and the SummaryQuality metric.

Code Snippet:

from fi.evals import Deterministic
from fi.testcases import MLLMTestCase

class SummaryTestCase(MLLMTestCase):
    transcript: str
    summary: str

complete_results = {}
for criterion, description in EVALUATION_CRITERIA.items():
    eval_task = Deterministic(config={
        "multi_choice": False,
        "choices": ["Good", "Poor"],
        "rule_prompt": f"transcript : {{{{input_key1}}}}, summary : {{{{input_key2}}}}. {description}",
        "input": {
            "input_key1": "transcript",
            "input_key2": "summary"
        }
    })

    results = []
    for index, row in data.iterrows():
        test_case = SummaryTestCase(
            transcript=row['transcript'],
            summary=row['summary']
        )
        result = evaluator.evaluate([eval_task], [test_case])
        results.append(result.eval_results[0].data[0])

    complete_results[criterion] = results

results_df = pd.DataFrame(complete_results)

from fi.testcases import TestCase
from fi.evals.templates import SummaryQuality

def evaluate_summary_quality(dataset, summary_column_name):
    scores = []

    for _, row in dataset.iterrows():
        test_case = TestCase(
            input=row["source"],
            output=row[summary_column_name],
       
        )
        template = SummaryQuality(config={"check_internet": False})
        response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

        score = response.eval_results[0].metrics[0].value
        scores.append(score)

    average_score = sum(scores) / len(scores) if scores else 0

    combined_results.append({
        "Summary Column": summary_column_name,
        "Avg. Summary Quality": average_score
    })

5. Comparing Models

The evaluation scores for each model’s summaries were aggregated to identify the best-performing model.

Code Snippet:

def get_majority(row):
    frequency = row.value_counts()
    return frequency.idxmax()

results_df["Majority Evaluation"] = results_df.apply(get_majority, axis=1)

model_scores = results_df.groupby("model_name")["Majority Evaluation"].value_counts()
best_model = model_scores.idxmax()
print(f"Best Performing Model: {best_model}")

Results and Dashboard Integration

The evaluation results were visualized on the Future AGI dashboard, showcasing:

Distribution of evaluation scores for each model.
Comparison of metrics like relevance, coherence, brevity, and coverage.
Identification of the top-performing model.

Key Outcomes

Streamlined Evaluation: Reduced manual effort in assessing summaries by 90%.
Objective Scoring: Eliminated subjectivity with usage of consistent metrics.
Improved Summarization Quality: Enabled data-driven selection of the best model.
Efficiency: Increased internal team efficiency by 20% through enhanced model selection for generating summaries.
Scalability: Evaluated 10x more summaries within the same timeframe, enabling faster iterations and experimentation.

Conclusion

Future AGI’s evaluation framework revolutionizes the assessment of AI-generated meeting summaries. By automating the evaluation process and providing actionable insights, it empowers organizations to select the most effective models, enhance communication, and achieve measurable improvements in productivity.

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework