Enhancing meeting summarization with future agi
Enhancing meeting summarization with future agi
Enhancing meeting summarization with future agi
Enhancing meeting summarization with future agi
Rishav Hada

Rishav Hada

Rishav Hada

Jan 7, 2025

Jan 7, 2025

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Enhancing Meeting Summarization Through Future AGI’s Intelligent Evaluation Framework

Let's discuss

Let's discuss

Let's discuss

Overview

Effective meeting summarization is critical for organizations to streamline communication and maintain productivity. Various AI models generate summaries, but selecting the best-performing model requires systematic evaluation based on specific metrics.

Future AGI provides a robust evaluation platform, accessible via its dashboard and SDK, enabling organizations to:

  1. Compare AI-generated meeting summaries across multiple models.

  2. Evaluate summaries based on relevance, coherence, brevity, and coverage.

  3. Identify the best-performing model through actionable metrics.

  4. Gain insights through a detailed dashboard.

This case study demonstrates how Future AGI’s SDK empowers organizations to optimize meeting summarization workflows efficiently.

Problem Statement

An organization utilizes multiple AI models for generating meeting summaries. The workflow involves:

  • Generating summaries from different models.

  • Comparing and evaluating summaries manually.

Challenges:

  1. Subjective Evaluation: Human biases in assessing summaries lead to inconsistent results.

  2. Scalability Issues: Evaluating outputs from multiple models is time-consuming.

  3. Lack of Metrics: No systematic metrics for evaluating summary quality.

Solution Provided by Future AGI

Future AGI offers an automated evaluation framework to overcome these challenges. Here’s how:

1. Generating Summaries from Multiple Models

Meeting transcripts were processed through various AI models to obtain summaries. These summaries formed the input for the evaluation framework.

2. Initializing Future AGI’s Evaluator Client

The evaluator client enables automated evaluation of model outputs based on pre-defined metrics.

Code Snippet:

from fi.evals import EvalClient
from getpass import getpass

evaluator = EvalClient(
    fi_api_key=getpass("Enter your API Key: "),
    fi_secret_key=getpass("Enter your Secret Key: "),
    fi_api_url="<https://dev.api.futureagi.com>"
)
print("Evaluator client initialized successfully!")

3. Defining Evaluation Criteria

The summaries were evaluated using deterministic tests based on the following criteria:

  • Relevance: Does the summary cover the key points?

  • Coherence: Is the summary logically structured?

  • Brevity: Is the summary concise without losing critical information?

  • Coverage: Does the summary include all important topics?

  • FutureAGI also provides SummaryQuality metric that allows you to access the overall quality of the summary.

Code Snippet:

EVALUATION_CRITERIA = {
    "Relevance": "Evaluate whether the summary covers all the key points discussed in the meeting.",
    "Coherence": "Evaluate whether the summary is logically structured and easy to follow.",
    "Brevity": "Evaluate whether the summary is concise and free from unnecessary details.",
    "Coverage": "Evaluate whether the summary includes all important topics discussed."
}

4. Performing Evaluation

Each summary was evaluated against the criteria using Future AGI’s deterministic evaluation module and the SummaryQuality metric.

Code Snippet:

from fi.evals import Deterministic
from fi.testcases import MLLMTestCase

class SummaryTestCase(MLLMTestCase):
    transcript: str
    summary: str

complete_results = {}
for criterion, description in EVALUATION_CRITERIA.items():
    eval_task = Deterministic(config={
        "multi_choice": False,
        "choices": ["Good", "Poor"],
        "rule_prompt": f"transcript : {{{{input_key1}}}}, summary : {{{{input_key2}}}}. {description}",
        "input": {
            "input_key1": "transcript",
            "input_key2": "summary"
        }
    })

    results = []
    for index, row in data.iterrows():
        test_case = SummaryTestCase(
            transcript=row['transcript'],
            summary=row['summary']
        )
        result = evaluator.evaluate([eval_task], [test_case])
        results.append(result.eval_results[0].data[0])

    complete_results[criterion] = results

results_df = pd.DataFrame(complete_results)
from fi.testcases import TestCase
from fi.evals.templates import SummaryQuality

def evaluate_summary_quality(dataset, summary_column_name):
    scores = []

    for _, row in dataset.iterrows():
        test_case = TestCase(
            input=row["source"],
            output=row[summary_column_name],
       
        )
        template = SummaryQuality(config={"check_internet": False})
        response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

        score = response.eval_results[0].metrics[0].value
        scores.append(score)

    average_score = sum(scores) / len(scores) if scores else 0

    combined_results.append({
        "Summary Column": summary_column_name,
        "Avg. Summary Quality": average_score
    })

5. Comparing Models

The evaluation scores for each model’s summaries were aggregated to identify the best-performing model.

Code Snippet:

def get_majority(row):
    frequency = row.value_counts()
    return frequency.idxmax()

results_df["Majority Evaluation"] = results_df.apply(get_majority, axis=1)

model_scores = results_df.groupby("model_name")["Majority Evaluation"].value_counts()
best_model = model_scores.idxmax()
print(f"Best Performing Model: {best_model}")

Results and Dashboard Integration

The evaluation results were visualized on the Future AGI dashboard, showcasing:

  • Distribution of evaluation scores for each model.

  • Comparison of metrics like relevance, coherence, brevity, and coverage.

  • Identification of the top-performing model.

Key Outcomes

  1. Streamlined Evaluation: Reduced manual effort in assessing summaries by 90%.

  2. Objective Scoring: Eliminated subjectivity with consistent metrics.

  3. Improved Summarization Quality: Enabled data-driven selection of the best model.

  4. Scalability: Evaluated outputs from multiple models efficiently.

Conclusion

Future AGI’s evaluation framework revolutionizes the assessment of AI-generated meeting summaries. By automating the evaluation process and providing actionable insights, it empowers organizations to select the most effective models, enhance communication, and achieve measurable improvements in productivity.

Backgorund
Future agi logo

Ready to automate your AI lifecycle?

Backgorund
Future agi logo

Ready to automate your AI lifecycle?

Backgorund
Future agi logo

Ready to automate your AI lifecycle?

Backgorund
Future agi logo

Ready to automate your AI lifecycle?