Models

What Is a Parameter?

A parameter is a value learned by a machine-learning model during training, such as a weight or bias inside a neural network.

What Is a Parameter?

A parameter is a value a machine-learning model learns from training data. In a neural network these are the weights and biases multiplied through every layer; in a linear regression they are the coefficients of each feature. The complete set of parameters is what the model “knows.” Parameters are distinct from hyperparameters, which the engineer sets before training. Parameter count drives model size, memory footprint, inference latency, and the cost of fine-tuning. FutureAGI tracks parameter-related metadata such as model version, adapter id, and base model on every trace.

Why Parameters Matter in Production LLM and Agent Systems

Parameter count is the single number that most directly predicts a model’s compute and memory profile. A 70B LLM does not run on a single A10 GPU; a 7B variant does. Parameter count also shapes behavior: smaller models trade some quality for faster, cheaper, more predictable inference. The tradeoff drives every routing, fallback, and cost decision in a 2026 production stack.

The pain shows up across roles. ML engineers picking a model see token cost rise nonlinearly with parameter count when context windows grow. SREs see p99 latency spike when traffic shifts to a larger variant during an outage failover. Compliance teams need to know whether a fine-tuned variant changed parameter values broadly (full fine-tune) or narrowly (LoRA adapter) before approving release. Product teams feel parameter choices through quality-vs-cost ratios that determine pricing.

Agentic stacks add another wrinkle. A planner step might run on a large model for accuracy while routine tool-calling steps run on a smaller model for cost. The parameter count of each step is now a routing decision, not a model decision. Which adapter is loaded, which base model serves the route, and which quantization is in effect all change the effective parameter signature. Without per-trace metadata, regressions cannot be attributed to a specific configuration.

How FutureAGI Surfaces Parameter Metadata

FutureAGI does not train model weights, so it does not change parameter values. The honest connection is metadata: every traceAI span can carry model.version, adapter.id, base_model, and quantization metadata, plus llm.token_count.prompt and llm.token_count.completion for the cost side of parameter choices. fi.datasets.Dataset records the full configuration so a regression can be attributed to a specific parameter signature.

A practical example: a team runs a support-agent route across two model variants — a base 7B and a LoRA adapter trained on tickets. Both variants log model.version, adapter.id, prompt version, and llm.token_count.prompt to traceAI. After a week, the team builds a Dataset from sampled production traces, attaches Groundedness, AnswerRelevancy, and JSONValidation, and groups the eval results by (model.version, adapter.id). The output: a per-variant scorecard that says exactly which parameter signature is winning. Compared with a vague “the new model is better” claim, this is reproducible. Agent Command Center can then route traffic toward the winning variant or set a model fallback for spikes.

For pure parameter-count questions (“does scaling from 7B to 13B help our task?”), FutureAGI’s role is to make the difference measurable at the user-visible level, not to predict the gain.

How to Measure or Detect Parameter-Driven Effects

Compare variants on the exact same dataset and the exact same evaluator suite.

  • Per-variant evaluator deltasGroundedness, AnswerRelevancy, TaskCompletion, JSONValidation scores broken down by (model.version, adapter.id).
  • Token costllm.token_count.prompt and llm.token_count.completion summed and weighted by the model’s price per token.
  • Latency distribution — p50, p90, p99 across each variant; larger parameter counts often shift the right tail more than the median.
  • Memory pressure signals — kv-cache usage, batch size, queue depth on the inference engine.
  • Cohort consistency — does the larger variant only win on rare or hard cohorts? If so, route by cohort instead of globally.
from fi.evals import AnswerRelevancy

evaluator = AnswerRelevancy()
result = evaluator.evaluate(
    input=user_query,
    output=variant_b_response,
)
print(result.score, result.reason)

Common Mistakes

  • Conflating parameters with hyperparameters. They are different concepts; mixing them in incident reports confuses root cause.
  • Reading parameter count as quality. A well-trained 7B model can beat a poorly trained 70B on a specific task; measure, don’t assume.
  • Tracking only the base model. Adapters change effective parameters; missing adapter.id makes comparisons meaningless.
  • Ignoring quantization. A quantized 70B is a different parameter signature than the FP16 70B; treat it as a separate variant.
  • Skipping cost-per-quality math. A 5% quality lift at 4x token cost may not be worth shipping.

Frequently Asked Questions

What is a parameter in machine learning?

A parameter is a numerical value a model learns from training data, such as the weights and biases inside a neural network. The set of parameters is what the model 'knows'.

What is the difference between a parameter and a hyperparameter?

Parameters are learned by training; hyperparameters are set by the engineer before training, such as learning rate, batch size, and number of layers. Hyperparameters control how parameters are learned.

How does parameter count affect production?

More parameters generally mean more capacity but higher memory, latency, and cost. FutureAGI surfaces this via traces with `llm.token_count.prompt`, model version, and route-level latency comparisons.