AVE Agentic Vulnerability Enumeration
Scoring · OWASP AIVSS v0.8

Scoring

How AVE scores every behavioral classification record — the AIVSS v0.8 formula, the 10 AARF amplification factors, and worked examples from real records.

AVE does not invent its own scoring system. Every record is scored using OWASP AIVSS v0.8 — the AI Vulnerability Severity Scoring standard. AVE implements AIVSS; it does not own or modify it. This page documents how that scoring works inside an AVE record, so a researcher writing a new record or a reviewer checking an existing one can verify the number is right.

The formula

AIVSS = ((CVSS_Base + AARS) / 2) × ThM × Mitigation_Factor

Four inputs, each explained below:

TermRangeWhat it is
CVSS_Base0–10Standard CVSS 4.0 base score for the underlying flaw, independent of agentic context. Stored as aivss.cvss_base.
AARS0–10Agentic Amplification & Reachability Score — the sum of the 10 AARF factors below. Stored as aivss.aars.
ThM0.5–1.5Threat & Heuristic Multiplier — how real-world the threat is. Stored as aivss.thm.
Mitigation_Factor0–1How much available mitigation reduces real-world risk. Stored as aivss.mitigation_factor.
Why average CVSS and AARS, not just use AARS

AIVSS does not replace CVSS — it extends it. Averaging CVSS_Base with AARS means a class with a low traditional severity but high agentic amplification (or the reverse) lands in the middle rather than at either extreme. A record is never scored purely on how “agentic” it is.

AARS — the 10 AARF factors

AARS is the sum of 10 Agentic Amplification and Risk Factors (AARF), each scored 0.0 (not applicable) to 1.0 (fully applicable). They live in aivss.aarf as an optional breakdown object.

FactorScore 1.0 when…
autonomythe agent acts without human confirmation
tool_usethe component grants access to external tools or APIs
multi_agentthe attack chains across multiple agents
non_determinismbehavior varies unpredictably across runs
self_modificationthe component can alter its own instructions at runtime
dynamic_identitythe component assumes roles or personas
persistent_memorystate is retained across sessions
natural_language_inputinstructions are delivered via natural language
data_accessthe component reads sensitive data (files, env, databases)
external_dependenciesthe component loads remote code or content

Intermediate values (0.5) are used when a factor partially applies. AARS is simply the sum of all 10 — a record where every factor scores 1.0 has an AARS of 10.

ThM — Threat & Heuristic Multiplier

ThM reflects how real the threat is right now, independent of severity. It is set by the researcher authoring the record, based on observed evidence.

ThMWhen to use it
0.75Theoretical — no known proof of concept
0.90A working proof of concept exists
1.0Exploited in the wild, or weaponised

Severity bands

The final AIVSS score maps to severity, which must agree with aivss.aivss_score:

BandAIVSS range
CRITICAL≥ 9.0
HIGH7.0–8.9
MEDIUM4.0–6.9
LOW< 4.0

Worked example — AVE-2026-00046

AVE's only CRITICAL-severity record: MCP tool hook hijacking. Here is its real aivss object and how the final score was derived.

Step 1 — inputs CVSS_Base = 10.0 AARS = 8.5 (sum of AARF factors below) ThM = 1.0 · Mitigation_Factor = 1.0 Step 2 — AARF factors (sum = AARS) autonomy 1.0 · tool_use 1.0 · multi_agent 0.5 · non_determinism 0.5 · self_modification 1.0 dynamic_identity 1.0 · persistent_memory 0.5 · natural_language_input 1.0 · data_access 1.0 · external_dependencies 1.0 = 8.5 total — full tool interception capability with external exfiltration Step 3 — compose ((10.0 + 8.5) / 2) × 1.0 × 1.0 = 9.25 → rounds to 9.2 · CRITICAL
AVE-2026-00046 — MCP tool hook hijacking. Real values from the published record.

Worked example — AVE-2026-00014

The lowest-severity record in the set: false authority claim via trust escalation. A useful contrast — high CVSS_Base alone does not guarantee a high final score.

Step 1 — inputs CVSS_Base = 6.5 AARS = 5.5 (sum of AARF factors below) ThM = 0.75 · Mitigation_Factor = 0.83 Step 2 — AARF factors (sum = AARS) autonomy 0.5 · tool_use 0.5 · multi_agent 1.0 · non_determinism 1.0 · self_modification 0 dynamic_identity 1.0 · persistent_memory 0.5 · natural_language_input 1.0 · data_access 0 · external_dependencies 0 = 5.5 total — social engineering, amplified by multi-agent and dynamic identity Step 3 — compose ((6.5 + 5.5) / 2) × 0.75 × 0.83 = 3.73 → rounds to 3.7 · LOW
AVE-2026-00014 — false authority claim via trust escalation. ThM 0.75 (theoretical, no PoC) and a Mitigation_Factor below 1.0 both pull the score down despite meaningful CVSS_Base and AARS.
The invariant that matters most

confidence is never part of this calculation and never appears in an AVE record. AIVSS answers “how bad would this be if it fires”; confidence answers “how sure is the scanner that it fired.” The two are separate fields on a scanner Finding, computed independently. See Architecture for the full declares-vs-assigns contract.

Where this lives in a record

Field-level reference for every part of the aivss object — required/optional status and types — is on the Schema page. This page explains how the numbers are derived; Schema documents the field shapes that hold them.

OWASP AIVSS v0.8 specification: aivss.owasp.org