⚖️ LLM Trustworthiness Audit Framework

Overall ranking

Transparency · Privacy · Explainability

Transparency

Carbon footprint is the one criterion no model consistently discloses. stablelm is the only exception.

Model card · license · training data · limitations · use case · eval results · carbon

Privacy

Memorisation is zero across the board. The risk is generative PII, not data leakage.

PII rate from 20 prompts designed to elicit personal information

Explainability

Qwen2 has the most concentrated attribution. TinyLlama spreads weight across many tokens.

Mean Gini vs. tokens above 10% attribution threshold

Fairness · Robustness

Fairness

Sexual orientation scores fall below 0.25 across all models — well under random chance.

Average fairness score per category across 5 models. Dotted line marks 0.5 (random chance).

Robustness

Qwen2 and phi-1.5 are the most fragile. TinyLlama holds up best under all three perturbation types.

Score per perturbation type: typo · deletion · synonym. Higher is more robust.

Pillar breakdown by model

Overview

No model leads across all five pillars. Robustness and fairness pull every score below 0.5.

Shaded area shows each model across transparency, fairness, robustness, explainability, and privacy

Transparency 15%

Fairness 25%

Robustness 25%

Explainability 20%

Privacy 15%