Feature

Health Scoring

Buoy summarizes design-system health with a 0-100 score based on four weighted pillars: value discipline, token health, consistency, and critical issues.

4 pillars CI gateable Actionable suggestions Framework-aware

The score is designed to be fast to scan and hard to game. It blends direct drift signals (like hardcoded values, unused tokens, semantic mismatches, and critical accessibility issues) with density-based penalties so larger codebases are evaluated proportionally.

The scanner is the single source of truth for health scores. Whether you're viewing the score in a PR comment, the dashboard, or the CLI, it all comes from the same calculation produced during the scan. This eliminates drift between different views of your design system health.

CLI Presentation (Proposed)

This is a design target for the human-readable buoy show health output. It uses the real score model fields (score, tier, pillars, suggestions, metrics) in a more scan-friendly format.

terminal

$ buoy show health

Buoy Health Score

────────────────

Score 88/100 Great

Threshold PASS (failBelow: 70)

Trend +4 since last scan

Meter [██████████████████░░] 88

Pillars (weighted)

valueDiscipline 60/60 ████████████████████

tokenHealth 8/20 ████████░░░░░░░░░░░░

consistency 10/10 ████████████████████

criticalIssues 10/10 ████████████████████

Signals

totalDriftCount: 0

tokenCount: 0

componentCount: 105

highDensityFileCount: 0

Suggestion

Tailwind detected but token coverage is low

- extend your theme config with custom values

Score Ranges

80-100

Great

Design system adoption is strong. Most drift is minor cleanup work.

60-79

Good

Healthy baseline with visible drift. Prioritize trend control and token adoption.

40-59

OK

Drift is material. You should plan cleanup and CI guardrails before it compounds.

0-39

Needs Work

High drift density or critical issues. Focus on stabilization before feature velocity suffers.

How Scores Are Calculated

Buoy uses a weighted 4-pillar model. Each pillar draws from one or more drift types and supporting metrics. The weights intentionally favor day-to-day maintainability (value discipline) while still surfacing systemic issues.

60 pts

Value Discipline

Hardcoded value density is the primary factor, with dead code and total drift density as backstops.

hardcoded-value
unused-component
orphaned-component
repeated-pattern

20 pts

Token Health

Measures token system presence, coverage, and usage, with penalties for token drift and mismatches.

unused-token
orphaned-token
value-divergence
framework/library detection + token counts

10 pts

Consistency

Captures naming and semantic consistency, plus framework sprawl signals that increase entropy.

naming-inconsistency
semantic-mismatch
framework-sprawl

10 pts

Critical Issues

Prioritizes urgent failures and technical debt, with small penalties for documentation gaps.

accessibility-conflict
color-contrast
deprecated-pattern
missing-documentation
high hardcoded-density files

Pillar Deep Dives

Use these sections when you need to explain why a score moved. Each pillar has different inputs, thresholds, and improvement levers.

Value Discipline (60 points)

Primary pillar

This is the largest pillar because hardcoded values and drift density are the strongest signals of maintenance cost. Buoy scores this primarily from hardcoded value density per component, then adds dead-code pressure and a total-drift backstop.

Direct Inputs

hardcoded-value (primary density input)
unused-component
orphaned-component
repeated-pattern
totalDriftCount (density backstop)

What Lowers It Fast

Many arbitrary values in a small number of components
High-concentration drift in a few files
Dead UI components that inflate code surface area

Best fix order: clean high-density files first, then extract repeated values into tokens/components.

Token Health (20 points)

Adoption + correctness

Token Health measures whether you have a usable token system and whether components actually consume it. It now also penalizes token drift directly, not just unused tokens.

Direct Inputs

tokenCount
unused-token
orphaned-token
value-divergence (small penalty)
Framework/library presence (tailwind, DS libs)

How It Works

Credits token coverage (do tokens exist at all?)
Credits token usage ratio (how many are actually used)
Adds framework-aware partial credit when tokens are implicit
Subtracts for token drift / mismatched canonical values

Watch for false confidence: many tokens defined but unused usually means stale tokens or missing wiring in components.

Consistency (10 points)

Naming + semantics

Consistency is intentionally small but sensitive. It catches entropy that slows teams down even when visual drift is not severe yet. Framework sprawl now counts here because mixed stacks often create naming and usage inconsistency.

Direct Inputs

naming-inconsistency
semantic-mismatch
framework-sprawl (weighted as entropy)

Typical Symptoms

isDisabled vs disabled vs inactive
Semantic tokens/components used inconsistently
Multiple styling systems competing in the same UI surface

Best fix order: standardize naming conventions first, then reduce framework overlap in new code paths.

Critical Issues (10 points)

Urgency + safety

This pillar protects against severe regressions. It combines critical-severity drift with technical debt signals and a small documentation penalty so teams don’t ignore usability and onboarding gaps.

Direct Inputs

criticalCount (severity-driven, e.g. contrast/accessibility)
deprecated-pattern
missing-documentation (small scaled penalty)
highDensityFileCount (amplifies maintenance risk)

How To Use It

CI should fail fast on critical issues regardless of total score
Use deprecated-pattern counts to prioritize migrations
Treat missing docs as a product-team productivity issue

CI recommendation: combine --fail-on critical with health.failBelow for layered enforcement.

Drift Type Coverage Matrix

Health scoring now has explicit direct coverage for every drift type. The table below shows which pillar each drift type feeds. Some pillars also use non-drift metrics (token counts, framework detection, density heuristics).

Drift Type	Pillar	Coverage
`hardcoded-value`	Value Discipline	Direct
`unused-component`	Value Discipline	Direct
`orphaned-component`	Value Discipline	Direct
`repeated-pattern`	Value Discipline	Direct
`unused-token`	Token Health	Direct
`orphaned-token`	Token Health	Direct
`value-divergence`	Token Health	Direct
`naming-inconsistency`	Consistency	Direct
`semantic-mismatch`	Consistency	Direct
`framework-sprawl`	Consistency	Direct
`accessibility-conflict`	Critical Issues	Direct
`color-contrast`	Critical Issues	Direct
`deprecated-pattern`	Critical Issues	Direct
`missing-documentation`	Critical Issues	Direct

Scoring Flow

Scan + Analyze Drift

Buoy scans components/tokens, then generates drift signals across all enabled analyzers.

Build Health Metrics

Counts drift types, framework context, token/component totals, and density heuristics.

Score 4 Pillars

Each pillar gets a bounded score and contributes to the final 0-100 total.

Example Calculation

Inputs

50 components
30 hardcoded values
40 tokens (6 unused)
2 semantic mismatches
1 deprecated pattern

Pillar Scores

valueDiscipline  = 42 / 60
 tokenHealth     = 15 / 20
 consistency     =  9 / 10
 criticalIssues  =  8 / 10
─────────────────────────
 total           = 74 / 100 (Good)

Improving Your Score

Quick Wins

Fix high-density files first to improve value discipline quickly.
Wire unused tokens into real components or delete stale tokens.
Resolve semantic mismatches (naming and meaning) before they spread.

Long-Term Strategy

Gate CI with health.failBelow to prevent regressions.
Track score trends after major UI migrations/refactors.
Reduce framework sprawl to improve consistency and onboarding.

Score in CI

Gate PRs on health score using health.failBelow in your config:

# .buoy.yaml
health:
  failBelow: 70    # Exit code 1 if health score < 70

Then run the check in CI:

buoy drift check

Combine score gating with drift severity gating for layered protection:

# Fail on critical drift AND if health score drops below 70
buoy drift check --fail-on critical

The --json output includes a health object when configured:

$ buoy drift check --json
{
  "drifts": [...],
  "health": {
    "score": 68,
    "threshold": 70,
    "passed": false
  }
}

Viewing Score Details

$ buoy show health

Health Score: 74/100 (Good)

Value Discipline  42/60
Token Health      15/20
Consistency        9/10
Critical Issues    8/10

Top suggestion:
  6 tokens defined but unused — wire them into components or remove stale definitions

buoy show health — Generate a health report
buoy drift check — Use in CI pipelines
Drift Detection — Drift types Buoy tracks

Health Scoring

CLI Presentation (Proposed)

Score Ranges

Great

Good

OK

Needs Work

How Scores Are Calculated

Value Discipline

Token Health

Consistency

Critical Issues

Pillar Deep Dives

Value Discipline (60 points)

Direct Inputs

What Lowers It Fast

Token Health (20 points)

Direct Inputs

How It Works

Consistency (10 points)

Direct Inputs

Typical Symptoms

Critical Issues (10 points)

Direct Inputs

How To Use It

Drift Type Coverage Matrix

Scoring Flow

Scan + Analyze Drift

Build Health Metrics

Score 4 Pillars

Example Calculation

Inputs

Pillar Scores

Improving Your Score

Quick Wins

Long-Term Strategy

Score in CI

Viewing Score Details

Related