Feature

Health Scoring

Buoy summarizes design-system health with a 0-100 score based on four weighted pillars: value discipline, token health, consistency, and critical issues.

4 pillars CI gateable Actionable suggestions Framework-aware

The score is designed to be fast to scan and hard to game. It blends direct drift signals (like hardcoded values, unused tokens, semantic mismatches, and critical accessibility issues) with density-based penalties so larger codebases are evaluated proportionally.

CLI Presentation (Proposed)

This is a design target for the human-readable buoy show health output. It uses the real score model fields (score, tier, pillars, suggestions, metrics) in a more scan-friendly format.

terminal
$ buoy show health
Buoy Health Score
────────────────
Score 88/100 Great
Threshold PASS (failBelow: 70)
Trend +4 since last scan
Meter [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] 88
Pillars (weighted)
valueDiscipline 60/60 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
tokenHealth 8/20 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
consistency 10/10 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
criticalIssues 10/10 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Signals
totalDriftCount: 0
tokenCount: 0
componentCount: 105
highDensityFileCount: 0
Suggestion
Tailwind detected but token coverage is low
- extend your theme config with custom values

Score Ranges

80-100

Great

Design system adoption is strong. Most drift is minor cleanup work.

60-79

Good

Healthy baseline with visible drift. Prioritize trend control and token adoption.

40-59

OK

Drift is material. You should plan cleanup and CI guardrails before it compounds.

0-39

Needs Work

High drift density or critical issues. Focus on stabilization before feature velocity suffers.

How Scores Are Calculated

Buoy uses a weighted 4-pillar model. Each pillar draws from one or more drift types and supporting metrics. The weights intentionally favor day-to-day maintainability (value discipline) while still surfacing systemic issues.

60 pts

Value Discipline

Hardcoded value density is the primary factor, with dead code and total drift density as backstops.

  • hardcoded-value
  • unused-component
  • orphaned-component
  • repeated-pattern
20 pts

Token Health

Measures token system presence, coverage, and usage, with penalties for token drift and mismatches.

  • unused-token
  • orphaned-token
  • value-divergence
  • framework/library detection + token counts
10 pts

Consistency

Captures naming and semantic consistency, plus framework sprawl signals that increase entropy.

  • naming-inconsistency
  • semantic-mismatch
  • framework-sprawl
10 pts

Critical Issues

Prioritizes urgent failures and technical debt, with small penalties for documentation gaps.

  • accessibility-conflict
  • color-contrast
  • deprecated-pattern
  • missing-documentation
  • high hardcoded-density files

Pillar Deep Dives

Use these sections when you need to explain why a score moved. Each pillar has different inputs, thresholds, and improvement levers.

Value Discipline (60 points)

Primary pillar

This is the largest pillar because hardcoded values and drift density are the strongest signals of maintenance cost. Buoy scores this primarily from hardcoded value density per component, then adds dead-code pressure and a total-drift backstop.

Direct Inputs

  • hardcoded-value (primary density input)
  • unused-component
  • orphaned-component
  • repeated-pattern
  • totalDriftCount (density backstop)

What Lowers It Fast

  • Many arbitrary values in a small number of components
  • High-concentration drift in a few files
  • Dead UI components that inflate code surface area
Best fix order: clean high-density files first, then extract repeated values into tokens/components.

Token Health (20 points)

Adoption + correctness

Token Health measures whether you have a usable token system and whether components actually consume it. It now also penalizes token drift directly, not just unused tokens.

Direct Inputs

  • tokenCount
  • unused-token
  • orphaned-token
  • value-divergence (small penalty)
  • Framework/library presence (tailwind, DS libs)

How It Works

  • Credits token coverage (do tokens exist at all?)
  • Credits token usage ratio (how many are actually used)
  • Adds framework-aware partial credit when tokens are implicit
  • Subtracts for token drift / mismatched canonical values
Watch for false confidence: many tokens defined but unused usually means stale tokens or missing wiring in components.

Consistency (10 points)

Naming + semantics

Consistency is intentionally small but sensitive. It catches entropy that slows teams down even when visual drift is not severe yet. Framework sprawl now counts here because mixed stacks often create naming and usage inconsistency.

Direct Inputs

  • naming-inconsistency
  • semantic-mismatch
  • framework-sprawl (weighted as entropy)

Typical Symptoms

  • isDisabled vs disabled vs inactive
  • Semantic tokens/components used inconsistently
  • Multiple styling systems competing in the same UI surface
Best fix order: standardize naming conventions first, then reduce framework overlap in new code paths.

Critical Issues (10 points)

Urgency + safety

This pillar protects against severe regressions. It combines critical-severity drift with technical debt signals and a small documentation penalty so teams don’t ignore usability and onboarding gaps.

Direct Inputs

  • criticalCount (severity-driven, e.g. contrast/accessibility)
  • deprecated-pattern
  • missing-documentation (small scaled penalty)
  • highDensityFileCount (amplifies maintenance risk)

How To Use It

  • CI should fail fast on critical issues regardless of total score
  • Use deprecated-pattern counts to prioritize migrations
  • Treat missing docs as a product-team productivity issue
CI recommendation: combine --fail-on critical with health.failBelow for layered enforcement.

Drift Type Coverage Matrix

Health scoring now has explicit direct coverage for every drift type. The table below shows which pillar each drift type feeds. Some pillars also use non-drift metrics (token counts, framework detection, density heuristics).

Drift TypePillarCoverage
hardcoded-valueValue DisciplineDirect
unused-componentValue DisciplineDirect
orphaned-componentValue DisciplineDirect
repeated-patternValue DisciplineDirect
unused-tokenToken HealthDirect
orphaned-tokenToken HealthDirect
value-divergenceToken HealthDirect
naming-inconsistencyConsistencyDirect
semantic-mismatchConsistencyDirect
framework-sprawlConsistencyDirect
accessibility-conflictCritical IssuesDirect
color-contrastCritical IssuesDirect
deprecated-patternCritical IssuesDirect
missing-documentationCritical IssuesDirect

Scoring Flow

1

Scan + Analyze Drift

Buoy scans components/tokens, then generates drift signals across all enabled analyzers.

2

Build Health Metrics

Counts drift types, framework context, token/component totals, and density heuristics.

3

Score 4 Pillars

Each pillar gets a bounded score and contributes to the final 0-100 total.

Example Calculation

Inputs

  • 50 components
  • 30 hardcoded values
  • 40 tokens (6 unused)
  • 2 semantic mismatches
  • 1 deprecated pattern

Pillar Scores

valueDiscipline  = 42 / 60
 tokenHealth     = 15 / 20
 consistency     =  9 / 10
 criticalIssues  =  8 / 10
─────────────────────────
 total           = 74 / 100 (Good)

Improving Your Score

Quick Wins

  • Fix high-density files first to improve value discipline quickly.
  • Wire unused tokens into real components or delete stale tokens.
  • Resolve semantic mismatches (naming and meaning) before they spread.

Long-Term Strategy

  • Gate CI with health.failBelow to prevent regressions.
  • Track score trends after major UI migrations/refactors.
  • Reduce framework sprawl to improve consistency and onboarding.

Score in CI

Gate PRs on health score using health.failBelow in your config:

# .buoy.yaml
health:
  failBelow: 70    # Exit code 1 if health score < 70

Then run the check in CI:

buoy drift check

Combine score gating with drift severity gating for layered protection:

# Fail on critical drift AND if health score drops below 70
buoy drift check --fail-on critical

The --json output includes a health object when configured:

$ buoy drift check --json
{
  "drifts": [...],
  "health": {
    "score": 68,
    "threshold": 70,
    "passed": false
  }
}

Viewing Score Details

$ buoy show health

Health Score: 74/100 (Good)

Value Discipline  42/60
Token Health      15/20
Consistency        9/10
Critical Issues    8/10

Top suggestion:
  6 tokens defined but unused β€” wire them into components or remove stale definitions

Related