Feature
Health Scoring
Buoy summarizes design-system health with a 0-100 score based on four weighted pillars: value discipline, token health, consistency, and critical issues.
The score is designed to be fast to scan and hard to game. It blends direct drift signals (like hardcoded values, unused tokens, semantic mismatches, and critical accessibility issues) with density-based penalties so larger codebases are evaluated proportionally.
CLI Presentation (Proposed)
This is a design target for the human-readable buoy show health output. It uses the real score model fields
(score, tier, pillars, suggestions, metrics) in a more scan-friendly format.
Score Ranges
80-100
Great
Design system adoption is strong. Most drift is minor cleanup work.
60-79
Good
Healthy baseline with visible drift. Prioritize trend control and token adoption.
40-59
OK
Drift is material. You should plan cleanup and CI guardrails before it compounds.
0-39
Needs Work
High drift density or critical issues. Focus on stabilization before feature velocity suffers.
How Scores Are Calculated
Buoy uses a weighted 4-pillar model. Each pillar draws from one or more drift types and supporting metrics. The weights intentionally favor day-to-day maintainability (value discipline) while still surfacing systemic issues.
Value Discipline
Hardcoded value density is the primary factor, with dead code and total drift density as backstops.
hardcoded-valueunused-componentorphaned-componentrepeated-pattern
Token Health
Measures token system presence, coverage, and usage, with penalties for token drift and mismatches.
unused-tokenorphaned-tokenvalue-divergence- framework/library detection + token counts
Consistency
Captures naming and semantic consistency, plus framework sprawl signals that increase entropy.
naming-inconsistencysemantic-mismatchframework-sprawl
Critical Issues
Prioritizes urgent failures and technical debt, with small penalties for documentation gaps.
accessibility-conflictcolor-contrastdeprecated-patternmissing-documentation- high hardcoded-density files
Pillar Deep Dives
Use these sections when you need to explain why a score moved. Each pillar has different inputs, thresholds, and improvement levers.
Value Discipline (60 points)
Primary pillarThis is the largest pillar because hardcoded values and drift density are the strongest signals of maintenance cost. Buoy scores this primarily from hardcoded value density per component, then adds dead-code pressure and a total-drift backstop.
Direct Inputs
hardcoded-value(primary density input)unused-componentorphaned-componentrepeated-patterntotalDriftCount(density backstop)
What Lowers It Fast
- Many arbitrary values in a small number of components
- High-concentration drift in a few files
- Dead UI components that inflate code surface area
Token Health (20 points)
Adoption + correctnessToken Health measures whether you have a usable token system and whether components actually consume it. It now also penalizes token drift directly, not just unused tokens.
Direct Inputs
tokenCountunused-tokenorphaned-tokenvalue-divergence(small penalty)- Framework/library presence (
tailwind, DS libs)
How It Works
- Credits token coverage (do tokens exist at all?)
- Credits token usage ratio (how many are actually used)
- Adds framework-aware partial credit when tokens are implicit
- Subtracts for token drift / mismatched canonical values
Consistency (10 points)
Naming + semanticsConsistency is intentionally small but sensitive. It catches entropy that slows teams down even when visual drift is not severe yet. Framework sprawl now counts here because mixed stacks often create naming and usage inconsistency.
Direct Inputs
naming-inconsistencysemantic-mismatchframework-sprawl(weighted as entropy)
Typical Symptoms
isDisabledvsdisabledvsinactive- Semantic tokens/components used inconsistently
- Multiple styling systems competing in the same UI surface
Critical Issues (10 points)
Urgency + safetyThis pillar protects against severe regressions. It combines critical-severity drift with technical debt signals and a small documentation penalty so teams donβt ignore usability and onboarding gaps.
Direct Inputs
criticalCount(severity-driven, e.g. contrast/accessibility)deprecated-patternmissing-documentation(small scaled penalty)highDensityFileCount(amplifies maintenance risk)
How To Use It
- CI should fail fast on critical issues regardless of total score
- Use deprecated-pattern counts to prioritize migrations
- Treat missing docs as a product-team productivity issue
Drift Type Coverage Matrix
Health scoring now has explicit direct coverage for every drift type. The table below shows which pillar each drift type feeds. Some pillars also use non-drift metrics (token counts, framework detection, density heuristics).
| Drift Type | Pillar | Coverage |
|---|---|---|
hardcoded-value | Value Discipline | Direct |
unused-component | Value Discipline | Direct |
orphaned-component | Value Discipline | Direct |
repeated-pattern | Value Discipline | Direct |
unused-token | Token Health | Direct |
orphaned-token | Token Health | Direct |
value-divergence | Token Health | Direct |
naming-inconsistency | Consistency | Direct |
semantic-mismatch | Consistency | Direct |
framework-sprawl | Consistency | Direct |
accessibility-conflict | Critical Issues | Direct |
color-contrast | Critical Issues | Direct |
deprecated-pattern | Critical Issues | Direct |
missing-documentation | Critical Issues | Direct |
Scoring Flow
1
Scan + Analyze Drift
Buoy scans components/tokens, then generates drift signals across all enabled analyzers.
2
Build Health Metrics
Counts drift types, framework context, token/component totals, and density heuristics.
3
Score 4 Pillars
Each pillar gets a bounded score and contributes to the final 0-100 total.
Example Calculation
Inputs
- 50 components
- 30 hardcoded values
- 40 tokens (6 unused)
- 2 semantic mismatches
- 1 deprecated pattern
Pillar Scores
valueDiscipline = 42 / 60
tokenHealth = 15 / 20
consistency = 9 / 10
criticalIssues = 8 / 10
βββββββββββββββββββββββββ
total = 74 / 100 (Good) Improving Your Score
Quick Wins
- Fix high-density files first to improve value discipline quickly.
- Wire unused tokens into real components or delete stale tokens.
- Resolve semantic mismatches (naming and meaning) before they spread.
Long-Term Strategy
- Gate CI with
health.failBelowto prevent regressions. - Track score trends after major UI migrations/refactors.
- Reduce framework sprawl to improve consistency and onboarding.
Score in CI
Gate PRs on health score using health.failBelow in your config:
# .buoy.yaml
health:
failBelow: 70 # Exit code 1 if health score < 70 Then run the check in CI:
buoy drift check Combine score gating with drift severity gating for layered protection:
# Fail on critical drift AND if health score drops below 70
buoy drift check --fail-on critical The --json output includes a health object when configured:
$ buoy drift check --json
{
"drifts": [...],
"health": {
"score": 68,
"threshold": 70,
"passed": false
}
} Viewing Score Details
$ buoy show health
Health Score: 74/100 (Good)
Value Discipline 42/60
Token Health 15/20
Consistency 9/10
Critical Issues 8/10
Top suggestion:
6 tokens defined but unused β wire them into components or remove stale definitions Related
- buoy show health β Generate a health report
- buoy drift check β Use in CI pipelines
- Drift Detection β Drift types Buoy tracks