How to Measure Technical Debt: 12 Metrics That Matter

Most engineering organizations can describe their technical debt in the abstract. Far fewer can put a number on it. That gap is what keeps managing technical debt off the roadmap and out of the board deck, and what eventually lets it dictate both.
Measurement closes the gap. Like financial debt, technical debt accumulates interest in the form of slower software development, higher costs, and rising risk, and with the right metrics in place, that interest becomes a quantity rather than a vibe: costs in engineer-hours, risk in blast radius, drag in lead time. It becomes something you can prioritize against new features, defend in a budgeting conversation, and watch trends over quarters.
This guide is the measurement-focused companion to our pillar on what technical debt is. It covers the 12 metrics that matter across code, architecture, and team layers; the formulas to compute them; how to express debt in dollars; and how to build a dashboard your CTO will actually open. The architecture layer gets the most attention because in systems with many services, teams, and ownership boundaries, that is often where the largest unmeasured liabilities sit.
Why Measuring Technical Debt Is Hard (and Why You Should Anyway)
Technical debt is distributed across software systems, and like financial debt, it resists measurement for three reasons. First, technical debt refers to many things at once. It shows up in various forms, since a slow build, a brittle module, a misaligned service boundary, a stale runbook, outdated code, and legacy code paths all count, but they live in different systems and surface as different symptoms, with increased complexity emerging where they overlap. The term "technical debt" refers to this whole landscape, not a single artifact. Second, the cost is counterfactual. Technical debt is what you would not have spent if the existing code had been built differently, and counterfactuals are hard to instrument. Third, the most expensive debt tends to be architectural, and architecture-level debt has historically been the least observable layer of any software system.
The case for measuring anyway is straightforward: without numbers, every technical debt conversation collapses into anecdote, and three concrete benefits unlock the moment you have them:
- Prioritization. When two cleanup proposals compete for the same sprint, ratio-based scoring (estimated savings ÷ remediation cost) settles it.
- Executive buy-in. A board does not fund "refactoring." It funds "$1.8M of recurring engineering drag, reducible by 40% in two quarters."
- Proof of improvement. Without a baseline, the team that just spent six weeks paying down debt cannot show that it worked. With one, the trend line is the deliverable.
Measurement is not the goal. It is the precondition for treating debt as an engineering investment rather than a moral failing.
The Three Layers of Tech Debt Measurement
Tech debt lives in three layers, and each one needs its own instrumentation.
Code debt is what most tools measure today: cyclomatic complexity, code duplication, lint counts, low test coverage, and "code smells" in the SonarQube sense. These code quality signals are abundant, cheap to collect, and the easiest to game. They tell you which files are painful to change, but they rarely tell you which decisions are driving the highest costs.
Architectural debt sits between modules: dependency cycles, fan-out hotspots, ownership gaps, and drift between the architecture you intended and the one currently running. This layer is the most predictive of long-term velocity loss and the most under-instrumented. A codebase with clean lint and brutal coupling will still be slow and dangerous to change.
Organizational-level debt shows up in delivery and team health: lead time for changes, change failure rate, on-call hours, and attrition among developers on debt-heavy teams. The DORA program provides the canonical delivery metrics here, while Microsoft Research's SPACE framework is useful for interpreting developer productivity and team-health signals without reducing productivity to activity counts.
A rigorous program for managing technical debt covers all three layers because code metrics capture local pain, organizational metrics capture system-wide drag, and architectural metrics capture the cause of both.
12 Metrics That Matter for Technical Debt Measurement
The metrics below are the working set we recommend instrumenting. Four per layer, chosen for either widespread tooling support or strong predictive value.
A few notes on selection. Lines of code, story points, and commit counts are deliberately omitted because they reward activity rather than outcomes, and the SPACE framework authors are explicit that such metrics tend to backfire. Code coverage is not included among the 12 primary metrics for the same reason: useful as a supporting trend, dangerous as a target, and easily inflated by code reviews that waive tests without scrutiny. These code quality signals work best in combination, not in isolation, and identifying technical debt across them is a measurement problem before it is a remediation problem.
For developer-side key metrics beyond the four organizational ones above, see our companion piece on developer productivity metrics.
How to Calculate the Cost of Technical Debt
A metric is only as useful as the decision it informs, and most decisions need a dollar figure. Three usable methods translate technical debt into informed decisions about money, capacity, and roadmap.
1. The SQALE method: Technical Debt Ratio. The most widely adopted formal definition, originating in Jean-Louis Letouzey's SQALE method:
TDR = (Remediation Effort / Development Effort) × 100
Development Effort = Lines of Code × Cost-to-develop-one-line
Remediation Effort = Σ (effort required to fix each detected issue)
Tools that implement SQALE-style debt ratios often map the result to a maintainability grade such as A through E, with the exact thresholds depending on the tool or quality model. SonarQube, for example, currently grades A at a TDR under 5% and E above 50%. The strength of this approach is that it is reproducible and tool-supported. The weakness is that it only measures what static analysis can see, which is almost entirely code debt.
2. The time-based method: developer-hours absorbed by debt. Track, per quarter, how many developer-hours go to remediation efforts: reverts, hotfixes, manual deploys, quick fixes, "while we're in there" cleanup, incident response on known-fragile systems. Multiply by fully loaded developer costs.
Quarterly Debt Cost ($) = Σ (debt-tagged engineer-hours) × loaded hourly rate
This captures organizational and architectural debt that static analysis misses. It requires either time-tracking discipline or a tagging convention on tickets and incidents.
3. The opportunity-cost method: lost feature velocity and development velocity. Compare delivery throughput on high-debt versus low-debt services or teams. The delta, in features-per-quarter lost and development velocity reduced, translates to revenue not shipped and a clearer view of business impact.
Velocity Loss ($) = (baseline velocity − actual velocity) × revenue per feature
This is the hardest to defend statistically and the most powerful in a board conversation.
Worked example. A 60-developer software development team at $200K loaded cost per engineer logs roughly 4,800 engineer-hours per quarter on debt-tagged work (about 12% of capacity). At $96 per hour, that is $460,800 per quarter, or $1.84M annually, before any opportunity-cost adjustment. Halving that with a focused technical debt reduction program changes what a CFO is willing to fund.
In our work with engineering teams, we see roughly 30-40% of engineering time lost to rework from drift and technical debt. The number is plausible because architectural debt often manifests as repeated rework: duplicate implementations, avoidable coordination, manual reviews, delayed releases, and rework caused by drift.
Architecture-Level Tech Debt: The Hardest to Measure (and the Most Important)
Code-level tools have a thirty-year head start. SonarQube, ESLint, and their peers can measure cyclomatic complexity, code duplication, and coverage in any modern repository. Organizational-level tools have a decade of history behind them, including LinearB, Swarmia, and the open-source DORA metrics-as-a-service ecosystem.
Architecture-level metrics are the gap. They require a dependency graph that spans services, not just files. They require a model of intended architecture to compare runtime behavior against. They require ownership data that stays current as teams reorganize. Most development teams lack those resources, which is why architectural debt is often the largest unmeasured line item on the engineering balance sheet for many organizations and a major source of interest payments hidden inside their software.
What is measurable today, with the right tools and instrumentation:
- Graph metrics: centrality, modularity, and average path length between services. High centrality on a single service means a blast radius problem.
- Dependency cycles: any strongly connected component larger than one node in the service graph signals tightly coupled components and a deployment-ordering bug waiting to happen.
- Drift: the symmetric difference between the runtime call graph and the architecture you said you were building. Outdated architectures show up here first.
- Ownership gaps: services with no active owner, or with owners on teams that have been reorganized. These are often where security vulnerabilities and security gaps go unnoticed.
- Blast-radius spread: for each service, the number of downstream services that would be affected by a one-hour outage, which is the closest measurable proxy for potential bugs in production.
Capturing these manually is possible, but it does not scale past a few dozen services. This is the gap we built Catio to close. As the Architecture IDE for modern software systems, Catio is designed to maintain a live model of the runtime architecture and cloud environments, including services, dependencies, ownership, and cost context, so teams can continuously track architecture-level debt. The framing we use is straightforward: SonarQube focuses on code, DORA focuses on delivery, and Catio focuses on the architecture in between, which is the layer most teams have the hardest time measuring.
Building a Technical Debt Dashboard
A measurement program that never leaves a wiki page is not a measurement program. The metrics need to live somewhere that a CTO or VP of Engineering will look at weekly.
For a CTO-level dashboard, five to seven metrics is the right working set: more than seven and no one reads it, fewer than five and you cannot see across the three layers. A defensible default:
- Technical Debt Ratio (code-level summary), as a single number with a six-month trend line
- Architecture drift score (architecture-level), where direction matters more than absolute value
- Open dependency cycles (architecture-level), as a count plus the worst three by name
- Ownership coverage % (architecture-level), which should sit at 95% or higher
- Lead time for changes (org-level), reported as median and 90th percentile
- Change failure rate (org-level), as a running 30-day average
- Quarterly debt cost in dollars, the executive summary number
Roll these up by team and by service so the conversation can move from "we have technical debt" to "the payments team has three open cycles, drift is rising, and lead time has doubled since Q3." Surface the same view in three places: a permanent CTO page, the architecture-review meeting, and the quarterly board deck. This is how tracking technical debt turns into informed decisions rather than slide-deck talking points.
Catio surfaces this view automatically through its Architecture Decision Loop, the five-stage cycle of Understand, Decide, Design, Execute, and Compound. The architecture state you measure is then the same one your development teams are deciding and shipping against, not a stale diagram that drifted three quarters ago.
How Often to Measure (and When Measurement Is a Trap)
Different metrics deserve different cadences.
Real-time or per-deploy. Lead time, deployment frequency, change failure rate, and MTTR. These come out of CI/CD and incident data and update themselves.
Daily or weekly. Code-level TDR, complexity hotspots, coverage. Most static analysis tools run on every PR.
Per-sprint. Dependency cycles, ownership coverage, and on-call hours. Cheap to compute, fast to act on, and an early warning that technical debt accumulates faster than your roadmap accounts for.
Quarterly. Architecture drift, opportunity costs in dollars, board-facing rollups. These are strategic, not tactical, and treating them as tactical creates noise rather than informed decisions.
Three anti-patterns to watch for in any technical debt reduction effort, all of them well-documented in the SPACE framework and elsewhere:
- Vanity metrics. Lines of code, commits per developer, story points completed. They feel like they are measuring and rewarding the wrong behavior.
- Gaming. Any metric that becomes a target stops working as a measure (Goodhart's Law). Coverage is the most common casualty: developers write tests that execute lines without asserting anything meaningful, and no number of code reviews fixes the underlying incentive.
- Measurement paralysis. Tracking 40 metrics is functionally the same as tracking zero, since no one can act on that volume. If a metric does not change a decision, drop it.
The right number of debt metrics is the smallest number that lets you prioritize, defend the prioritization, and prove improvement.
From Measurement to Action: Closing the Loop
Measurement that does not change behavior is a hobby. The point is to translate metrics into roadmap decisions, then to reduce technical debt where the numbers tell you it is most expensive.
The closure mechanism is a set of decision rules tied to your software development process. When a metric crosses a defined threshold, something specific happens:
- TDR above 15% on a service triggers reserving the next sprint's slack capacity for that service.
- Any open dependency cycle blocks new features that would add to the cycle.
- An architecture drift score rising two quarters in a row requires an architecture review before the next roadmap commit.
- Change failure rate above 15% triggers an audit of the release process, mandatory code reviews on hotfix paths, and a sweep of incident postmortems.
Pair the rules with a budget. The common "20% of resources goes to debt" promise needs metrics behind it: which 20%, against which thresholds, judged against which targets. Without that, it is a slogan. With it, it is a policy that ties resources required to business goals. For the strategies and anti-patterns on what to do once the threshold trips, see our companion guide on how to reduce technical debt.
AI-Assisted Tech Debt Measurement
AI is changing the measurement layer in three concrete ways, and overpromising in a fourth.
Better code analysis. LLM-assisted analysis can help identify semantic duplication, suspicious dead paths, bad code, and API misuse patterns that simple linters may miss. It should complement, not replace, deterministic static analysis, and it makes identifying technical debt and security vulnerabilities across legacy code more tractable than before.
Architecture intelligence. This is the larger shift. Metrics like cycles, drift, ownership coverage, and blast radius used to require a hand-maintained service graph that almost no organization kept current. Tools like Catio aim to infer the architecture graph from live systems and cloud environments, keep it current, and let teams query it in natural language. In a mature setup, a team should be able to ask a question such as "where is our biggest architecture debt this quarter, and what would it cost to fix?" and get an evidence-backed starting point rather than a week-long manual graph exercise. Archie, Catio's conversational architecture copilot, is built for that workflow, and Catio's modernization solution turns the answer into a costed plan with timelines and actionable insights.
Cost quantification. Translating raw metrics into dollar figures used to be an analyst exercise. Modeling system behavior to quantify cost in ROI terms is now within reach of tooling rather than only spreadsheets, which is what makes business impact something you can defend rather than estimate.
The honest limits. AI does not yet decide which debt is worth paying down, because only humans set the strategy. AI estimates of remediation efforts are still rough, and AI can confidently produce incorrect numbers, so any AI-driven measurement program needs human review for high-stakes calls. Treat AI as the layer that makes the measurement possible at scale, not the layer that decides what to do about it.
Conclusion
Technical debt can be measured rigorously across three layers, including code, architecture, and organization, and translated into the dollar figures that move it from engineering retrospectives onto the company P&L and into everyday business operations. Code quality tools are mature and well supported by code quality metrics built into modern static analysis tools. Organizational tools are catching up. Architecture-level tools are the largest gap and the most predictive of how quickly your software will be able to ship a year from now.
The right measurement program is not the largest one. It is the smallest set of metrics that lets developers and leaders prioritize cleanup work, defend the prioritization in dollar terms, and prove improvement over time. Start with the twelve in the table above, cut to the seven you would put on a CTO dashboard, tie thresholds to decisions, and revisit the set every quarter to keep managing technical debt in line with business goals.
Catio surfaces architecture-level technical debt metrics, including dependency cycles, drift, ownership gaps, and the interest your architecture is paying every quarter, through a live digital twin of your software systems. Start with a real architecture decision and see your architecture measured in minutes, not quarters, with developers and architects working off the same numbers as the business.
Frequently Asked Questions
How is technical debt measured? Technical debt is measured across three layers. Code-level metrics such as cyclomatic complexity, code duplication, code churn, and TDR describe local pain. Architecture metrics such as dependency cycles, drift, and ownership coverage describe systemic risk. Organizational metrics from DORA and SPACE describe the impact on developers and delivery. Combining the three gives you a clearer understanding of how much technical debt you actually carry and where rapid development decisions have left long-term sustainability problems.
What is the TDR? The technical debt ratio is the cost to remediate debt expressed as a percentage of the cost it would take to redevelop the code. The formula, defined by the SQALE method, is (remediation effort ÷ development effort) × 100. Tools that adopt SQALE-style ratios typically map the result to a maintainability grade such as A through E, with the exact thresholds set by the tool: SonarQube currently grades A under 5% and E above 50%.
What is the 80/20 rule for technical debt? The 80/20 rule is a useful heuristic rather than a formal law: in many systems, a small number of modules, services, or legacy code paths account for a disproportionate share of technical debt cost. For example, a single payment service often dominates total interest costs in a large enterprise. Measurement programs use this heuristic to focus remediation efforts where they pay back the most, rather than treating debt as a uniform problem and trying to measure technical debt evenly across the codebase.
How do you monitor technical debt? Monitor it through a small dashboard that surfaces five to seven key metrics across code, architecture, and organizational layers, refreshed automatically from your existing development process and CI/CD data. The best tools for tracking technical debt close the loop between measurement, code reviews, and roadmap decisions, so that debt trends drive informed decisions rather than slide-deck commentary.
What are the 4 types of technical debt? Martin Fowler's quadrant identifies four types of technical debt by intent and prudence: deliberate-and-prudent (informed trade-offs for short-term gains), deliberate-and-reckless (skipping known good practice under rushed development), inadvertent-and-prudent (debt you only see in hindsight), and inadvertent-and-reckless (debt taken on without awareness of design alternatives). All four show up in real codebases, often in various forms simultaneously.
How do you measure tech debt in dollars? Three methods are often used together. The SQALE method converts static-analysis findings into remediation hours, then multiplies by developer costs. The time-based method tracks developer-hours spent on debt-tagged work each quarter and multiplies by loaded hourly costs. The opportunity-cost method compares feature throughput on high-debt versus low-debt services and translates the delta into revenue not shipped. The right method depends on whether you are reporting to engineering, finance, or the business, and many organizations use all three to build a clearer understanding of what their software actually costs to maintain.
What is the SQALE method? SQALE, short for Software Quality Assessment based on Life-cycle Expectations, is an open-source method developed by Jean-Louis Letouzey in 2010 for quantifying technical debt. It defines a quality model (the practices that constitute "right code"), measures the time to fix each violation, and produces a debt ratio and letter grade. SonarQube is the most widely deployed implementation.
Can AI accurately measure technical debt? AI is most accurate at the layers that produce structured data: code analysis and architecture-graph inference. AI estimates of dollar costs and remediation time are useful as starting points, but should be reviewed by humans for any decision above a few weeks of effort. The current value of AI in this space is making architecture-level measurement scalable, not making cost estimates infallible.
What's the difference between technical debt and code smells? A code smell is a local indicator, such as a long method, a duplicated block, or a deeply nested conditional, that suggests something is wrong. Technical debt is the accumulated principal and interest of those issues, plus everything code smells cannot see: misaligned service boundaries, missing ownership, design debt, architectural drift, and brittle deploys. Code quality issues are inputs to measurement. Technical debt is the output.


