The Measurement Problem in Engineering
Engineering productivity is the most contested measurement problem in software companies. Managers want to measure it. Engineers resist being measured. Most proposed metrics are wrong in subtle ways.
Lines of code rewards padding. Story points create a perverse incentive to inflate estimates. Tickets closed rewards easy tickets. PR count doesn't correlate with business value.
The research from the last decade has converged on two frameworks that actually predict software delivery performance: DORA metrics and the SPACE framework. Neither measures individual output — both measure system performance.
DORA Metrics: The Gold Standard
The DevOps Research and Assessment (DORA) program at Google has tracked software delivery performance since 2014 with data from 33,000+ professionals. Their four metrics:
1. Deployment Frequency How often your team deploys to production.
- Elite: On demand (multiple times per day)
- High: Between once per day and once per week
- Medium: Between once per week and once per month
- Low: Between once per month and once every six months
Why it matters: Deployment frequency is a proxy for batch size. Small, frequent deployments mean smaller changes, faster feedback, and lower risk. Teams that deploy daily have 1/5th the change failure rate of teams that deploy monthly.
2. Lead Time for Changes Time from commit to production deployment.
- Elite: Less than one hour
- High: Between one day and one week
- Medium: Between one week and one month
- Low: More than six months
Lead time measures your delivery pipeline efficiency. Long lead times mean slow feedback loops, accumulating WIP, and delayed value delivery.
3. Change Failure Rate Percentage of deployments that cause production incidents or require rollback.
- Elite: 0-15%
- High: 16-30%
- Medium: 16-30% (same range)
- Low: 46-60%
Note: Elite and High teams don't have zero failures — they have failures but catch and resolve them faster. The goal isn't perfection, it's fast recovery.
4. Mean Time to Recovery (MTTR) How long to restore service after a production incident.
- Elite: Less than one hour
- High: Less than one day
- Medium: Between one day and one week
- Low: More than one week
What the data shows: Elite performers (top quartile on all four metrics) are 4-5x more likely to exceed organizational performance and profitability targets. This isn't correlation — DORA controls for org size, industry, and tenure.
The SPACE Framework
DORA measures systems. SPACE (Satisfaction, Performance, Activity, Communication, Efficiency) measures developers. From GitHub and Microsoft Research:
Satisfaction: Developer job satisfaction, burnout risk, NPS of the developer experience. Why track it? Unhappy developers leave, and developer turnover costs $150-300K per engineer (recruiting + ramp time).
Performance: Code quality outcomes — bug rate, code review quality, security vulnerability rate. Not output volume, but output quality.
Activity: Systems-level activity — PRs merged, code reviews completed, CI runs. Used to identify bottlenecks, not to evaluate individuals.
Communication: Quality of collaboration — PR review turnaround time, async communication effectiveness, documentation quality.
Efficiency: Interruptions per day, context switching cost, wait time (for CI, review, approvals). How much of engineering time is value-added vs. overhead?
Tracking DORA in Practice
For a team not already measuring DORA:
Step 1: Instrument your deployment pipeline. Every deployment needs a timestamp and success/failure flag. If you're on GitHub Actions, CircleCI, or any modern CI/CD: this data is already there.
Step 2: Instrument your incident management. Every PagerDuty/OpsGenie alert needs: timestamp, linked deployment, resolution timestamp. Most teams have this but haven't connected it to deployment data.
Step 3: Calculate the four metrics monthly. Tools that automate this:
- LinearB: DORA + engineering analytics
- Jellyfish: Engineering intelligence platform
- Faros AI: Engineering metrics
- DIY: GitHub Actions + BigQuery + Looker
Step 4: Set targets, not quotas. DORA metrics are team-level, not individual-level. A target of "weekly deployments by Q2" is a team target. Blaming an individual for deployment frequency creates the wrong incentives.
The Anti-Metrics: What Not to Measure
Lines of code: Rewards verbose code, penalizes refactoring. A 50-line function that replaces 500 lines is a 450-line "productivity decline" by this metric.
Story points per sprint: Inflates over time (velocity gaming), varies by team, and doesn't correlate with business value.
PR count: Rewards trivial PRs. Engineers split features into many small PRs to hit metrics.
Tickets closed: Rewards taking easy tickets, penalizes hard architectural work.
Code coverage %: The metric of bureaucracy. Teams write tests that hit coverage targets without testing anything meaningful.
The common thread: any metric that can be gamed will be gamed. Individual output metrics are especially vulnerable to gaming. System metrics (DORA) are harder to game because they measure outcomes, not activities.
The Engineering Efficiency Calculation
For business leaders who need a dollar figure:
Developer time allocation study (average team):
- Directly productive coding: 32% of time
- Code review and collaboration: 18%
- Meetings and admin: 24%
- Context switching and interruptions: 14%
- Waiting (CI, approvals, blocked): 12%
Reducing waiting and context switching from 26% to 15% of time:
- 11% more productive coding capacity
- On a team of 10 engineers at $180K average salary: $198,000/year in additional effective engineering capacity — without hiring anyone
This is why DORA elite teams have 2-3x the effective output of low performers despite similar headcount and compensation. Velocity is an organizational property, not an individual one.
Use our Developer Productivity Calculator to estimate engineering throughput, delivery frequency targets, and the cost of context switching in your team.