DORA: Flow Metrics, Invisible Capabilities, and What Really Sustains Continuous Delivery

DORA doesn't measure productivity. It measures symptoms. And what really matters lies in the 24 capabilities that nobody remembers to implement.

January 30, 2026

25 min read

Também em Português

Series Why Productive Teams Fail

4/8

1 What We Really Mean When We Talk About Developer Experience 2 Before Measuring, Someone Chose What to Believe In 3 DORA, SPACE, DevEx, DX Core 4: Each Answers a Different Question 4 DORA: Flow Metrics, Invisible Capabilities, and What Really Sustains Continuous Delivery ← 5 SPACE: Productivity Is Not a Number — It's a Human System Under Tension 6 DevEx: Flow, Feedback, and the Load Nobody Measures 7 DX Core 4: When Understanding Isn't Enough and Action Becomes Mandatory 8 Beyond Team Metrics: Structure, Flow, and the Corporate Perspective

DORA has become common currency in conversations about DevOps and software delivery. Its four metrics — deployment frequency, lead time for changes, mean time to recovery, and change failure rate — are cited in presentations, used to justify investments, and included in corporate dashboards.

But very few people remember that DORA isn’t about metrics. DORA is about capabilities.

The fundamental misconception

DORA metrics are outcome indicators. They don’t tell you what to do. They signal whether what you’re doing is working. The real work lies in the 24 technical and organizational capabilities that sustain these metrics.

DORA metrics as symptoms

function as thermometers, not as treatment.

Why doesn’t DORA measure productivity? Because productivity in software isn’t about delivery speed — it’s about value generated sustainably. DORA measures symptoms of a healthy delivery system (fast flow, low risk, agile recovery), but it doesn’t measure:

Whether what’s being delivered solves real problems
Whether the architecture is evolving or rotting
Whether the team can sustain this pace without breaking
Whether there’s learning or just mechanical repetition
Whether technical decisions are improving or being postponed

A team can have excellent DORA metrics and still:

Be delivering features nobody uses
Accumulating unsustainable technical debt
Burning people out in the process
Avoiding complex problems in favor of easy deliveries

The dangerous confusion

DORA measures pipeline health, not productivity. An efficient delivery system is a necessary but not sufficient condition for real productivity. Confusing the two leads organizations to optimize for speed while destroying value and people.

Inherent limitations of the 4 DORA metrics

What the metrics show

How often code goes to production
How long from commit to deploy
How long to restore service after incident
How many changes cause failures

What they don't say

Why frequency is low
Where the bottleneck is in the process
Why incidents happen
What makes changes risky

Deployment Frequency

Measures how often code goes to production. High frequency signals that the pipeline is reliable, perceived risk is low, and changes can be small.

Doesn’t measure: Whether these deliveries matter. Whether the team is sustainable. Whether there’s technical debt growing in parallel.

Lead Time for Changes

Time from commit until the code is running in production. Low lead time indicates a lean pipeline, fewer manual steps, less bureaucracy.

Doesn’t measure: Quality of changes. Code review time. Cognitive load to do the work.

Mean Time to Recovery (MTTR)

How long it takes to restore service after a failure. Low MTTR indicates good observability, clear incident response processes, and quick rollback capability.

Doesn’t measure: Why the incident happened. How many incidents could have been prevented. Emotional toll on the team.

Change Failure Rate

Percentage of changes that cause degradation or incidents in production. Low rate indicates effective tests, stable environments, and safe deployments.

Doesn’t measure: Test coverage. Design quality. Technical decisions that increase or reduce fragility.

The diagnostic value

DORA measurements are useful because they direct conversations. They force questions: why does it take so long to deploy? Why do our deployments break? But the answers aren’t in the numbers — they’re in the capabilities.

The 24 capabilities nobody implements

The real work of DORA research lies in identifying the technical and organizational capabilities that differentiate high-performing teams. They’re grouped into five categories:

1. Technical Capabilities

Version control for everything

Code, configurations, infrastructure, scripts. Everything versioned. No “manual changes on the server”. No lost configurations. Rollback always possible.

Deployment automation

Automatic, repeatable, and reliable deployments. No manual steps. No “you have to run this script first”. The process is the same in dev, staging, and production.

Continuous Integration

Frequent commits, tests running automatically, fast feedback. If the build breaks, everyone knows immediately.

Trunk-based development

Short-lived branches. Frequent merges. Fewer conflicts. Real continuous integration, not “weekend CI”.

Test automation

Unit, integration, and contract tests. Meaningful coverage, not just cosmetic metrics. Confidence to change code.

Test data management

Isolated, reproducible, and secure test data. Test environments that actually resemble production.

Shift left on security

Security from the start. Static analysis, dependency review, properly managed secrets. Not “we’ll look at that later”.

Continuous Delivery

Code always deployable. Deployment is a business decision, not a technical feat. Pipeline reliable enough for deployment at any time.

Loose coupling

Changes in one service don’t break others. Clear contracts. Explicit dependencies. Independent evolution.

Architecture

Teams can test, deploy, and change their systems without excessive coordination with other teams. Architecture enables autonomy.

2. Process Capabilities

Customer feedback

Short feedback cycles. Validate hypotheses quickly. Learn from real users, not internal speculation.

Value stream

Understand the complete flow: from idea to delivered value. Identify bottlenecks, waste, and unnecessary steps.

Work in small batches

Small, frequent, and incremental changes. Less risk, faster feedback, less rework.

Team experimentation

Teams have autonomy to test ideas, change processes, learn from mistakes. Improvement doesn’t come from above, it comes from practice.

3. Management Capabilities

Change approval

Lightweight approvals, based on trust and automated controls — not committees. Peer review, not bureaucratic gates.

Monitoring and observability

Visibility into what’s happening in production. Structured logs, metrics, traces. Effective debugging.

Proactive notification

Problems detected before becoming incidents. Meaningful alerts, not noise.

Database change management

Schema changes versioned, tested, and deployed as code. No manual scripts, no “hope it works”.

WIP limits

Work in progress limited. Focus on finishing before starting new work. Real throughput, not illusion of movement.

Visual management

Work state visible to the team. Dashboards, boards, clear signals of progress and blockers.

4. Cultural Capabilities

Westrum organizational culture

Generative culture: information flows, collaboration is expected, failures are learning opportunities, not blame.

Learning culture

Time and space to learn. Share knowledge. Invest in technical development. Not just “deliver more”.

Job satisfaction

Meaningful work. Autonomy. Sense of progress. Teams that feel valued deliver better.

Transformational leadership

Leadership that inspires, intellectually challenges, individually supports, and encourages innovation. Not command-and-control.

Surface-level implementation

Implementing DORA dashboards without investing in capabilities is cosmetic. Numbers may even improve for a while — through heroic effort, shortcuts, or statistical illusion. But they don’t sustain.

Carregando diagrama...

The 24 DORA capabilities divided into 4 categories: Technical (10), Process (4), Management (6), and Cultural (4). Metrics are the consequence of these capabilities.

The Westrum Culture Model: the capability that sustains all others

Among the 24 DORA capabilities, one deserves special attention: Westrum organizational culture^[3]. Not because it’s more important than the others, but because it determines whether the others can function.

The three types of organizational culture

Ron Westrum, a sociologist who studied safety in high-risk organizations (aviation, healthcare, nuclear energy), identified three types of culture based on how information flows:

Type	How it treats information	How it treats failures	How it treats new ideas
Pathological	Information is power, withheld	Seeks scapegoats	Crushes
Bureaucratic	Information flows through formal channels	Seeks justice	Creates problems
Generative	Information flows to those who need it	Seeks learning	Implements

The cultural foundation

DORA research found a strong correlation between generative culture and high performance in software delivery. It’s no coincidence: measurements improve when people can talk about problems without fear of punishment.

Characteristics of each type

Pathological Culture:

Messengers are punished (“don’t bring problems, bring solutions”)
Responsibilities are avoided or transferred
Cooperation is discouraged; departments compete with each other
Failures are hidden until they explode
New ideas are seen as threats to the status quo

Bureaucratic Culture:

Messengers are tolerated if they follow the right channels
Responsibilities are rigidly compartmentalized
Cooperation happens only when mandatory
Failures lead to more processes and controls
New ideas are filtered through committees

Generative Culture:

Messengers are trained and valued
Responsibilities are shared
Cooperation is the norm; silos are actively fought
Failures are treated as learning opportunities
New ideas are welcomed and tested quickly

How culture affects DORA measurements

The connection between culture and measurements isn’t abstract — it’s mechanical:

Deployment Frequency: In pathological cultures, nobody wants to be responsible for a deploy that breaks. Result: deploys accumulate, become large and risky. In generative cultures, small and frequent deploys are safe because errors are treated as learning, not as crimes.

Lead Time: In bureaucratic cultures, every change needs multiple formal approvals. Result: lead time inflated by waiting in queues. In generative cultures, trust enables more agile approvals and collaborative reviews.

MTTR: In pathological cultures, incidents become witch hunts. Result: people hide problems or delay escalation. In generative cultures, incidents are reported immediately and everyone collaborates on the solution.

Change Failure Rate: In fear-based cultures, developers avoid risks — but also avoid improvements. Paradoxically, this can increase the failure rate because changes accumulate and tests are cut to speed up approvals.

Impact of culture on DORA indicators

What generative culture produces

Problems are detected early
Information flows to those who need it
Experimentation is safe
Learning is continuous

What pathological culture produces

Problems are hidden until they explode
Information is withheld as power
Experimentation is punished
Repetition of errors is constant

The trap of measuring culture

Westrum didn’t create his model to be transformed into a corporate survey. But that’s exactly what many organizations do: they apply questionnaires, calculate a “culture score” and declare victory when the number rises.

The problem: culture doesn’t change by decree. An organization can have declared generative values while practicing pathological behaviors. The CEO can talk about “psychological safety” while managers punish those who bring bad news.

The 3 AM test

An organization’s real culture isn’t what’s written in the handbook. It’s what happens when someone makes a serious production error at 3 AM. If the first question is “who did this?”, the culture is pathological — no matter what the values slide says.

Cultural change is slow and painful

If your organization has pathological or bureaucratic culture, there’s no shortcut. Cultural change requires:

Consistent leadership: Leaders need to model desired behavior, not just talk about it
Demonstrated safety: People need to see that those who report problems aren’t punished
Time: Trust takes years to build and seconds to destroy
Tolerance for discomfort: Exposing previously hidden problems is painful before it’s liberating

Why this matters for DORA: You can implement all the technical practices — CI/CD, automated tests, observability — and still have poor indicators if the culture doesn’t allow people to talk about problems. Culture is the soil where practices grow or die.

The critique nobody makes: when DORA becomes dogma

Implementing the 24 capabilities is fundamental. They represent the real work of building sustainable delivery systems. But there’s a conversation that rarely happens in meeting rooms, conferences, or articles about DevOps: what if the DORA measurements themselves are leading us in the wrong direction?

This isn’t about denying the model’s value. DORA brought rigor and evidence to discussions that were previously just anecdotal. But there’s a problem growing silently: when measurements become absolute truths, when correlations become universal laws, when numbers replace judgment.

The epistemological problem and structural critiques

Before diving deeper, it’s worth questioning something rarely discussed: do measurements describe reality or create it?

When DORA categorizes teams as “Elite”, “High”, “Medium”, and “Low performers”, is it merely observing patterns — or establishing a hierarchy that organizations will pursue? When deployment frequency becomes a success indicator, are we measuring organizational health — or creating a game where “faster” becomes synonymous with “better”, regardless of context?

Implicit premises of the DORA framework

What DORA assumes

Organizational contexts are comparable
Speed is a universal virtue
Metrics reveal objective truth
Improvement is always measurable

What it often ignores

Each context has its unique constraints
Speed has human and technical costs
Metrics create incentives and behaviors
Not everything valuable shows up in dashboards

The hidden premise

DORA starts from statistical correlations observed across thousands of organizations. But correlation isn’t universal law. What works for the majority may not work for you. And what measures one thing well doesn’t necessarily measure another.

This distinction matters because when we forget that indicators are snapshots — not absolute truths — we start treating them as ends in themselves. And then the system stops asking “are we solving the right problem?” and only asks “are we improving the numbers?”.

The five structural critiques

1. Context-blindness: same indicators, opposite realities

Two teams can have identical deployment frequency — 10 deployments per day — and be living completely opposite realities.

Team A: Deploys frequently because they built a robust automated pipeline, reliable tests, and a culture of trust. Small changes, automatic rollback, zero anxiety.

Team B: Deploys frequently because they’re under constant managerial pressure. Features are broken into artificially small PRs to “raise the number”. Tests were reduced to speed up CI. Team lives in permanent alert state.

Both appear as “Elite performers” on the dashboard.

DORA doesn’t distinguish built capability from dangerous shortcut. Measurements are blind to how and why — they only see how much.

The blind spot

Two teams with identical indicators can be on opposite trajectories: one toward sustainability, another toward collapse. And DORA doesn’t differentiate.

2. Reverse gaming: optimizing in the wrong direction

When measurements become targets, systems start gaming against them. Not out of malice, but through organizational dynamics^[2].

Real examples of gaming:

Deployment Frequency: Features are broken into dozens of tiny PRs. A change that should be atomic becomes 15 deployments “to raise the frequency”.
Lead Time: Commits are made but not merged until the last moment. PRs stay in draft. The “official lead time” drops, but real work time stays the same.
MTTR (Mean Time to Recovery): Auto-rollback is configured aggressively. Any error becomes automatic rollback, counted as “fast recovery” — even when it should be investigated.
Change Failure Rate: Incidents are reclassified as “planned maintenance”. Problematic changes are hidden in “normal” deployments. The rate drops on the dashboard, but problems keep happening.

Gaming: when numbers improve but reality worsens

What the metric shows

High deployment frequency
Very low lead time
Fast recovery
Low failure rate

What's really happening

Artificially fragmented features
Hidden Work in Progress
Uninvestigated problems
Masked incidents

The system isn’t improving — it’s learning to lie to the indicator.

3. The invisible human cost of performance

A team can maintain excellent DORA indicators while people break inside.

Real scenario: Team maintains “Elite performer” status for 8 consecutive months. High deployment frequency, low lead time, impressive MTTR. In retrospectives, everything seems fine. On dashboards, everything is green.

Then, in a single month, three key developers resign. When questioned, the answer is uniform: exhaustion.

What measurements didn’t show:

Constant work outside hours to maintain frequency
Chronic anxiety before each deploy
No time for learning or technical improvement
Technical debt accumulating silently
Rushed decisions to “not break the rhythm”

The false positive

Indicators don’t know — and can’t know — whether high performance is sustainable or is being extracted through human wear.

A team can be “performing well” according to DORA and simultaneously heading toward collective burnout. The indicators don’t measure cognitive, emotional, or social cost.

4. The fallacy of universal speed

DORA implicitly assumes that faster is always better. But this premise isn’t universal — it’s contextual.

Example: Financial compliance system

According to DORA, this team is a “Low performer”:

Deployment frequency: 2x per month
Lead time: 3 weeks
Change failure rate: ~5%

But context reveals another story:

Every change goes through mandatory audit
Deployments require approved maintenance window
Rollback isn’t trivial (legacy database, external contracts)
The cost of failure isn’t reputational — it’s regulatory

This team is being cautious for good reasons. Accelerating could increase unacceptable risk. “Low performer” in DORA doesn’t mean “bad team” — it just means speed isn’t the right goal for this system.

Context matters

For certain domains — safety-critical systems, financial infrastructure, embedded hardware — stability, predictability, and careful analysis matter more than speed.

DORA doesn’t help decide when speed is the wrong goal. And when applied blindly, it can penalize teams doing exactly what they should be doing.

5. The epistemological critique: correlation isn’t law

The deepest critique — and most rarely articulated — is philosophical.

DORA comes from empirical research: thousands of organizations were observed, patterns were identified, correlations were established. Teams with high deployment frequency tend to have better overall performance. Teams with low MTTR tend to be more resilient.

But correlation isn’t causation. And tendency isn’t universal law.

Yet DORA is frequently treated as:

Absolute scientific truth
Organizational maturity checklist
Moral ruler to judge teams
Unquestionable justification for technical decisions

This transforms an observation instrument into a management ideology.

The ideological shift

When used for ranking between teams, when indicators become individual performance targets, when numbers replace conversations — DORA stops illuminating problems and becomes the problem.

The model describes what was observed in specific contexts. It doesn’t prescribe what should be done in all contexts. And it’s definitely not a guarantee that “improving the numbers” means improving the system.

The synthesis phrase of the critique

DORA measures how well a team can change software — not whether those changes make sense, how much it costs to sustain them, or what’s being sacrificed to maintain them.

It’s a thermometer, not a diagnosis. And thermometers don’t tell you if the fever is a symptom of something serious or just the body healthily fighting an infection.

Why we believe: Gartner, legitimacy, and scientific theater

If these critiques are so evident — context matters, metrics can be manipulated, numbers don’t capture human wear — why is DORA treated as unquestionable truth in so many organizations?

The answer passes through a name that rarely appears in technical discussions, but exercises disproportionate influence over corporate decisions: Gartner.

What Gartner really is (and isn’t)

What Gartner actually is

Gartner isn’t a scientific body, doesn’t produce experimental knowledge, and doesn’t validate practices neutrally. It is, fundamentally, a consulting company that sells reduction of perceived decision risk.

When a CIO or VP of Engineering needs to justify millions in DevOps transformation investment, the problem isn’t technical — it’s political:

“What if it goes wrong?”
“How do I justify this to the board?”
“Who else is doing this?”
“How do I know it’s not just a fad?”

Citing Gartner solves this problem. Not because Gartner discovered a technical truth nobody else knew. But because it offers something more valuable to executives: reputational cover.

Perception vs reality of Gartner's role

What Gartner seems to be

Independent scientific research
Neutral technology validation
Best practices discovery
Quality certifying body

What Gartner really is

Corporate advisory
Market consensus mapping
Organization of existing narratives
Reputational insurance for executives

Gartner’s value isn’t in being right. It’s in being defensible.

If an initiative fails, but was based on Gartner recommendation, the failure becomes “market-aligned decision that didn’t work in this context”. If it had no Gartner endorsement and fails, it becomes “risky bet from a reckless leader”.

Magic Quadrant: measuring acceptability, not quality

Gartner’s most famous product — the Magic Quadrant — is frequently interpreted as a technical quality ranking. It’s not.

It measures corporate acceptability: how safe is it to choose this tool without being questioned? How widely adopted is it already? How well does the vendor company position itself in the market?

What the Magic Quadrant really evaluates

The Magic Quadrant classifies vendors into four quadrants based on two axes:

Ability to Execute: company size, market reach, financial viability, customer support
Completeness of Vision: alignment with market trends, perceived innovation, product strategy

Note what isn’t being directly measured:

Technical quality of the solution
Ease of use
Fit for specific contexts
Real cost-benefit
Developer experience

A technically superior product from a small startup never reaches the “Leaders” quadrant. A mediocre product from a giant vendor has structural advantage.

Why executives trust it anyway

Because the Magic Quadrant solves their problem: reducing decision anxiety.

Choosing a “Leader” in the Magic Quadrant means:

Decision can be explained in a board meeting
External consultants will validate the choice
Other executives will recognize the brand
If it fails, blame is shared with “the market”

Choosing a tool outside the Quadrant requires active justification. Choosing within the Quadrant is the default choice — it doesn’t need to be justified, it needs to be contested to not happen.

Why Gartner recommends DORA

Now it’s clearer why DORA consistently appears in Gartner reports and recommendations. Not because this approach is infallible, but because it has three properties that executives (and Gartner) value:

1. It’s simple and communicable

Four indicators. Easy to explain in a slide. Easy to compare over time. Easy to report to non-technical stakeholders.

This is executive currency. Complex, nuanced, context-dependent measurements don’t fit in 30-minute meetings with the C-level. Simplicity sells — even when it reduces reality to the point of distorting it.

2. It has “sufficient scientific backing”

DORA comes from research with thousands of organizations, annual reports, academic language, statistical correlations. This creates symbolic authority — even without strong causation.

For Gartner, this is enough to legitimize use. It doesn’t need to be perfect. It needs to be defensible.

3. It reinforces a convenient narrative

DORA sustains a story executives want to tell:

“Faster teams are better teams. Let’s invest in automation, DevOps, and agile transformation to raise our numbers.”

This narrative:

Justifies budget for tools
Creates a sense of measurable progress
Enables comparison with competitors
Aligns technology with “efficiency” (C-level magic word)

Gartner doesn’t create this narrative from scratch — it organizes, packages, and validates it.

Lost in translation

When Gartner recommends DORA, it’s doing so as a governance framework, not as a complete explanatory model of reality. The problem is this distinction disappears in implementation.

The side effect: from observation to control

What starts as:

“Let’s observe our delivery capability”

quickly becomes:

“Let’s manage people by these numbers”

At that moment:

Measurements become performance targets
Correlation becomes cause
Observation becomes control
Model becomes morality

And neither Gartner nor DORA were designed for this — but the corporate system incentivizes exactly this use.

The phrase that summarizes everything

The market dynamics

Gartner doesn’t sell truth. It sells defensible consensus.

DORA enters this package as an approach “scientific enough” to be used — and “simple enough” to be misused.

Executives turn to Gartner not to discover what’s right, but to ensure their decisions are hard to contest. DORA works perfectly in this role: it offers clear numbers, has an appearance of rigor, and reinforces narratives already in progress.

The problem isn’t that Gartner recommends DORA. The problem is that organizations treat this recommendation as scientific validation, when it’s actually market consensus mapping. And consensus isn’t synonymous with wisdom.

DORA’s structural limits

After understanding the critiques of the model and Gartner’s role in its legitimization, it’s easier to see where DORA ends — not due to design failure, but due to inherent limitations of any measurement system.

DORA is powerful for measuring a system’s delivery capability. But there are fundamental questions it doesn’t answer — and never proposed to answer:

Structural limits of the approach

What DORA sees

Pipeline speed
Deploy stability
Recovery capability
Change frequency

What DORA doesn't see

Quality of technical decisions
Sustainability of the pace
Team's cognitive wear
Real impact for the user

What’s missing in the model

Team satisfaction: DORA doesn’t measure burnout, turnover, or mental burden — only output.
Code quality: Indicators don’t tell if code is maintainable, testable, or comprehensible.
Business value: Frequent deployment doesn’t guarantee what’s being delivered solves real problems.
Developer Experience: Friction, clarity, tools, and processes remain completely invisible.
Organizational cost: The political, social, and emotional effort to maintain the numbers doesn’t show.
Context and choices: DORA doesn’t help decide if speed is the right goal for your system.

The consequence of ignoring limits

When organizations treat DORA as a complete approach — instead of one lens among several — they create systems that optimize for indicators while wearing out people, accumulating technical debt, and producing changes nobody asked for.

These gaps aren’t bugs. They’re inherent characteristics of any model that tries to reduce organizational and human complexity to comparable numbers.

The complementary approach

DORA works best as a starting point, not as complete truth. It forces important conversations about flow, capabilities, and continuous delivery. But it needs to be complemented with other lenses — SPACE, DevEx, qualitative conversations with the team — to create an honest view of what’s happening.

Capabilities first, indicators second — but with eyes open

If there’s one thing DORA work makes clear, it’s that indicators are consequence, not cause. But there’s another truth, less comfortable: measurements are also political instruments, not just technical ones.

You don’t improve deployment frequency by creating a dashboard. You improve by investing in automation, tests, trunk-based development, and a culture of experimentation.

You don’t reduce MTTR just by asking people to be faster. You reduce it with observability, clear runbooks, automated rollback, and blameless culture.

The sequence

Implement the capabilities. Observe the results. Use measurements to validate progress, not to force behavior. And always ask: what aren’t these indicators showing?

The complete truth about DORA

DORA measures how well a team can change software in production. It doesn’t measure:

Whether those changes make sense
How much it costs to sustain them
What’s being sacrificed to maintain them

This doesn’t make DORA useless. It makes it incomplete — and potentially dangerous when used as the only lens.

The 24 capabilities aren’t a checklist. They’re investments that compete for time, attention, and resources with features, deadlines, and short-term pressures. And in this competition, numbers frequently win — because they’re easier to measure than real value.

Living with DORA consciously

If your organization uses (or requires) DORA indicators, you don’t need to reject them. But you need to use them consciously:

Always ask:

Are we measuring because we want to understand — or because we need to report?
Who benefits if these numbers improve?
What would be invisible if we only looked at indicators?
Does our context really benefit from more speed?

Always complement:

DORA indicators with regular conversations about wear and satisfaction
Quantitative dashboards with qualitative observation
Delivery numbers with business value perception
External comparisons with understanding of your own context

And remember:

The question isn’t whether it’s worth investing in capabilities. The question is: are you willing to question the indicators that justify (or don’t justify) that investment?

Notas de Rodape

[2]
Goodhart’s Law states that “when a measure becomes a target, it ceases to be a good measure”. Formulated by economist Charles Goodhart in 1975, this law describes how systems adapt to optimize metrics instead of real objectives — a phenomenon widely documented in economics, social sciences, and increasingly in software engineering.
[3]
Westrum, Ron. A typology of organisational cultures. BMJ Quality & Safety, 2004. The article presents the typology of organizational cultures (pathological, bureaucratic, generative) based on how organizations process information. The model was later incorporated into DORA research as one of the capabilities predictive of high performance in software delivery.