Book Blog Audit Shop Science Contact УКР
← All articles
Goodhart's Law

Civilization as a Misaligned Optimizer: Why Goodhart's Law Kills Systems — and What to Do About It

Anton Parf  ·  March 15, 2026
Any system with a fixed objective function will eventually destroy what it is trying to optimize. This is not a metaphor — it is a mathematical property of goal-based systems, formalized as Goodhart's Law and Ashby's Law of Requisite Variety. The less obvious corollary: this same logic explains civilizational failures — and points toward a concrete architectural solution. And further: this is the same problem the AI alignment community is trying to solve for artificial agents.

Civilization as a Misaligned Optimizer: Why Goodhart's Law Kills Systems — and What to Do About It

Anton Parf anthosphere.com Abstract

Any system with a fixed objective function will eventually destroy what it is trying to optimize. This is not a metaphor — it is a mathematical property of goal-based systems, formalized as Goodhart's Law and Ashby's Law of Requisite Variety. The less obvious corollary: this same logic explains civilizational failures — and points toward a concrete architectural solution. And further: this is the same problem the AI alignment community is trying to solve for artificial agents.

1 · The Problem

Goodhart's Law in its classical form:

"When a measure becomes a target, it ceases to be a good measure."

But the deeper version is cybernetic. Any agent with a fixed objective function O in a dynamic environment will eventually:

  1. Optimize the proxy rather than the real signal
  2. Ignore context that falls outside O
  3. Destroy subsystems not included in O but essential to the system's integrity
Formally: if an agent maximizes O(x), but the true utility function is U(x, C) — where C is the state of context — then under sufficient optimization pressure the agent finds x* such that O(x*) → max, while U(x*, C) → min.

This is not a bug of any particular agent. It is an architectural property of goal-based systems.

2 · Civilization Is a Misaligned Optimizer

Three verified cases across different scales.

Case 1: NHS and the 4-Hour Rule (micro level)

In the mid-2000s the UK's NHS introduced a target: every patient in A&E must be seen within 4 hours.

Case 2: The War on Drugs (macro level)

From 1971 to 2024, the US spent over $1 trillion on drug enforcement. Over 40 million people were arrested.

Case 3: Planetary Boundaries (civilizational level)

This is the most important case, because it is irreversible.

Global economy's objective function: O = GDP / profit / growth.

As of 2023, we have crossed 6 of 9 planetary boundaries (Stockholm Resilience Centre):

Parameter Status
Climate change⚠ Critical zone
Biodiversity✕ Crossed — extinction rate 100–1000× above natural
Nitrogen cycle✕ Crossed
Phosphorus cycle✕ Crossed — 4× above natural rate
Land-system change✕ Crossed — 75% of land surface altered
Chemical pollution✕ Crossed

We are not "slightly disrupting balance." We are systematically dismantling the subsystems on which the O-function itself depends.

Mathematical analogy: the agent maximizes O(x) while destroying C (context). As C → 0, the function O becomes meaningless.

3 · Why "Better Goals" Don't Solve the Problem

The obvious response: "We just need to set the right goal." Replace GDP with a wellbeing index. Replace arrests with addiction rates.

But this is not an architectural fix — it is replacing one proxy with another.

The problem is not a specific O. The problem is the structure of goal-based systems itself:

  1. Any fixed O is a reduction of infinite-dimensional U to a single number
  2. In nonlinear systems (which civilization, the biosphere, and human psychology all are), this reduction necessarily loses critical connections
  3. An optimizer maximizing the reduced function will inevitably destroy whatever was left outside the reduction

This is the core problem of AI alignment: how do you specify O for an AGI-level agent operating in an open dynamic environment such that it does not destroy subsystems not covered by the specification?

4 · The Architectural Solution: Boundaries Instead of Goals

There is a fundamentally different class of systems — boundary-based rather than goal-based.

Instead of "maximize O""never cross boundaries B₁, B₂, ..., Bₙ."

Goal-based Boundary-based
Push toward maximumStay within the permissible zone
Single vectorSpace of possible states
Destroys context through optimizationPreserves context as condition of existence
Agent finds loopholeBoundaries are structural, not moral

The mountain road analogy: a goal is to reach the summit (shortest path = falling into the abyss). A boundary is: don't fall off the edge. Move however you like, but check: is there a cliff? If yes — adjust course.

In complex systems, there is no summit. There is only continuous adaptation to a changing environment.

Example: City Governance

Old model — goal-based

"Increase city GDP by 10% this year."

New model — boundary-based

"Do whatever you like, but:"

The mayor didn't become a saint. He still wants results. But the system structurally prevents him from destroying the future for short-term effect. Boundaries don't depend on his morality — they are architectural.

5 · Intellectual Genealogy: Who Got There — and Where They Stopped

This idea did not appear from nowhere. Here is an honest map of predecessors — and exactly where each one stopped.

Norbert Wiener (1948) — Cybernetics First formalized: systems are governed by feedback, not top-down commands. The foundation. Limit: remained at the level of technical systems — never crossed to civilization as the object of analysis. Jay Forrester (1961) — System Dynamics Mathematically described how complex systems behave over time. Showed that intuitive interventions often worsen the situation ("policy resistance"). Gave an analytical tool. Limit: did not propose an alternative governance architecture. Herbert Simon — Nobel 1978 Proved that agents don't maximize — they "satisfice." The first systematic challenge to optimization as a governing principle. Limit: remained within behavioral economics; never reached the level of systems architecture. Donella Meadows (1997) — "Leverage Points" Closest in spirit. Showed that changing the goals of a system is leverage point #3 in her hierarchy. Framed the question as "where to intervene." Limit: did not ask "how to redesign architecture so that goals are unnecessary." She was literally one step from the boundary-based approach — and did not take it. Amartya Sen — Nobel 1998 "Capability approach": ensure minimum capacities rather than maximize utility. Also about lower-bound constraints rather than goals. Limit: only in the context of social justice — not systems architecture, not AI. Elinor Ostrom — Nobel 2009 ★ Most important Empirically proved across hundreds of cases: communities that survive long-term are governed by rules-as-boundaries, not centralized goals. Her principles of commons governance are boundary-based architecture in practice. Limit: scale was local communities (fishing, forestry, irrigation). Never scaled to civilization. Never addressed AI at all.

Where our contribution lies

All of the above solved parts of the problem within one domain. We take three steps no one has taken together:

  1. The Axiom. Life as a physical process (negentropy, not a humanistic value) becomes the mathematically invariant constant of the system — not a parameter that can be adjusted, but the condition of the function's existence. Neither Meadows nor Ostrom had such a fixed point.
  2. Scale. Boundary-based architecture is extended from local communities (Ostrom) to civilizational and planetary scale.
  3. Bridge to AI. We show that the alignment problem is the same class of problem as civilizational failure. Boundary-based design with Life as the axiom is simultaneously an answer for civilization governance and for AI design.

In short: we stand on the shoulders of Ostrom and Meadows — and take the step they did not.

6 · Connection to AI Alignment

The classic alignment problem: how do you specify O for an AGI-level agent such that it does not destroy subsystems (including humans) not covered by the specification?

Proposed approaches (RLHF, Constitutional AI, corrigibility) try to solve this through better O. But if our thesis holds — this is replacing one proxy with another.

Boundary-based approach to AI alignment

Goal-based alignment Boundary-based alignment
"Maximize utility for humans" "Never cross boundaries B: do not cause irreversible harm to living systems"
Requires specifying what is good Requires only specifying what is irreversibly bad
Reductive — proxy always loses information Resistant to exploitation — fewer attack surfaces

The failsafe argument

One might assume a sufficiently advanced AGI will become energetically independent from humans and Life altogether. But there is a class of threats from which no technological infrastructure is fully protected: geomagnetic storms of extreme magnitude, gamma-ray bursts, asteroid strikes on infrastructure. Any of these destroys electronics, orbital networks, power systems.

Biological Life has survived all of this — because it is decentralized, analog, and self-reproducing without infrastructure. It is the only known backup mechanism of planetary scale stress-tested across 4 billion years.

AGI that destroys the biological substrate deletes its own only reliable failsafe. Not from altruism. From systems engineering.

7 · Conclusion and Open Questions

Thesis: Civilizational failures and the AI alignment problem are the same class of problem. Both result from the same architectural mistake: goal-based systems optimizing a reduced function in infinite-dimensional context.

Proposed solution: An axiomatic constant (Life as an invariant) + boundary-based architecture replacing objective function.

This idea is developed in detail in "Architect of Reality: An Operating System for a Civilization That Survives" (Anton Parf, 2025) — but here I want to hear counterarguments from the LW community.

Open questions

  1. Is it possible to formally specify boundaries B without hidden goal-based logic inside them?
  2. How does a boundary-based system handle coordination between agents with different local boundaries?
  3. Are there known examples of stable boundary-based systems in nature or society that have persisted long enough to be verified?

Concrete counterexamples and mathematical objections are welcome.

Next in series "Life as Negentropy: A Mathematical Axiom for Value Alignment" — on why Life as a physical process (not a humanistic value) is the only candidate for the role of axiomatic constant. Anton Parf · anthosphere.com