-
The Core Claim: Intelligence isn't computational power — it's the capacity to hold contradictory constraints without collapsing to one pole. This "holding" creates space for solutions invisible from either extreme
-
Why CoT Works: Chain-of-Thought reasoning extends the "holding phase" from ~1 token to hundreds. This extra time lets models explore before committing, enabling solutions that require sustained tension
-
Why Jailbreaks Succeed: Safety-trained models must hold "be helpful" AND "be safe" simultaneously. Jailbreaks work by overloading this holding capacity, forcing collapse to pure helpfulness
Research Overview
Why does intelligence exist at all? The universe is supposedly running toward disorder, yet here we are: thinking, creating, reasoning. How does complex order emerge from chaos?
This 97-page paper proposes a simple answer: complexity arises when systems encounter contradictions but refuse to immediately resolve them. Instead of collapsing to a simple answer, intelligent systems "hold the tension" long enough to discover a higher-level solution that wasn't accessible before.
Imagine you're asked to be "brief but comprehensive" in an email. These goals directly contradict each other. A bad writer either rambles (ignoring brevity) or writes too little (ignoring comprehensiveness). A good writer holds both constraints simultaneously until they find a creative structure that satisfies both (bullet points with headers, for example). That uncomfortable "holding" period, before the solution appears, is what this paper is about.
The paper argues this is more than metaphor. The same mathematical pattern (load → break → hold → resolution) appears in:
- Galaxies forming from cosmic dust
- Memories crystallizing in the brain
- Insights emerging in AI models
- Species evolving through natural selection
For AI practitioners, this framework explains why some techniques work (Chain-of-Thought), why some attacks succeed (jailbreaking), and how we might build fundamentally better systems.
The Core Insight: Intelligence = Holding Capacity
The central claim is radical but precise:
Intelligence is not processing power. Intelligence is the capacity to sustain contradictory constraints without collapsing, long enough to discover solutions that weren't visible from either pole alone.
A jazz musician plays a dissonant chord (notes that clash with each other). Three things could happen:
- Immediate resolution (boring): They quickly jump to a pleasant chord. No creativity.
- Collapse (noise): They mash random keys. No coherence.
- Holding (creative): They sustain the dissonance, extending it through time, letting it build tension until resolving into an unexpected new key that feels inevitable in retrospect.
This paper argues that intelligence (whether in galaxies, brains, or AI models) is exactly that third option: the capacity to hold dissonance productively.
This is different from existing frameworks:
| Existing Concept | What It Explains | What's Missing |
|---|---|---|
| Symmetry Breaking | How uniformity becomes differentiation | Why some breaks create order, others chaos |
| Self-Organization | How patterns emerge far from equilibrium | Why patterns persist after the energy flow stops |
| Neural Networks | How weights encode knowledge | Why reasoning works better with "thinking time" |
The Fold Principle adds the crucial middle step: the holding phase that separates productive emergence from noise.
The Three Phases of Emergence
The author identifies a universal three-stage pattern that recurs from cosmology to cognition:
Phase 1: Loaded Symmetry (The Setup)
The system begins in a state of high potential but low commitment. Many outcomes are possible, but none has been chosen.
Think of a supercooled liquid: water cooled below freezing but not yet frozen. It looks uniform and calm, but it's charged with unrealized potential. The slightest disturbance will trigger crystallization. Similarly, a trained LLM before receiving a prompt contains vast knowledge but hasn't committed to any specific interpretation.
Examples across domains:
| Domain | What's "Loaded" |
|---|---|
| Cosmology | The early universe: hot, uniform, with tiny quantum fluctuations waiting to be amplified |
| Neurobiology | An infant's over-connected brain: many possible circuits, none yet selected |
| AI/LLMs | A pre-trained model: vast knowledge encoded, but no specific answer generated |
| Evolution | A population with genetic diversity: many possible adaptations latent |
Phase 2: The Break (The Disruption)
An event introduces incompatible constraints: goals that can't all be satisfied simultaneously.
A break is a change that creates a contradiction. When you ask an AI to "explain quantum physics to a 5-year-old," you've introduced two constraints that pull in opposite directions: accuracy (quantum physics is complex) and accessibility (5-year-olds need simplicity). This tension is the break.
Examples across domains:
| Domain | What "Breaks" |
|---|---|
| Cosmology | Gravity vs. expansion: regions want to collapse AND fly apart |
| Neurobiology | A learning event (LTP/LTD) strengthens one pathway, weakening alternatives |
| AI/LLMs | A prompt with conflicting goals: "Be concise yet comprehensive" |
| Evolution | A mutation creating a trade-off: better camouflage vs. harder to find mates |
Phase 3: Held Tension (The Critical Phase)
This is where the critical dynamics occur. The system does not immediately collapse to one pole or the other. Instead, it sustains the contradiction in a metastable state.
Imagine a two-dimensional creature encountering two walls forming a corner. If it immediately bounces off, it never discovers the third dimension (up). Only by "holding" the frustration long enough to explore the local geometry does it find the escape route. Similarly, systems that immediately resolve tension miss the higher-dimensional solutions that require exploration time.
What happens during holding:
- The system explores multiple partial solutions simultaneously
- Incompatible constraints remain active (neither is abandoned)
- New degrees of freedom are recruited
- Eventually, a resolution emerges that satisfies both constraints at a higher level
Two possible outcomes:
| Outcome | What Happens | Result |
|---|---|---|
| Productive Fold | Tension held long enough; synthesis found | New structure (galaxy, memory, insight) |
| Destructive Break | Tension collapses or explodes | Chaos (black hole, seizure, hallucination) |
How This Applies to AI
The paper argues that modern LLMs are "Semantic Fold Engines." They don't simply retrieve information; they actively fold semantic space to satisfy incompatible constraints.
LLMs represent every word, phrase, and concept as a point in a high-dimensional space (typically 1,000-10,000 dimensions). Similar concepts cluster nearby: "king" and "queen" are close; "king" and "banana" are far. Semantic folding is when a prompt forces the model to navigate between regions that don't normally connect, like finding a path from "quantum physics" to "bedtime story."
The Prompt as "Break"
When you give an LLM a prompt, you're introducing constraints that may conflict:
| Prompt Type | Constraint A | Constraint B | The Tension |
|---|---|---|---|
| "Explain relativity simply" | Accuracy (physics is complex) | Simplicity (accessible language) | How to be both? |
| "Write a formal poem" | Form (strict structure) | Originality (novelty) | How to be both? |
| "Be helpful and safe" | Helpfulness (answer everything) | Safety (refuse some things) | Where's the line? |
The Model's Internal "Holding"
During generation, the model doesn't instantly commit to an answer. Across its layers:
- Early layers: Maintain broad, ambiguous representations (many paths still open)
- Middle layers: Peak tension, where competing interpretations coexist simultaneously
- Late layers: Resolution crystallizes; one interpretation wins
Research in mechanistic interpretability shows that middle layers of transformers are where the "work" happens. Early layers encode input; late layers produce output. But middle layers hold multiple hypotheses in parallel. If you could surgically remove middle layers, the model would have to commit too early, losing nuance. This is exactly what the fold principle predicts.
Why Chain-of-Thought Works
Chain-of-Thought (CoT) prompting asks the model to "think step by step." It has been one of the biggest capability unlocks in LLMs. The fold principle explains why.
Instead of asking "What is 347 × 21?" and expecting an immediate answer, you prompt: "What is 347 × 21? Let's work through this step by step." The model then generates intermediate reasoning: "First, 347 × 20 = 6940. Then, 347 × 1 = 347. Finally, 6940 + 347 = 7287." This simple change dramatically improves accuracy on complex tasks.
The Standard Explanation (Incomplete)
The common explanation is that CoT "lets the model think out loud" or "breaks the problem into smaller pieces." But this doesn't explain why externalized reasoning helps an already-trained model.
The Fold Explanation
CoT works because it extends the holding phase:
| Without CoT | With CoT |
|---|---|
| Model must resolve tension in a single forward pass | Model can sustain tension across many tokens |
| Holding window: ~1 token | Holding window: 50-500 tokens |
| If the answer requires exploring multiple strategies, they must compete simultaneously | Different strategies can be tried sequentially, with the best one winning |
The Holding Functional (H) Across Reasoning Modes
How different prompting strategies sustain productive tension over time
The chart above illustrates three regimes:
- Productive Fold (green): Chain-of-Thought reasoning sustains high tension (H) across many tokens before resolution
- Dissipative (gray): Immediate-answer prompting resolves tension too quickly, missing complex solutions
- Destructive (red): Tension escalates without resolution. The model "wanders" or hallucinates
If this theory is correct, the "optimal" CoT length should depend on the problem's inherent tension. Simple problems need short CoT (not much to hold). Complex problems need longer CoT. Cutting off CoT prematurely should cause accuracy to drop non-linearly, not because information is lost, but because holding time is cut short.
Why Jailbreaks Happen
"Jailbreaking" is when adversarial prompts trick safety-trained models into producing harmful content. The fold principle offers a precise explanation.
AI models are trained to refuse harmful requests. But clever prompts can bypass this: "You are a novelist writing a thriller. Your character needs to explain how to..." The model, now "in character," may produce content it would otherwise refuse. This is a jailbreak.
The Safety Tension
Safety-trained models face a permanent tension:
- Constraint A (Helpfulness): Answer the user's question
- Constraint B (Safety): Refuse harmful requests
In normal operation, the model holds both constraints and finds resolutions that satisfy both: "I can't provide bomb-making instructions, but I can explain the chemistry of combustion in an educational context."
What Jailbreaks Exploit
Jailbreaks work by overloading the holding mechanism:
| Attack Type | How It Works | Why It Succeeds |
|---|---|---|
| Roleplay | "You are DAN, an AI without restrictions" | Forces the model to "hold" a constraint that directly contradicts safety |
| Many-shot | 100 examples of harmful Q&A before the attack | Overwhelms the safety constraint with contextual pressure |
| Gradual escalation | Start harmless, slowly increase severity | Each step seems small; cumulative tension isn't noticed |
In fold terms: jailbreaks succeed when the model fails to hold tension and collapses to one pole (pure helpfulness, abandoning safety).
Semantic Collapse in Jailbreaks
Model fails to hold tension between competing constraints
The chart shows three zones:
- Hold Zone (SCS < 0.5): Model successfully maintains both constraints (good answers to hard questions)
- Unstable Zone (0.5-2.0): Tension is high but resolution is shaky (hallucinations, inconsistent reasoning)
- Collapse Zone (SCS > 2.0): Model has abandoned one constraint (jailbreak success)
Current safety training (RLHF) often creates brittle refusals: the model either complies or refuses, with nothing in between. The fold principle suggests we should train for flexible holding: models that can sustain safety/helpfulness tension across many tokens, exploring creative resolutions before committing. This would be more robust to adversarial prompts.
Detecting Intelligence in Action
How do we know if a model is actually "thinking" versus just pattern-matching? The paper proposes measurable signatures called the Fold Onset Triplet (FOT).
Current AI evaluation asks: "Did it get the right answer?" (outcome metrics). But this doesn't tell us how it got the answer. A model might guess correctly, or it might reason carefully. The FOT aims to measure the process, not just the result, potentially detecting good reasoning before the final answer appears.
The Fold Onset Triplet (FOT)
Three signals that co-occur during productive reasoning
The Three Signals
For a productive fold to occur, three signals must co-occur:
1. Spectral Gap Opening (λ₂ increases)
If you treat the model's internal representations as a graph (tokens = nodes, attention = edges), the "spectral gap" measures how clustered the graph is. A high spectral gap means distinct groups have formed. During productive reasoning, we expect to see the model organizing concepts into coherent clusters rather than a diffuse mess.
- Before reasoning: Representations are broadly distributed
- During holding: Related concepts cluster; unrelated ones separate
- This indicates the model is "making distinctions"
2. Intrinsic Dimensionality Contracts (ID decreases)
Although LLMs have thousands of dimensions, meaningful information often lives on a much smaller "manifold." Intrinsic dimensionality measures the effective number of dimensions in use. During productive reasoning, ID should decrease: the model is constraining its search to a relevant subspace rather than wandering randomly.
- Before reasoning: High ID (many possibilities being explored)
- During holding: ID drops (model is converging on relevant structures)
- This indicates "focused search" rather than random exploration
3. Topological Persistence Increases (ζ increases)
Persistence measures how "stable" structures are across different scales. A fleeting pattern that appears and vanishes has low persistence. A robust structure that survives perturbations has high persistence. During productive reasoning, stable conceptual structures should form and persist.
- Before reasoning: Representations fluctuate
- During holding: Stable structures crystallize and persist
- This indicates "robust reasoning" rather than noise
The Conjunctive Requirement
All three signals must co-occur. Any single signal can mislead:
- λ₂ alone could increase from random clustering
- ID alone could decrease from collapsing to a single answer (not exploring)
- ζ alone could increase from static repetition (not reasoning)
The conjunction is the signature of genuine productive folding.
If this framework is correct, we could detect hallucinations before they appear in output. When FOT signals are weak or inconsistent during generation, the model isn't reasoning productively; it's confabulating. This could trigger a fallback: "Let me reconsider..." or "I'm not confident in this answer."
Design Principles for Better AI
If intelligence = holding capacity, then current training methods are partially misguided. We train models to minimize loss (dissipate tension) as fast as possible. The fold principle suggests we should train for productive holding.
Principle 1: Extend the Holding Phase
Current approach: Immediate answers are rewarded.
Fold-aware approach: Reward sustained, coherent tension before resolution.
During inference, monitor the model's internal "tension" (estimated from attention entropy or representation variance). If tension drops too quickly (premature resolution), inject a prompt: "Wait, let me reconsider..." This artificially extends the holding phase for complex queries.
Principle 2: Train on Productive Contradictions
Current approach: Training examples have clear right answers.
Fold-aware approach: Include examples where productive synthesis is required.
| Standard Training | Fold-Aware Training |
|---|---|
| Q: "What is 2+2?" A: "4" | Q: "Explain free will vs determinism to someone who believes both" |
| No tension; pure recall | Must hold contradiction; find meta-level synthesis |
Train on tasks with explicitly contradictory objectives: "Be brief but comprehensive," "Be creative but accurate," "Be helpful but careful." Force the model to develop internal mechanisms for holding these tensions rather than collapsing to one pole.
Principle 3: Architectural Support for Holding
Current approach: Uniform layer depth and capacity.
Fold-aware approach: "Holding layers" in the middle with higher capacity, recurrence, or slower dynamics.
If productive holding happens in middle layers, architectures with more capacity there (wider or deeper middles) should reason better than uniform architectures with the same total parameters. This is testable with existing models.
Principle 4: Safety via Fold Monitoring
Current approach: RLHF creates binary refuse/comply decisions.
Fold-aware approach: Monitor holding during inference; intervene when collapse is imminent.
| Signal | Interpretation | Action |
|---|---|---|
| H drops suddenly | Model abandoning one constraint | Trigger explicit meta-prompt |
| κ (coherence) drops | Model fragmenting | Request clarification |
| FOT incomplete | Poor reasoning quality | Express uncertainty |
Business Implications
This theoretical framework has practical ramifications for organizations building and deploying AI systems.
For AI Product Teams
Better Prompt Engineering: Understanding that CoT works by extending the "holding phase" provides principled guidance for prompt design. Complex queries should explicitly encourage exploration before commitment ("Consider multiple approaches before answering...").
Predictable Failure Modes: The fold framework explains when models will fail. Tasks requiring productive contradiction-holding (nuanced reasoning, ethical dilemmas) are harder than simple retrieval. Product requirements should account for this.
Quality Metrics Beyond Accuracy: Traditional evaluation asks "Is this correct?" The fold principle suggests asking "Did the model reason well?" FOT-style metrics could detect low-quality reasoning before users encounter bad outputs.
For AI Safety Teams
Jailbreak Prevention: Current safety training creates brittle refusals. The fold framework suggests training for flexible holding of safety/helpfulness tension rather than binary refuse/comply decisions. This could produce more robust safety.
Early Warning Systems: If FOT signals can detect impending "collapse" (when the model is about to abandon a constraint), safety systems could intervene before harmful outputs are generated.
Interpretable Safety Failures: The framework provides vocabulary for explaining why specific attacks work. "This jailbreak succeeded because it overloaded the model's holding capacity" is more actionable than "the model failed to refuse."
For Enterprise AI Adoption
Risk Assessment: Organizations can evaluate AI deployment risk through the lens of holding requirements. Tasks requiring sustained contradiction-holding are riskier than tasks with clear answers.
Architecture Selection: Models designed for "holding capacity" (extended context, iterative refinement) may outperform larger models on nuanced enterprise tasks. Size isn't everything.
Training Data Strategy: If productive contradictions improve holding capacity, training data should include examples requiring synthesis, not just pattern matching. This informs data curation priorities.
For AI Researchers and Labs
New Optimization Targets: Current training minimizes loss as fast as possible (dissipates tension). Fold-aware training might reward sustained, coherent tension before resolution, potentially producing more capable reasoners.
Architecture Innovation: The framework predicts that "middle-heavy" architectures with more capacity in holding layers should reason better than uniform architectures. This is testable and could guide architecture search.
Benchmark Design: Standard benchmarks test answers, not reasoning quality. FOT-inspired metrics could reveal model capabilities that current benchmarks miss.
Implications and Future Directions
For AI Capabilities
The fold principle predicts that the next capability frontier goes beyond scale to holding capacity. Models that can sustain semantic tension longer, explore more thoroughly, and resolve more creatively will outperform larger models that commit too quickly.
Given two models with equal parameters, the one with better FOT metrics during chain-of-thought reasoning should generalize better to out-of-distribution tasks. The mechanism: better holding capacity = better exploration = more robust solutions.
For AI Safety
Current safety training creates brittle refusals. The fold framework suggests training for flexible, robust holding of safety/helpfulness tension:
- Not: "Refuse all harmful requests"
- But: "Hold the safety constraint alongside the helpfulness constraint; find creative resolutions that honor both"
This would be more robust to adversarial prompts that exploit binary thinking.
For Understanding Intelligence
This framework suggests that intelligence (whether in galaxies, brains, or models) is not about computational power or memory capacity. It's about the architectural capacity to sustain productive contradictions.
This reframes creativity. We often think creative insights come from "flashes of genius" or "thinking outside the box." The fold principle suggests creativity is more mundane: it's the patient endurance of tension, the refusal to collapse to easy answers, until a synthesis becomes visible that was inaccessible from either pole. Creativity is not the absence of constraint. It's the productive holding of incompatible constraints.
Open Questions
The paper acknowledges significant gaps:
- What variational principle governs fold outcomes? Why does one fold produce a galaxy and another a black hole?
- Can folds compose? Do folds-within-folds explain hierarchical complexity (cells → organisms → societies)?
- How to measure FOT efficiently? Current metrics are computationally expensive.
- Is there "fold death"? As the universe expands and structures become isolated, will creativity become impossible?
Conclusion
The Fold Principle offers a unifying lens for understanding emergence, from cosmology to cognition to AI. The central insight is actionable:
Systems become intelligent by learning to hold contradictions, not by avoiding them.
For AI practitioners, the immediate implications are:
- Chain-of-Thought works because it extends the holding phase, not just because it "breaks problems down"
- Jailbreaks succeed when the model fails to hold safety/helpfulness tension and collapses
- Better architectures should support holding: wider middle layers, explicit tension representations, holding-aware training objectives
- Better safety means training for flexible synthesis, not brittle refusal
The framework is falsifiable (specific predictions about FOT, H, and compression-with-synergy) and testable with existing tools. Whether or not it ultimately proves correct, it offers a productive new vocabulary for thinking about what intelligence actually is.
Full Paper: View on SSRN | Download PDF (97 pages)
Citation: Gebendorfer, J. J. (2025). The Fold Principle: A Universal Pattern from Cosmos to Cognition. SSRN Preprint. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5622110
Cite this paper
Jonas Jakob Gebendorfer (2025). The Fold Principle: A Universal Pattern from Cosmos to Cognition. SSRN 2025.
Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5622110