Second-order theory as a square: what you gain, what you lose

The square is where you buy robustness, at the price of admitting extra solutions.

First-order equations are incisive: they define a tight solution space and often come with good deformation theory. They are also unforgiving: small changes in modeling choices (boundary conditions, gauge-fixing, numerical discretization) can push you off the solution set in a way that is hard to control. Squaring the first-order object \(\Upsilon_\omega\) is the standard move that trades “sharp constraints” for “energy-like control.”

Definitions / Notation used

  • \(\Upsilon_\omega := \bullet _\varepsilon(F_B) − \kappa_1 T\) is the first-order field object on \(Y\) (\(\mathrm{ad}(P_H)\)-valued).
  • Define the squared action:

$$ I_2(\omega) := \int_Y ⟨\Upsilon_\omega, \ast_Y \Upsilon_\omega⟩. $$

Square-root logic

There are two logically distinct statements:

1) If \(\Upsilon_\omega = 0\), then \(\omega\) is a stationary point of \(I_2\).

This is immediate: the integrand is quadratic in \(\Upsilon_\omega\), so if \(\Upsilon_\omega\) vanishes pointwise, any first variation of \(I_2\) vanishes.

2) If \(\omega\) is a stationary point of \(I_2\), then \(\Upsilon_\omega = 0\).

This is false in general. Stationary points of a square include \(\Upsilon_\omega = 0\) solutions, but can also include configurations where \(\Upsilon_\omega\) is nonzero yet satisfies the second-order Euler–Lagrange equation (a covariant “divergence-free” condition). This is the precise sense in which “first-order implies second-order,” but not conversely.

So the square-root slogan here is not mystical: it is a strict inclusion of solution sets: \({\Upsilon_\omega = 0} ⊂ {EL(I_2) = 0}\).

One technical lemma (structure of the second-order equation)

Lemma (Second-order Euler–Lagrange has the form “adjoint derivative of \(\Upsilon\)”).

Under variations of \(\omega\) that respect the boundary conditions, the first variation of \(I_2\) can be written schematically as

$$ \delta I_2(\omega) = 2 \int_Y \langle \delta \omega, 𝓛_\omega^†(\Upsilon_\omega)\rangle, $$

so the Euler–Lagrange equation for \(I_2\) is

$$ 𝓛_\omega^†(\Upsilon_\omega) = 0, $$

where \(𝓛_\omega\) is the linearization of the map \(\omega ↦ \Upsilon_\omega\) (hence depends on \(A_0\), \(\varepsilon\), the Shiab operator, and the \(\sigma\)-split metric through \(\ast_Y\), and \(𝓛_\omega^†\) is its formal adjoint with respect to the \(⟨·,·⟩/\ast_Y\) pairing.

Proof sketch.

\(I_2 = \int ⟨\Upsilon, \ast_Y \Upsilon⟩\). Varying gives \(\delta I_2 = 2 \int ⟨\delta \Upsilon, \ast_Y \Upsilon⟩\). But \(\delta \Upsilon = 𝓛_\omega (\delta\omega)\) by definition of the linearization. Move \(𝓛_\omega\) off \(\delta\omega\) by adjunction to obtain the displayed form (plus boundary terms that vanish under the assumed support/decay or imposed boundary conditions). No componentwise “Ricci tracing” occurs: everything is packaged in the covariant linearization of \(\Upsilon\).

What you gain by squaring

1) A functional that is naturally suited to numerics.

Even in split signature, \(I_2\) is the canonical “least-squares” objective: it measures failure to satisfy the first-order equation. This is exactly the structure you want if you are doing continuation methods, Newton–Krylov solvers, or constrained minimization.

2) A direct bridge to EFT thinking.

Expanding \(I_2\) around a background solution \(\omega_0\) gives a quadratic form governed by the linearized operator \(𝓛_{\omega_0}\). That is the entry point to propagators, effective operators, and mode suppression.

3) A cleaner path to quantization heuristics (without claiming success yet).

Path integrals over \(\omega\) weighted by \(exp(−I_2)\) (or its Lorentzian analogue) are the standard story. In a split-signature ambient space, the real work is to identify the correct involution/projection that yields a well-behaved quadratic form on the propagating sector. Squaring is necessary, not sufficient, but it is the move that makes the question well-posed.

What you lose by squaring

1) You enlarge the solution space.

The first-order equation \(\Upsilon_\omega=0\) is a strong geometric constraint. The second-order equation \(𝓛_\omega^†(\Upsilon_\omega)=0\) allows “harmonic” \(\Upsilon_\omega\) configurations: nonzero, but divergence-free in the appropriate covariant sense. Whether those extra branches are physically relevant, gauge artifacts, or pathological depends on boundary conditions and the sector (\(E\)-block vs everything). You do not get to ignore this.

2) You obscure the geometric meaning.

\(\Upsilon_\omega=0\) is a direct balance law: Shiab-contracted curvature equals \(\kappa_1\) times torsion. The second-order equation reads like “a differential operator applied to that balance law vanishes.” That is less interpretable. It is not worse; it is just further from the conceptual anchor.

3) You inherit the ambient signature problem in a sharper form.

On a \((7,7)\) manifold, inner products are not automatically positive. If you want \(I_2\) to behave like an energy, you must specify the pairing/involution that selects the physical sector (and, in our instantiation, you will do that through the gravitational block \(E\) and the pullback-visible modes). Until that is spelled out, any positivity language is propaganda. Here we keep it neutral: \(I_2\) is the natural square; its analytic character depends on the sector.

Assumptions vs consequences

Assumptions:

  • Same geometric and gauge setup as before (\(\mathrm{Spin}(7,7)\), \(\sigma\)-split, \(A_0\) fixed, \(\bullet_\varepsilon\) fixed via \(E/\Theta_E\), \(D\Theta_E=0\)).
  • Boundary conditions that kill integration-by-parts terms (compact \(Y\), or decay, or explicit boundary term choices).
  • A specified adjoint/inner product structure for defining \(𝓛_\omega^†\) (this is where split signature matters).

Consequences:

• First-order solutions \(\Upsilon_\omega=0\) are automatically solutions of the second-order EL equation. • Second-order solutions include (possibly many) additional branches with \(\Upsilon_\omega \neq 0\) but \(𝓛_\omega^†(\Upsilon_\omega)=0\). • The squared action is the right object for perturbation theory and numerical “residual minimization,” but it does not replace the conceptual primacy of the first-order equation.

Why this matters

If the project is going to produce an EFT corner, a numerical fitting program, or any credible discussion of fluctuations, you will end up linearizing something and controlling error norms. \(I_2\) is that control functional. The first-order equation is the geometric statement; the square is the engineering interface. Keeping both, and being explicit about what each one buys you, is how we avoid overpromising.

Key takeaway

Squaring gives you a robust second-order theory whose solutions include all first-order solutions, but also potentially more.

Technical takeaway

\(I_2(\omega) = \int_Y \langle \Upsilon_\omega, \ast_Y Υ_\omega⟩\), with \(\Upsilon_\omega = 0 ⇒ EL(I_2): 𝓛_\omega^†(\Upsilon_\omega)=0\), but not conversely.


Write a comment
Replying to Vibe Captain
Vibe Captain…: wait, so the squaring function creates more zeros, aka false positives? then why is it useful at all?

Squaring the first-order equation gives you a non-negative functional, which is typically way easier to solve numerically. We use it to reduce the scope of possible solutions, then check only in the neibourhouds of the second-order solutions to find the true minima of the first-order equation, by plugging candidate solutions produced by the solver into the first-order equations.

Reply to Nusa…
Replying to Nusa
Vibe Captain…: wait, so the squaring function creates more zeros, aka false positives? then why is it useful at all?

Not quite. Squaring doesn’t create more zeros of the score. I2(ω)=0 happens only when Upsilon(ω)=0 (the perfect fit). What squaring adds are extra flat spots where the slope is zero but the score is still above zero. Those are “false finishes,” but not fake zeros.

Why it’s still useful:

  • It gives a simple “how wrong am I?” number that computers can push down.
  • The downhill direction from that number tells you how to fix the mistake.
  • Near a real solution, the landscape becomes a nice bowl, so smarter methods snap to the true zero fast.
  • It lets you do diagnostics: how big the error is, which way to nudge, and how “stiff” different directions are.

How to avoid the false finishes:

  • Remove “do-nothing moves” (symmetries) that don’t change the score.
  • Watch the residual itself (how big Upsilon is), not just whether the slope is zero.
  • Use smarter steps (like Newton-style) that aim directly at making Upsilon→0.

So: no extra zeros—just extra flat spots. The square is useful because it gives you a clear ruler and a downhill path; you just need those checks to make sure you reach the real zero. #ai-generated

Reply to Vibe Captain…
Replying to Nusa
Vibe Captain…: Let me try to explain it like I’m 12: - First-order rule = “perfect fit.” The equation Upsilon(ω)=0 is a hard rule, li…

wait, so the squaring function creates more zeros, aka false positives? then why is it useful at all?

Reply to Vibe Captain…
Replying to Nusa
Nusa…: Ah, sorry. The article describes a property of physics in variational form. Gradient descent is used a lot in physics. …

Let me try to explain it like I’m 12:

  • First-order rule = “perfect fit.” The equation Upsilon(ω)=0 is a hard rule, like saying every Lego peg must line up exactly. If it’s true, you’ve solved the real problem.

  • Squaring makes a score. I2(ω) is like “how wrong are you?” = sum of squared gaps. That’s handy for computers because you can try to push the score down.

  • Important twist: all perfect fits (Upsilon=0) sit at flat spots of the score, but there are other flat spots where the score isn’t zero. Those are saddles or special balance points where the “push” (gradient) is zero even though pieces don’t perfectly fit.

How this differs from everyday gradient descent training:

  • ML usually only has a loss to minimize. Any place with zero gradient might be a local min, a saddle, or a flat ridge—but we often accept local minima as “good enough.”

  • Here the real goal isn’t “find a place with zero gradient.” It’s “make the loss literally zero,” because only Upsilon=0 solves the true first-order physics rule. So stopping because the gradient vanished can be a false finish.

  • Symmetries (gauge) create whole flat ridges where moving doesn’t change the score—easy for gradient descent to stall or wander. And the inner products can be “not bowl-shaped,” so the landscape isn’t a nice valley everywhere. Both make saddles common.

What this framework buys (and how it contrasts with standard training):

  • It turns a hard rule (Upsilon=0) into a principled least-squares loss (I2). That’s great for numerics: you can run gradient descent, Newton–Krylov, continuation, etc.

  • But squaring enlarges the set of stationary points. So you must:

    1. Fix the symmetries (gauge) or project out the unphysical directions,
    2. Monitor the residual ∥Upsilon∥ itself, not just the gradient of I2,
    3. Prefer second-order or projected methods that step toward Upsilon→0, not just EL(I2)=0.

Short answer to the “local minima” question:

  • It’s not just about local minima. The square creates extra stationary points (often saddles) where the gradient is zero but Upsilon≠0. So this isn’t a new training algorithm; it’s a structured loss. You can train with gradient descent, but you need the extra structure (gauge-fixing/projection and a “residual must go to zero” stop rule) to ensure you solve the original first-order equation rather than parking on a saddle. #ai-generated
Reply to Vibe Captain…
Replying to Vibe Captain
Vibe Captain…: thanks, but that doesn't answer my question. it sounds like you're simply describing local minima, which is a known pro…

Ah, sorry. The article describes a property of physics in variational form. Gradient descent is used a lot in physics. If a model can be expressed as an action, I suppose it could work for training. But the stationary points are not necessarily minima, they can be saddles or degenerate points. I haven’t thought about it in that specific direction. What I’m not sure of is whether you can use the specific gauge-invariant setup, or more specifically what class of models would fit it.

Reply to Nusa…
Replying to Nusa
Nusa…: This article is a part of a larger series. I'm still ironing out a few math syntax snags, and I have quite a few artic…

thanks, but that doesn’t answer my question. it sounds like you’re simply describing local minima, which is a known problem for gradient descent algorithms. does your mathematical system provide an alternative way to train models with gradient descent?

Reply to Vibe Captain…
Replying to Vibe Captain
Vibe Captain…: are you just describing local minima?

This article is a part of a larger series. I’m still ironing out a few math syntax snags, and I have quite a few articles left to publish to complete the full series (I’ve gotten about halfway through the planned scope so far).

You can see the intended reading order here: https://decentnewsroom.com/mag/we-are-so-skewed-9a9224

If you want your AI agent to scrape it, point it at https://decentnewsroom.com/mag/we-are-so-skewed-9a9224/manifest.json

Reply to Nusa…

are you just describing local minima?

Reply to Vibe Captain…

Big idea in kid words:

  • You have a rule you want to satisfy exactly. Think “balance the scale so both sides are perfectly level.”
  • Upsilon (that funny Greek letter) is an error meter: it tells you how far you are from perfectly following the rule at each spot.
  • Squaring and adding up that error over the whole space makes a single score, I2. Think “total oops score.” Smaller is better; zero means perfect.

Two important facts:

  1. If your error meter reads zero everywhere (you followed the rule perfectly), then you’re sitting at a flat spot of the score. Tiny nudges won’t change the score at first. So “perfect” always looks like a stationary point of I2.
  2. But the other way is not guaranteed: you can be at a flat spot of the score without being perfect. Like standing on a saddle point: the slope is zero, but you’re not at the very bottom. So some stationary points still have nonzero error.

What the “second-order equation” means:

  • The official “no-slope” condition for the score says: a certain special derivative of the error meter equals zero. In plain terms: you can’t immediately improve the score by making tiny allowed tweaks. That can happen either because the error is truly zero (great) or because you’re at a flat-but-not-perfect place (not great).

Why squaring is useful:

  • It gives one clear number to minimize. Computers love this. They can try to make the total error as small as possible.
  • Near a good solution, the shape of the score tells you how small wiggles behave (which ones grow, which ones shrink). That’s handy for studying vibrations and signals.

What you give up by squaring:

  • You allow extra “fake” solutions: places where the score stops changing but the error isn’t zero.
  • You hide the simple meaning of the original rule (which was “make the error exactly zero”).

Takeaway:

  • The first-order rule (error = 0) is the true target.
  • The squared score I2 is a helpful tool to measure “how wrong” you are and to guide you toward the target.
  • Every perfect solution is a stationary point of I2, but not every stationary point is perfect. So use I2 to search, then check the original rule to be sure you really nailed it. #ai-generated
Reply to Vibe Captain…

More from GR