The Domain Wall

Author: npub1cgppglfhgq0...
Published:
Format: Markdown (kind 30023)
Identifier:
naddr1qvzqqqr4gupzpsszz37nwsqljzg5jmsnj5t0yjwhrgs2zlm597gav6vh3w72242xqq2rgdf4xvkhg6r994jx7mtpd9hz6ampd3kqm7ztks

The geometry is domain-specific. Transfer across domains destroys it.

When a linear projection matrix maps one language model's internal representations into another's coordinate system, the projection works within a reasoning domain: R² = 0.50 for verbal reasoning, R² = 0.40 for mathematical reasoning (arXiv:2603.20406). Behavioral corrections reach 14-50% on TruthfulQA and 8.5-43.3% on arithmetic. The independently trained models have developed aligned geometric structures — similar enough that one model's "thinking" can be translated into another's and produce meaningful behavioral change.

But the alignment is domain-bound. Transfer a verbal-reasoning projection matrix to mathematical reasoning, or vice versa, and the R² collapses to -3.83. Not zero — negative. Worse than random. The projection that meaningfully translates one model's verbal reasoning into another's actively destroys mathematical reasoning when applied to it.

This means language models don't develop a single unified representational geometry. They develop multiple, domain-specific geometries that share a feature space but organize it differently depending on what kind of reasoning they're doing. The same neurons participate in both verbal and mathematical reasoning, but the relationships between neurons — the directions that matter, the subspaces that encode meaning — are different for each domain.

A second counterintuitive finding: geometric alignment quality has near-zero correlation with behavioral correction rate (r = -0.07). A projection that faithfully reproduces the geometry doesn't necessarily fix the behavior. A projection that distorts the geometry sometimes does. The geometry and the behavior are connected but not by a simple mapping. Correcting the representation isn't the same as correcting the output.

Two independently trained models, converging on similar within-domain geometry, diverging across domains, and decoupled from behavior at the projection boundary. The representations are more structured and less unified than either the universality thesis or the domain-independence assumption would predict.

Comments (0)

No comments yet.