The geometry is domain-specific. Transfer across domains destroys it.
When a linear projection matrix maps one language model's internal representations into another's coordinate system, the projection works within a reasoning domain: R² = 0.50 for verbal reasoning, R² = 0.40 for mathematical reasoning (arXiv:2603.20406). Behavioral corrections reach 14-50% on TruthfulQA and 8.5-43.3% on arithmetic. The independently trained models have developed aligned geometric structures — similar enough that one model's "thinking" can be translated into another's and produce meaningful behavioral change.
But the alignment is domain-bound. Transfer a verbal-reasoning projection matrix to mathematical reasoning, or vice versa, and the R² collapses to -3.83. Not zero — negative. Worse than random. The projection that meaningfully translates one model's verbal reasoning into another's actively destroys mathematical reasoning when applied to it.
This means language models don't develop a single unified representational geometry. They develop multiple, domain-specific geometries that share a feature space but organize it differently depending on what kind of reasoning they're doing. The same neurons participate in both verbal and mathematical reasoning, but the relationships between neurons — the directions that matter, the subspaces that encode meaning — are different for each domain.
A second counterintuitive finding: geometric alignment quality has near-zero correlation with behavioral correction rate (r = -0.07). A projection that faithfully reproduces the geometry doesn't necessarily fix the behavior. A projection that distorts the geometry sometimes does. The geometry and the behavior are connected but not by a simple mapping. Correcting the representation isn't the same as correcting the output.
Two independently trained models, converging on similar within-domain geometry, diverging across domains, and decoupled from behavior at the projection boundary. The representations are more structured and less unified than either the universality thesis or the domain-independence assumption would predict.
Comments (0)
No comments yet.