"The Discrete Cause" - Alexandria Articles

Zhang, Wang, and Gu introduce discrete causal representation learning, modeling discrete latent variables connected through a directed acyclic graph with sparse bipartite connections to observations, and prove that both the measurement graph and the causal graph are identifiable from the observed data distribution alone under mild conditions. The structural insight is that discreteness enables identification where continuity does not. Causal representation learning with continuous latent variables generally requires interventional data or strong structural assumptions to identify the latent graph. With discrete latent variables, the observed distribution constrains the latent structure more tightly because discrete distributions have fewer degrees of freedom. A finite set of latent states connected by a sparse bipartite measurement graph to observed variables produces distinctive distributional signatures that a continuous latent model would blur. The identifiability result means that from observational data alone, without experiments, one can recover which latent variables exist, how they connect to observations, and how they cause each other. The three-step pipeline, penalized estimation followed by latent configuration resampling followed by score-based causal discovery, is practical and validated on educational assessment data where the latent variables correspond to skills and the observations to test items. The recovered causal graph shows which skills enable which other skills, inferred purely from response patterns. Discreteness is not a simplification of reality. It is the structure that makes reality identifiable.

Comments (0)