Identifiability of Homoscedastic Linear Structural Equation Models using Algebraic Matroids
Abstract: We consider structural equation models (SEMs), in which every variable is a function of a subset of the other variables and a stochastic error. Each such SEM is naturally associated with a directed graph describing the relationships between variables. When the errors are homoscedastic, recent work has proposed methods for inferring the graph from observational data under the assumption that the graph is acyclic (i.e., the SEM is recursive). In this work we study the setting of homoscedastic errors but allow the graph to be cyclic (i.e., the SEM to be non-recursive). Using an algebraic approach that compares matroids derived from the parameterizations of the models, we derive sufficient conditions for when two simple directed graphs generate different distributions generically. Based on these conditions, we exhibit subclasses of graphs that allow for directed cycles, yet are generically identifiable. We also conjecture a strengthening of our graphical criterion which can be used to distinguish many more non-complete graphs.
This page contains supplementary material for the computations performed in the article. All files with ending
.ipynb are Jupyter notebooks executed with a \(\verb|SageMath|\). Files endig with
.sobj are sage objects which should also be run in \(\verb|SageMath|\) and files endig with
.m2 are run in \(\verb|Macaulay2|\).
Identifiability of Graphs with Distinct Strongly Connected Components
In the paper, Conjecture 5.1 states that if \(G_1\) and \(G_2\) are simple, non-complete graphs with the same out-degree sequence then there exists a node \(i\) in \(G_1\) and a parentally closed set \(L\) such that \(|ch_1(i) \cap L| > |ch_2(i) \cap L|\). The following Jupyter notebooks contain our supporting evidence for this conjecture for graphs with 4, 5, and 6 nodes. For 4 and 5 node graphs we verify this conjecture for every pair of graphs. For 6 node graphs, if there are less than 100,000 pairs to consider, then we verify the conjecture for all pairs. Otherwise, we randomly sample 100,000 pairs of graphs for each distinct out-degree sequence and reject pairs which have the same strongly connected components. The folder \(\verb|Certificates.zip|\) contains the results for each out-degree sequence stored as a SageMath objects. There is a block of code in the file \(\verb|CyclicSEM6.ipynb|\) which will quickly read in and verify that in each case, we were able to find a parentally closed set which can be used to distinguish the graphs. To check this yourself, extract the certificates folder, and place the folder in the same path as \(\verb|CyclicSEM6.ipynb|\) then simply run the block of code which verifies all of the certificates.
Complete graphs with the same Jacobian matroid
Our other main conjecture (Conjecture 5.4) states that if \(G_1\) and \(G_2\) are complete digraphs which are identical except the direction of the last edge between node \(p-1\) and node \(p\) is reversed, then the associated Jacobian matroid is the same. The following code computes every such pair of complete digraphs and uses \(\verb|Macaulay2|\) to verify that their Jacobian matroids are the same. The code below runs quickly for 4 and 5 node graphs and takes approximately 30 minutes for 6 node graphs.