# Selfadhesivity in Gaussian conditional independence structures¶

ABSTRACT: Selfadhesivity is a property of entropic polymatroids which can be formulated as gluability conditions of the polymatroid to an identical copy of itself along arbitrary restrictions and such that the two pieces are independent given the common restriction.

We show that positive definite matrices satisfy this condition as well and examine consequences for Gaussian conditional independence structures. New axioms of Gaussian CI are obtained by applying selfadhesivity to the previously known axioms of structural semigraphoids and orientable gaussoids.

All data is archived in the MPG Digital Library Keeper at https://keeper.mpdl.mpg.de/d/fbfe463162e94a14ac28/.

## Installation of the tools¶

The computations of structural semigraphoids, orientable gaussoids and their selfadhesions described in the paper rely at their core on state of the art solvers for the boolean satisfiability problem (CaDiCaL and nbc_minisat_all) and on exact rational linear programming solvers (SoPlex). The solvers are connected to conditional independence structures through Perl modules. The source code for these modules is bundled with this MathRepo page. Download and unpack the following archive which contains the module packages:

These modules require `Modern::Perl 2018`

which translates to a Perl
version of 5.26 or later. To install them, you need a CPAN client like
cpanm (set it up to use `local::lib`

as described in the documentation).
All instructions below assume a Linux(-like) environment. Open a terminal
and execute

```
$ cpanm Keyword::Declare Devel::Gladiator
$ cpanm CInet-Alien-*
```

The SAT solvers are bundled with their respective `CInet::Alien`

module
and they are compiled and installed alongside the Perl modules. You need
to have prerequisite build tools (`make`

, C and C++ compiler, etc.)
available on your system. The binary are statically linked and some
operating systems require you to install special `-static`

versions of
the C and C++ standard libraries for this (on Fedora, for example, install
`glibc-static`

and `libstdc++-static`

as well as `zlib-static`

).

The license of SoPlex does not permit bundling its source code. You have
to install a version of SoPlex with arbitrary-length rational number support
(via the GMP library) and then continue installing `CInet::Bridge::SoPlex`

and the rest:

```
$ cpanm CInet-Bridge-SoPlex* CInet-ManySAT*
$ cpanm CInet-Base* CInet-Propositional* CInet-Polyhedral*
$ cpanm CInet-Gaussoids* CInet-Semimatroids* CInet-Adhesive*
```

If you do not use Perl much, then the whole installation procedure is likely to pull in a lot of transitive dependencies and is going to take a while. In case of errors, consult the log file pointed out by cpanm. It usually contains useful information as for which dependencies are missing or what the error exactly is.

## Enumeration and symmetric reduction of gaussoids¶

The initial computations of structural semigraphoid and orientable gaussoid candidates for selfadhesivity receive as input the set of gaussoids over a 5-element ground set reduced modulo isomorphy. This set can be computed as follows:

```
use CInet::Base;
use CInet::Gaussoids;
say for Gaussoids(5)->modulo(SymmetricGroup)->list;
```

This outputs a list of 508817 binary vectors of length 80. The format is the same as on https://gaussoids.de. The computation takes around 12 GiB of memory and 95 minutes on a single core at 2.20 GHz. The output is available for download:

This file and all other text files are quite large and are therefore compressed using the zstd algorithm. You have to install the zstd program and execute the following to unpack the file:

```
$ zstd -d gaussoids.txt.zst
```

## Structural semigraphoids and orientability¶

The Perl modules installed in the first step contain routines for testing whether a given CI structure is a structural semigraphoid or an orientable gaussoid:

```
use Modern::Perl 2018;
use CInet::Base;
use CInet::Semimatroids;
sub chomped {
local $_ = shift // $_;
chomp; $_
}
local $| = 1; # no output buffering
my @input = map CInet::Relation->new(5 => chomped), <<>>;
say for grep is_semimatroid($_), @input;
```

With these programs, the input gaussoids can be filtered for these properties:

```
$ perl get-structurals.pl <gaussoids.txt | tee structurals.txt
$ perl get-orientables.pl <gaussoids.txt | tee orientables.txt
```

The former takes about 144 minutes, the latter about 8 minutes.
This discrepancy is in part caused by the fact that the SAT solver CaDiCaL
can be used as a library and works incrementally: it only has to be filled
once with the axioms for oriented gaussoids which is then solved 508817
times under different assumptions. On the other hand, SoPlex is an external
program for which 508817 linear program files have to be written and each
is solved by a new SoPlex process. Almost 75% of the time in `get-structurals.pl`

is spent writing files and starting SoPlex processes. In this context it is
important to note that linear program files are created usually in a
directory under `/tmp`

. For performance reasons, this should be a
RAM-backed temporary filesystem instead of a real hard disk.

The number of structural semigraphoids and of orientable gaussoids is the number of lines in the output files, 336838 and 175215, respectively:

Their intersection of structural semigraphoids and orientable gaussoids can
be computed easily because the same isomorphy representatives `gaussoids.txt`

have been input into both programs, for example using the following Perl
snippet and giving both file names as arguments:

```
use Modern::Perl 2018;
sub chomped {
local $_ = shift // $_;
chomp; $_
}
my %h;
$h{$_}++ for map chomped, <<>>;
for (keys %h) {
say if $h{$_} == 2;
}
```

The result is available for download below and it consists of 175139 gaussoids.

## Selfadhesion computations¶

Selfadhesivity testing is a very demanding procedure, both in terms
of time and memory. This is because the number of CI statements (and
correspondingly the size of linear programs or boolean formulas)
grow exponentially in the ground set size \(n\), namely as
\(\binom{n}{2} 2^{n-2}\). For testing the selfadhesivity of a
CI structure on \(n=5\) with respect to a given property, that
property has to be tested for each \(L \subseteq [n]\) on a
ground set of size \(2n - |L|\). The programs computing selfadhesions
are all parallelized using the `Forks::Super`

Perl module, which has
to be installed along with some other dependencies:

```
$ cpanm Path::Tiny Forks::Super List::MoreUtils
```

The installation of `Forks::Super`

in particular takes a long time and
may timeout. In that case increase the `--configure-timeout`

,
`--build-timeout`

and `--test-timeout`

in the cpanm call.

The program deciding \(\mathfrak{o}^{\textsf{sa}}\), for example, looks like this:

```
use Modern::Perl 2018;
use Forks::Super;
use List::MoreUtils qw(natatime);
use Path::Tiny;
use CInet::Base;
use CInet::Gaussoids;
use CInet::Semimatroids;
use CInet::Adhesive;
sub chomped {
local $_ = shift // $_;
chomp; $_
}
sub worker {
my $id = shift;
srand($id);
my $outfile = path(shift);
my $fh = $outfile->openw_utf8;
for my $A (@_) {
say {$fh} $A if is_selfadhesive(CInet::Relation->new(5 => $A), \&is_orientable);
}
close $fh;
}
my @input = map chomped, <<>>;
# Warm caches (shared by all workers):
is_selfadhesive(CInet::Relation->new(5 => $input[0]), \&is_orientable);
my $batch_size = 1000;
my $batches = natatime $batch_size, @input;
my $total = @input / $batch_size;
my $done = 0;
$Forks::Super::MAX_PROC = 30;
$Forks::Super::ON_BUSY = 'block';
my $id = 0;
my @outfiles;
while (my @batch = $batches->()) {
my $outfile = Path::Tiny->tempfile;
push @outfiles, $outfile;
fork \&worker, args => [ $id++, "$outfile", @batch ];
$done++;
warn "Scheduled $done out of $total batches";
}
waitall;
# Print total result
for my $outfile (@outfiles) {
say for $outfile->lines_utf8({ chomp => 1 });
}
```

This program occupies `$Forks::Super::MAX_PROC = 30`

CPUs. This can be
increased or decreased according to your resources. On each CPU a dedicated
worker process runs which gets a batch of 1000 input structures. There is a
warmup phase which executes the `is_selfadhesive`

routine once before
starting the workers. The `CInet`

modules for deciding orientability and
structural semigraphoidality make extensive use of caches to not recompute
the definition of oriented gaussoids or structural semigraphoids for every
query. These caches are computed once and are then shared by all workers,
greatly contributing to performance but increasing memory usage.

For orientability checking and its large axiom systems in the background, the setup phase allocates about 15 GiB of caches. With 30 parallel workers (instead of 30), the memory usage peaks at shortly below 280 GiB. The processing takes 5677 processor minutes on Intel Xeon Gold processors at 3.50 GHz.

By contrast, the structural semigraphoid computation only caches a few templates of linear programs. 20 GiB of memory suffice for the entire set of workers, but again it is much slower than the incremental SAT solver. The processing takes 59446 processor minutes of which 8745 minutes are spent on starting processes and writing files, on Intel Xeon E7-8867 v3 processors at 2.50 GHz.

Note that since the selfadhesion operator is recessive, it suffices to input
only `orientables.txt`

(with 175215 elements) into the `get-orientables-sa.pl`

checker and only `structurals.txt`

(with 336838 elements) into
`get-structurals-sa.pl`

. This also skews the comparison between the two
selfadhesion computations.

The most demanding selfadhesion computation is the one for the meet property \(\mathfrak{sg}_* \wedge \mathfrak{o}\). It takes 123071 processor minutes of which 66% are spent in kernel space writing files or executing SoPlex, on Intel Xeon CPU E7- 8837 processors at 2.67 GHz. The number of results is 167989. The resulting files are available for download below:

## Finding new inference rules¶

Every CI structure which does not belong to \((\mathfrak{sg}_* \wedge \mathfrak{o})^{\textsf{sa}}(5)\) cannot be realizable by a Gaussian distribution. Therefore it encodes valid inference rules for Gaussians which are violated by this structure. To find these inference rules, one examines all realizable CI structures which contain this non-realizable structure. In particular, Horn-type inference rules (those without disjunctions on the conclusion side) are precisely given by the elements in the intersection of all realizable structures containing the given non-realizable structure.

Since we do not have access to the set of realizable CI structures, we approximate realizability by a Gaussian distribution via necessary properties. The Horn-type inference rules derivable form necessary properties may be weaker (because the intersection is taken over more structures) but they can be computed. Hence, we take the elements \(\mathcal{L}\) of \((\mathfrak{sg}_* \wedge \mathfrak{o})(5)\) one after another and examine the intersection of all the structures in \((\mathfrak{sg}_* \wedge \mathfrak{o})^{\textsf{sa}}(5)\) which contain \(\mathcal{L}\).

First compute the difference of the files `structural-orientables.txt`

and `structural-orientables-sa.txt`

by giving them in this order as
arguments to the following program:

```
use Modern::Perl 2018;
use Path::Tiny;
my ($f1, $f2) = @ARGV;
my %h;
$h{$_}++ for path($f1)->lines_utf8({ chomp => 1 });
$h{$_}-- for path($f2)->lines_utf8({ chomp => 1 });
for (keys %h) {
say if $h{$_} > 0;
}
```

These are the 7150 candidate structures which may give new inference rules which are not implied by the structural semigraphoid or orientability properties but are implied by their selfadhesion.

To intersect all selfadhesive CI structures containing one of these candidates, we first have to compute the entire list of CI structures which are selfadhesive with respect to structural semigraphoidality and orientability (above we only computed isomorphy representatives). This is accomplished by the following Perl code:

```
use Modern::Perl 2018;
use CInet::Base;
my %h;
for (<<>>) {
chomp;
$h{$_}++ for CInet::Relation->new(5 => $_)->orbit(SymmetricGroup)->list;
}
say for keys %h;
```

The output of sending `structural-orientables-sa.txt`

into this program
is large: its zstd-compressed version is 202 MiB big and it inflates to about
1.5 GiB uncompressed.

The file contains 19721952 CI structures. The search for new Horn-type axioms goes through these almost 20 million CI structures for each candidate and intersects all structures containing the candidate:

```
$ cat axiom-candidates.txt | while read G
> do echo " $G"; echo -n "=> ";
> grep -e $(sed 's/1/./g' <<<"$G") structural-orientables-sa-all.txt |
> perl intersect.pl; done
```

Here, `grep`

extracts the structures containing the candidate `$G`

and the Perl script `intersect.pl`

performs the intersection. Its source
code is below. It requires the `Math::BigInt::GMP`

module to be installed.
(The intersection of CI structures is implemented using (fast) bit operations
on arbitrary-length binary numbers, as each CI structure is encoded as a
string of bits.)

```
use Modern::Perl;
use bignum lib => 'GMP';
my $x = 0;
while (<<>>) {
chomp;
# Bitwise OR is intersection because per convention on
# gaussoids.de, a 0 bit means that a conditional independence
# statement is IN the set.
$x |= "0b$_";
}
say sprintf "%080s", $x->to_bin;
```

The above shell routine is a slow process which could benefit from better algorithm engineering, but it only has to run until one inference rule is found:

```
11111111011111111111111010101111111001111111111111110111110110111111111111111111
=> 11111111000000001011001000000000000000001111111100010111000000001111111111111111
11111111110110111011111111111110101111101111110101111111111111111111111111111111
=> 11111111000000000000000000000000000000000000000000000000111111111111111111111111
11111111111111111111001101111101111011111111111111110111011111111111111111111111
=> 11111111111111111011001000000000010011111111111100010111010101111111111111111111
11110111101111111111111001111011111111011111111111011111101111111111111111111111
=> 00000000000000000000000000000000111111001111111111010111001111111111111111111111
11111111101111111111111001111001111111011111101111011111011111111111111111111111
=> 11111111000000000000000000000000000000000000000000000000000101111111111111111111
11111111111111111111110111101111111111101010111111110111011111111111111111111111
=> 11111111111111111111110011101011101100100000000000100010010111111111111111111111
...
```

In each case the candidate is printed and then on the next line an arrow
(for implication) followed by the intersection from `structural-orientables-sa-all.txt`

.
The results are called the *closures* or *completions* of the inputs with respect
to the property under consideration (here selfadhesive structural orientability).
The zeros that appear after the arrow but not before are additional valid CI
statements which are implied by the candidate. To get the CI statements
printed, use the following snippet:

```
$ perl -MCInet::Base -E 'say join ", ", map FACE, CInet::Relation->new(5 => shift)->independences' 11111111011111111111111010101111111001111111111111110111110110111111111111111111
13|, 14|235, 15|2, 15|4, 23|5, 23|14, 25|13, 34|2, 34|15
$ perl -MCInet::Base -E 'say join ", ", map FACE, CInet::Relation->new(5 => shift)->independences' 11111111000000001011001000000000000000001111111100010111000000001111111111111111
13|, 13|2, 13|4, 13|5, 13|24, 13|25, 13|45, 13|245, 14|2, 14|23, 14|25, 14|235, 15|, 15|2, 15|3, 15|4, 15|23, 15|24, 15|34, 15|234, 23|, 23|1, 23|4, 23|5, 23|14, 23|15, 23|45, 23|145, 25|, 25|1, 25|3, 25|13, 34|, 34|1, 34|2, 34|5, 34|12, 34|15, 34|25, 34|125
```

This proves many Horn-type inference rules, for example

where \((ij|K)\) is shorthand for \([i \mathrel{\text{$\perp\mkern-10mu\perp$}} j \mid K]\).
To check how strong this new axiom is, compute `orientables-all.txt`

by symmetry expansion
as described above and compute the orientable closure of

```
$ echo 11111111001111111111111010101111111001111111111111110111110110111111111111111111 | while read G
> do echo " $G"; echo -n "=> ";
> grep -e $(sed 's/1/./g' <<<"$G") orientables-all.txt |
> perl intersect.pl; done
11111111001111111111111010101111111001111111111111110111110110111111111111111111
=> 11111111000000001011001000000000000000001111111100010111000000001111111111111111
```

The last structure is the closure of \(\mathcal{L}\) in the orientable gaussoids. It coincides with the closure of \(\mathcal{L} \setminus \{\, (13|2) \,\}\) in the selfadhesive structural orientable gaussoids. Hence, the inference rule singled out above is new (not implied by structural semigraphoids and orientability) but it is also sufficient (given orientability axioms) to imply all other new Horn-type inference rules which can be derived from this candidate structure.

## System information¶

Computations were performed on various of the MPI’s compute servers; timings, setup and CPU information is given throughout the text. On consumer hardware, the memory usage will be the first obstacle in carrying out these computations. Parallelization (using at least 30 cores) is necessary for the computations to finish in a matter of days (instead of months).

The source code for the `CInet`

Perl modules is provided for download
above. Versions of other software and the utilized solvers are as follows:
\(\verb|Perl v5.32|\), \(\verb|CaDiCaL v1.3.0|\),
\(\verb|nbc_minisat_all v1.0.2|\), \(\verb|SoPlex v4.0.0|\).

## Colophon¶

Project page created: 16/05/2022

Project contributors: Tobias Boege

Corresponding author of this page: Tobias Boege, tobias.boege@mis.mpg.de