BoolForge Tutorial #5: Example use cases of the random function generator

In this tutorial, we will focus on some use cases of the ability to generate a plethora of different types of random Boolean functions. You will learn how to

compute the prevalence of canalization, k-canalization and nested canalization for functions of different degree,
determine the distribution of the canalizing strength as well as the normalized input redundancy for different degrees,
investigate the correlation between absolute bias and canalization, and
generate and analyze all dynamically different nested canalizing functions of a given degree.

To ensure familiarity with these concepts, we highly recommended to first work the previous tutorial.

[2]:

import boolforge
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Prevalence of canalization

While most 2-input Boolean functions are canalizing, canalization very quickly becomes an elusive property as the degree increases. Using boolforge.random_function(), we can easily approximate the probability that an n-input Boolean function has canalizing depth \(k\).

[118]:

nsim = 1000
ns = np.arange(2,7)
canalizing_depths = np.arange(max(ns)+1)
count_canalizing_depths = np.zeros((len(ns),max(ns)+1))

for _ in range(nsim):
    for i,n in enumerate(ns):
        f = boolforge.random_function(n)
        count_canalizing_depths[i,f.get_canalizing_depth()] += 1
count_canalizing_depths /= nsim

fig,ax = plt.subplots()
for i,canalizing_depth in enumerate(canalizing_depths):
    ax.bar(ns,count_canalizing_depths[:,i],bottom=np.sum(count_canalizing_depths[:,:i],1),label=str(canalizing_depth))
ax.legend(frameon=False,loc='center',bbox_to_anchor=[0.5,1.1],ncol=8,title='canalizing depth')
ax.set_xticks(ns)
ax.set_xlabel('Number of essential variables')
ax.set_ylabel(f'Proportion of functions')

pd.DataFrame(count_canalizing_depths,index='n=' + ns.astype(str),columns='k=' + canalizing_depths.astype(str))

[118]:

	k=0	k=1	k=2	k=3	k=4
n=2	0.179	0.000	0.821	0.000	0.00
n=3	0.626	0.105	0.000	0.269	0.00
n=4	0.952	0.033	0.005	0.000	0.01
n=5	1.000	0.000	0.000	0.000	0.00
n=6	1.000	0.000	0.000	0.000	0.00

../../_images/_collections_tutorials_05_pipelines_random_function_generator_3_1.png

We see that hardly any Boolean function with \(n\geq 5\) inputs is canalizing, let alone nested canalizing. This makes the finding that most Boolean functions in published Boolean gene regulatory network models are nested canalizing very surprising (Kadelka et al., Science Advances, 2024).

To zoom in on the few functions that are canalizing for higher \(n\), we can simply require depth=1 and repeat the above analysis.

[119]:

count_canalizing_depths = np.zeros((len(ns),max(ns)+1))

for _ in range(nsim):
    for i,n in enumerate(ns):
        f = boolforge.random_function(n,depth=1)
        count_canalizing_depths[i,f.get_canalizing_depth()] += 1
count_canalizing_depths /= nsim

fig,ax = plt.subplots()
for i,canalizing_depth in enumerate(canalizing_depths):
    ax.bar(ns,count_canalizing_depths[:,i],bottom=np.sum(count_canalizing_depths[:,:i],1),label=str(canalizing_depth))
ax.legend(frameon=False,loc='center',bbox_to_anchor=[0.5,1.1],ncol=8,title='canalizing depth')
ax.set_xticks(ns)
ax.set_xlabel('Number of essential variables')
ax.set_ylabel(f'Proportion of functions')

pd.DataFrame(count_canalizing_depths,index='n=' + ns.astype(str),columns='k=' + canalizing_depths.astype(str))

[119]:

	k=1	k=2	k=3	k=4	k=5
n=2	0.000	1.000	0.000	0.00	0.000
n=3	0.210	0.000	0.790	0.00	0.000
n=4	0.596	0.094	0.000	0.31	0.000
n=5	0.950	0.029	0.009	0.00	0.012
n=6	1.000	0.000	0.000	0.00	0.000

../../_images/_collections_tutorials_05_pipelines_random_function_generator_5_1.png

This analysis reveals that among Boolean functions of degree \(n\geq 5\), functions with few conditionally canalizing variables are much more abundant than functions with more conditionally canalizing variables, which is mathematically obvious due to the recursive nature of the definition of k-canalization.

Measures of collective canalization for different degree

Using a similar setup, we can investigate if and how the various measures of collective canalization, specifically canalizing strength (Kadelka et al., Advances in Applied Mathematics, 2023) and the normalized input redundancy (Gates et al., PNAS, 2021), change when the degree of the functions changes.

[120]:

nsim = 100
ns = np.arange(2,8)
canalizing_strengths = np.zeros((len(ns),nsim))
input_redundancies = np.zeros((len(ns),nsim))

for j in range(nsim):
    for i,n in enumerate(ns):
        f = boolforge.random_function(n)
        canalizing_strengths[i,j] = f.get_canalizing_strength()
        input_redundancies[i,j] = f.get_input_redundancy()

width=0.4
violinplot_args = {'widths':width,'showmeans':True,'showextrema':False}
fig,ax = plt.subplots()
ax.violinplot(canalizing_strengths.T,positions=ns-width/2,**violinplot_args)
ax.scatter([], [], color='C0', label='canalizing strength')
ax.violinplot(input_redundancies.T,positions=ns+width/2,**violinplot_args)
ax.scatter([], [], color='C1', label='normalized input redundancy')
ax.legend(loc='center',bbox_to_anchor=[0.5,1.05],frameon=False,ncol=2)
ax.set_xlabel('Number of essential variables')
a=ax.set_ylabel('Value')

../../_images/_collections_tutorials_05_pipelines_random_function_generator_8_0.png

This plot reveals that, on average, both the canalizing strength and the normalized input redundancy decrease as the number of variables increases. However, the dependence of canalizing strength on degree is much stronger.

If we stratify this analysis by canalizing depth (exact canalizing depth using EXACT_DEPTH=True or minimal canalizing depth using the default EXACT_DEPTH=False), we can confirm that functions with more conditionally canalizing variables tend to also have higher average collective canalization, irrespective of how it is measured. In other words, the various measures of canalization are all highly correlated.

[ ]:

nsim = 100
EXACT_DEPTH = False #Default: False
ns = np.arange(2,7)
canalizing_strengths = np.zeros((len(ns),max(ns)+1,nsim))
input_redundancies = np.zeros((len(ns),max(ns)+1,nsim))

for k in range(nsim):
    for i,n in enumerate(ns):
        for depth in np.append(np.arange(n-1),n):
            f = boolforge.random_function(n,depth=depth,EXACT_DEPTH=EXACT_DEPTH)
            canalizing_strengths[i,depth,k] = f.get_canalizing_strength()
            input_redundancies[i,depth,k] = f.get_input_redundancy()

width = 0.28
violinplot_args = {'widths': width, 'showmeans': True, 'showextrema': False}
fig, ax = plt.subplots(2, 1, figsize=(6, 6), sharex=True)

base_gap = 1     # gap between groups
intra_gap = 0.3   # gap within group

max_depth = max(ns)

for ii, (data, label) in enumerate(zip(
    [canalizing_strengths, input_redundancies],
    ['canalizing strength', 'normalized input redundancy'])):

    positions = []
    values = []
    colors_used = []
    group_centers = []

    current_x = 0.0
    for i, n in enumerate(ns):
        valid_depths = np.append(np.arange(n-1), n)
        n_viols = len(valid_depths)

        # positions centered on each group's midpoint
        offsets = np.linspace(
            -(n_viols - 1) * intra_gap / 2,
            (n_viols - 1) * intra_gap / 2,
            n_viols
        )
        group_positions = current_x + offsets
        positions.extend(group_positions)
        group_centers.append(current_x)

        for depth in valid_depths:
            values.append(data[i, depth, :])
            colors_used.append('C'+str(depth))

        # advance x-position based on total group width
        group_width = (n_viols - 1) * intra_gap
        current_x += group_width / 2 + base_gap + width + intra_gap

    # plot violins one by one with colors
    for vpos, val, c in zip(positions, values, colors_used):
        vp = ax[ii].violinplot(val, positions=[vpos], **violinplot_args)
        for body in vp['bodies']:
            body.set_facecolor(c)
            body.set_alpha(0.85)
        vp['cmeans'].set_color('k')

    # axis labels
    ax[ii].set_ylabel(label)
    if ii == 1:
        ax[ii].set_xlabel('Number of essential variables (n)')
        ax[ii].set_xticks(group_centers)
        ax[ii].set_xticklabels(ns)
    ax[ii].set_ylim([-0.02,1.02])

# add legend for depth colors
depth_handles = [
    plt.Line2D([0], [0], color='C'+str(d), lw=5, label=f'{d}')
    for d in range(max_depth + 1)
]
a=fig.legend(handles=depth_handles, loc='upper center', ncol=7, frameon=False,
             title='exact canalizing depth' if EXACT_DEPTH else 'minimal canalizing depth')

../../_images/_collections_tutorials_05_pipelines_random_function_generator_10_0.png

Correlation between canalization and bias

Basically all metrics used to assess the sensitivity of Boolean functions (canalization, absolute bias, average sensitivity) are correlated. For example, functions with higher absolute bias are more likely to be canalizing.

[122]:

ns = np.arange(2,6)
nsim = 3000
n_steps = 21
bias_values = np.linspace(0,1,n_steps)
count_canalizing = np.zeros((len(ns),n_steps),dtype=int)
for i,n in enumerate(ns):
    for _ in range(nsim):
        for j,bias in enumerate(bias_values):
            f = boolforge.random_function(n,
                                        bias=bias,
                                        ALLOW_DEGENERATE_FUNCTIONS=True)
            if f.is_canalizing():
                count_canalizing[i,j] += 1

fig,ax = plt.subplots()
for i,n in enumerate(ns):
    ax.plot(bias_values,count_canalizing[i]/nsim,label=f'n={n}')
xticks = [0,0.25,0.5,0.75,1]
ax.set_xticks(xticks)
ax.set_xticklabels([f'{p} ({round(200*np.abs(p-0.5))}%)' for p in xticks])
ax.set_xlabel('bias (absolute bias)')
ax.set_ylabel('probability canalizing')
a=ax.legend(loc='center',frameon=False,bbox_to_anchor=[0.5,1.05],ncol=6)

../../_images/_collections_tutorials_05_pipelines_random_function_generator_12_0.png

Similarly, the probability that a function is degenerate (i.e., that it does not depend on all its variables) also increases as the absolute bias increases.

[123]:

ns = np.arange(2,6)
nsim = 3000
n_steps = 21
bias_values = np.linspace(0,1,n_steps)
count_degenerate = np.zeros((len(ns),n_steps),dtype=int)
for i,n in enumerate(ns):
    for _ in range(nsim):
        for j,bias in enumerate(bias_values):
            f = boolforge.random_function(n,
                                        bias=bias,
                                        ALLOW_DEGENERATE_FUNCTIONS=True)
            if f.is_degenerate():
                count_degenerate[i,j] += 1

fig,ax = plt.subplots()
for i,n in enumerate(ns):
    ax.plot(bias_values,count_degenerate[i]/nsim,label=f'n={n}')
xticks = [0,0.25,0.5,0.75,1]
ax.set_xticks(xticks)
ax.set_xticklabels([f'{p} ({round(200*np.abs(p-0.5))}%)' for p in xticks])
ax.set_xlabel('bias (absolute bias)')
ax.set_ylabel('probability degenerate')
a=ax.legend(loc='center',frameon=False,bbox_to_anchor=[0.5,1.05],ncol=6)

../../_images/_collections_tutorials_05_pipelines_random_function_generator_14_0.png

Analyzing functions with specific canalizing layer structure

The average sensitivity of the Boolean functions governing the updates in a Boolean network, determines the stability of the network dynamics to perturbations. More generally, it determines the dynamical regime of the network (see later tutorials). The ability to generate canalizing functions with a specific canalizing layer structure enables us investigate the link between layer structure and average sensitivity, as well as other properties, such as canalizing strength or effective degree.

For nested canalizing functions of a given degree \(n\), there exists a bijection betweeen their absolute bias and their canalizing layer structure. The function boolforge.get_layer_structure_of_an_NCF_given_its_Hamming_weight(degree,hamming_weight) implements this. Iterating over all possible absolute biases (parametrized by the possible Hamming weights), we can thus generate all dynamically different types of n-input nested canalizing functions and investigate their average sensitivity, which we can compute exactly for relatively low degree.

[124]:

n = 5
all_possible_hamming_weights = np.arange(1,2**(n-1),2)
all_possible_absolute_biases = all_possible_hamming_weights / 2**(n-1)
n_functions = 2**(n-2)
average_sensitivites=np.zeros(n_functions)
canalizing_strengths=np.zeros(n_functions)
effective_degrees=np.zeros(n_functions)
layer_structures = []
for i,w in enumerate(all_possible_hamming_weights):
    layer_structures.append(boolforge.get_layer_structure_of_an_NCF_given_its_Hamming_weight(n,w)[1])
    f = boolforge.random_function(n,layer_structure=layer_structures[i])
    average_sensitivites[i] = f.get_average_sensitivity(EXACT=True,NORMALIZED=False)
    canalizing_strengths[i] = f.get_canalizing_strength()
    effective_degrees[i] = f.get_effective_degree()
number_of_layers = list(map(len,layer_structures))

pd.DataFrame(np.c_[all_possible_hamming_weights,
                   all_possible_absolute_biases,
                   list(map(str,layer_structures)),
                   number_of_layers,
                   average_sensitivites,
                   np.round(canalizing_strengths,4),
                   np.round(effective_degrees,4)],
            columns=['Hamming weight',
                     'Absolute bias',
                     'Layer structure',
                     'Number of layers',
                     'Average sensitivity',
                     'Canalizing strength',
                     'Effective degree'])

[124]:

	Hamming weight	Absolute bias	Layer structure	Number of layers	Average sensitivity	Canalizing strength	Effective degree
0	1	0.0625	[5]	1	0.3125	1.0	1.125
1	3	0.1875	[3, 2]	2	0.6875	0.7705	1.3984
2	5	0.3125	[2, 1, 2]	3	0.9375	0.6369	1.5938
3	7	0.4375	[2, 3]	2	1.0625	0.5993	1.5833
4	9	0.5625	[1, 1, 3]	3	1.1875	0.5033	1.7266
5	11	0.6875	[1, 1, 1, 2]	4	1.3125	0.4657	1.8021
6	13	0.8125	[1, 2, 2]	3	1.3125	0.4657	1.7708
7	15	0.9375	[1, 4]	2	1.1875	0.5033	1.6094

We notice that nested canalizing functions with higher absolute bias tend to be more sensitive to input changes and also less canalizing. However, the relationship between absolute bias and these other metrics is far from monotonic. Further, we notice that there is a perfect correlation between the average sensitivity of a nested canalizing function and its canalizing strength, and a near perfect correlation between average sensitivity and effective degree.

To investigate the non-monotonic behavior further, we can vary the degree and create line plots that reveal a clear pattern, as shown in (Kadelka et al., Physica D, 2017).

[117]:

ns = np.arange(5,9)
f,ax = plt.subplots()
for n in ns:
    all_possible_hamming_weights = np.arange(1,2**(n-1),2)
    all_possible_absolute_biases = all_possible_hamming_weights / 2**(n-1)
    n_functions = 2**(n-2)
    average_sensitivites=np.zeros(n_functions)
    layer_structures = []
    for i,w in enumerate(all_possible_hamming_weights):
        layer_structures.append(boolforge.get_layer_structure_of_an_NCF_given_its_Hamming_weight(n,w)[1])
        f = boolforge.random_function(n,layer_structure=layer_structures[i])
        average_sensitivites[i] = f.get_average_sensitivity(EXACT=True,NORMALIZED=False)
    number_of_layers = list(map(len,layer_structures))
    ax.plot(all_possible_absolute_biases,average_sensitivites,'x--',label=f'n={n}')
ax.legend(loc='best',frameon=False)
ax.set_xlabel('Absolute bias of the nested canalizing function')
a=ax.set_ylabel('Average sensitivity')

../../_images/_collections_tutorials_05_pipelines_random_function_generator_18_0.png