Path-finding

class connectome_interpreter.path_finding.XORCircuit(input1: list, input2: list, exciter1: str, exciter2: str, inhibitor: str, output: list)[source]

Bases: object

exciter1: str

exciter2: str

inhibitor: str

input1: list

input2: list

output: list

connectome_interpreter.path_finding.compare_layered_paths(paths: List[DataFrame], priority_indices=None, neuron_to_sign: dict | None = None, sign_color_map: dict = {-1: 'blue', 1: 'red'}, el_colours: List[str] = ['rosybrown', 'burlywood'], legend_labels: List[str] = ['Path 1', 'Path 2'], weight_decimals: int = 2, figsize: tuple = (10, 8), label_pos: List[float] = [0.7, 0.7])[source]

Compare two layered paths by overlaying them and annotating the weights. The paths should be in the format of the output from find_path_iteratively(). The width of the edges is based on the weight in the first path, when the connection is present in both paths.

Parameters:

paths (List[pd.DataFrame]) – A list of two DataFrames containing the path data, including columns ‘layer’, ‘pre’, ‘post’, and ‘weight’.
priority_indices (list, set, pd.Series, numpy.ndarray, optional) – A list of neuron indices that should be plotted on top of each layer. Defaults to None.
neuron_to_sign (dict, optional) – A dictionary that maps neuron indices to their signs. Defaults to None.
sign_color_map (dict, optional) – A dictionary that maps neuron signs to their respective colors. Defaults to {1: ‘red’, -1: ‘blue’}.
el_colours (List[str], optional) – A list of two colors for the edge labels of the two paths. Defaults to [‘rosybrown’, ‘burlywood’].
legend_labels (List[str], optional) – A list of two labels for the legend. Defaults to [‘Path 1’, ‘Path 2’].
weight_decimals (int, optional) – The number of decimal places to round the edge weights to. Defaults to 2.
figsize (tuple, optional) – The size of the figure. Defaults to (10, 8).

Returns:

The function plots the graph.

Return type:

None

connectome_interpreter.path_finding.connected_components(paths: DataFrame, threshold: float = 0, quiet: bool = False) → list[source]

Find connected components in a directed graph represented by a DataFrame of paths. The DataFrame should contain columns ‘pre’, ‘post’, ‘layer’, and ‘weight’. The function filters the paths based on a weight threshold and then constructs a directed graph using NetworkX. It identifies weakly connected components in the graph and returns a list of DataFrames of paths, each representing a connected component.

Parameters:

paths (pd.DataFrame) – The DataFrame containing the path data, including the layer number, pre-synaptic index, post-synaptic index, and weight.
threshold (float, optional) – The threshold for the weight of the direct connection between pre and post. Defaults to 0.

Returns:

A list of DataFrames, each representing a connected component in the directed graph.

Return type:

list

connectome_interpreter.path_finding.count_paths(edgelist: DataFrame, start_layer: int = 1, end_layer: int | None = None, loop_mode: str = 'allow') → int | Tuple[int, int][source]

Counts all paths without materializing them in memory. Uses memoization to avoid redundant computation. This function was written mostly by Claude Sonnet 4.5 under instructions.

Parameters:

edgelist (pd.DataFrame) – Must contain “layer”, “pre”, “post” columns.
start_layer (int) – Starting layer.
end_layer (int) – Ending layer. If None, uses max layer.
loop_mode (str, optional) –
How to handle loops in paths. Options:
- ”allow” (default): Count all paths including those with loops.
- ”exclude”: Count only loop-free paths.
- ”both”: Return tuple (count_with_loops, count_without_loops) computed efficiently in a single DFS pass.
Note: a node appearing at both the start and end of a path is allowed (e.g., A-B-C-D-A is not considered a loop), but a node appearing again in the middle is a loop (e.g., A-B-C-A-A has a loop).

Returns:

If loop_mode is “allow” or “exclude”, returns an int with the total number of valid paths. If loop_mode is “both”, returns a tuple (count_with_loops, count_without_loops).

Return type:

int or tuple

connectome_interpreter.path_finding.create_layered_positions(df: DataFrame, priority_indices=None, sort_dict: dict | None = None) → dict[source]

Creates a dictionary of positions for each neuron in the paths, so that the paths can be visualized in a layered manner. It assumes that df contains the columns ‘layer’, ‘pre_layer’, ‘post_layer’ (or ‘layer’, ‘pre’, ‘post’). If a neuron exists in multiple layers, it is plotted multiple times.

Parameters:

df (pd.DataFrame) – The DataFrame containing the path data, including the layer number, pre-synaptic index, and post-synaptic index.
priority_indices (list, set, pd.Series, numpy.ndarray optional) – A list of neuron indices that should be plotted on top of each layer. Defaults to None.
sort_dict (dict, optional) – A dictionary of neuron indices as keys and their sorting order as values (bigger value is higher in the plot). Defaults to None.

Returns:

A dictionary of positions for each neuron in the paths, with the keys as the neuron indices and the values as the (x, y) coordinates.

Return type:

dict

Find paths within a specified number of steps in a directed graph, starting from input indices and ending at output indices. The unique edges are returned. Filtering by threshold happens after grouping if pre_group and post_group are provided.

avg_within_connected calculates the weight based on the connected neurons, instead of all neurons in any group involved. This might be useful when obtaining paths from one single neuron in a cell type, while there are many other neurons in the same type. This might be especially useful for optic lobe connectivity analysis that’s spatially local.

Two neuron groups might be connected to different extents in different layers (when a pair of cell types are connected with different individual neurons in different layers.). In that case the highest weight is returned by default, but see all_connections_between_groups argument.

Parameters:

inprop (spmatrix) – The connectivity matrix, with presynaptic in the rows.
inidx (arrayable) – The input neuron indices to start the paths from.
outidx (arrayable) – The output neuron indices to end the paths at.
n (int) – The maximum number of hops. n=1 for direct connections.
threshold (float, optional) – The threshold for the weight of the direct connection between pre and post. If pre_group or post_group are provided, filtering happens after grouping. Defaults to 0.
pre_group (dict, optional) – A dictionary mapping pre neuron indices to their respective groups. Defaults to None.
post_group (dict, optional) – A dictionary mapping post neuron indices to their respective groups. Defaults to None.
return_raw_el (bool, optional) – If True, returns the raw edges before grouping. Defaults to False.
combining_method (str, optional) – Method to combine inputs (outprop=False) or outputs (outprop=True). Can be ‘sum’, ‘mean’, or ‘median’. Defaults to ‘mean’.
avg_within_connected (bool, optional) – If True, the weight is calculated within the connected neurons of the same group. If False, the weight is calculated across all neurons of the same group. Defaults to False.
all_connections_between_groups (bool, optional) – If True, use all connections between groups inidx and outidx are in, even if inidx doesn’t cover all neurons in that group. For example, if inidx is one L1 neuron, and pre_group maps indices to cell type, outidx is one Tm3 neuron, then the function will return L1->Tm3 connections for all L1 and Tm3 neurons. Defaults to False.
quiet (bool, optional) – If True, suppresses output messages. Defaults to False.

Returns:

A DataFrame containing the edges of the paths found, including columns ‘pre’, ‘post’, and ‘weight’. If return_raw_el is True, returns a tuple of two DataFrames: the first is the grouped edges, and the second is the raw edges before grouping. If no paths are found, returns None.

Return type:

pd.DataFrame or tuple

connectome_interpreter.path_finding.enumerate_paths(edgelist: DataFrame, start_layer: int = 1, end_layer: int | None = None, return_generator: bool = False, loop_mode: str = 'allow') → List[List[Tuple[str | int, str | int, float]]] | Generator[List[Tuple[str | int, str | int, float]], None, None][source]

Finds all paths that begin with an edge in start_layer and end with an edge in end_layer, assuming valid paths proceed layer-by-layer without skipping. This function was written by Claude.

Parameters:

edgelist (pd.DataFrame) – The edgelist of the entire graph. Must contain columns: “layer”, “pre”, “post”, and “weight”. Each row is a directed, weighted edge from “pre” to “post” at a given layer.
start_layer (int) – The layer from which all paths must begin. Must be <= end_layer.
end_layer (int) – The layer at which all paths must terminate. If None, defaults to the maximum layer in the edgelist.
return_generator (bool, optional) – If True, returns a generator instead of a list. Useful for large graphs to avoid memory issues. Defaults to False.
loop_mode (str, optional) –
How to handle loops in paths. Options:
- ”allow” (default): Return all paths, including those that revisit a vertex.
- ”exclude”: Return only loop-free (simple) paths. A node appearing at both the start and end of a path is allowed (e.g. A-B-C-A is kept, capturing the s-t cycle use case), but a node reappearing in the middle is a loop and the path is dropped (e.g. A-B-A-C). This matches the convention in count_paths().

Returns:

If return_generator is False, returns a list of valid paths. If True, returns a generator. Each path is a list of (pre, post, weight) tuples, ordered from start to end.

Return type:

Union[List[List[Tuple]], Generator]

Filters paths such that intermediate neurons are specified in necessary_intermediate and the filtered cumulative effective weight > thre_cumsum of the total effective weight; the minimum edge weight across the selected paths is either the minimum along those filtered paths or the minimum threshold thre_step_min, whichever is larger. As a rough guide on how to set thre_cumsum, it can be set larger for fewer paths in all_paths, and smaller for more paths.

Parameters:

all_paths (pd.DataFrame | list[pd.DataFrame]) – The DataFrame or list of DataFrames containing the path data, where each DataFrame is like the output from find_paths_of_length(), i.e., contains paths of a specific length.
thre_cumsum (float) – The cumulative effective weight threshold to reach for filtered paths. Should be a number between 0 and 1. Defaults to 0.5.
thre_step_min (float, optional) – The minimum threshold for the weight of the direct connection between pre and post. Defaults to 0.0.
necessary_intermediate (dict, optional) – A dictionary of necessary intermediate neurons, where the keys are the layer numbers (starting neurons: 1; directly downstream: 2) and the values are the neuron indices (can be int, float, list, set, numpy.ndarray, or pandas.Series). Defaults to None.

Returns:

paths: list of pd.DataFrame: Filtered paths.
w_filter: float: Total effective weight of the filtered paths.
w_all: float: Total effective weight of all paths before filtering.
thre_stepfloat: Minimum edge weight threshold used to filter paths, bounded below by thre_step_min.

Return type:

tuple

Filters the paths based on the weight threshold and the necessary intermediate neurons. The weight threshold refers to the direct connectivity between connected neurons in the path. It is recommended to not put too may neurons in necessary_intermediate, as it may be too stringent and remove all paths.

Parameters:

df (pd.DataFrame) – The DataFrame containing the path data, including the layer number, pre-synaptic index, post-synaptic index, and weight. If df is empty, and quiet is False, raises a ValueError. If quiet is True, returns None.
threshold (float, optional) – The threshold for the weight of the direct connection between pre and post. Defaults to 0.
necessary_intermediate (dict, optional) – A dictionary of necessary intermediate neurons, where the keys are the layer numbers (starting neurons: 1; directly downstream: 2) and the values are the neuron indices (can be int, float, list, set, numpy.ndarray, or pandas.Series). Defaults to None.

Returns:

The filtered DataFrame containing the path data, including the layer number, pre-synaptic index, post-synaptic index, and weight.

Return type:

pd.DataFrame

Iteratively finds the path from the specified output (outidx) back to the input (inidx) across multiple layers, using the find_path_once function to traverse each layer.

Parameters:

inprop_csc (scipy.sparse matrix) – The direct connectivity matrix in Compressed Sparse Column format.
steps_cpu (list) – A list of compressed connectivity matrices: one matrix for each compressed path length.
inidx (int, float, list, set, numpy.ndarray, or pandas.Series) – The input neuron indices.
outidx (int, float, list, set, numpy.ndarray, or pandas.Series) – The output neuron indices to start the reverse path finding.
target_layer_number (int) – The number of layers to traverse backwards from the outidx. If target_layer_number = 1, we are looking at the direct synaptic connectivity.
top_n (int, optional) – The number of top connections to consider at each layer based on direct connectivity from inprop_csc. If top_n = -1, all connections are considered.
threshold (float, optional) – The threshold for the average of the direct connectivity from inidx to outidx.
quiet (bool, optional) – If True, suppresses print statements. Defaults to False.

Returns:

A DataFrame containing the path data, including the pre-synaptic and post-synaptic neuron indices, the layer (direct connections from inidx: layer = 1), and the weight (input proportion of the postsynaptic neuron) of the direct connection between pre and post.

Return type:

pd.DataFrame

Finds the path once between input and output, of distance target_layer_number, returning indices of neurons in the previous layer that connect the input with the output. This works by taking the top_n direct upstream partners of the outidx neurons, and intersect those with neurons ‘effectively’ connected (through steps_cpu) to the inidx neurons.

Parameters:

inprop (scipy.sparse.csc_matrix) – The connectivity matrix in Compressed Sparse Column format.
steps_cpu (list) – A list of compressed connectivity matrices: one matrix for each compressed path length.
inidx (int, float, list, set, numpy.ndarray, or pandas.Series) – The input neuron index/indices.
outidx (int, float, list, set, numpy.ndarray, or pandas.Series) – The output neuron index/indices.
target_layer_number (int) – The target layer number to examine. Must be >= 1. When target_layer_number = 1, we are looking at the direct synaptic connectivity.
top_n (int, optional) – The number of top connections to consider based on direct connectivity from inprop_csc. If top_n = -1, all connections are considered.
threshold (float, optional) – The threshold of the direct connectivity from inidx to an average outidx.

Returns:

An array of neuron indices in the previous layer that have significant connectivity, connecting between the inidx and outidx.

Return type:

np.ndarray

Finds the path of length target_layer_number between inidx and outidx, returning the edgelist in a DataFrame, including the pre and post indices, the layer (direct connections from inidx: layer = 1), and the weight of the direct connection between pre and post.

Parameters:

edgelist (Union[spmatrix, pd.DataFrame]) – The edgelist of the entire graph. If a DataFrame, it must contain columns “pre”, “post”, and “weight”. If a sparse matrix, the pre needs to be in the rows.
inidx (int, float, list, set, numpy.ndarray, or pandas.Series) – The source indices.
outidx (int, float, list, set, numpy.ndarray, or pandas.Series) – The target indices.
target_layer_number (int) – The target layer number to examine. Must be >= 1. When target_layer_number = 1, we are looking at the direct synaptic connectivity.

Returns:

A DataFrame containing the path data, including the pre-synaptic and post-synaptic neuron indices, the layer (direct connections from inidx: layer = 1), and the weight (input proportion of the postsynaptic neuron) of the direct connection between pre and post. If no path is found, returns None.

Return type:

pd.DataFrame

connectome_interpreter.path_finding.find_shortest_paths(paths: DataFrame, start_nodes: list[str], end_nodes: list[str]) → list[list[str]][source]

Find the shortest paths between groups in start_nodes and end_nodes in a paths dataframe (paths is the output of find_path_iteratively).

Parameters:

paths (pd.DataFrame) – DataFrame containing the path data, including columns ‘weight’, ‘pre’, and ‘post’.
start_nodes (list) – List of ‘pre’ groups.
end_nodes (list) – List of ‘post’ groups.

Returns:

A list of shortest paths, where each path is a list of groups that connect the start and end nodes (ordered from start to end).

Return type:

list

connectome_interpreter.path_finding.find_xor(paths: DataFrame) → List[XORCircuit][source]

Find XOR-like circuits in a 3-layer network, based on [Wang et al. 2024] (https://www.biorxiv.org/content/10.1101/2024.09.24.614724v2). Note: this function currently ignores middle excitatory neruons that receive both inputs.

Parameters:

paths – DataFrame with columns [‘pre’, ‘post’, ‘sign’, ‘layer’]
pre – source node
post – target node
sign – 1 (excitatory) or -1 (inhibitory)
layer – 1 (input->middle) or 2 (middle->output)

Returns: List of XORCircuit objects, each representing a found XOR motif

connectome_interpreter.path_finding.group_paths(paths: DataFrame, pre_group: dict | None = None, post_group: dict | None = None, intermediate_group: dict | None = None, avg_within_connected: bool = False, outprop: bool = False, combining_method: str = 'mean') → DataFrame[source]

Group the paths by user-specified variable (e.g. cell type, cell class etc.). If outprop=False, weights are summed across presynaptic neurons of the same group and combined across all postsynaptic neurons of the same group using combining_method (even if some postsynaptic neurons are not in paths). If outprop=True, weights are summed across postsynaptic neurons of the same group and combined across all presynaptic neurons of the same group using combining_method (even if some presynaptic neurons are not in paths).

Parameters:

paths (pd.DataFrame) – The DataFrame containing the path data, looking like the output from find_path_iteratively().
pre_group (dict) – A dictionary that maps pre-synaptic neuron indices to their respective group.
post_group (dict) – A dictionary that maps post-synaptic neuron indices to their respective group.
intermediate_group (dict, optional) – A dictionary that maps intermediate neuron indices to their respective group. Defaults to None. If None, it will be set to pre_group.
avg_within_connected (bool, optional) – If True, the average weight is calculated within the connected neurons of the same group. If False, the average weight is calculated across all neurons of the same group. Defaults to False.
outprop (bool, optional) – If True, get the summed output proportion (across recipient single cells in the same cell type) for each average sender. If False (default), get the summed input proportion across all senders for each average recipient.
combining_method (str, optional) – Method to combine inputs (outprop=False) or outputs (outprop=True). Can be ‘sum’, ‘mean’, or ‘median’. Defaults to ‘mean’.

Returns:

The grouped DataFrame containing the path data, including the layer number, pre-synaptic index, post-synaptic index, and weight.

Return type:

pd.DataFrame

connectome_interpreter.path_finding.path_for_ngl(path)[source]

Convert a path DataFrame to one that can be used to visualize the path in neuroglancer with get_ngl_link(df_format = ‘long’). Neurons are coloured by their (indirect) connectivity (calculated using effective_conn_from_paths()) to an average neuron in the last layer. pre and post columns must contain neuron ids. This function can be used for visualizing signal propagation in a pathway.

Parameters:: path (pd.DataFrame) – The DataFrame containing the path data, including the layer number, pre-synaptic index, post-synaptic index, and weight.
Returns:: A DataFrame with columns ‘neuron_id’, ‘layer’, and ‘activation’ (which is (indirect) connectivity in this case), suitable for Neuroglancer visualization.
Return type:: pd.DataFrame

connectome_interpreter.path_finding.remove_excess_neurons(df: DataFrame, keep=None, target_indices=None, keep_targets_in_middle: bool = False, quiet: bool = False) → DataFrame[source]

After filtering, some neurons are no longer on the paths between the input and output neurons. This function removes those neurons from the paths.

Parameters:

df (pd.Dataframe) – a filtered dataframe with similar structure as the dataframe returned by find_paths_of_length(). If df is empty, and quiet is False, raises a ValueError. If quiet is True, returns None.
keep (list, set, pd.Series, numpy.ndarray, str, optional) – A list of neuron indices that should be kept in the paths, even if they don’t connect between input and target in the last layer. Defaults to None.
target_indices (list, set, pd.Series, numpy.ndarray, str, optional) – A list of target neuron indices that should be kept in the last layer. Defaults to None, in which case all neurons in the last layer in df would be kept.
keep_targets_in_middle (bool, optional) – If True, the target_indices are kept in the middle layers as well, even if they don’t connect between input and target in the last layer. Defaults to False.

Returns:

A dataframe with similar structure as the result of find_paths_of_length(), with the excess neurons removed. If no path is found, returns None.

Return type:

pd.Dataframe

connectome_interpreter.path_finding.remove_excess_neurons_batched(df: DataFrame, keep=None, target_indices=None, keep_targets_in_middle: bool = False, quiet: bool = False) → DataFrame[source]

Does the same thing as remove_excess_neurons(), but for batched input (i.e. assumes column batch in df).

Parameters:

df (pd.DataFrame) – a filtered dataframe with similar structure as the dataframe returned by find_path_iteratively(). Must contain a column batch.
keep (list, set, pd.Series, numpy.ndarray, str, optional) – A list of neuron indices that should be kept in the paths, even if they don’t connect between input and target in the last layer. Defaults to None.
target_indices (list, set, pd.Series, numpy.ndarray, str, optional) – A list of target neuron indices that should be kept in the last layer. Defaults to None, in which case all neurons in the last layer in df would be kept.
keep_targets_in_middle (bool, optional) – If True, the target_indices are kept in the middle layers as well, even if they don’t connect between input and target in the last layer. Defaults to False.
quiet (bool, optional) – If True, suppresses print statements. Defaults to False.

Returns:

The filtered DataFrame containing the path data, including the layer number, pre-synaptic index, post-synaptic index, and weight

Return type:

pd.DataFrame