Example 4: Sparse Multiple Canonical Correlation Network (SmCCNet) + PageRank Clustering + Visualization
This example demonstrates a comprehensive workflow that integrates network construction, clustering, and visualization to explore patterns in multi-omics data.
Workflow Overview:
Network Construction (SmCCNet): Generates an adjacency matrix from multi-omics data using SmCCNet. The resulting matrix represents relationships between features.
PageRank-Based Clustering: Applies the PageRank clustering method to identify meaningful sub-networks (clusters) within the constructed network.
Visualization: Visualizes the identified clusters using a static visualization tool (StaticVisualizer) to provide insights into the network structure.
Step-by-Step Guide:
Setup Input Files:
Ensure the following input files are available in the input/ directory: - Proteins Data (`proteins.csv`): Contains protein features. - Metabolites Data (`metabolites.csv`): Contains metabolite features. - Phenotype Data (`phenotype_data.csv`): Contains phenotype labels.
Example structure:
Proteins.csv: protein_feature1,protein_feature2 0.1,0.3 0.2,0.4 Metabolites.csv: metabolite_feature1,metabolite_feature2 0.5,0.7 0.6,0.8 Phenotype_data.csv: phenotype 1 0
Run SmCCNet:
Use SmCCNet to construct the network:
from bioneuralnet.analysis import StaticVisualizer def run_smccnet_pagerank_visualization_workflow(): omics_proteins = pd.read_csv('input/proteins.csv', index_col=0) omics_metabolites = pd.read_csv('input/metabolites.csv', index_col=0) phenotype_data = pd.read_csv('input/phenotype_data.csv', index_col=0) omics_dfs = [omics_proteins, omics_metabolites] data_types = ['protein', 'metabolite'] smccnet_instance = SmCCNet( phenotype_data=phenotype_data, omics_dfs=omics_dfs, data_types=data_types,
This step constructs the adjacency matrix, which encodes relationships between features in the omics data.
Run PageRank Clustering:
Apply PageRank-based clustering to identify sub-networks (clusters) from the constructed network:
G = nx.from_pandas_adjacency(adjacency_matrix) pagerank_instance = PageRank( graph=G, omics_data=pd.concat(omics_dfs, axis=1), phenotype_data=phenotype_data, alpha=0.9, max_iter=100, tol=1e-6, k=0.9, output_dir='pagerank_output' ) seed_nodes = ['node1', 'node2'] try: results = pagerank_instance.run(seed_nodes=seed_nodes) print("PageRank Clustering Results:") print(results) except Exception as e: print(f"Error running PageRank clustering: {e}") return cluster_nodes = results.get('cluster_nodes', [])
This step identifies clusters using seed nodes you specify.
Visualization:
Visualize the identified clusters using a static visualization tool:
print("Visualization saved to visualization_output/cluster_visualization.png") else: print("No cluster identified for visualization.") if __name__ == "__main__": run_smccnet_pagerank_visualization_workflow()
The visualization provides a graphical view of the network, highlighting clusters of interest. The resulting image is saved in the visualization_output/ directory.
Running the Example:
To execute the complete workflow:
"""
Example 4: Sparse Multiple Canonical Correlation Network (SmCCNet) + PageRank Clustering + Visualization
========================================================================================================
This script demonstrates a workflow where we first construct a network using Sparse Multiple Canonical Correlation
Network (SmCCNet) from multi-omics data, apply PageRank-based clustering to identify meaningful sub-networks, and
visualize the clusters to explore patterns in the data.
Steps:
1. Generate a graph using SmCCNet based on multi-omics and phenotype data.
2. Apply PageRank-based clustering to identify clusters of highly connected nodes.
3. Visualize the resulting clusters using static or dynamic visualization tools to explore the network structure.
"""
import os
import pandas as pd
import networkx as nx
from bioneuralnet.graph_generation import SmCCNet
from bioneuralnet.clustering import PageRank
from bioneuralnet.analysis import StaticVisualizer
def run_smccnet_pagerank_visualization_workflow():
omics_proteins = pd.read_csv('input/proteins.csv', index_col=0)
omics_metabolites = pd.read_csv('input/metabolites.csv', index_col=0)
phenotype_data = pd.read_csv('input/phenotype_data.csv', index_col=0)
omics_dfs = [omics_proteins, omics_metabolites]
data_types = ['protein', 'metabolite']
smccnet_instance = SmCCNet(
phenotype_data=phenotype_data,
omics_dfs=omics_dfs,
data_types=data_types,
kfold=5,
summarization='PCA',
seed=732
)
adjacency_matrix = smccnet_instance.run()
adjacency_output_path = os.path.join(smccnet_instance.output_dir, 'adjacency_matrix.csv')
adjacency_matrix.to_csv(adjacency_output_path)
print(f"Adjacency matrix saved to {adjacency_output_path}")
G = nx.from_pandas_adjacency(adjacency_matrix)
pagerank_instance = PageRank(
graph=G,
omics_data=pd.concat(omics_dfs, axis=1),
phenotype_data=phenotype_data,
alpha=0.9,
max_iter=100,
tol=1e-6,
k=0.9,
output_dir='pagerank_output'
)
seed_nodes = ['node1', 'node2']
try:
results = pagerank_instance.run(seed_nodes=seed_nodes)
print("PageRank Clustering Results:")
print(results)
except Exception as e:
print(f"Error running PageRank clustering: {e}")
return
cluster_nodes = results.get('cluster_nodes', [])
if cluster_nodes:
subgraph = G.subgraph(cluster_nodes).copy()
visualizer = StaticVisualizer(
adjacency_matrix=nx.to_pandas_adjacency(subgraph),
output_dir='visualization_output',
output_filename='cluster_visualization.png'
)
G_sub = visualizer.generate_graph()
visualizer.visualize(G_sub)
print("Visualization saved to visualization_output/cluster_visualization.png")
else:
print("No cluster identified for visualization.")
if __name__ == "__main__":
run_smccnet_pagerank_visualization_workflow()
Upon successful execution, the workflow outputs: - Adjacency Matrix: Generated by SmCCNet, saved as a CSV file. - Clustering Results: Saved in the pagerank_output/ directory. - Visualization: Saved as an image in the visualization_output/ directory.
Result Interpretation:
Adjacency Matrix: Represents relationships between features in the omics data.
Clustering Results: Provides metrics like cluster size, conductance, correlation, composite score, and p-value.
Visualization: Offers a graphical representation of the sub-network, revealing patterns and connections among features.