User API

The User API provides detailed documentation for BioNeuralNet’s modules, classes, and functions.

`bioneuralnet.graph_generation`
`bioneuralnet.network_embedding`
`bioneuralnet.downstream_task`
`bioneuralnet.analysis`
`bioneuralnet.utils`

Executables

The following run() methods allow you to execute core workflows directly after creating an instance of the respective class and passing the necessary components or parameters. These methods encapsulate the end-to-end pipeline for specific tasks, such as graph generation, embedding creation, or disease prediction.

Usage Example:

Instantiate a Class: Create an instance of the desired class, passing in the required data and configurations.
Execute the Workflow: Call the run() method on the instance to execute the workflow.

Example:

from bioneuralnet.downstream_task import DPMON

# Required inputs
adjacency_matrix = adjacency_df  # Adjacency matrix from a prior graph-generation step
omics_list = [omics_df1, omics_df2]  # List of omics data DataFrames
phenotype_data = phenotype_df  # DataFrame of phenotype information
clinical_data = clinical_df  # DataFrame of clinical data
model = 'GCN'  # Specify the model type (e.g., GCN, GAT, SAGE)

# Create an instance of DPMON
dpmon_instance = DPMON(
    adjacency_matrix=adjacency_matrix,
    omics_list=omics_list,
    phenotype_data=phenotype_data,
    clinical_data=clinical_data,
    model=model
)

# Execute the disease prediction workflow
predictions = dpmon_instance.run()
print(predictions)

Below are direct references to the run() methods for quick access to their workflow details:

SmCCNet.run() → DataFrame[source]

Executes the entire Sparse Multiple Canonical Correlation Network (SmCCNet) workflow.

Steps:

Preprocessing Data: - Formats and serializes the input omics and phenotype data for SmCCNet analysis.
Graph Generation: - Constructs a global network by generating an adjacency matrix through SmCCNet.
Postprocessing Results: - Deserializes the adjacency matrix (output of SmCCNet) into a Pandas DataFrame.

Returns: pd.DataFrame

A DataFrame containing the adjacency matrix, where each entry represents the strength of the correlation between features.

Raises:

ValueError: If the input data is improperly formatted or missing.

Exception: For any unforeseen errors encountered during preprocessing, graph generation, or postprocessing.

Notes:

SmCCNet is designed for multi-omics data and requires a well-preprocessed and normalized dataset.

Ensure that omics and phenotype data are properly aligned to avoid errors in graph construction.

Example:

smccnet = SmCCNet(omics_data, phenotype_data)
adjacency_matrix = smccnet.run()
print(adjacency_matrix.head())

WGCNA.run() → DataFrame[source]

Executes the entire Weighted Gene Co-expression Network Analysis (WGCNA) workflow.

Steps:

Preprocessing Data:
- Prepares and formats the input omics data for WGCNA analysis.
- Serializes the data into a format suitable for the WGCNA pipeline.
Running WGCNA:
- Constructs a weighted correlation network based on the serialized omics data.
- Identifies co-expression modules among genes or features in the dataset.
Postprocessing Results:
- Deserializes the adjacency matrix (output of WGCNA) into a Pandas DataFrame.
- Logs successful completion and prepares the matrix for downstream tasks.

Returns: pd.DataFrame

A DataFrame containing the adjacency matrix, where each entry represents the weighted correlation between features.

Raises:

ValueError If the input data is improperly formatted or missing.

Exception For any unforeseen errors encountered during preprocessing, network construction, or postprocessing.

Notes:

The WGCNA workflow is sensitive to input data quality and formatting.

Ensure that the input omics data is preprocessed, normalized, and properly indexed to align with expected formats.

This method is designed for large-scale multi-omics data and may require significant computational resources depending on the dataset size.

Example:

wgcna = WGCNA(omics_data)
adjacency_matrix = wgcna.run()
print(adjacency_matrix.head())

DPMON.run() → DataFrame[source]

Execute the DPMON pipeline for disease prediction.

Steps:

Combining Omics and Phenotype Data: - Merges the provided omics datasets and ensures that the phenotype (finalgold_visit) column is included.
Tuning or Training: - Tuning: If tune=True, performs hyperparameter tuning using Ray Tune and returns an empty DataFrame. - Training: If tune=False, runs standard training to generate predictions.
Predictions: - If training is performed, returns a DataFrame of predictions with ‘Actual’ and ‘Predicted’ columns.

Returns: pd.DataFrame

If tune=False, a DataFrame containing disease phenotype predictions for each sample.

If tune=True, returns an empty DataFrame since no predictions are generated.

Raises:

ValueError: If the input data is improperly formatted or missing.

Exception: For any unforeseen errors encountered during preprocessing, tuning, or training.

Notes:

DPMON relies on internally-generated embeddings (via GNNs), node correlations, and a downstream neural network.

Ensure that the adjacency matrix and omics data are properly aligned and that clinical/phenotype data match the sample indices.

Example:

dpmon = DPMON(adjacency_matrix, omics_list, phenotype_data, clinical_data, model='GCN')
predictions = dpmon.run()
print(predictions.head())

GNNEmbedding.run() → Dict[str, Tensor][source]

Generate GNN-based embeddings from the provided adjacency matrix and node features.

Steps:

Node Feature Preparation:
- Computes correlations between omics nodes and clinical variables.
Building PyG Data Object:
- Converts the adjacency matrix and node features into a PyTorch Geometric Data object.
Model Inference:
- Runs the specified GNN model (e.g., GCN, GAT, SAGE, or GIN) to compute node embeddings.
Saving Embeddings:
- Stores the resulting embeddings to a file for future analysis or downstream tasks.

Returns: Dict[str, torch.Tensor]

A dictionary where keys are graph names (e.g., ‘graph’) and values are PyTorch tensors of shape (num_nodes, embedding_dim) containing the node embeddings.

Raises:

ValueError: If node features cannot be computed or if required nodes are missing.

Exception: For any unforeseen errors encountered during node feature preparation, model inference, or embedding generation.

Notes:

Ensure that the adjacency matrix aligns with the nodes present in the omics data.

Clinical variables should be properly correlated with omics features.

Adjust parameters like model_type, gnn_hidden_dim, or gnn_layer_num as needed to customize the embedding process.

Example:

gnn_embedding = GNNEmbedding(adjacency_matrix, omics_data, model_type='GCN')
embeddings = gnn_embedding.run()
print(embeddings['graph'].shape)

Node2VecEmbedding.run() → DataFrame[source]

Runs the Node2Vec embedding process.

Steps:

Converting to NetworkX Graph:
- Converts the input adjacency matrix to a NetworkX-compatible graph object.
Embedding Generation:
- Executes the Node2Vec algorithm to generate low-dimensional embeddings for graph nodes.
Output Preparation:
- Returns the generated embeddings as a Pandas DataFrame.

Returns: pd.DataFrame

A DataFrame containing the node embeddings, with nodes as rows and embedding dimensions as columns.

Raises:

Exception: For any errors encountered during graph conversion or embedding generation.

Notes:

Ensure the adjacency matrix is properly formatted and reflects the graph’s structure.

Adjust hyperparameters like walk_length or embedding_dim to tune the Node2Vec process.

Example:

node2vec = Node2VecEmbedding(adjacency_matrix)
embeddings = node2vec.run()
print(embeddings.head())

GraphEmbedding.run() → DataFrame[source]

Generate subject representations by integrating network embeddings into omics data.

Steps:

Embedding Generation:
- Runs GNN or Node2Vec-based methods to produce node embeddings for the graph.
Dimensionality Reduction:
- Applies PCA to condense the high-dimensional embeddings into a single principal component.
Integration:
- Multiplies original omics features by the reduced embeddings to create enhanced omics data.

Returns: pd.DataFrame

A DataFrame of enhanced omics data where each feature (node) has been weighted by its embedding-derived principal component.

Raises:

ValueError: If embeddings are empty or omics data cannot be integrated.

Exception: For any unexpected issues encountered during the embedding generation, reduction, or integration steps.

Notes:

The enhanced omics data can be used downstream for tasks like clustering, classification, or regression.

Ensure that the PCA step is appropriate for your data. Adjust the dimensionality reduction strategy if required.

Example:

subject_rep = SubjectRepresentation(adjacency_matrix, omics_data)
enhanced_data = subject_rep.run()
print(enhanced_data.head())