Example 3: Disease Prediction Using Graph Information (SmCCNet + DPMON)

This example demonstrates a hybrid workflow combining SmCCNet for network generation and DPMON for disease prediction using multi-omics data and graph information.

Workflow Overview:

Network Construction (SmCCNet): Generates an adjacency matrix from multi-omics data using SmCCNet. The resulting matrix represents relationships between features.
Disease Prediction (DPMON): Utilizes the generated adjacency matrix to predict disease phenotypes using the DPMON model.

Step-by-Step Guide:

Setup Input Data: - Prepare the following inputs:

Proteins Data (`omics_proteins`): Pandas DataFrame containing protein features.

Metabolites Data (`omics_metabolites`): Pandas DataFrame containing metabolite features.

Phenotype Data (`phenotype_data`): Pandas Series with phenotype labels.

Clinical Data (`clinical_data`): Pandas DataFrame with clinical features.

Example input data structure:

omics_proteins = pd.DataFrame({
    'protein_feature1': [0.1, 0.2],
    'protein_feature2': [0.3, 0.4]
}, index=['Sample1', 'Sample2'])

omics_metabolites = pd.DataFrame({
    'metabolite_feature1': [0.5, 0.6],
    'metabolite_feature2': [0.7, 0.8]
}, index=['Sample1', 'Sample2'])

phenotype_data = pd.Series([1, 0], index=['Sample1', 'Sample2'])

clinical_data = pd.DataFrame({
    'clinical_feature1': [5, 3],
    'clinical_feature2': [7, 2]
}, index=['Sample1', 'Sample2'])

Run SmCCNet to Generate the Adjacency Matrix:

Use SmCCNet to construct the network from multi-omics data:

smccnet_instance = SmCCNet(
    phenotype_data=phenotype_data,
    omics_data=pd.concat([omics_proteins, omics_metabolites], axis=1),
    data_types=['protein', 'metabolite'],
    kfold=5,
    summarization='PCA',
    seed=732,
)
adjacency_matrix = smccnet_instance.run()
print("Adjacency matrix generated using SmCCNet.")

This step produces the adjacency matrix, which encodes the relationships between features in your data.

Run DPMON for Disease Prediction:

With the adjacency matrix from SmCCNet, use DPMON to predict disease phenotypes:

dpmon_instance = DPMON(
    adjacency_matrix=adjacency_matrix,
    omics_list=[omics_proteins, omics_metabolites],
    phenotype_data=phenotype_data,
    features_data=clinical_data,
    model='GCN',
    tune=False,
    gpu=False
)

predictions_df = dpmon_instance.run()

if not predictions_df.empty:
    print("DPMON workflow completed successfully. Predictions generated.")
else:
    print("DPMON hyperparameter tuning completed. No predictions were generated.")

Save and Interpret Results:

The predictions generated by DPMON are returned as a DataFrame, providing insights into disease associations based on the integrated omics data.
```
print("DPMON Predictions:")
print(predictions_df)
```

Running the Example:

To execute the complete workflow:

Complete SmCCNet + DPMON Hybrid Workflow.

"""
Example 3: Disease Prediction Using Graph Information (SmCCNet + Disease Prediction using Multi-Omics Networks (DPMON))
======================================================================================================================

This script demonstrates a workflow where we first generate a graph using Sparse Multiple Canonical Correlation Network
(SmCCNet), and then use that network matrix to run Disease Prediction using Multi-Omics Networks (DPMON), a pipeline
that leverages the power of Graph Neural Networks (GNNs) specifically designed to predict disease phenotypes.

Steps:
1. Generate an adjacency matrix using SmCCNet based on multi-omics and phenotype data.
2. Utilize DPMON to predict disease phenotypes using the network information and omics data.
"""

import pandas as pd
from bioneuralnet.graph_generation import SmCCNet
from bioneuralnet.downstream_task import DPMON

def run_smccnet_dpmon_workflow(omics_proteins: pd.DataFrame,
                               omics_metabolites: pd.DataFrame,
                               phenotype_data: pd.Series,
                               clinical_data: pd.DataFrame) -> pd.DataFrame:
    """
    Executes the hybrid workflow combining SmCCNet for network generation and DPMON for disease prediction.

    This function performs the following steps:
        1. Generates an adjacency matrix using SmCCNet.
        2. Initializes and runs DPMON for disease prediction based on the adjacency matrix.
        3. Returns the disease prediction results.

    Args:
        omics_proteins (pd.DataFrame): DataFrame containing protein data.
        omics_metabolites (pd.DataFrame): DataFrame containing metabolite data.
        phenotype_data (pd.Series): Series containing phenotype information.
        clinical_data (pd.DataFrame): DataFrame containing clinical data.

    Returns:
        pd.DataFrame: Disease prediction results from DPMON.
    """
    try:
        smccnet_instance = SmCCNet(
            phenotype_data=phenotype_data,
            omics_data=pd.concat([omics_proteins, omics_metabolites], axis=1),
            data_types=['protein', 'metabolite'],
            kfold=5,
            summarization='PCA',
            seed=732,
        )
        adjacency_matrix = smccnet_instance.run()
        print("Adjacency matrix generated using SmCCNet.")

        dpmon_instance = DPMON(
            adjacency_matrix=adjacency_matrix,
            omics_list=[omics_proteins, omics_metabolites],
            phenotype_data=phenotype_data,
            features_data=clinical_data,
            model='GCN',  
            tune=False,  
            gpu=False     
        )

        predictions_df = dpmon_instance.run()
        if not predictions_df.empty:
            print("DPMON workflow completed successfully. Predictions generated.")
        else:
            print("DPMON hyperparameter tuning completed. No predictions were generated.")

        return predictions_df

    except Exception as e:
        print(f"An error occurred during the SmCCNet + DPMON workflow: {e}")
        raise e

if __name__ == "__main__":
    try:
        print("Starting SmCCNet + DPMON Hybrid Workflow...")

        omics_proteins = pd.DataFrame({
            'protein_feature1': [0.1, 0.2],
            'protein_feature2': [0.3, 0.4]
        }, index=['Sample1', 'Sample2'])

        omics_metabolites = pd.DataFrame({
            'metabolite_feature1': [0.5, 0.6],
            'metabolite_feature2': [0.7, 0.8]
        }, index=['Sample1', 'Sample2'])

        phenotype_data = pd.Series([1, 0], index=['Sample1', 'Sample2'])

        clinical_data = pd.DataFrame({
            'clinical_feature1': [5, 3],
            'clinical_feature2': [7, 2]
        }, index=['Sample1', 'Sample2'])

        predictions = run_smccnet_dpmon_workflow(omics_proteins, omics_metabolites, phenotype_data, clinical_data)

        print("DPMON Predictions:")
        print(predictions)

        print("Hybrid Workflow completed successfully.\n")
    except Exception as e:
        print(f"An error occurred during the execution: {e}")
        raise e

Result Interpretation:

Adjacency Matrix: Represents the constructed network from multi-omics data, indicating the strength and presence of relationships between features.
Disease Predictions: Provides predictions related to disease states based on the integrated omics and network data, facilitating biomarker discovery and patient stratification.