Example 3: Disease Prediction Using Graph Information (SmCCNet + DPMON)

This example demonstrates a hybrid workflow combining SmCCNet for network generation and DPMON for disease prediction using multi-omics data and graph information.

Workflow Overview:

  1. Network Construction (SmCCNet): Generates an adjacency matrix from multi-omics data using SmCCNet. The resulting matrix represents relationships between features.

  2. Disease Prediction (DPMON): Utilizes the generated adjacency matrix to predict disease phenotypes using the DPMON model.

Step-by-Step Guide:

  1. Setup Input Data: - Prepare the following inputs:

    • Proteins Data (`omics_proteins`): Pandas DataFrame containing protein features.

    • Metabolites Data (`omics_metabolites`): Pandas DataFrame containing metabolite features.

    • Phenotype Data (`phenotype_data`): Pandas Series with phenotype labels.

    • Clinical Data (`clinical_data`): Pandas DataFrame with clinical features.

    Example input data structure:

    omics_proteins = pd.DataFrame({
        'protein_feature1': [0.1, 0.2],
        'protein_feature2': [0.3, 0.4]
    }, index=['Sample1', 'Sample2'])
    
    omics_metabolites = pd.DataFrame({
        'metabolite_feature1': [0.5, 0.6],
        'metabolite_feature2': [0.7, 0.8]
    }, index=['Sample1', 'Sample2'])
    
    phenotype_data = pd.Series([1, 0], index=['Sample1', 'Sample2'])
    
    clinical_data = pd.DataFrame({
        'clinical_feature1': [5, 3],
        'clinical_feature2': [7, 2]
    }, index=['Sample1', 'Sample2'])
    
  2. Run SmCCNet to Generate the Adjacency Matrix:

    Use SmCCNet to construct the network from multi-omics data:

    smccnet_instance = SmCCNet(
        phenotype_data=phenotype_data,
        omics_data=pd.concat([omics_proteins, omics_metabolites], axis=1),
        data_types=['protein', 'metabolite'],
        kfold=5,
        summarization='PCA',
        seed=732,
    )
    adjacency_matrix = smccnet_instance.run()
    print("Adjacency matrix generated using SmCCNet.")
    

    This step produces the adjacency matrix, which encodes the relationships between features in your data.

  3. Run DPMON for Disease Prediction:

    With the adjacency matrix from SmCCNet, use DPMON to predict disease phenotypes:

    dpmon_instance = DPMON(
        adjacency_matrix=adjacency_matrix,
        omics_list=[omics_proteins, omics_metabolites],
        phenotype_data=phenotype_data,
        features_data=clinical_data,
        model='GCN',
        tune=False,
        gpu=False
    )
    
    predictions_df = dpmon_instance.run()
    
    if not predictions_df.empty:
        print("DPMON workflow completed successfully. Predictions generated.")
    else:
        print("DPMON hyperparameter tuning completed. No predictions were generated.")
    
  4. Save and Interpret Results:

    The predictions generated by DPMON are returned as a DataFrame, providing insights into disease associations based on the integrated omics data.

    print("DPMON Predictions:")
    print(predictions_df)
    

Running the Example:

To execute the complete workflow:

Complete SmCCNet + DPMON Hybrid Workflow.
"""
Example 3: Disease Prediction Using Graph Information (SmCCNet + Disease Prediction using Multi-Omics Networks (DPMON))
======================================================================================================================

This script demonstrates a workflow where we first generate a graph using Sparse Multiple Canonical Correlation Network
(SmCCNet), and then use that network matrix to run Disease Prediction using Multi-Omics Networks (DPMON), a pipeline
that leverages the power of Graph Neural Networks (GNNs) specifically designed to predict disease phenotypes.

Steps:
1. Generate an adjacency matrix using SmCCNet based on multi-omics and phenotype data.
2. Utilize DPMON to predict disease phenotypes using the network information and omics data.
"""

import pandas as pd
from bioneuralnet.graph_generation import SmCCNet
from bioneuralnet.downstream_task import DPMON

def run_smccnet_dpmon_workflow(omics_proteins: pd.DataFrame,
                               omics_metabolites: pd.DataFrame,
                               phenotype_data: pd.Series,
                               clinical_data: pd.DataFrame) -> pd.DataFrame:
    """
    Executes the hybrid workflow combining SmCCNet for network generation and DPMON for disease prediction.

    This function performs the following steps:
        1. Generates an adjacency matrix using SmCCNet.
        2. Initializes and runs DPMON for disease prediction based on the adjacency matrix.
        3. Returns the disease prediction results.

    Args:
        omics_proteins (pd.DataFrame): DataFrame containing protein data.
        omics_metabolites (pd.DataFrame): DataFrame containing metabolite data.
        phenotype_data (pd.Series): Series containing phenotype information.
        clinical_data (pd.DataFrame): DataFrame containing clinical data.

    Returns:
        pd.DataFrame: Disease prediction results from DPMON.
    """
    try:
        smccnet_instance = SmCCNet(
            phenotype_data=phenotype_data,
            omics_data=pd.concat([omics_proteins, omics_metabolites], axis=1),
            data_types=['protein', 'metabolite'],
            kfold=5,
            summarization='PCA',
            seed=732,
        )
        adjacency_matrix = smccnet_instance.run()
        print("Adjacency matrix generated using SmCCNet.")

        dpmon_instance = DPMON(
            adjacency_matrix=adjacency_matrix,
            omics_list=[omics_proteins, omics_metabolites],
            phenotype_data=phenotype_data,
            features_data=clinical_data,
            model='GCN',  
            tune=False,  
            gpu=False     
        )

        predictions_df = dpmon_instance.run()
        if not predictions_df.empty:
            print("DPMON workflow completed successfully. Predictions generated.")
        else:
            print("DPMON hyperparameter tuning completed. No predictions were generated.")

        return predictions_df

    except Exception as e:
        print(f"An error occurred during the SmCCNet + DPMON workflow: {e}")
        raise e

if __name__ == "__main__":
    try:
        print("Starting SmCCNet + DPMON Hybrid Workflow...")

        omics_proteins = pd.DataFrame({
            'protein_feature1': [0.1, 0.2],
            'protein_feature2': [0.3, 0.4]
        }, index=['Sample1', 'Sample2'])

        omics_metabolites = pd.DataFrame({
            'metabolite_feature1': [0.5, 0.6],
            'metabolite_feature2': [0.7, 0.8]
        }, index=['Sample1', 'Sample2'])

        phenotype_data = pd.Series([1, 0], index=['Sample1', 'Sample2'])

        clinical_data = pd.DataFrame({
            'clinical_feature1': [5, 3],
            'clinical_feature2': [7, 2]
        }, index=['Sample1', 'Sample2'])

        predictions = run_smccnet_dpmon_workflow(omics_proteins, omics_metabolites, phenotype_data, clinical_data)

        print("DPMON Predictions:")
        print(predictions)

        print("Hybrid Workflow completed successfully.\n")
    except Exception as e:
        print(f"An error occurred during the execution: {e}")
        raise e

Result Interpretation:

  • Adjacency Matrix: Represents the constructed network from multi-omics data, indicating the strength and presence of relationships between features.

  • Disease Predictions: Provides predictions related to disease states based on the integrated omics and network data, facilitating biomarker discovery and patient stratification.