Skip to content

Integration

Core integration

GIMME

GIMME()

Bases: RemovableGeneDataIntegrator

integrate

integrate(model, data, **kwargs)

Integrate the given data with the model.

Parameters:

Name Type Description Default
model

The model to be integrated with the data

required
data

Gene data used to determine the objective function of GIMME

required
kwargs

Keyword arguments passed to apply_GIMME

{}

Returns:

Name Type Description
result GIMMEAnalysis

apply_FASTCORE

apply_FASTCORE(
    C: Union[List[str], Set[str]],
    nonP: Union[List[str], Set[str]],
    model: Model,
    epsilon: float,
    return_model: bool,
    copy_model: bool = True,
    raise_err: bool = True,
    rxn_scaling_coefs: dict = None,
    calc_efficacy: bool = True,
) -> FASTCOREAnalysis

Apply the FASTCORE algorithm to extract a flux-consistent subnetwork.

FASTCORE identifies a minimal set of reactions from a given metabolic model that includes a defined set of core reactions (C) while ensuring flux consistency. Non-penalty reactions (nonP) can be included without affecting the objective function during the sparse mode search.

Parameters:

Name Type Description Default
C list[str] or set[str]

Core reaction IDs that must be included and carry flux.

required
nonP list[str] or set[str]

Non-penalty reaction IDs. Included if needed but not prioritized.

required
model Model

Input genome-scale metabolic model.

required
epsilon float

Tolerance threshold for flux consistency checks. Flux values below this are considered zero. Adjusted per reaction if rxn_scaling_coefs is provided.

required
return_model bool

If True, return the extracted subnetwork as a cobra.Model object.

required
copy_model bool

If True (default), operate on a copy of the input model. If False, modify the input model directly.

True
raise_err bool

If True (default), raise ValueError on inconsistency. If False, warn and potentially remove problematic core reactions.

True
rxn_scaling_coefs dict

Mapping of reaction IDs to scaling coefficients to adjust epsilon per reaction. Defaults to None.

None
calc_efficacy bool

If True (default), calculate efficacy metrics.

True

Returns:

Type Description
FASTCOREAnalysis

An object containing the results: - result_model (cobra.Model, optional): Extracted subnetwork (if return_model is True). - kept_rxn_ids (np.ndarray): Reaction IDs included in the subnetwork. - removed_rxn_ids (np.ndarray): Reaction IDs excluded from the subnetwork. - algo_efficacy (dict, optional): Efficacy metrics (if calc_efficacy is True). - log (dict): Algorithm parameters like epsilon.

Raises:

Type Description
ValueError

If raise_err is True and an inconsistency prevents including all core reactions.

Notes

Based on the algorithm described in: Vlassis, N., Pacheco, M. P., & Sauter, T. (2014). Fast reconstruction of compact context-specific metabolic network models. PLoS computational biology, 10(1), e1003424.

apply_CORDA

apply_CORDA(
    model,
    data,
    protected_rxns=None,
    predefined_threshold=None,
    threshold_kws=None,
    rxn_scaling_coefs=None,
    discrete_strategy_name: str = "linear",
    n_iters=np.inf,
    penalty_factor=100,
    penalty_increase_factor=1.1,
    keep_if_support=5,
    met_prod=None,
    upper_bound=1000000.0,
    threshold=1e-06,
    support_flux_value=1,
    skip_last_step=True,
) -> CORDA_Analysis

Apply the CORDA algorithm to generate a context-specific metabolic model.

Orchestrates the CORDA process: 1. Prepare model and data (thresholds, confidence scores, protected reactions). 2. Initialize and run CORDABuilder. 3. Calculate efficacy metrics. 4. Return results in a CORDA_Analysis object.

Parameters:

Name Type Description Default
model Model

Input genome-scale metabolic model.

required
data object

Object with reaction scores (data.rxn_scores) and optionally gene data (data.gene_data) if using gene-based thresholds.

required
protected_rxns list[str]

Reaction IDs to force into the core set (confidence 3). Defaults to None.

None
predefined_threshold str

Name of predefined thresholding strategy (e.g., 'percentile_90'). Requires data.gene_data. Defaults to None.

None
threshold_kws dict

Additional keyword arguments for the thresholding function (used with predefined_threshold). Defaults to None.

None
rxn_scaling_coefs dict

Mapping of reaction IDs to scaling coefficients to adjust flux thresholds. Defaults to None.

None
discrete_strategy_name str

Strategy to convert continuous scores to discrete confidence levels ('linear'). Defaults to "linear".

'linear'
n_iters int

Max iterations for finding support reactions in CORDABuilder. Defaults to infinity.

inf
penalty_factor float

Initial penalty factor in CORDABuilder. Defaults to 100.

100
penalty_increase_factor float

Penalty increase factor in CORDABuilder. Defaults to 1.1.

1.1
keep_if_support int

Support threshold for elevating medium confidence in CORDABuilder. Defaults to 5.

5
met_prod list[str]

Metabolite IDs for which mock production reactions should be added and forced into the core set. Defaults to None.

None
upper_bound float

High upper bound for reactions during optimization. Defaults to 1e6.

1000000.0
threshold float

Flux threshold below which flux is considered zero. Defaults to 1e-6.

1e-06
support_flux_value float or dict

Minimum flux required for support reactions in CORDABuilder. Defaults to 1.

1
skip_last_step bool

Whether to skip the final refinement step in CORDABuilder. Defaults to True.

True

Returns:

Type Description
CORDA_Analysis

Object containing results: context-specific model, confidence scores, removed reactions, efficacy metrics, and logs.

Note

Original paper: Schultz, A., & Qutub, A. A. (2016). Reconstruction of tissue-specific metabolic networks using CORDA. PLoS computational biology, 12(3), e1004808.

apply_rFASTCORMICS

apply_rFASTCORMICS(
    model: Model,
    data,
    protected_rxns: List[str] = None,
    predefined_threshold: Optional[
        Union[dict, analysis_types]
    ] = None,
    threshold_kws: dict = None,
    rxn_scaling_coefs: dict = None,
    consistent_checking_method: Literal[
        "FASTCC", "FVA"
    ] = "FASTCC",
    unpenalized_subsystem: Union[
        str, List[str]
    ] = "Transport.*",
    method: str = "onestep",
    threshold: float = 1e-06,
    FASTCORE_raise_error: bool = False,
    calc_efficacy: bool = True,
) -> rFASTCORMICSAnalysis

Apply the rFASTCORMICS algorithm to build a context-specific model.

Leverages expression data to define core/non-core reaction sets and uses FASTCORE to extract a consistent subnetwork. Optionally includes model consistency checking and handling of protected reactions and unpenalized subsystems.

Parameters:

Name Type Description Default
model Model

Input genome-scale metabolic model.

required
data object

Object with gene expression data (data.gene_data) and reaction scores (data.rxn_scores).

required
protected_rxns list[str]

Reaction IDs always included in the core set. Defaults to None.

None
predefined_threshold dict or analysis_types

Strategy or dictionary defining thresholds to classify reactions based on scores (e.g., 'percentile_90'). See pipeGEM.integration.utils.parse_predefined_threshold. Defaults to None.

None
threshold_kws dict

Additional keyword arguments for the thresholding function. Defaults to None.

None
rxn_scaling_coefs dict

Mapping of reaction IDs to scaling coefficients to adjust flux thresholds in FASTCORE. Defaults to None.

None
consistent_checking_method (FASTCC, FVA)

Method to ensure initial model consistency ('FASTCC' or 'FVA'). Set to None to skip. Defaults to "FASTCC".

'FASTCC'
unpenalized_subsystem str or list[str]

Subsystem name(s) (regex allowed) included in the non-penalty set (nonP) during FASTCORE. Defaults to "Transport.*".

'Transport.*'
method (onestep, twostep)

rFASTCORMICS variant: - 'onestep': Run FASTCORE once with core and non-penalty sets. - 'twostep': Run FASTCORE on protected reactions, refine, run again on expanded core set. (May need validation). Defaults to "onestep".

'onestep'
threshold float

Flux threshold below which flux is considered zero. Defaults to 1e-6.

1e-06
FASTCORE_raise_error bool

If True, FASTCORE raises error on inconsistency. If False, warns. Defaults to False.

False
calc_efficacy bool

If True, calculate efficacy metrics based on expression-defined sets. Defaults to True.

True

Returns:

Type Description
rFASTCORMICSAnalysis

Object containing results: context-specific model (in nested FASTCORE result), core/non-core sets, thresholding analysis, efficacy metrics.

Notes

Original paper: Pacheco, M. P., Bintener, T., Ternes, D., Kulms, D., Haan, S., Letellier, E., & Sauter, T. (2019). Identifying and targeting cancer-specific metabolism with network-based drug target prediction. EBioMedicine, 43, 98-106.

apply_iMAT

apply_iMAT(
    model,
    data,
    predefined_threshold,
    threshold_kws: dict,
    protected_rxns=None,
    rxn_scaling_coefs=None,
    eps=1e-06,
    tol=1e-06,
    use_gurobi=False,
) -> iMAT_Analysis

Apply the iMAT algorithm to generate a context-specific metabolic model.

iMAT (integrative Metabolic Analysis Tool) uses gene expression data to classify reactions into high-confidence (core) and low-confidence (non-core) sets. It then solves a mixed-integer linear programming (MILP) problem to find a flux distribution that maximizes activity through core reactions while minimizing activity through non-core reactions. Reactions with near-zero flux in the optimal solution are removed.

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model.

required
data object

An object containing gene expression data (data.gene_data) and reaction scores (data.rxn_scores) derived from it.

required
predefined_threshold dict or analysis_types

Strategy or dictionary defining thresholds (exp_th, non_exp_th) to classify reactions based on scores. See pipeGEM.integration.utils.parse_predefined_threshold.

required
threshold_kws dict

Additional keyword arguments for the thresholding function.

required
protected_rxns list[str]

A list of reaction IDs that should always be treated as high-confidence (core) and potentially weighted higher in the objective. Defaults to None.

None
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients. Currently unused in the main logic but potentially used for tolerance adjustment. Defaults to None.

None
eps float

Small flux value used in constraints to enforce activity through core reactions selected by the MILP. Defaults to 1e-6.

1e-06
tol float

Flux tolerance threshold. Reactions with absolute flux below this value in the MILP solution are removed from the final model. Defaults to 1e-6.

1e-06
use_gurobi bool

If True, use Gurobi-specific indicator constraints for potentially better performance. Requires Gurobi solver. Defaults to False.

False

Returns:

Type Description
iMAT_Analysis

An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis): Details of thresholding used.

Notes

Based on the algorithm described in: Shlomi, T., Cabili, M. N., Herrgård, M. J., Palsson, B. Ø., & Ruppin, E. (2008). Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26(9), 1003-1010. The implementation uses binary indicator variables to control reaction activity.

apply_mCADRE

apply_mCADRE(
    model,
    data,
    protected_rxns,
    predefined_threshold=None,
    threshold_kws: dict = None,
    rxn_scaling_coefs: dict = None,
    exp_cutoff: float = 0.9,
    absent_value: float = 0,
    absent_value_indicator: float = -1e-06,
    tol=1e-06,
    eta=0.333,
    evidence_scores: Union[
        Dict[str, Union[int, float]], Series
    ] = None,
    salvage_check_tasks=None,
    default_salv_test=False,
    func_test_tasks=None,
    required_met_ids=None,
    default_func_test=False,
) -> mCADRE_Analysis

Apply the mCADRE algorithm to generate a context-specific metabolic model.

mCADRE (metabolic Context-specificity Assessed by Deterministic Reaction Evaluation) builds context-specific models by iteratively removing reactions based on expression data, network connectivity, and optional evidence scores, while ensuring the model can still perform essential metabolic functions (tasks).

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model.

required
data object

An object containing gene expression data (data.gene_data) and reaction scores (data.rxn_scores) derived from it.

required
protected_rxns list[str]

A list of reaction IDs that should never be removed from the model.

required
predefined_threshold dict or analysis_types

Strategy or dictionary defining thresholds (exp_th, non_exp_th) to classify reactions based on scores. See pipeGEM.integration.utils.parse_predefined_threshold. Defaults to None.

None
threshold_kws dict

Additional keyword arguments for the thresholding function. Defaults to None.

None
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients, used to adjust consistency check tolerance. Defaults to None.

None
exp_cutoff float

Expression score threshold (after mapping to [0, 1]) above which a reaction is considered part of the initial 'core' set. Defaults to 0.9.

0.9
absent_value float

The raw score in data.rxn_scores that indicates a reaction is absent (e.g., 0 for some expression data types). Defaults to 0.

0
absent_value_indicator float

The internal score assigned to absent reactions after mapping. Should be less than 0. Defaults to -1e-6.

-1e-06
tol float

Tolerance used for consistency checks (e.g., FASTCC). Defaults to 1e-6.

1e-06
eta float

Weighting factor used in the consistency check stopping criteria when evaluating removal of medium-confidence reactions. Represents the trade-off between removing non-core reactions and keeping core reactions. Defaults to 0.333.

0.333
evidence_scores dict[str, Union[int, float]] or Series

Additional evidence scores for reactions (e.g., from literature, proteomics). Higher scores favor keeping the reaction. Defaults to None (all zero).

None
salvage_check_tasks TaskContainer or str

Metabolic tasks (e.g., salvage pathways) that the final model must be able to perform. Can be a TaskContainer object or a path to a task file. Defaults to None.

None
default_salv_test bool

If True, use predefined default salvage pathway tasks (Guanine -> GMP, Hypoxanthine -> IMP). Defaults to False.

False
func_test_tasks TaskContainer or str

General metabolic function tasks that the final model must be able to perform. Defaults to None.

None
required_met_ids list[str]

List of metabolite IDs that the model must be able to produce (used if default_func_test is True). Defaults to None.

None
default_func_test bool

If True and required_met_ids is provided, use predefined default functional tasks (production of each required metabolite from glucose). Defaults to False.

False

Returns:

Type Description
mCADRE_Analysis

An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - core_rxn_ids (np.ndarray): IDs of reactions initially defined as core. - non_expressed_rxn_ids (np.ndarray): IDs of reactions initially defined as non-expressed. - score_df (pd.DataFrame): DataFrame with expression, connectivity, and evidence scores. - salvage_test_result (TaskAnalysis or None): Results of salvage pathway tests. - func_test_result (TaskAnalysis or None): Results of functional tests. - threshold_analysis (ThresholdAnalysis): Details of thresholding used. - algo_efficacy (float): Efficacy score (e.g., F1) comparing the final model against the initial core/non-core sets.

Raises:

Type Description
RuntimeError

If the initial model fails any of the provided functional or salvage tests.

Notes

Based on the algorithm described in: Wang, Y., Eddy, J. A., & Price, N. D. (2012). Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC systems biology, 6, 1-16.

apply_INIT

apply_INIT(
    model,
    data,
    predefined_threshold,
    threshold_kws: dict,
    protected_rxns=None,
    eps=1e-06,
    tol=1e-06,
    weight_method: Literal[
        "default", "threshold"
    ] = "threshold",
    rxn_scaling_coefs: dict = None,
) -> INIT_Analysis

Apply the INIT algorithm to generate a context-specific metabolic model.

INIT (Integrative Network Inference for Tissues) uses expression data to assign weights to reactions. It then solves a mixed-integer linear programming (MILP) problem, similar to iMAT, to find a flux distribution that maximizes the sum of weights for active reactions. Reactions with near-zero flux in the optimal solution are removed.

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model.

required
data object

An object containing gene expression data (data.gene_data) and reaction scores (data.rxn_scores) derived from it.

required
predefined_threshold dict or analysis_types

Strategy or dictionary defining thresholds (exp_th, non_exp_th) used for weight calculation if weight_method is 'threshold'. See pipeGEM.integration.utils.parse_predefined_threshold.

required
threshold_kws dict

Additional keyword arguments for the thresholding function if weight_method is 'threshold'.

required
protected_rxns list[str]

A list of reaction IDs that should always be treated as core reactions and potentially assigned a high weight. Defaults to None.

None
eps float

Small flux value used in constraints to enforce activity through core reactions selected by the MILP (inherited from iMAT constraints). Defaults to 1e-6.

1e-06
tol float

Flux tolerance threshold. Reactions with absolute flux below this value in the MILP solution are removed from the final model. Defaults to 1e-6.

1e-06
weight_method (default, threshold)

Method to calculate reaction weights from scores: - 'default': Uses 5 * log(score). - 'threshold': Uses linear interpolation based on exp_th and non_exp_th. Defaults to "threshold".

'default'
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients. Currently unused in the main logic but potentially used for tolerance adjustment. Defaults to None.

None

Returns:

Type Description
INIT_Analysis

An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis or None): Details of thresholding used if weight_method was 'threshold'. - weight_dic (dict): Dictionary of calculated weights used in the objective. - fluxes (pd.DataFrame): DataFrame of absolute fluxes from the MILP solution.

Notes

Based on the algorithm described in: Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., & Nielsen, J. (2012). Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT. PLoS computational biology, 8(5), e1002518. The implementation leverages the MILP formulation structure from iMAT.

apply_MBA

apply_MBA(
    model,
    data=None,
    predefined_threshold=None,
    threshold_kws: dict = None,
    protected_rxns=None,
    rxn_scaling_coefs: dict = None,
    medium_conf_rxn_ids=None,
    high_conf_rxn_ids=None,
    consistent_checking_method: str = "FASTCC",
    tolerance: float = 1e-08,
    epsilon: float = 0.33,
    random_state: int = 42,
)

Apply the Model Building Algorithm (MBA) to generate a context-specific model.

MBA iteratively removes reactions with no confidence score ('no-confidence' set) based on consistency checks, while preserving high-confidence reactions and minimizing the removal of medium-confidence reactions.

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model.

required
data object

An object containing gene expression data (data.gene_data) and reaction scores (data.rxn_scores). If provided, medium_conf_rxn_ids and high_conf_rxn_ids are derived from this data using thresholds. Defaults to None.

None
predefined_threshold dict or analysis_types

Strategy or dictionary defining thresholds (exp_th, non_exp_th) to classify reactions based on scores when data is provided. See pipeGEM.integration.utils.parse_predefined_threshold. Defaults to None.

None
threshold_kws dict

Additional keyword arguments for the thresholding function when data is provided. Defaults to None.

None
protected_rxns list[str]

A list of reaction IDs that should always be treated as high-confidence and never removed. Defaults to None.

None
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients, used to adjust consistency check tolerance. Defaults to None.

None
medium_conf_rxn_ids list[str]

List of reaction IDs considered medium confidence. Used only if data is None. Defaults to None.

None
high_conf_rxn_ids list[str]

List of reaction IDs considered high confidence. Used only if data is None. Defaults to None.

None
consistent_checking_method str

Method used for consistency checks (e.g., 'FASTCC'). Defaults to "FASTCC".

'FASTCC'
tolerance float

Tolerance used for consistency checks. Defaults to 1e-8.

1e-08
epsilon float

Weighting factor used in the consistency check stopping criteria. Represents the maximum allowed ratio of removed medium-confidence reactions to removed no-confidence reactions during the removal check of a no-confidence reaction. Defaults to 0.33.

0.33
random_state int

Seed for the random number generator used to shuffle the order of no-confidence reactions being tested for removal. Defaults to 42.

42

Returns:

Type Description
MBA_Analysis

An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis or None): Details of thresholding used if data was provided. - algo_efficacy (float): Efficacy score comparing the final model against the initial high/no-confidence sets.

Raises:

Type Description
AssertionError

If data is None and either medium_conf_rxn_ids or high_conf_rxn_ids contain IDs not present in the model.

Notes

Based on the algorithm described in: Jerby, L., Shlomi, T., & Ruppin, E. (2010). Computational reconstruction of tissue-specific metabolic models: application to human tissues. Molecular systems biology, 6(1), 401.

apply_GIMME

apply_GIMME(
    model: Model,
    rxn_expr_score: Dict[str, float],
    high_exp: float,
    protected_rxns=None,
    obj_frac: float = 0.8,
    remove_zero_fluxes: bool = False,
    flux_threshold: float = 1e-06,
    max_inconsistency_score=1000.0,
    return_fluxes: bool = True,
    keep_context: bool = False,
    rxn_scaling_coefs: dict = None,
    predefined_threshold=None,
)

Apply the GIMME algorithm to generate a context-specific metabolic model.

GIMME (Gene Inactivity Moderated by Metabolism and Expression) assumes that cellular metabolism aims to achieve a required metabolic functionality (defined by the model's objective function) with minimal deviation from a reference expression state. It minimizes the flux through reactions with expression below a threshold, subject to maintaining a certain level of the original objective function.

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model with a defined objective function representing the required metabolic functionality.

required
rxn_expr_score dict[str, float]

Dictionary mapping reaction IDs to their expression scores. NaN values are ignored.

required
high_exp float

Expression score threshold. Reactions with scores below this threshold are penalized in the GIMME objective function.

required
protected_rxns list[str]

List of reaction IDs that should not be penalized, even if their expression is below high_exp. Defaults to None.

None
obj_frac float

Fraction of the original model's optimal objective value that must be maintained by the GIMME solution. Defaults to 0.8.

0.8
remove_zero_fluxes bool

If True, create a result_model by removing reactions with flux below flux_threshold in the GIMME solution. Defaults to False.

False
flux_threshold float

Flux threshold used when remove_zero_fluxes is True. Defaults to 1e-6.

1e-06
max_inconsistency_score float

Value to cap the penalty applied to low-expression reactions to handle potential numerical issues with very low scores. Defaults to 1e3.

1000.0
return_fluxes bool

If True, include the GIMME flux distribution in the result object. Defaults to True.

True
keep_context bool

If True, modify the input model by adding the GIMME objective and constraining the original objective. If False (default), modifications happen within a context manager.

False
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients, used to adjust objective weights and the removal flux_threshold. Defaults to None (all coeffs 1).

None
predefined_threshold any

This parameter is currently ignored by GIMME. Defaults to None.

None

Returns:

Type Description
GIMMEAnalysis

An object containing the results: - rxn_coefficients (dict): Dictionary of objective coefficients (penalties) applied to low-expression reactions. - rxn_scores (dict): The input reaction expression scores. - flux_result (pd.DataFrame or None): GIMME flux distribution if return_fluxes is True. - result_model (cobra.Model or None): Pruned model if remove_zero_fluxes is True, otherwise None.

Notes

Based on the algorithm described in: Becker, S. A., & Palsson, B. Ø. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4(5), e1000082. The objective function minimizes the sum of fluxes weighted by (high_exp - score) for reactions with score < high_exp.

apply_RIPTiDe_pruning

apply_RIPTiDe_pruning(
    model,
    rxn_expr_score: Dict[str, float],
    max_gw: float = None,
    obj_frac: float = 0.8,
    threshold: float = 1e-06,
    protected_rxns=None,
    max_inconsistency_score=1000.0,
    rxn_scaling_coefs: Dict[str, float] = None,
    **kwargs
)

Apply the pruning step of the RIPTiDe algorithm.

This step uses parsimonious Flux Balance Analysis (pFBA) with weights derived from reaction expression scores (or RALs - Reaction Activity Levels) to identify and remove low-flux reactions, creating a pruned, context-specific model.

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model.

required
rxn_expr_score dict[str, float]

Dictionary mapping reaction IDs to their expression scores (RALs). NaN values are ignored. Scores outside [-max_inconsistency_score, max_inconsistency_score] are capped.

required
max_gw float

Maximum possible reaction expression score (RAL). If None, it's calculated as the maximum finite value in rxn_expr_score. Defaults to None.

None
obj_frac float

Fraction of the optimal objective value to maintain when minimizing fluxes during pFBA. Defaults to 0.8.

0.8
threshold float

Flux threshold below which reactions are considered inactive and removed. Adjusted by rxn_scaling_coefs if provided. Defaults to 1e-6.

1e-06
protected_rxns list[str]

List of reaction IDs that should not be removed, even if their flux is below the threshold. Defaults to None.

None
max_inconsistency_score float

Value to cap reaction scores at (positive and negative) to handle extreme outliers. Defaults to 1e3.

1000.0
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients. Used to adjust pFBA weights and the removal threshold. Defaults to None (all coeffs 1).

None
**kwargs

Additional keyword arguments (currently unused).

{}

Returns:

Type Description
RIPTiDePruningAnalysis

An object containing the results: - result_model (cobra.Model): The pruned context-specific model. - removed_rxn_ids (list[str]): List of IDs of removed reactions. - obj_dict (dict[str, float]): Dictionary of weights used in pFBA.

Raises:

Type Description
ValueError

If max_gw is NaN after calculation or if derived pFBA objective coefficients are outside the expected [0, 1] range (after scaling).

Notes

RIPTiDe (Reaction Inclusion by Parsimony and Transcript Distribution) aims to create context-specific models reflecting metabolic activity based on transcriptomic data. This pruning step is the first part. Original paper: Jenior, M. L., et al. (2021). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS computational biology, 16(4), e1007099.

apply_RIPTiDe_sampling

apply_RIPTiDe_sampling(
    model,
    rxn_expr_score: Dict[str, float],
    max_gw: float = None,
    max_inconsistency_score: float = 1000.0,
    obj_frac: float = 0.8,
    sampling_obj_frac: float = 0.8,
    do_sampling: bool = False,
    solver: str = "gurobi",
    sampling_method: str = "gapsplit",
    protected_rxns: Optional[List[str]] = None,
    protect_no_expr: bool = False,
    sampling_n: int = 500,
    keep_context: bool = False,
    rxn_scaling_coefs: Dict[str, float] = None,
    discard_inf_score=True,
    thinning=1,
    processes=1,
    seed=None,
    **kwargs
)

Apply the sampling step of the RIPTiDe algorithm or prepare for it.

This step uses reaction expression scores (RALs) to define an objective function maximizing flux through high-expression reactions. It can optionally perform flux sampling on the model constrained by this objective.

Parameters:

Name Type Description Default
model Model

The input metabolic model, typically the result of RIPTiDe pruning.

required
rxn_expr_score dict[str, float]

Dictionary mapping reaction IDs to their expression scores (RALs). NaN values are ignored. Scores outside [-max_inconsistency_score, max_inconsistency_score] are capped unless discard_inf_score is True.

required
max_gw float

Maximum possible reaction expression score (RAL). If None, it's calculated as the maximum finite value in rxn_expr_score. Defaults to None.

None
max_inconsistency_score float

Value to cap reaction scores at (positive and negative) if discard_inf_score is False. Defaults to 1e3.

1000.0
obj_frac float

Fraction of the optimal objective value (based on maximizing flux through high-RAL reactions) to use as a constraint if keep_context is True or during sampling setup. Defaults to 0.8.

0.8
sampling_obj_frac float

Fraction of the optimal objective value to maintain during flux sampling (passed to the sampler). Defaults to 0.8.

0.8
do_sampling bool

If True, perform flux sampling after setting up the objective and constraints. If False, only sets up the model context. Defaults to False.

False
solver str

Solver to use for optimization and sampling (e.g., 'gurobi', 'cplex'). Defaults to "gurobi".

'gurobi'
sampling_method str

Flux sampling algorithm to use ('achr', 'optgp', 'gapsplit'). Defaults to "gapsplit".

'gapsplit'
protected_rxns list[str]

List of reaction IDs to assign the maximum weight in the objective, regardless of their RAL. Defaults to None.

None
protect_no_expr bool

If True, assign maximum weight to reactions not present in rxn_expr_score. Defaults to False.

False
sampling_n int

Number of flux samples to generate if do_sampling is True. Defaults to 500.

500
keep_context bool

If True, modify the input model by adding the RIPTiDe objective and constraining it based on obj_frac. If False, modifications happen within a context manager only during sampling. Defaults to False.

False
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients, used to adjust objective weights. Defaults to None (all coeffs 1).

None
discard_inf_score bool

If True, treat infinite scores in rxn_expr_score as NaN (ignored). If False, cap them using max_inconsistency_score. Defaults to True.

True
thinning int

Thinning factor for flux sampling (passed to sampler). Defaults to 1.

1
processes int

Number of parallel processes for flux sampling. Defaults to 1.

1
seed int

Random seed for flux sampling. Defaults to None.

None
**kwargs

Additional keyword arguments passed to the flux sampler.

{}

Returns:

Type Description
RIPTiDeSamplingAnalysis

An object containing the results: - sampling_result (SamplingAnalysis or None): Results from flux sampling if do_sampling was True, otherwise None.

Raises:

Type Description
ValueError

If max_gw is less than the maximum score in rxn_expr_score.

Notes

This function sets up the model for RIPTiDe-based flux analysis or sampling. The objective function maximizes flux weighted by scaled RALs. See: Jenior, M. L., et al. (2021). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS computational biology, 16(4), e1007099.

apply_EFlux

apply_EFlux(
    model: Model,
    rxn_expr_score: Dict[str, float],
    max_ub: float = 1000,
    min_lb: float = 1e-06,
    min_score: float = -1000.0,
    protected_rxns: Union[str, List[str], None] = None,
    flux_threshold: float = 1e-06,
    remove_zero_fluxes: bool = False,
    return_fluxes: bool = True,
    transform: Union[Callable, str] = exp_x,
    rxn_scaling_coefs: Dict[str, float] = None,
    predefined_threshold=None,
) -> EFluxAnalysis

Apply the E-Flux algorithm to constrain model fluxes based on expression.

E-Flux uses reaction expression scores (e.g., derived from transcriptomics) to set reaction bounds. Scores are typically transformed and then linearly scaled to map the expression range [min_exp, max_exp] to the flux range [min_lb, max_ub]. This enforces higher flux capacity for highly expressed reactions. Parsimonious FBA (pFBA) is then run on the constrained model.

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model.

required
rxn_expr_score dict[str, float]

Dictionary mapping reaction IDs to their expression scores. NaN values are handled. Scores below min_score are capped.

required
max_ub float

The maximum flux bound assigned to the reaction(s) with the highest (transformed) expression score. Defaults to 1000.

1000
min_lb float

The minimum flux bound assigned to the reaction(s) with the lowest (transformed) expression score. Defaults to 1e-6.

1e-06
min_score float

Minimum expression score to consider; scores below this are capped at this value before transformation and scaling. Defaults to -1e3.

-1000.0
protected_rxns str or list[str] or None

Reaction ID(s) to exclude from bound constraints. Defaults to None.

None
flux_threshold float

Flux threshold used when remove_zero_fluxes is True. Reactions with absolute pFBA flux below this are removed. Defaults to 1e-6.

1e-06
remove_zero_fluxes bool

If True, remove reactions with pFBA flux below flux_threshold from the final model. Defaults to False.

False
return_fluxes bool

If True, include the pFBA flux distribution in the result object. Defaults to True.

True
transform callable or str

Function or name of a function (from pipeGEM.utils.transform.functions or numpy) to apply to expression scores before scaling (e.g., exp_x). Defaults to exp_x.

exp_x
rxn_scaling_coefs dict[str, float]

Dictionary mapping reaction IDs to scaling coefficients. Applied after scaling expression to bounds (divides the calculated bound). Defaults to None (all coeffs 1).

None
predefined_threshold any

This parameter is currently ignored by E-Flux. Defaults to None.

None

Returns:

Type Description
EFluxAnalysis

An object containing the results: - rxn_bounds (dict): Dictionary of the final bounds applied to each reaction. - rxn_scores (dict): The input reaction expression scores. - flux_result (pd.DataFrame or None): pFBA flux distribution if return_fluxes is True. - result_model (cobra.Model): The model with E-Flux bounds applied (and potentially pruned if remove_zero_fluxes is True).

Raises:

Type Description
AssertionError

If max_ub <= 0, min_lb < 0, max_ub <= min_lb, or max_exp <= 0.

ValueError

If the denominator used for scaling becomes non-finite (e.g., due to transform function behavior or max_exp == min_exp).

Notes

Based on the method described in: Colijn, C., Brandes, A., Zucker, J., Lun, D. S., Wienecke, A., Romaszko, J., ... & Ekins, S. (2009). Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS computational biology, 5(8), e1000489. (Though the implementation details like transformation and scaling might differ). Exchange reactions are typically excluded from bound setting.

apply_SPOT

apply_SPOT(
    model: Model,
    rxn_expr_score: Dict[str, float],
    protected_rxns: Optional[Union[str, List[str]]] = None,
    obj_frac: float = 0.1,
    norm_ub: float = 10000.0,
    remove_zero_fluxes: bool = False,
    flux_threshold: float = 1e-06,
    return_fluxes: bool = True,
    keep_context: bool = False,
    rxn_scaling_coefs: Optional[Dict[str, float]] = None,
    predefined_threshold=None,
) -> SPOTAnalysis

Apply the SPOT algorithm to generate an expression-guided flux distribution.

SPOT (Simplified Phenotype Optimization Technique) finds a flux distribution that maximises the correlation between reaction fluxes and gene-expression scores while keeping the model's metabolic objective (e.g. biomass) at a user-specified fraction of its FBA-optimal value.

The optimisation problem solved is:

.. math::

\max \sum_i w_i \cdot v_i \\
\text{s.t.} \quad f_{\text{FBA}} \geq \texttt{obj\_frac} \cdot f^*_{\text{FBA}} \\
\sum_i (v_i^+ + v_i^-) \leq \texttt{norm\_ub} \\
v \in \text{FBA feasible region}

where :math:w_i = \texttt{rxn\_expr\_score}[i] \cdot \texttt{rxn\_scaling\_coefs}[i].

Parameters:

Name Type Description Default
model Model

The input genome-scale metabolic model with a defined objective function representing the required metabolic functionality.

required
rxn_expr_score dict[str, float]

Mapping of reaction IDs to expression scores. NaN values are ignored.

required
protected_rxns str or list[str] or None

Reaction IDs excluded from the SPOT objective (their expression scores do not contribute to the weighted sum). Defaults to None.

None
obj_frac float

Fraction of the FBA-optimal objective value that must be maintained as a lower-bound constraint during SPOT optimisation. Set to 0 to omit the FBA constraint (free maximisation). Defaults to 0.1.

0.1
norm_ub float

Upper bound for the L1 flux-sum constraint Σ(v_i^+ + v_i^-) ≤ norm_ub. Prevents the solver from exploiting highly-expressed reactions at unbounded flux. Defaults to 1e4.

10000.0
remove_zero_fluxes bool

If True, build a result_model by removing reactions whose absolute flux in the SPOT solution is ≤ flux_threshold. Defaults to False.

False
flux_threshold float

Flux cutoff used when remove_zero_fluxes=True. Defaults to 1e-6.

1e-06
return_fluxes bool

If True, store the SPOT flux distribution in the result object. Defaults to True.

True
keep_context bool

If True, the SPOT modifications (FBA constraint, norm constraint, SPOT objective) are applied permanently to the input model. If False (default), all modifications are made inside a context manager and reverted afterwards.

False
rxn_scaling_coefs dict[str, float] or None

Per-reaction scaling coefficients that multiply the expression weights before forming the objective. Defaults to None (all 1.0).

None
predefined_threshold any

Currently unused by SPOT; accepted for API consistency. Defaults to None.

None

Returns:

Type Description
SPOTAnalysis

Object containing:

  • flux_result (pandas.DataFrame or None) — SPOT flux distribution if return_fluxes=True.
  • result_model (cobra.Model or None) — pruned model if remove_zero_fluxes=True, otherwise None.
  • rxn_scores (dict) — the original rxn_expr_score input.
Notes

Based on: Becker, S. A., & Palsson, B. Ø. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4(5), e1000082. (SPOT is a variant of this family of methods.) The L1 norm constraint is implemented directly via the optlang API so that the function works with GLPK, CPLEX, and Gurobi without requiring any solver-specific imports.

apply_gecko_light

apply_gecko_light(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    f_factor=0.5,
    ptot=0.5,
    copy_model=True,
    protected_rxns=None,
)

Apply simple kcat-based enzyme constraints (GECKO-light).

For every reaction in model that has associated kcat data, the upper bound is constrained to::

new_ub = kcat [1/s] * abundance [mmol/gDW] * sigma * 3600

where the factor 3600 converts from per-second to per-hour to match typical COBRA flux units (mmol / gDW / h). If absolute protein abundance is not provided, ptot * f_factor is used as a coarse fallback abundance scale so those parameters have a concrete effect.

Parameters:

Name Type Description Default
model Model

The metabolic model to constrain.

required
enzyme_data EnzymeData

Enzyme data aligned with the model (must have been .align()-ed).

required
protein_abundance ProteinAbundanceData

Protein abundance data. If None, abundance is approximated as ptot * f_factor for every enzyme.

None
sigma float

Average enzyme saturation factor (0 – 1). Default 0.5.

0.5
f_factor float

Fraction of the proteome that is metabolic enzymes (0 to 1). Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.

0.5
ptot float

Total protein content in g / gDW. Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.

0.5
copy_model bool

If True (default), work on a deep-copy of model.

True
protected_rxns list of str

Reaction IDs whose bounds should not be modified.

None

Returns:

Type Description
GECKOLightAnalysis

apply_gecko_full

apply_gecko_full(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    ptot=0.5,
    f_factor=0.5,
    copy_model=True,
    protected_rxns=None,
)

Build a full enzyme-constrained model (ecModel).

The GECKO formulation constrains total enzyme usage through a shared protein pool. Each enzyme-catalysed reaction draws from this pool in proportion to MW / kcat.

Parameters:

Name Type Description Default
model Model

The metabolic model.

required
enzyme_data EnzymeData

Enzyme data aligned with the model.

required
protein_abundance ProteinAbundanceData

Protein abundance data (currently used for logging only; the pool constraint implicitly limits usage).

None
sigma float

Average enzyme saturation factor (0 – 1).

0.5
ptot float

Total protein content in g / gDW.

0.5
f_factor float

Fraction of the proteome that is metabolic enzymes (0 – 1).

0.5
copy_model bool

Work on a deep-copy of model (default True).

True
protected_rxns list of str

Reaction IDs whose bounds should not be modified.

None

Returns:

Type Description
GECKOFullAnalysis

auto_parameterize

auto_parameterize(
    model,
    enzyme_data: EnzymeData,
    kcat_source: Literal[
        "brenda", "sabio-rk", "manual"
    ] = "manual",
    fill_missing: Literal[
        "median", "geometric_mean", "dlkcat"
    ] = "median",
    organism: str = "human",
    metabolite_data=None,
    device: str = "cpu",
) -> EnzymeData

Automated parameter collection and estimation pipeline.

Steps
  1. (Optional) Fetch kcat values from a database (BRENDA, SABIO-RK).
  2. Match to model reactions via EC numbers.
  3. Fill missing kcat values using the specified strategy.
  4. Optionally use DLKcat for prediction of remaining missing values.
  5. Return the enriched :class:EnzymeData.

Parameters:

Name Type Description Default
model Model

The metabolic model.

required
enzyme_data EnzymeData

Existing enzyme data (may have missing kcat values).

required
kcat_source str

Source for kcat values: "brenda", "sabio-rk", or "manual" (use only what is already in enzyme_data).

'manual'
fill_missing str

Strategy to fill missing kcat values: "median" — use the median of available kcats, "geometric_mean" — use the geometric mean, "dlkcat" — use DLKcat deep-learning prediction.

'median'
organism str

Organism name (used for database queries).

'human'
metabolite_data MetaboliteData

Metabolite data with SMILES (required when fill_missing is "dlkcat").

None
device str

Device for DLKcat ("cpu" or "cuda").

'cpu'

Returns:

Type Description
EnzymeData

The enriched enzyme data with filled kcat values.

Enzyme-constrained integration

apply_gecko_light

apply_gecko_light(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    f_factor=0.5,
    ptot=0.5,
    copy_model=True,
    protected_rxns=None,
)

Apply simple kcat-based enzyme constraints (GECKO-light).

For every reaction in model that has associated kcat data, the upper bound is constrained to::

new_ub = kcat [1/s] * abundance [mmol/gDW] * sigma * 3600

where the factor 3600 converts from per-second to per-hour to match typical COBRA flux units (mmol / gDW / h). If absolute protein abundance is not provided, ptot * f_factor is used as a coarse fallback abundance scale so those parameters have a concrete effect.

Parameters:

Name Type Description Default
model Model

The metabolic model to constrain.

required
enzyme_data EnzymeData

Enzyme data aligned with the model (must have been .align()-ed).

required
protein_abundance ProteinAbundanceData

Protein abundance data. If None, abundance is approximated as ptot * f_factor for every enzyme.

None
sigma float

Average enzyme saturation factor (0 – 1). Default 0.5.

0.5
f_factor float

Fraction of the proteome that is metabolic enzymes (0 to 1). Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.

0.5
ptot float

Total protein content in g / gDW. Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.

0.5
copy_model bool

If True (default), work on a deep-copy of model.

True
protected_rxns list of str

Reaction IDs whose bounds should not be modified.

None

Returns:

Type Description
GECKOLightAnalysis

apply_gecko_full

apply_gecko_full(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    ptot=0.5,
    f_factor=0.5,
    copy_model=True,
    protected_rxns=None,
)

Build a full enzyme-constrained model (ecModel).

The GECKO formulation constrains total enzyme usage through a shared protein pool. Each enzyme-catalysed reaction draws from this pool in proportion to MW / kcat.

Parameters:

Name Type Description Default
model Model

The metabolic model.

required
enzyme_data EnzymeData

Enzyme data aligned with the model.

required
protein_abundance ProteinAbundanceData

Protein abundance data (currently used for logging only; the pool constraint implicitly limits usage).

None
sigma float

Average enzyme saturation factor (0 – 1).

0.5
ptot float

Total protein content in g / gDW.

0.5
f_factor float

Fraction of the proteome that is metabolic enzymes (0 – 1).

0.5
copy_model bool

Work on a deep-copy of model (default True).

True
protected_rxns list of str

Reaction IDs whose bounds should not be modified.

None

Returns:

Type Description
GECKOFullAnalysis

auto_parameterize

auto_parameterize(
    model,
    enzyme_data: EnzymeData,
    kcat_source: Literal[
        "brenda", "sabio-rk", "manual"
    ] = "manual",
    fill_missing: Literal[
        "median", "geometric_mean", "dlkcat"
    ] = "median",
    organism: str = "human",
    metabolite_data=None,
    device: str = "cpu",
) -> EnzymeData

Automated parameter collection and estimation pipeline.

Steps
  1. (Optional) Fetch kcat values from a database (BRENDA, SABIO-RK).
  2. Match to model reactions via EC numbers.
  3. Fill missing kcat values using the specified strategy.
  4. Optionally use DLKcat for prediction of remaining missing values.
  5. Return the enriched :class:EnzymeData.

Parameters:

Name Type Description Default
model Model

The metabolic model.

required
enzyme_data EnzymeData

Existing enzyme data (may have missing kcat values).

required
kcat_source str

Source for kcat values: "brenda", "sabio-rk", or "manual" (use only what is already in enzyme_data).

'manual'
fill_missing str

Strategy to fill missing kcat values: "median" — use the median of available kcats, "geometric_mean" — use the geometric mean, "dlkcat" — use DLKcat deep-learning prediction.

'median'
organism str

Organism name (used for database queries).

'human'
metabolite_data MetaboliteData

Metabolite data with SMILES (required when fill_missing is "dlkcat").

None
device str

Device for DLKcat ("cpu" or "cuda").

'cpu'

Returns:

Type Description
EnzymeData

The enriched enzyme data with filled kcat values.