Integration¶

Core integration¶

GIMME ¶

GIMME()

Bases: RemovableGeneDataIntegrator

integrate ¶

integrate(model, data, **kwargs)

Integrate the given data with the model.

Parameters:

Name	Description	Default
`model`	The model to be integrated with the data	required
`data`	Gene data used to determine the objective function of GIMME	required
`kwargs`	Keyword arguments passed to apply_GIMME	`{}`

Returns:

Name	Type	Description
`result`	`GIMMEAnalysis`

apply_FASTCORE ¶

apply_FASTCORE(
    C: Union[List[str], Set[str]],
    nonP: Union[List[str], Set[str]],
    model: Model,
    epsilon: float,
    return_model: bool,
    copy_model: bool = True,
    raise_err: bool = True,
    rxn_scaling_coefs: dict = None,
    calc_efficacy: bool = True,
) -> FASTCOREAnalysis

Apply the FASTCORE algorithm to extract a flux-consistent subnetwork.

FASTCORE identifies a minimal set of reactions from a given metabolic model that includes a defined set of core reactions (C) while ensuring flux consistency. Non-penalty reactions (nonP) can be included without affecting the objective function during the sparse mode search.

Parameters:

Name	Type	Description	Default
`C`	`list[str] or set[str]`	Core reaction IDs that must be included and carry flux.	required
`nonP`	`list[str] or set[str]`	Non-penalty reaction IDs. Included if needed but not prioritized.	required
`model`	`Model`	Input genome-scale metabolic model.	required
`epsilon`	`float`	Tolerance threshold for flux consistency checks. Flux values below this are considered zero. Adjusted per reaction if `rxn_scaling_coefs` is provided.	required
`return_model`	`bool`	If True, return the extracted subnetwork as a cobra.Model object.	required
`copy_model`	`bool`	If True (default), operate on a copy of the input model. If False, modify the input model directly.	`True`
`raise_err`	`bool`	If True (default), raise ValueError on inconsistency. If False, warn and potentially remove problematic core reactions.	`True`
`rxn_scaling_coefs`	`dict`	Mapping of reaction IDs to scaling coefficients to adjust `epsilon` per reaction. Defaults to None.	`None`
`calc_efficacy`	`bool`	If True (default), calculate efficacy metrics.	`True`

Returns:

Type	Description
`FASTCOREAnalysis`	An object containing the results: - result_model (cobra.Model, optional): Extracted subnetwork (if `return_model` is True). - kept_rxn_ids (np.ndarray): Reaction IDs included in the subnetwork. - removed_rxn_ids (np.ndarray): Reaction IDs excluded from the subnetwork. - algo_efficacy (dict, optional): Efficacy metrics (if `calc_efficacy` is True). - log (dict): Algorithm parameters like epsilon.

Raises:

Type	Description
`ValueError`	If `raise_err` is True and an inconsistency prevents including all core reactions.

Notes

Based on the algorithm described in: Vlassis, N., Pacheco, M. P., & Sauter, T. (2014). Fast reconstruction of compact context-specific metabolic network models. PLoS computational biology, 10(1), e1003424.

apply_CORDA ¶

apply_CORDA(
    model,
    data,
    protected_rxns=None,
    predefined_threshold=None,
    threshold_kws=None,
    rxn_scaling_coefs=None,
    discrete_strategy_name: str = "linear",
    n_iters=np.inf,
    penalty_factor=100,
    penalty_increase_factor=1.1,
    keep_if_support=5,
    met_prod=None,
    upper_bound=1000000.0,
    threshold=1e-06,
    support_flux_value=1,
    skip_last_step=True,
) -> CORDA_Analysis

Apply the CORDA algorithm to generate a context-specific metabolic model.

Orchestrates the CORDA process: 1. Prepare model and data (thresholds, confidence scores, protected reactions). 2. Initialize and run CORDABuilder. 3. Calculate efficacy metrics. 4. Return results in a CORDA_Analysis object.

Parameters:

Name	Type	Description	Default
`model`	`Model`	Input genome-scale metabolic model.	required
`data`	`object`	Object with reaction scores (`data.rxn_scores`) and optionally gene data (`data.gene_data`) if using gene-based thresholds.	required
`protected_rxns`	`list[str]`	Reaction IDs to force into the core set (confidence 3). Defaults to None.	`None`
`predefined_threshold`	`str`	Name of predefined thresholding strategy (e.g., 'percentile_90'). Requires `data.gene_data`. Defaults to None.	`None`
`threshold_kws`	`dict`	Additional keyword arguments for the thresholding function (used with `predefined_threshold`). Defaults to None.	`None`
`rxn_scaling_coefs`	`dict`	Mapping of reaction IDs to scaling coefficients to adjust flux thresholds. Defaults to None.	`None`
`discrete_strategy_name`	`str`	Strategy to convert continuous scores to discrete confidence levels ('linear'). Defaults to "linear".	`'linear'`
`n_iters`	`int`	Max iterations for finding support reactions in CORDABuilder. Defaults to infinity.	`inf`
`penalty_factor`	`float`	Initial penalty factor in CORDABuilder. Defaults to 100.	`100`
`penalty_increase_factor`	`float`	Penalty increase factor in CORDABuilder. Defaults to 1.1.	`1.1`
`keep_if_support`	`int`	Support threshold for elevating medium confidence in CORDABuilder. Defaults to 5.	`5`
`met_prod`	`list[str]`	Metabolite IDs for which mock production reactions should be added and forced into the core set. Defaults to None.	`None`
`upper_bound`	`float`	High upper bound for reactions during optimization. Defaults to 1e6.	`1000000.0`
`threshold`	`float`	Flux threshold below which flux is considered zero. Defaults to 1e-6.	`1e-06`
`support_flux_value`	`float or dict`	Minimum flux required for support reactions in CORDABuilder. Defaults to 1.	`1`
`skip_last_step`	`bool`	Whether to skip the final refinement step in CORDABuilder. Defaults to True.	`True`

Returns:

Type	Description
`CORDA_Analysis`	Object containing results: context-specific model, confidence scores, removed reactions, efficacy metrics, and logs.

Note

Original paper: Schultz, A., & Qutub, A. A. (2016). Reconstruction of tissue-specific metabolic networks using CORDA. PLoS computational biology, 12(3), e1004808.

apply_rFASTCORMICS ¶

apply_rFASTCORMICS(
    model: Model,
    data,
    protected_rxns: List[str] = None,
    predefined_threshold: Optional[
        Union[dict, analysis_types]
    ] = None,
    threshold_kws: dict = None,
    rxn_scaling_coefs: dict = None,
    consistent_checking_method: Literal[
        "FASTCC", "FVA"
    ] = "FASTCC",
    unpenalized_subsystem: Union[
        str, List[str]
    ] = "Transport.*",
    method: str = "onestep",
    threshold: float = 1e-06,
    FASTCORE_raise_error: bool = False,
    calc_efficacy: bool = True,
) -> rFASTCORMICSAnalysis

Apply the rFASTCORMICS algorithm to build a context-specific model.

Leverages expression data to define core/non-core reaction sets and uses FASTCORE to extract a consistent subnetwork. Optionally includes model consistency checking and handling of protected reactions and unpenalized subsystems.

Parameters:

Name	Type	Description	Default
`model`	`Model`	Input genome-scale metabolic model.	required
`data`	`object`	Object with gene expression data (`data.gene_data`) and reaction scores (`data.rxn_scores`).	required
`protected_rxns`	`list[str]`	Reaction IDs always included in the core set. Defaults to None.	`None`
`predefined_threshold`	`dict or analysis_types`	Strategy or dictionary defining thresholds to classify reactions based on scores (e.g., 'percentile_90'). See `pipeGEM.integration.utils.parse_predefined_threshold`. Defaults to None.	`None`
`threshold_kws`	`dict`	Additional keyword arguments for the thresholding function. Defaults to None.	`None`
`rxn_scaling_coefs`	`dict`	Mapping of reaction IDs to scaling coefficients to adjust flux thresholds in FASTCORE. Defaults to None.	`None`
`consistent_checking_method`	`(FASTCC, FVA)`	Method to ensure initial model consistency ('FASTCC' or 'FVA'). Set to None to skip. Defaults to "FASTCC".	`'FASTCC'`
`unpenalized_subsystem`	`str or list[str]`	Subsystem name(s) (regex allowed) included in the non-penalty set (nonP) during FASTCORE. Defaults to "Transport.*".	`'Transport.*'`
`method`	`(onestep, twostep)`	rFASTCORMICS variant: - 'onestep': Run FASTCORE once with core and non-penalty sets. - 'twostep': Run FASTCORE on protected reactions, refine, run again on expanded core set. (May need validation). Defaults to "onestep".	`'onestep'`
`threshold`	`float`	Flux threshold below which flux is considered zero. Defaults to 1e-6.	`1e-06`
`FASTCORE_raise_error`	`bool`	If True, FASTCORE raises error on inconsistency. If False, warns. Defaults to False.	`False`
`calc_efficacy`	`bool`	If True, calculate efficacy metrics based on expression-defined sets. Defaults to True.	`True`

Returns:

Type	Description
`rFASTCORMICSAnalysis`	Object containing results: context-specific model (in nested FASTCORE result), core/non-core sets, thresholding analysis, efficacy metrics.

Notes

Original paper: Pacheco, M. P., Bintener, T., Ternes, D., Kulms, D., Haan, S., Letellier, E., & Sauter, T. (2019). Identifying and targeting cancer-specific metabolism with network-based drug target prediction. EBioMedicine, 43, 98-106.

apply_iMAT ¶

apply_iMAT(
    model,
    data,
    predefined_threshold,
    threshold_kws: dict,
    protected_rxns=None,
    rxn_scaling_coefs=None,
    eps=1e-06,
    tol=1e-06,
    use_gurobi=False,
) -> iMAT_Analysis

Apply the iMAT algorithm to generate a context-specific metabolic model.

iMAT (integrative Metabolic Analysis Tool) uses gene expression data to classify reactions into high-confidence (core) and low-confidence (non-core) sets. It then solves a mixed-integer linear programming (MILP) problem to find a flux distribution that maximizes activity through core reactions while minimizing activity through non-core reactions. Reactions with near-zero flux in the optimal solution are removed.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model.	required
`data`	`object`	An object containing gene expression data (`data.gene_data`) and reaction scores (`data.rxn_scores`) derived from it.	required
`predefined_threshold`	`dict or analysis_types`	Strategy or dictionary defining thresholds (`exp_th`, `non_exp_th`) to classify reactions based on scores. See `pipeGEM.integration.utils.parse_predefined_threshold`.	required
`threshold_kws`	`dict`	Additional keyword arguments for the thresholding function.	required
`protected_rxns`	`list[str]`	A list of reaction IDs that should always be treated as high-confidence (core) and potentially weighted higher in the objective. Defaults to None.	`None`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients. Currently unused in the main logic but potentially used for tolerance adjustment. Defaults to None.	`None`
`eps`	`float`	Small flux value used in constraints to enforce activity through core reactions selected by the MILP. Defaults to 1e-6.	`1e-06`
`tol`	`float`	Flux tolerance threshold. Reactions with absolute flux below this value in the MILP solution are removed from the final model. Defaults to 1e-6.	`1e-06`
`use_gurobi`	`bool`	If True, use Gurobi-specific indicator constraints for potentially better performance. Requires Gurobi solver. Defaults to False.	`False`

Returns:

Type	Description
`iMAT_Analysis`	An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis): Details of thresholding used.

Notes

Based on the algorithm described in: Shlomi, T., Cabili, M. N., Herrgård, M. J., Palsson, B. Ø., & Ruppin, E. (2008). Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26(9), 1003-1010. The implementation uses binary indicator variables to control reaction activity.

apply_mCADRE ¶

apply_mCADRE(
    model,
    data,
    protected_rxns,
    predefined_threshold=None,
    threshold_kws: dict = None,
    rxn_scaling_coefs: dict = None,
    exp_cutoff: float = 0.9,
    absent_value: float = 0,
    absent_value_indicator: float = -1e-06,
    tol=1e-06,
    eta=0.333,
    evidence_scores: Union[
        Dict[str, Union[int, float]], Series
    ] = None,
    salvage_check_tasks=None,
    default_salv_test=False,
    func_test_tasks=None,
    required_met_ids=None,
    default_func_test=False,
) -> mCADRE_Analysis

Apply the mCADRE algorithm to generate a context-specific metabolic model.

mCADRE (metabolic Context-specificity Assessed by Deterministic Reaction Evaluation) builds context-specific models by iteratively removing reactions based on expression data, network connectivity, and optional evidence scores, while ensuring the model can still perform essential metabolic functions (tasks).

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model.	required
`data`	`object`	An object containing gene expression data (`data.gene_data`) and reaction scores (`data.rxn_scores`) derived from it.	required
`protected_rxns`	`list[str]`	A list of reaction IDs that should never be removed from the model.	required
`predefined_threshold`	`dict or analysis_types`	Strategy or dictionary defining thresholds (`exp_th`, `non_exp_th`) to classify reactions based on scores. See `pipeGEM.integration.utils.parse_predefined_threshold`. Defaults to None.	`None`
`threshold_kws`	`dict`	Additional keyword arguments for the thresholding function. Defaults to None.	`None`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients, used to adjust consistency check tolerance. Defaults to None.	`None`
`exp_cutoff`	`float`	Expression score threshold (after mapping to [0, 1]) above which a reaction is considered part of the initial 'core' set. Defaults to 0.9.	`0.9`
`absent_value`	`float`	The raw score in `data.rxn_scores` that indicates a reaction is absent (e.g., 0 for some expression data types). Defaults to 0.	`0`
`absent_value_indicator`	`float`	The internal score assigned to absent reactions after mapping. Should be less than 0. Defaults to -1e-6.	`-1e-06`
`tol`	`float`	Tolerance used for consistency checks (e.g., FASTCC). Defaults to 1e-6.	`1e-06`
`eta`	`float`	Weighting factor used in the consistency check stopping criteria when evaluating removal of medium-confidence reactions. Represents the trade-off between removing non-core reactions and keeping core reactions. Defaults to 0.333.	`0.333`
`evidence_scores`	`dict[str, Union[int, float]] or Series`	Additional evidence scores for reactions (e.g., from literature, proteomics). Higher scores favor keeping the reaction. Defaults to None (all zero).	`None`
`salvage_check_tasks`	`TaskContainer or str`	Metabolic tasks (e.g., salvage pathways) that the final model must be able to perform. Can be a TaskContainer object or a path to a task file. Defaults to None.	`None`
`default_salv_test`	`bool`	If True, use predefined default salvage pathway tasks (Guanine -> GMP, Hypoxanthine -> IMP). Defaults to False.	`False`
`func_test_tasks`	`TaskContainer or str`	General metabolic function tasks that the final model must be able to perform. Defaults to None.	`None`
`required_met_ids`	`list[str]`	List of metabolite IDs that the model must be able to produce (used if `default_func_test` is True). Defaults to None.	`None`
`default_func_test`	`bool`	If True and `required_met_ids` is provided, use predefined default functional tasks (production of each required metabolite from glucose). Defaults to False.	`False`

Returns:

Type Description

mCADRE_Analysis

An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - core_rxn_ids (np.ndarray): IDs of reactions initially defined as core. - non_expressed_rxn_ids (np.ndarray): IDs of reactions initially defined as non-expressed. - score_df (pd.DataFrame): DataFrame with expression, connectivity, and evidence scores. - salvage_test_result (TaskAnalysis or None): Results of salvage pathway tests. - func_test_result (TaskAnalysis or None): Results of functional tests. - threshold_analysis (ThresholdAnalysis): Details of thresholding used. - algo_efficacy (float): Efficacy score (e.g., F1) comparing the final model against the initial core/non-core sets.

Raises:

Type	Description
`RuntimeError`	If the initial model fails any of the provided functional or salvage tests.

Notes

Based on the algorithm described in: Wang, Y., Eddy, J. A., & Price, N. D. (2012). Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC systems biology, 6, 1-16.

apply_INIT ¶

apply_INIT(
    model,
    data,
    predefined_threshold,
    threshold_kws: dict,
    protected_rxns=None,
    eps=1e-06,
    tol=1e-06,
    weight_method: Literal[
        "default", "threshold"
    ] = "threshold",
    rxn_scaling_coefs: dict = None,
) -> INIT_Analysis

Apply the INIT algorithm to generate a context-specific metabolic model.

INIT (Integrative Network Inference for Tissues) uses expression data to assign weights to reactions. It then solves a mixed-integer linear programming (MILP) problem, similar to iMAT, to find a flux distribution that maximizes the sum of weights for active reactions. Reactions with near-zero flux in the optimal solution are removed.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model.	required
`data`	`object`	An object containing gene expression data (`data.gene_data`) and reaction scores (`data.rxn_scores`) derived from it.	required
`predefined_threshold`	`dict or analysis_types`	Strategy or dictionary defining thresholds (`exp_th`, `non_exp_th`) used for weight calculation if `weight_method` is 'threshold'. See `pipeGEM.integration.utils.parse_predefined_threshold`.	required
`threshold_kws`	`dict`	Additional keyword arguments for the thresholding function if `weight_method` is 'threshold'.	required
`protected_rxns`	`list[str]`	A list of reaction IDs that should always be treated as core reactions and potentially assigned a high weight. Defaults to None.	`None`
`eps`	`float`	Small flux value used in constraints to enforce activity through core reactions selected by the MILP (inherited from iMAT constraints). Defaults to 1e-6.	`1e-06`
`tol`	`float`	Flux tolerance threshold. Reactions with absolute flux below this value in the MILP solution are removed from the final model. Defaults to 1e-6.	`1e-06`
`weight_method`	`(default, threshold)`	Method to calculate reaction weights from scores: - 'default': Uses 5 * log(score). - 'threshold': Uses linear interpolation based on `exp_th` and `non_exp_th`. Defaults to "threshold".	`'default'`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients. Currently unused in the main logic but potentially used for tolerance adjustment. Defaults to None.	`None`

Returns:

Type Description

INIT_Analysis

An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis or None): Details of thresholding used if weight_method was 'threshold'. - weight_dic (dict): Dictionary of calculated weights used in the objective. - fluxes (pd.DataFrame): DataFrame of absolute fluxes from the MILP solution.

Notes

Based on the algorithm described in: Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., & Nielsen, J. (2012). Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT. PLoS computational biology, 8(5), e1002518. The implementation leverages the MILP formulation structure from iMAT.

apply_MBA ¶

apply_MBA(
    model,
    data=None,
    predefined_threshold=None,
    threshold_kws: dict = None,
    protected_rxns=None,
    rxn_scaling_coefs: dict = None,
    medium_conf_rxn_ids=None,
    high_conf_rxn_ids=None,
    consistent_checking_method: str = "FASTCC",
    tolerance: float = 1e-08,
    epsilon: float = 0.33,
    random_state: int = 42,
)

Apply the Model Building Algorithm (MBA) to generate a context-specific model.

MBA iteratively removes reactions with no confidence score ('no-confidence' set) based on consistency checks, while preserving high-confidence reactions and minimizing the removal of medium-confidence reactions.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model.	required
`data`	`object`	An object containing gene expression data (`data.gene_data`) and reaction scores (`data.rxn_scores`). If provided, `medium_conf_rxn_ids` and `high_conf_rxn_ids` are derived from this data using thresholds. Defaults to None.	`None`
`predefined_threshold`	`dict or analysis_types`	Strategy or dictionary defining thresholds (`exp_th`, `non_exp_th`) to classify reactions based on scores when `data` is provided. See `pipeGEM.integration.utils.parse_predefined_threshold`. Defaults to None.	`None`
`threshold_kws`	`dict`	Additional keyword arguments for the thresholding function when `data` is provided. Defaults to None.	`None`
`protected_rxns`	`list[str]`	A list of reaction IDs that should always be treated as high-confidence and never removed. Defaults to None.	`None`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients, used to adjust consistency check tolerance. Defaults to None.	`None`
`medium_conf_rxn_ids`	`list[str]`	List of reaction IDs considered medium confidence. Used only if `data` is None. Defaults to None.	`None`
`high_conf_rxn_ids`	`list[str]`	List of reaction IDs considered high confidence. Used only if `data` is None. Defaults to None.	`None`
`consistent_checking_method`	`str`	Method used for consistency checks (e.g., 'FASTCC'). Defaults to "FASTCC".	`'FASTCC'`
`tolerance`	`float`	Tolerance used for consistency checks. Defaults to 1e-8.	`1e-08`
`epsilon`	`float`	Weighting factor used in the consistency check stopping criteria. Represents the maximum allowed ratio of removed medium-confidence reactions to removed no-confidence reactions during the removal check of a no-confidence reaction. Defaults to 0.33.	`0.33`
`random_state`	`int`	Seed for the random number generator used to shuffle the order of no-confidence reactions being tested for removal. Defaults to 42.	`42`

Returns:

Type	Description
`MBA_Analysis`	An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis or None): Details of thresholding used if `data` was provided. - algo_efficacy (float): Efficacy score comparing the final model against the initial high/no-confidence sets.

Raises:

Type	Description
`AssertionError`	If `data` is None and either `medium_conf_rxn_ids` or `high_conf_rxn_ids` contain IDs not present in the model.

Notes

Based on the algorithm described in: Jerby, L., Shlomi, T., & Ruppin, E. (2010). Computational reconstruction of tissue-specific metabolic models: application to human tissues. Molecular systems biology, 6(1), 401.

apply_GIMME ¶

apply_GIMME(
    model: Model,
    rxn_expr_score: Dict[str, float],
    high_exp: float,
    protected_rxns=None,
    obj_frac: float = 0.8,
    remove_zero_fluxes: bool = False,
    flux_threshold: float = 1e-06,
    max_inconsistency_score=1000.0,
    return_fluxes: bool = True,
    keep_context: bool = False,
    rxn_scaling_coefs: dict = None,
    predefined_threshold=None,
)

Apply the GIMME algorithm to generate a context-specific metabolic model.

GIMME (Gene Inactivity Moderated by Metabolism and Expression) assumes that cellular metabolism aims to achieve a required metabolic functionality (defined by the model's objective function) with minimal deviation from a reference expression state. It minimizes the flux through reactions with expression below a threshold, subject to maintaining a certain level of the original objective function.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model with a defined objective function representing the required metabolic functionality.	required
`rxn_expr_score`	`dict[str, float]`	Dictionary mapping reaction IDs to their expression scores. NaN values are ignored.	required
`high_exp`	`float`	Expression score threshold. Reactions with scores below this threshold are penalized in the GIMME objective function.	required
`protected_rxns`	`list[str]`	List of reaction IDs that should not be penalized, even if their expression is below `high_exp`. Defaults to None.	`None`
`obj_frac`	`float`	Fraction of the original model's optimal objective value that must be maintained by the GIMME solution. Defaults to 0.8.	`0.8`
`remove_zero_fluxes`	`bool`	If True, create a `result_model` by removing reactions with flux below `flux_threshold` in the GIMME solution. Defaults to False.	`False`
`flux_threshold`	`float`	Flux threshold used when `remove_zero_fluxes` is True. Defaults to 1e-6.	`1e-06`
`max_inconsistency_score`	`float`	Value to cap the penalty applied to low-expression reactions to handle potential numerical issues with very low scores. Defaults to 1e3.	`1000.0`
`return_fluxes`	`bool`	If True, include the GIMME flux distribution in the result object. Defaults to True.	`True`
`keep_context`	`bool`	If True, modify the input `model` by adding the GIMME objective and constraining the original objective. If False (default), modifications happen within a context manager.	`False`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients, used to adjust objective weights and the removal `flux_threshold`. Defaults to None (all coeffs 1).	`None`
`predefined_threshold`	`any`	This parameter is currently ignored by GIMME. Defaults to None.	`None`

Returns:

Type	Description
`GIMMEAnalysis`	An object containing the results: - rxn_coefficients (dict): Dictionary of objective coefficients (penalties) applied to low-expression reactions. - rxn_scores (dict): The input reaction expression scores. - flux_result (pd.DataFrame or None): GIMME flux distribution if `return_fluxes` is True. - result_model (cobra.Model or None): Pruned model if `remove_zero_fluxes` is True, otherwise None.

Notes

Based on the algorithm described in: Becker, S. A., & Palsson, B. Ø. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4(5), e1000082. The objective function minimizes the sum of fluxes weighted by (high_exp - score) for reactions with score < high_exp.

apply_RIPTiDe_pruning ¶

apply_RIPTiDe_pruning(
    model,
    rxn_expr_score: Dict[str, float],
    max_gw: float = None,
    obj_frac: float = 0.8,
    threshold: float = 1e-06,
    protected_rxns=None,
    max_inconsistency_score=1000.0,
    rxn_scaling_coefs: Dict[str, float] = None,
    **kwargs
)

Apply the pruning step of the RIPTiDe algorithm.

This step uses parsimonious Flux Balance Analysis (pFBA) with weights derived from reaction expression scores (or RALs - Reaction Activity Levels) to identify and remove low-flux reactions, creating a pruned, context-specific model.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model.	required
`rxn_expr_score`	`dict[str, float]`	Dictionary mapping reaction IDs to their expression scores (RALs). NaN values are ignored. Scores outside [-max_inconsistency_score, max_inconsistency_score] are capped.	required
`max_gw`	`float`	Maximum possible reaction expression score (RAL). If None, it's calculated as the maximum finite value in `rxn_expr_score`. Defaults to None.	`None`
`obj_frac`	`float`	Fraction of the optimal objective value to maintain when minimizing fluxes during pFBA. Defaults to 0.8.	`0.8`
`threshold`	`float`	Flux threshold below which reactions are considered inactive and removed. Adjusted by `rxn_scaling_coefs` if provided. Defaults to 1e-6.	`1e-06`
`protected_rxns`	`list[str]`	List of reaction IDs that should not be removed, even if their flux is below the threshold. Defaults to None.	`None`
`max_inconsistency_score`	`float`	Value to cap reaction scores at (positive and negative) to handle extreme outliers. Defaults to 1e3.	`1000.0`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients. Used to adjust pFBA weights and the removal `threshold`. Defaults to None (all coeffs 1).	`None`
`**kwargs`		Additional keyword arguments (currently unused).	`{}`

Returns:

Type	Description
`RIPTiDePruningAnalysis`	An object containing the results: - result_model (cobra.Model): The pruned context-specific model. - removed_rxn_ids (list[str]): List of IDs of removed reactions. - obj_dict (dict[str, float]): Dictionary of weights used in pFBA.

Raises:

Type	Description
`ValueError`	If `max_gw` is NaN after calculation or if derived pFBA objective coefficients are outside the expected [0, 1] range (after scaling).

Notes

RIPTiDe (Reaction Inclusion by Parsimony and Transcript Distribution) aims to create context-specific models reflecting metabolic activity based on transcriptomic data. This pruning step is the first part. Original paper: Jenior, M. L., et al. (2021). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS computational biology, 16(4), e1007099.

apply_RIPTiDe_sampling ¶

apply_RIPTiDe_sampling(
    model,
    rxn_expr_score: Dict[str, float],
    max_gw: float = None,
    max_inconsistency_score: float = 1000.0,
    obj_frac: float = 0.8,
    sampling_obj_frac: float = 0.8,
    do_sampling: bool = False,
    solver: str = "gurobi",
    sampling_method: str = "gapsplit",
    protected_rxns: Optional[List[str]] = None,
    protect_no_expr: bool = False,
    sampling_n: int = 500,
    keep_context: bool = False,
    rxn_scaling_coefs: Dict[str, float] = None,
    discard_inf_score=True,
    thinning=1,
    processes=1,
    seed=None,
    **kwargs
)

Apply the sampling step of the RIPTiDe algorithm or prepare for it.

This step uses reaction expression scores (RALs) to define an objective function maximizing flux through high-expression reactions. It can optionally perform flux sampling on the model constrained by this objective.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input metabolic model, typically the result of RIPTiDe pruning.	required
`rxn_expr_score`	`dict[str, float]`	Dictionary mapping reaction IDs to their expression scores (RALs). NaN values are ignored. Scores outside [-max_inconsistency_score, max_inconsistency_score] are capped unless `discard_inf_score` is True.	required
`max_gw`	`float`	Maximum possible reaction expression score (RAL). If None, it's calculated as the maximum finite value in `rxn_expr_score`. Defaults to None.	`None`
`max_inconsistency_score`	`float`	Value to cap reaction scores at (positive and negative) if `discard_inf_score` is False. Defaults to 1e3.	`1000.0`
`obj_frac`	`float`	Fraction of the optimal objective value (based on maximizing flux through high-RAL reactions) to use as a constraint if `keep_context` is True or during sampling setup. Defaults to 0.8.	`0.8`
`sampling_obj_frac`	`float`	Fraction of the optimal objective value to maintain during flux sampling (passed to the sampler). Defaults to 0.8.	`0.8`
`do_sampling`	`bool`	If True, perform flux sampling after setting up the objective and constraints. If False, only sets up the model context. Defaults to False.	`False`
`solver`	`str`	Solver to use for optimization and sampling (e.g., 'gurobi', 'cplex'). Defaults to "gurobi".	`'gurobi'`
`sampling_method`	`str`	Flux sampling algorithm to use ('achr', 'optgp', 'gapsplit'). Defaults to "gapsplit".	`'gapsplit'`
`protected_rxns`	`list[str]`	List of reaction IDs to assign the maximum weight in the objective, regardless of their RAL. Defaults to None.	`None`
`protect_no_expr`	`bool`	If True, assign maximum weight to reactions not present in `rxn_expr_score`. Defaults to False.	`False`
`sampling_n`	`int`	Number of flux samples to generate if `do_sampling` is True. Defaults to 500.	`500`
`keep_context`	`bool`	If True, modify the input `model` by adding the RIPTiDe objective and constraining it based on `obj_frac`. If False, modifications happen within a context manager only during sampling. Defaults to False.	`False`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients, used to adjust objective weights. Defaults to None (all coeffs 1).	`None`
`discard_inf_score`	`bool`	If True, treat infinite scores in `rxn_expr_score` as NaN (ignored). If False, cap them using `max_inconsistency_score`. Defaults to True.	`True`
`thinning`	`int`	Thinning factor for flux sampling (passed to sampler). Defaults to 1.	`1`
`processes`	`int`	Number of parallel processes for flux sampling. Defaults to 1.	`1`
`seed`	`int`	Random seed for flux sampling. Defaults to None.	`None`
`**kwargs`		Additional keyword arguments passed to the flux sampler.	`{}`

Returns:

Type	Description
`RIPTiDeSamplingAnalysis`	An object containing the results: - sampling_result (SamplingAnalysis or None): Results from flux sampling if `do_sampling` was True, otherwise None.

Raises:

Type	Description
`ValueError`	If `max_gw` is less than the maximum score in `rxn_expr_score`.

Notes

This function sets up the model for RIPTiDe-based flux analysis or sampling. The objective function maximizes flux weighted by scaled RALs. See: Jenior, M. L., et al. (2021). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS computational biology, 16(4), e1007099.

apply_EFlux ¶

apply_EFlux(
    model: Model,
    rxn_expr_score: Dict[str, float],
    max_ub: float = 1000,
    min_lb: float = 1e-06,
    min_score: float = -1000.0,
    protected_rxns: Union[str, List[str], None] = None,
    flux_threshold: float = 1e-06,
    remove_zero_fluxes: bool = False,
    return_fluxes: bool = True,
    transform: Union[Callable, str] = exp_x,
    rxn_scaling_coefs: Dict[str, float] = None,
    predefined_threshold=None,
) -> EFluxAnalysis

Apply the E-Flux algorithm to constrain model fluxes based on expression.

E-Flux uses reaction expression scores (e.g., derived from transcriptomics) to set reaction bounds. Scores are typically transformed and then linearly scaled to map the expression range [min_exp, max_exp] to the flux range [min_lb, max_ub]. This enforces higher flux capacity for highly expressed reactions. Parsimonious FBA (pFBA) is then run on the constrained model.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model.	required
`rxn_expr_score`	`dict[str, float]`	Dictionary mapping reaction IDs to their expression scores. NaN values are handled. Scores below `min_score` are capped.	required
`max_ub`	`float`	The maximum flux bound assigned to the reaction(s) with the highest (transformed) expression score. Defaults to 1000.	`1000`
`min_lb`	`float`	The minimum flux bound assigned to the reaction(s) with the lowest (transformed) expression score. Defaults to 1e-6.	`1e-06`
`min_score`	`float`	Minimum expression score to consider; scores below this are capped at this value before transformation and scaling. Defaults to -1e3.	`-1000.0`
`protected_rxns`	`str or list[str] or None`	Reaction ID(s) to exclude from bound constraints. Defaults to None.	`None`
`flux_threshold`	`float`	Flux threshold used when `remove_zero_fluxes` is True. Reactions with absolute pFBA flux below this are removed. Defaults to 1e-6.	`1e-06`
`remove_zero_fluxes`	`bool`	If True, remove reactions with pFBA flux below `flux_threshold` from the final model. Defaults to False.	`False`
`return_fluxes`	`bool`	If True, include the pFBA flux distribution in the result object. Defaults to True.	`True`
`transform`	`callable or str`	Function or name of a function (from `pipeGEM.utils.transform.functions` or `numpy`) to apply to expression scores before scaling (e.g., `exp_x`). Defaults to `exp_x`.	`exp_x`
`rxn_scaling_coefs`	`dict[str, float]`	Dictionary mapping reaction IDs to scaling coefficients. Applied after scaling expression to bounds (divides the calculated bound). Defaults to None (all coeffs 1).	`None`
`predefined_threshold`	`any`	This parameter is currently ignored by E-Flux. Defaults to None.	`None`

Returns:

Type	Description
`EFluxAnalysis`	An object containing the results: - rxn_bounds (dict): Dictionary of the final bounds applied to each reaction. - rxn_scores (dict): The input reaction expression scores. - flux_result (pd.DataFrame or None): pFBA flux distribution if `return_fluxes` is True. - result_model (cobra.Model): The model with E-Flux bounds applied (and potentially pruned if `remove_zero_fluxes` is True).

Raises:

Type	Description
`AssertionError`	If `max_ub` <= 0, `min_lb` < 0, `max_ub` <= `min_lb`, or `max_exp` <= 0.
`ValueError`	If the denominator used for scaling becomes non-finite (e.g., due to `transform` function behavior or `max_exp` == `min_exp`).

Notes

Based on the method described in: Colijn, C., Brandes, A., Zucker, J., Lun, D. S., Wienecke, A., Romaszko, J., ... & Ekins, S. (2009). Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS computational biology, 5(8), e1000489. (Though the implementation details like transformation and scaling might differ). Exchange reactions are typically excluded from bound setting.

apply_SPOT ¶

apply_SPOT(
    model: Model,
    rxn_expr_score: Dict[str, float],
    protected_rxns: Optional[Union[str, List[str]]] = None,
    obj_frac: float = 0.1,
    norm_ub: float = 10000.0,
    remove_zero_fluxes: bool = False,
    flux_threshold: float = 1e-06,
    return_fluxes: bool = True,
    keep_context: bool = False,
    rxn_scaling_coefs: Optional[Dict[str, float]] = None,
    predefined_threshold=None,
) -> SPOTAnalysis

Apply the SPOT algorithm to generate an expression-guided flux distribution.

SPOT (Simplified Phenotype Optimization Technique) finds a flux distribution that maximises the correlation between reaction fluxes and gene-expression scores while keeping the model's metabolic objective (e.g. biomass) at a user-specified fraction of its FBA-optimal value.

The optimisation problem solved is:

.. math::

\max \sum_i w_i \cdot v_i \\
\text{s.t.} \quad f_{\text{FBA}} \geq \texttt{obj\_frac} \cdot f^*_{\text{FBA}} \\
\sum_i (v_i^+ + v_i^-) \leq \texttt{norm\_ub} \\
v \in \text{FBA feasible region}

where :math:w_i = \texttt{rxn\_expr\_score}[i] \cdot \texttt{rxn\_scaling\_coefs}[i].

Parameters:

Name	Type	Description	Default
`model`	`Model`	The input genome-scale metabolic model with a defined objective function representing the required metabolic functionality.	required
`rxn_expr_score`	`dict[str, float]`	Mapping of reaction IDs to expression scores. `NaN` values are ignored.	required
`protected_rxns`	`str or list[str] or None`	Reaction IDs excluded from the SPOT objective (their expression scores do not contribute to the weighted sum). Defaults to None.	`None`
`obj_frac`	`float`	Fraction of the FBA-optimal objective value that must be maintained as a lower-bound constraint during SPOT optimisation. Set to 0 to omit the FBA constraint (free maximisation). Defaults to 0.1.	`0.1`
`norm_ub`	`float`	Upper bound for the L1 flux-sum constraint `Σ(v_i^+ + v_i^-) ≤ norm_ub`. Prevents the solver from exploiting highly-expressed reactions at unbounded flux. Defaults to 1e4.	`10000.0`
`remove_zero_fluxes`	`bool`	If `True`, build a `result_model` by removing reactions whose absolute flux in the SPOT solution is ≤ `flux_threshold`. Defaults to `False`.	`False`
`flux_threshold`	`float`	Flux cutoff used when `remove_zero_fluxes=True`. Defaults to 1e-6.	`1e-06`
`return_fluxes`	`bool`	If `True`, store the SPOT flux distribution in the result object. Defaults to `True`.	`True`
`keep_context`	`bool`	If `True`, the SPOT modifications (FBA constraint, norm constraint, SPOT objective) are applied permanently to the input model. If `False` (default), all modifications are made inside a context manager and reverted afterwards.	`False`
`rxn_scaling_coefs`	`dict[str, float] or None`	Per-reaction scaling coefficients that multiply the expression weights before forming the objective. Defaults to None (all 1.0).	`None`
`predefined_threshold`	`any`	Currently unused by SPOT; accepted for API consistency. Defaults to None.	`None`

Returns:

Type	Description
`SPOTAnalysis`	Object containing: `flux_result` (pandas.DataFrame or None) — SPOT flux distribution if `return_fluxes=True`. `result_model` (cobra.Model or None) — pruned model if `remove_zero_fluxes=True`, otherwise None. `rxn_scores` (dict) — the original `rxn_expr_score` input.

Notes

Based on: Becker, S. A., & Palsson, B. Ø. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4(5), e1000082. (SPOT is a variant of this family of methods.) The L1 norm constraint is implemented directly via the optlang API so that the function works with GLPK, CPLEX, and Gurobi without requiring any solver-specific imports.

apply_gecko_light ¶

apply_gecko_light(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    f_factor=0.5,
    ptot=0.5,
    copy_model=True,
    protected_rxns=None,
)

Apply simple kcat-based enzyme constraints (GECKO-light).

For every reaction in model that has associated kcat data, the upper bound is constrained to::

new_ub = kcat [1/s] * abundance [mmol/gDW] * sigma * 3600

where the factor 3600 converts from per-second to per-hour to match typical COBRA flux units (mmol / gDW / h). If absolute protein abundance is not provided, ptot * f_factor is used as a coarse fallback abundance scale so those parameters have a concrete effect.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The metabolic model to constrain.	required
`enzyme_data`	`EnzymeData`	Enzyme data aligned with the model (must have been `.align()`-ed).	required
`protein_abundance`	`ProteinAbundanceData`	Protein abundance data. If `None`, abundance is approximated as `ptot * f_factor` for every enzyme.	`None`
`sigma`	`float`	Average enzyme saturation factor (0 – 1). Default 0.5.	`0.5`
`f_factor`	`float`	Fraction of the proteome that is metabolic enzymes (0 to 1). Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.	`0.5`
`ptot`	`float`	Total protein content in g / gDW. Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.	`0.5`
`copy_model`	`bool`	If `True` (default), work on a deep-copy of model.	`True`
`protected_rxns`	`list of str`	Reaction IDs whose bounds should not be modified.	`None`

Returns:

Type	Description
`GECKOLightAnalysis`

apply_gecko_full ¶

apply_gecko_full(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    ptot=0.5,
    f_factor=0.5,
    copy_model=True,
    protected_rxns=None,
)

Build a full enzyme-constrained model (ecModel).

The GECKO formulation constrains total enzyme usage through a shared protein pool. Each enzyme-catalysed reaction draws from this pool in proportion to MW / kcat.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The metabolic model.	required
`enzyme_data`	`EnzymeData`	Enzyme data aligned with the model.	required
`protein_abundance`	`ProteinAbundanceData`	Protein abundance data (currently used for logging only; the pool constraint implicitly limits usage).	`None`
`sigma`	`float`	Average enzyme saturation factor (0 – 1).	`0.5`
`ptot`	`float`	Total protein content in g / gDW.	`0.5`
`f_factor`	`float`	Fraction of the proteome that is metabolic enzymes (0 – 1).	`0.5`
`copy_model`	`bool`	Work on a deep-copy of model (default `True`).	`True`
`protected_rxns`	`list of str`	Reaction IDs whose bounds should not be modified.	`None`

Returns:

Type	Description
`GECKOFullAnalysis`

auto_parameterize ¶

auto_parameterize(
    model,
    enzyme_data: EnzymeData,
    kcat_source: Literal[
        "brenda", "sabio-rk", "manual"
    ] = "manual",
    fill_missing: Literal[
        "median", "geometric_mean", "dlkcat"
    ] = "median",
    organism: str = "human",
    metabolite_data=None,
    device: str = "cpu",
) -> EnzymeData

Automated parameter collection and estimation pipeline.

Steps

(Optional) Fetch kcat values from a database (BRENDA, SABIO-RK).
Match to model reactions via EC numbers.
Fill missing kcat values using the specified strategy.
Optionally use DLKcat for prediction of remaining missing values.
Return the enriched :class:EnzymeData.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The metabolic model.	required
`enzyme_data`	`EnzymeData`	Existing enzyme data (may have missing kcat values).	required
`kcat_source`	`str`	Source for kcat values: `"brenda"`, `"sabio-rk"`, or `"manual"` (use only what is already in enzyme_data).	`'manual'`
`fill_missing`	`str`	Strategy to fill missing kcat values: `"median"` — use the median of available kcats, `"geometric_mean"` — use the geometric mean, `"dlkcat"` — use DLKcat deep-learning prediction.	`'median'`
`organism`	`str`	Organism name (used for database queries).	`'human'`
`metabolite_data`	`MetaboliteData`	Metabolite data with SMILES (required when fill_missing is `"dlkcat"`).	`None`
`device`	`str`	Device for DLKcat (`"cpu"` or `"cuda"`).	`'cpu'`

Returns:

Type	Description
`EnzymeData`	The enriched enzyme data with filled kcat values.

Enzyme-constrained integration¶

apply_gecko_light ¶

apply_gecko_light(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    f_factor=0.5,
    ptot=0.5,
    copy_model=True,
    protected_rxns=None,
)

Apply simple kcat-based enzyme constraints (GECKO-light).

For every reaction in model that has associated kcat data, the upper bound is constrained to::

new_ub = kcat [1/s] * abundance [mmol/gDW] * sigma * 3600

where the factor 3600 converts from per-second to per-hour to match typical COBRA flux units (mmol / gDW / h). If absolute protein abundance is not provided, ptot * f_factor is used as a coarse fallback abundance scale so those parameters have a concrete effect.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The metabolic model to constrain.	required
`enzyme_data`	`EnzymeData`	Enzyme data aligned with the model (must have been `.align()`-ed).	required
`protein_abundance`	`ProteinAbundanceData`	Protein abundance data. If `None`, abundance is approximated as `ptot * f_factor` for every enzyme.	`None`
`sigma`	`float`	Average enzyme saturation factor (0 – 1). Default 0.5.	`0.5`
`f_factor`	`float`	Fraction of the proteome that is metabolic enzymes (0 to 1). Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.	`0.5`
`ptot`	`float`	Total protein content in g / gDW. Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5.	`0.5`
`copy_model`	`bool`	If `True` (default), work on a deep-copy of model.	`True`
`protected_rxns`	`list of str`	Reaction IDs whose bounds should not be modified.	`None`

Returns:

Type	Description
`GECKOLightAnalysis`

apply_gecko_full ¶

apply_gecko_full(
    model,
    enzyme_data,
    protein_abundance=None,
    sigma=0.5,
    ptot=0.5,
    f_factor=0.5,
    copy_model=True,
    protected_rxns=None,
)

Build a full enzyme-constrained model (ecModel).

The GECKO formulation constrains total enzyme usage through a shared protein pool. Each enzyme-catalysed reaction draws from this pool in proportion to MW / kcat.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The metabolic model.	required
`enzyme_data`	`EnzymeData`	Enzyme data aligned with the model.	required
`protein_abundance`	`ProteinAbundanceData`	Protein abundance data (currently used for logging only; the pool constraint implicitly limits usage).	`None`
`sigma`	`float`	Average enzyme saturation factor (0 – 1).	`0.5`
`ptot`	`float`	Total protein content in g / gDW.	`0.5`
`f_factor`	`float`	Fraction of the proteome that is metabolic enzymes (0 – 1).	`0.5`
`copy_model`	`bool`	Work on a deep-copy of model (default `True`).	`True`
`protected_rxns`	`list of str`	Reaction IDs whose bounds should not be modified.	`None`

Returns:

Type	Description
`GECKOFullAnalysis`

auto_parameterize ¶

auto_parameterize(
    model,
    enzyme_data: EnzymeData,
    kcat_source: Literal[
        "brenda", "sabio-rk", "manual"
    ] = "manual",
    fill_missing: Literal[
        "median", "geometric_mean", "dlkcat"
    ] = "median",
    organism: str = "human",
    metabolite_data=None,
    device: str = "cpu",
) -> EnzymeData

Automated parameter collection and estimation pipeline.

Steps

(Optional) Fetch kcat values from a database (BRENDA, SABIO-RK).
Match to model reactions via EC numbers.
Fill missing kcat values using the specified strategy.
Optionally use DLKcat for prediction of remaining missing values.
Return the enriched :class:EnzymeData.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The metabolic model.	required
`enzyme_data`	`EnzymeData`	Existing enzyme data (may have missing kcat values).	required
`kcat_source`	`str`	Source for kcat values: `"brenda"`, `"sabio-rk"`, or `"manual"` (use only what is already in enzyme_data).	`'manual'`
`fill_missing`	`str`	Strategy to fill missing kcat values: `"median"` — use the median of available kcats, `"geometric_mean"` — use the geometric mean, `"dlkcat"` — use DLKcat deep-learning prediction.	`'median'`
`organism`	`str`	Organism name (used for database queries).	`'human'`
`metabolite_data`	`MetaboliteData`	Metabolite data with SMILES (required when fill_missing is `"dlkcat"`).	`None`
`device`	`str`	Device for DLKcat (`"cpu"` or `"cuda"`).	`'cpu'`

Returns:

Type	Description
`EnzymeData`	The enriched enzyme data with filled kcat values.