Integration¶
Core integration¶
GIMME ¶
GIMME()
Bases: RemovableGeneDataIntegrator
integrate ¶
integrate(model, data, **kwargs)
Integrate the given data with the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
The model to be integrated with the data |
required | |
data
|
Gene data used to determine the objective function of GIMME |
required | |
kwargs
|
Keyword arguments passed to apply_GIMME |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
result |
GIMMEAnalysis
|
|
apply_FASTCORE ¶
apply_FASTCORE(
C: Union[List[str], Set[str]],
nonP: Union[List[str], Set[str]],
model: Model,
epsilon: float,
return_model: bool,
copy_model: bool = True,
raise_err: bool = True,
rxn_scaling_coefs: dict = None,
calc_efficacy: bool = True,
) -> FASTCOREAnalysis
Apply the FASTCORE algorithm to extract a flux-consistent subnetwork.
FASTCORE identifies a minimal set of reactions from a given metabolic model that includes a defined set of core reactions (C) while ensuring flux consistency. Non-penalty reactions (nonP) can be included without affecting the objective function during the sparse mode search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
C
|
list[str] or set[str]
|
Core reaction IDs that must be included and carry flux. |
required |
nonP
|
list[str] or set[str]
|
Non-penalty reaction IDs. Included if needed but not prioritized. |
required |
model
|
Model
|
Input genome-scale metabolic model. |
required |
epsilon
|
float
|
Tolerance threshold for flux consistency checks. Flux values below
this are considered zero. Adjusted per reaction if |
required |
return_model
|
bool
|
If True, return the extracted subnetwork as a cobra.Model object. |
required |
copy_model
|
bool
|
If True (default), operate on a copy of the input model. If False, modify the input model directly. |
True
|
raise_err
|
bool
|
If True (default), raise ValueError on inconsistency. If False, warn and potentially remove problematic core reactions. |
True
|
rxn_scaling_coefs
|
dict
|
Mapping of reaction IDs to scaling coefficients to adjust |
None
|
calc_efficacy
|
bool
|
If True (default), calculate efficacy metrics. |
True
|
Returns:
| Type | Description |
|---|---|
FASTCOREAnalysis
|
An object containing the results:
- result_model (cobra.Model, optional): Extracted subnetwork (if |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
Based on the algorithm described in: Vlassis, N., Pacheco, M. P., & Sauter, T. (2014). Fast reconstruction of compact context-specific metabolic network models. PLoS computational biology, 10(1), e1003424.
apply_CORDA ¶
apply_CORDA(
model,
data,
protected_rxns=None,
predefined_threshold=None,
threshold_kws=None,
rxn_scaling_coefs=None,
discrete_strategy_name: str = "linear",
n_iters=np.inf,
penalty_factor=100,
penalty_increase_factor=1.1,
keep_if_support=5,
met_prod=None,
upper_bound=1000000.0,
threshold=1e-06,
support_flux_value=1,
skip_last_step=True,
) -> CORDA_Analysis
Apply the CORDA algorithm to generate a context-specific metabolic model.
Orchestrates the CORDA process: 1. Prepare model and data (thresholds, confidence scores, protected reactions). 2. Initialize and run CORDABuilder. 3. Calculate efficacy metrics. 4. Return results in a CORDA_Analysis object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
Input genome-scale metabolic model. |
required |
data
|
object
|
Object with reaction scores ( |
required |
protected_rxns
|
list[str]
|
Reaction IDs to force into the core set (confidence 3). Defaults to None. |
None
|
predefined_threshold
|
str
|
Name of predefined thresholding strategy (e.g., 'percentile_90').
Requires |
None
|
threshold_kws
|
dict
|
Additional keyword arguments for the thresholding function
(used with |
None
|
rxn_scaling_coefs
|
dict
|
Mapping of reaction IDs to scaling coefficients to adjust flux thresholds. Defaults to None. |
None
|
discrete_strategy_name
|
str
|
Strategy to convert continuous scores to discrete confidence levels ('linear'). Defaults to "linear". |
'linear'
|
n_iters
|
int
|
Max iterations for finding support reactions in CORDABuilder. Defaults to infinity. |
inf
|
penalty_factor
|
float
|
Initial penalty factor in CORDABuilder. Defaults to 100. |
100
|
penalty_increase_factor
|
float
|
Penalty increase factor in CORDABuilder. Defaults to 1.1. |
1.1
|
keep_if_support
|
int
|
Support threshold for elevating medium confidence in CORDABuilder. Defaults to 5. |
5
|
met_prod
|
list[str]
|
Metabolite IDs for which mock production reactions should be added and forced into the core set. Defaults to None. |
None
|
upper_bound
|
float
|
High upper bound for reactions during optimization. Defaults to 1e6. |
1000000.0
|
threshold
|
float
|
Flux threshold below which flux is considered zero. Defaults to 1e-6. |
1e-06
|
support_flux_value
|
float or dict
|
Minimum flux required for support reactions in CORDABuilder. Defaults to 1. |
1
|
skip_last_step
|
bool
|
Whether to skip the final refinement step in CORDABuilder. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
CORDA_Analysis
|
Object containing results: context-specific model, confidence scores, removed reactions, efficacy metrics, and logs. |
Note
Original paper: Schultz, A., & Qutub, A. A. (2016). Reconstruction of tissue-specific metabolic networks using CORDA. PLoS computational biology, 12(3), e1004808.
apply_rFASTCORMICS ¶
apply_rFASTCORMICS(
model: Model,
data,
protected_rxns: List[str] = None,
predefined_threshold: Optional[
Union[dict, analysis_types]
] = None,
threshold_kws: dict = None,
rxn_scaling_coefs: dict = None,
consistent_checking_method: Literal[
"FASTCC", "FVA"
] = "FASTCC",
unpenalized_subsystem: Union[
str, List[str]
] = "Transport.*",
method: str = "onestep",
threshold: float = 1e-06,
FASTCORE_raise_error: bool = False,
calc_efficacy: bool = True,
) -> rFASTCORMICSAnalysis
Apply the rFASTCORMICS algorithm to build a context-specific model.
Leverages expression data to define core/non-core reaction sets and uses FASTCORE to extract a consistent subnetwork. Optionally includes model consistency checking and handling of protected reactions and unpenalized subsystems.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
Input genome-scale metabolic model. |
required |
data
|
object
|
Object with gene expression data ( |
required |
protected_rxns
|
list[str]
|
Reaction IDs always included in the core set. Defaults to None. |
None
|
predefined_threshold
|
dict or analysis_types
|
Strategy or dictionary defining thresholds to classify reactions based
on scores (e.g., 'percentile_90'). See
|
None
|
threshold_kws
|
dict
|
Additional keyword arguments for the thresholding function. Defaults to None. |
None
|
rxn_scaling_coefs
|
dict
|
Mapping of reaction IDs to scaling coefficients to adjust flux thresholds in FASTCORE. Defaults to None. |
None
|
consistent_checking_method
|
(FASTCC, FVA)
|
Method to ensure initial model consistency ('FASTCC' or 'FVA'). Set to None to skip. Defaults to "FASTCC". |
'FASTCC'
|
unpenalized_subsystem
|
str or list[str]
|
Subsystem name(s) (regex allowed) included in the non-penalty set (nonP) during FASTCORE. Defaults to "Transport.*". |
'Transport.*'
|
method
|
(onestep, twostep)
|
rFASTCORMICS variant: - 'onestep': Run FASTCORE once with core and non-penalty sets. - 'twostep': Run FASTCORE on protected reactions, refine, run again on expanded core set. (May need validation). Defaults to "onestep". |
'onestep'
|
threshold
|
float
|
Flux threshold below which flux is considered zero. Defaults to 1e-6. |
1e-06
|
FASTCORE_raise_error
|
bool
|
If True, FASTCORE raises error on inconsistency. If False, warns. Defaults to False. |
False
|
calc_efficacy
|
bool
|
If True, calculate efficacy metrics based on expression-defined sets. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
rFASTCORMICSAnalysis
|
Object containing results: context-specific model (in nested FASTCORE result), core/non-core sets, thresholding analysis, efficacy metrics. |
Notes
Original paper: Pacheco, M. P., Bintener, T., Ternes, D., Kulms, D., Haan, S., Letellier, E., & Sauter, T. (2019). Identifying and targeting cancer-specific metabolism with network-based drug target prediction. EBioMedicine, 43, 98-106.
apply_iMAT ¶
apply_iMAT(
model,
data,
predefined_threshold,
threshold_kws: dict,
protected_rxns=None,
rxn_scaling_coefs=None,
eps=1e-06,
tol=1e-06,
use_gurobi=False,
) -> iMAT_Analysis
Apply the iMAT algorithm to generate a context-specific metabolic model.
iMAT (integrative Metabolic Analysis Tool) uses gene expression data to classify reactions into high-confidence (core) and low-confidence (non-core) sets. It then solves a mixed-integer linear programming (MILP) problem to find a flux distribution that maximizes activity through core reactions while minimizing activity through non-core reactions. Reactions with near-zero flux in the optimal solution are removed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model. |
required |
data
|
object
|
An object containing gene expression data ( |
required |
predefined_threshold
|
dict or analysis_types
|
Strategy or dictionary defining thresholds ( |
required |
threshold_kws
|
dict
|
Additional keyword arguments for the thresholding function. |
required |
protected_rxns
|
list[str]
|
A list of reaction IDs that should always be treated as high-confidence (core) and potentially weighted higher in the objective. Defaults to None. |
None
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients. Currently unused in the main logic but potentially used for tolerance adjustment. Defaults to None. |
None
|
eps
|
float
|
Small flux value used in constraints to enforce activity through core reactions selected by the MILP. Defaults to 1e-6. |
1e-06
|
tol
|
float
|
Flux tolerance threshold. Reactions with absolute flux below this value in the MILP solution are removed from the final model. Defaults to 1e-6. |
1e-06
|
use_gurobi
|
bool
|
If True, use Gurobi-specific indicator constraints for potentially better performance. Requires Gurobi solver. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
iMAT_Analysis
|
An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - threshold_analysis (ThresholdAnalysis): Details of thresholding used. |
Notes
Based on the algorithm described in: Shlomi, T., Cabili, M. N., Herrgård, M. J., Palsson, B. Ø., & Ruppin, E. (2008). Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26(9), 1003-1010. The implementation uses binary indicator variables to control reaction activity.
apply_mCADRE ¶
apply_mCADRE(
model,
data,
protected_rxns,
predefined_threshold=None,
threshold_kws: dict = None,
rxn_scaling_coefs: dict = None,
exp_cutoff: float = 0.9,
absent_value: float = 0,
absent_value_indicator: float = -1e-06,
tol=1e-06,
eta=0.333,
evidence_scores: Union[
Dict[str, Union[int, float]], Series
] = None,
salvage_check_tasks=None,
default_salv_test=False,
func_test_tasks=None,
required_met_ids=None,
default_func_test=False,
) -> mCADRE_Analysis
Apply the mCADRE algorithm to generate a context-specific metabolic model.
mCADRE (metabolic Context-specificity Assessed by Deterministic Reaction Evaluation) builds context-specific models by iteratively removing reactions based on expression data, network connectivity, and optional evidence scores, while ensuring the model can still perform essential metabolic functions (tasks).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model. |
required |
data
|
object
|
An object containing gene expression data ( |
required |
protected_rxns
|
list[str]
|
A list of reaction IDs that should never be removed from the model. |
required |
predefined_threshold
|
dict or analysis_types
|
Strategy or dictionary defining thresholds ( |
None
|
threshold_kws
|
dict
|
Additional keyword arguments for the thresholding function. Defaults to None. |
None
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients, used to adjust consistency check tolerance. Defaults to None. |
None
|
exp_cutoff
|
float
|
Expression score threshold (after mapping to [0, 1]) above which a reaction is considered part of the initial 'core' set. Defaults to 0.9. |
0.9
|
absent_value
|
float
|
The raw score in |
0
|
absent_value_indicator
|
float
|
The internal score assigned to absent reactions after mapping. Should be less than 0. Defaults to -1e-6. |
-1e-06
|
tol
|
float
|
Tolerance used for consistency checks (e.g., FASTCC). Defaults to 1e-6. |
1e-06
|
eta
|
float
|
Weighting factor used in the consistency check stopping criteria when evaluating removal of medium-confidence reactions. Represents the trade-off between removing non-core reactions and keeping core reactions. Defaults to 0.333. |
0.333
|
evidence_scores
|
dict[str, Union[int, float]] or Series
|
Additional evidence scores for reactions (e.g., from literature, proteomics). Higher scores favor keeping the reaction. Defaults to None (all zero). |
None
|
salvage_check_tasks
|
TaskContainer or str
|
Metabolic tasks (e.g., salvage pathways) that the final model must be able to perform. Can be a TaskContainer object or a path to a task file. Defaults to None. |
None
|
default_salv_test
|
bool
|
If True, use predefined default salvage pathway tasks (Guanine -> GMP, Hypoxanthine -> IMP). Defaults to False. |
False
|
func_test_tasks
|
TaskContainer or str
|
General metabolic function tasks that the final model must be able to perform. Defaults to None. |
None
|
required_met_ids
|
list[str]
|
List of metabolite IDs that the model must be able to produce (used if
|
None
|
default_func_test
|
bool
|
If True and |
False
|
Returns:
| Type | Description |
|---|---|
mCADRE_Analysis
|
An object containing the results: - result_model (cobra.Model): The final context-specific model. - removed_rxn_ids (np.ndarray): IDs of removed reactions. - core_rxn_ids (np.ndarray): IDs of reactions initially defined as core. - non_expressed_rxn_ids (np.ndarray): IDs of reactions initially defined as non-expressed. - score_df (pd.DataFrame): DataFrame with expression, connectivity, and evidence scores. - salvage_test_result (TaskAnalysis or None): Results of salvage pathway tests. - func_test_result (TaskAnalysis or None): Results of functional tests. - threshold_analysis (ThresholdAnalysis): Details of thresholding used. - algo_efficacy (float): Efficacy score (e.g., F1) comparing the final model against the initial core/non-core sets. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the initial model fails any of the provided functional or salvage tests. |
Notes
Based on the algorithm described in: Wang, Y., Eddy, J. A., & Price, N. D. (2012). Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC systems biology, 6, 1-16.
apply_INIT ¶
apply_INIT(
model,
data,
predefined_threshold,
threshold_kws: dict,
protected_rxns=None,
eps=1e-06,
tol=1e-06,
weight_method: Literal[
"default", "threshold"
] = "threshold",
rxn_scaling_coefs: dict = None,
) -> INIT_Analysis
Apply the INIT algorithm to generate a context-specific metabolic model.
INIT (Integrative Network Inference for Tissues) uses expression data to assign weights to reactions. It then solves a mixed-integer linear programming (MILP) problem, similar to iMAT, to find a flux distribution that maximizes the sum of weights for active reactions. Reactions with near-zero flux in the optimal solution are removed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model. |
required |
data
|
object
|
An object containing gene expression data ( |
required |
predefined_threshold
|
dict or analysis_types
|
Strategy or dictionary defining thresholds ( |
required |
threshold_kws
|
dict
|
Additional keyword arguments for the thresholding function if
|
required |
protected_rxns
|
list[str]
|
A list of reaction IDs that should always be treated as core reactions and potentially assigned a high weight. Defaults to None. |
None
|
eps
|
float
|
Small flux value used in constraints to enforce activity through core reactions selected by the MILP (inherited from iMAT constraints). Defaults to 1e-6. |
1e-06
|
tol
|
float
|
Flux tolerance threshold. Reactions with absolute flux below this value in the MILP solution are removed from the final model. Defaults to 1e-6. |
1e-06
|
weight_method
|
(default, threshold)
|
Method to calculate reaction weights from scores:
- 'default': Uses 5 * log(score).
- 'threshold': Uses linear interpolation based on |
'default'
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients. Currently unused in the main logic but potentially used for tolerance adjustment. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
INIT_Analysis
|
An object containing the results:
- result_model (cobra.Model): The final context-specific model.
- removed_rxn_ids (np.ndarray): IDs of removed reactions.
- threshold_analysis (ThresholdAnalysis or None): Details of thresholding
used if |
Notes
Based on the algorithm described in: Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., & Nielsen, J. (2012). Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT. PLoS computational biology, 8(5), e1002518. The implementation leverages the MILP formulation structure from iMAT.
apply_MBA ¶
apply_MBA(
model,
data=None,
predefined_threshold=None,
threshold_kws: dict = None,
protected_rxns=None,
rxn_scaling_coefs: dict = None,
medium_conf_rxn_ids=None,
high_conf_rxn_ids=None,
consistent_checking_method: str = "FASTCC",
tolerance: float = 1e-08,
epsilon: float = 0.33,
random_state: int = 42,
)
Apply the Model Building Algorithm (MBA) to generate a context-specific model.
MBA iteratively removes reactions with no confidence score ('no-confidence' set) based on consistency checks, while preserving high-confidence reactions and minimizing the removal of medium-confidence reactions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model. |
required |
data
|
object
|
An object containing gene expression data ( |
None
|
predefined_threshold
|
dict or analysis_types
|
Strategy or dictionary defining thresholds ( |
None
|
threshold_kws
|
dict
|
Additional keyword arguments for the thresholding function when |
None
|
protected_rxns
|
list[str]
|
A list of reaction IDs that should always be treated as high-confidence and never removed. Defaults to None. |
None
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients, used to adjust consistency check tolerance. Defaults to None. |
None
|
medium_conf_rxn_ids
|
list[str]
|
List of reaction IDs considered medium confidence. Used only if |
None
|
high_conf_rxn_ids
|
list[str]
|
List of reaction IDs considered high confidence. Used only if |
None
|
consistent_checking_method
|
str
|
Method used for consistency checks (e.g., 'FASTCC'). Defaults to "FASTCC". |
'FASTCC'
|
tolerance
|
float
|
Tolerance used for consistency checks. Defaults to 1e-8. |
1e-08
|
epsilon
|
float
|
Weighting factor used in the consistency check stopping criteria. Represents the maximum allowed ratio of removed medium-confidence reactions to removed no-confidence reactions during the removal check of a no-confidence reaction. Defaults to 0.33. |
0.33
|
random_state
|
int
|
Seed for the random number generator used to shuffle the order of no-confidence reactions being tested for removal. Defaults to 42. |
42
|
Returns:
| Type | Description |
|---|---|
MBA_Analysis
|
An object containing the results:
- result_model (cobra.Model): The final context-specific model.
- removed_rxn_ids (np.ndarray): IDs of removed reactions.
- threshold_analysis (ThresholdAnalysis or None): Details of thresholding
used if |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If |
Notes
Based on the algorithm described in: Jerby, L., Shlomi, T., & Ruppin, E. (2010). Computational reconstruction of tissue-specific metabolic models: application to human tissues. Molecular systems biology, 6(1), 401.
apply_GIMME ¶
apply_GIMME(
model: Model,
rxn_expr_score: Dict[str, float],
high_exp: float,
protected_rxns=None,
obj_frac: float = 0.8,
remove_zero_fluxes: bool = False,
flux_threshold: float = 1e-06,
max_inconsistency_score=1000.0,
return_fluxes: bool = True,
keep_context: bool = False,
rxn_scaling_coefs: dict = None,
predefined_threshold=None,
)
Apply the GIMME algorithm to generate a context-specific metabolic model.
GIMME (Gene Inactivity Moderated by Metabolism and Expression) assumes that cellular metabolism aims to achieve a required metabolic functionality (defined by the model's objective function) with minimal deviation from a reference expression state. It minimizes the flux through reactions with expression below a threshold, subject to maintaining a certain level of the original objective function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model with a defined objective function representing the required metabolic functionality. |
required |
rxn_expr_score
|
dict[str, float]
|
Dictionary mapping reaction IDs to their expression scores. NaN values are ignored. |
required |
high_exp
|
float
|
Expression score threshold. Reactions with scores below this threshold are penalized in the GIMME objective function. |
required |
protected_rxns
|
list[str]
|
List of reaction IDs that should not be penalized, even if their
expression is below |
None
|
obj_frac
|
float
|
Fraction of the original model's optimal objective value that must be maintained by the GIMME solution. Defaults to 0.8. |
0.8
|
remove_zero_fluxes
|
bool
|
If True, create a |
False
|
flux_threshold
|
float
|
Flux threshold used when |
1e-06
|
max_inconsistency_score
|
float
|
Value to cap the penalty applied to low-expression reactions to handle potential numerical issues with very low scores. Defaults to 1e3. |
1000.0
|
return_fluxes
|
bool
|
If True, include the GIMME flux distribution in the result object. Defaults to True. |
True
|
keep_context
|
bool
|
If True, modify the input |
False
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients, used to adjust
objective weights and the removal |
None
|
predefined_threshold
|
any
|
This parameter is currently ignored by GIMME. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
GIMMEAnalysis
|
An object containing the results:
- rxn_coefficients (dict): Dictionary of objective coefficients (penalties)
applied to low-expression reactions.
- rxn_scores (dict): The input reaction expression scores.
- flux_result (pd.DataFrame or None): GIMME flux distribution if
|
Notes
Based on the algorithm described in: Becker, S. A., & Palsson, B. Ø. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4(5), e1000082. The objective function minimizes the sum of fluxes weighted by (high_exp - score) for reactions with score < high_exp.
apply_RIPTiDe_pruning ¶
apply_RIPTiDe_pruning(
model,
rxn_expr_score: Dict[str, float],
max_gw: float = None,
obj_frac: float = 0.8,
threshold: float = 1e-06,
protected_rxns=None,
max_inconsistency_score=1000.0,
rxn_scaling_coefs: Dict[str, float] = None,
**kwargs
)
Apply the pruning step of the RIPTiDe algorithm.
This step uses parsimonious Flux Balance Analysis (pFBA) with weights derived from reaction expression scores (or RALs - Reaction Activity Levels) to identify and remove low-flux reactions, creating a pruned, context-specific model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model. |
required |
rxn_expr_score
|
dict[str, float]
|
Dictionary mapping reaction IDs to their expression scores (RALs). NaN values are ignored. Scores outside [-max_inconsistency_score, max_inconsistency_score] are capped. |
required |
max_gw
|
float
|
Maximum possible reaction expression score (RAL). If None, it's
calculated as the maximum finite value in |
None
|
obj_frac
|
float
|
Fraction of the optimal objective value to maintain when minimizing fluxes during pFBA. Defaults to 0.8. |
0.8
|
threshold
|
float
|
Flux threshold below which reactions are considered inactive and
removed. Adjusted by |
1e-06
|
protected_rxns
|
list[str]
|
List of reaction IDs that should not be removed, even if their flux is below the threshold. Defaults to None. |
None
|
max_inconsistency_score
|
float
|
Value to cap reaction scores at (positive and negative) to handle extreme outliers. Defaults to 1e3. |
1000.0
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients. Used to adjust
pFBA weights and the removal |
None
|
**kwargs
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Type | Description |
|---|---|
RIPTiDePruningAnalysis
|
An object containing the results: - result_model (cobra.Model): The pruned context-specific model. - removed_rxn_ids (list[str]): List of IDs of removed reactions. - obj_dict (dict[str, float]): Dictionary of weights used in pFBA. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
RIPTiDe (Reaction Inclusion by Parsimony and Transcript Distribution) aims to create context-specific models reflecting metabolic activity based on transcriptomic data. This pruning step is the first part. Original paper: Jenior, M. L., et al. (2021). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS computational biology, 16(4), e1007099.
apply_RIPTiDe_sampling ¶
apply_RIPTiDe_sampling(
model,
rxn_expr_score: Dict[str, float],
max_gw: float = None,
max_inconsistency_score: float = 1000.0,
obj_frac: float = 0.8,
sampling_obj_frac: float = 0.8,
do_sampling: bool = False,
solver: str = "gurobi",
sampling_method: str = "gapsplit",
protected_rxns: Optional[List[str]] = None,
protect_no_expr: bool = False,
sampling_n: int = 500,
keep_context: bool = False,
rxn_scaling_coefs: Dict[str, float] = None,
discard_inf_score=True,
thinning=1,
processes=1,
seed=None,
**kwargs
)
Apply the sampling step of the RIPTiDe algorithm or prepare for it.
This step uses reaction expression scores (RALs) to define an objective function maximizing flux through high-expression reactions. It can optionally perform flux sampling on the model constrained by this objective.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input metabolic model, typically the result of RIPTiDe pruning. |
required |
rxn_expr_score
|
dict[str, float]
|
Dictionary mapping reaction IDs to their expression scores (RALs).
NaN values are ignored. Scores outside [-max_inconsistency_score,
max_inconsistency_score] are capped unless |
required |
max_gw
|
float
|
Maximum possible reaction expression score (RAL). If None, it's
calculated as the maximum finite value in |
None
|
max_inconsistency_score
|
float
|
Value to cap reaction scores at (positive and negative) if
|
1000.0
|
obj_frac
|
float
|
Fraction of the optimal objective value (based on maximizing flux
through high-RAL reactions) to use as a constraint if |
0.8
|
sampling_obj_frac
|
float
|
Fraction of the optimal objective value to maintain during flux sampling (passed to the sampler). Defaults to 0.8. |
0.8
|
do_sampling
|
bool
|
If True, perform flux sampling after setting up the objective and constraints. If False, only sets up the model context. Defaults to False. |
False
|
solver
|
str
|
Solver to use for optimization and sampling (e.g., 'gurobi', 'cplex'). Defaults to "gurobi". |
'gurobi'
|
sampling_method
|
str
|
Flux sampling algorithm to use ('achr', 'optgp', 'gapsplit'). Defaults to "gapsplit". |
'gapsplit'
|
protected_rxns
|
list[str]
|
List of reaction IDs to assign the maximum weight in the objective, regardless of their RAL. Defaults to None. |
None
|
protect_no_expr
|
bool
|
If True, assign maximum weight to reactions not present in
|
False
|
sampling_n
|
int
|
Number of flux samples to generate if |
500
|
keep_context
|
bool
|
If True, modify the input |
False
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients, used to adjust objective weights. Defaults to None (all coeffs 1). |
None
|
discard_inf_score
|
bool
|
If True, treat infinite scores in |
True
|
thinning
|
int
|
Thinning factor for flux sampling (passed to sampler). Defaults to 1. |
1
|
processes
|
int
|
Number of parallel processes for flux sampling. Defaults to 1. |
1
|
seed
|
int
|
Random seed for flux sampling. Defaults to None. |
None
|
**kwargs
|
Additional keyword arguments passed to the flux sampler. |
{}
|
Returns:
| Type | Description |
|---|---|
RIPTiDeSamplingAnalysis
|
An object containing the results:
- sampling_result (SamplingAnalysis or None): Results from flux sampling
if |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
This function sets up the model for RIPTiDe-based flux analysis or sampling. The objective function maximizes flux weighted by scaled RALs. See: Jenior, M. L., et al. (2021). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS computational biology, 16(4), e1007099.
apply_EFlux ¶
apply_EFlux(
model: Model,
rxn_expr_score: Dict[str, float],
max_ub: float = 1000,
min_lb: float = 1e-06,
min_score: float = -1000.0,
protected_rxns: Union[str, List[str], None] = None,
flux_threshold: float = 1e-06,
remove_zero_fluxes: bool = False,
return_fluxes: bool = True,
transform: Union[Callable, str] = exp_x,
rxn_scaling_coefs: Dict[str, float] = None,
predefined_threshold=None,
) -> EFluxAnalysis
Apply the E-Flux algorithm to constrain model fluxes based on expression.
E-Flux uses reaction expression scores (e.g., derived from transcriptomics) to set reaction bounds. Scores are typically transformed and then linearly scaled to map the expression range [min_exp, max_exp] to the flux range [min_lb, max_ub]. This enforces higher flux capacity for highly expressed reactions. Parsimonious FBA (pFBA) is then run on the constrained model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model. |
required |
rxn_expr_score
|
dict[str, float]
|
Dictionary mapping reaction IDs to their expression scores. NaN values
are handled. Scores below |
required |
max_ub
|
float
|
The maximum flux bound assigned to the reaction(s) with the highest (transformed) expression score. Defaults to 1000. |
1000
|
min_lb
|
float
|
The minimum flux bound assigned to the reaction(s) with the lowest (transformed) expression score. Defaults to 1e-6. |
1e-06
|
min_score
|
float
|
Minimum expression score to consider; scores below this are capped at this value before transformation and scaling. Defaults to -1e3. |
-1000.0
|
protected_rxns
|
str or list[str] or None
|
Reaction ID(s) to exclude from bound constraints. Defaults to None. |
None
|
flux_threshold
|
float
|
Flux threshold used when |
1e-06
|
remove_zero_fluxes
|
bool
|
If True, remove reactions with pFBA flux below |
False
|
return_fluxes
|
bool
|
If True, include the pFBA flux distribution in the result object. Defaults to True. |
True
|
transform
|
callable or str
|
Function or name of a function (from |
exp_x
|
rxn_scaling_coefs
|
dict[str, float]
|
Dictionary mapping reaction IDs to scaling coefficients. Applied after scaling expression to bounds (divides the calculated bound). Defaults to None (all coeffs 1). |
None
|
predefined_threshold
|
any
|
This parameter is currently ignored by E-Flux. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
EFluxAnalysis
|
An object containing the results:
- rxn_bounds (dict): Dictionary of the final bounds applied to each reaction.
- rxn_scores (dict): The input reaction expression scores.
- flux_result (pd.DataFrame or None): pFBA flux distribution if |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If |
ValueError
|
If the denominator used for scaling becomes non-finite (e.g., due to
|
Notes
Based on the method described in: Colijn, C., Brandes, A., Zucker, J., Lun, D. S., Wienecke, A., Romaszko, J., ... & Ekins, S. (2009). Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS computational biology, 5(8), e1000489. (Though the implementation details like transformation and scaling might differ). Exchange reactions are typically excluded from bound setting.
apply_SPOT ¶
apply_SPOT(
model: Model,
rxn_expr_score: Dict[str, float],
protected_rxns: Optional[Union[str, List[str]]] = None,
obj_frac: float = 0.1,
norm_ub: float = 10000.0,
remove_zero_fluxes: bool = False,
flux_threshold: float = 1e-06,
return_fluxes: bool = True,
keep_context: bool = False,
rxn_scaling_coefs: Optional[Dict[str, float]] = None,
predefined_threshold=None,
) -> SPOTAnalysis
Apply the SPOT algorithm to generate an expression-guided flux distribution.
SPOT (Simplified Phenotype Optimization Technique) finds a flux distribution that maximises the correlation between reaction fluxes and gene-expression scores while keeping the model's metabolic objective (e.g. biomass) at a user-specified fraction of its FBA-optimal value.
The optimisation problem solved is:
.. math::
\max \sum_i w_i \cdot v_i \\
\text{s.t.} \quad f_{\text{FBA}} \geq \texttt{obj\_frac} \cdot f^*_{\text{FBA}} \\
\sum_i (v_i^+ + v_i^-) \leq \texttt{norm\_ub} \\
v \in \text{FBA feasible region}
where :math:w_i = \texttt{rxn\_expr\_score}[i] \cdot
\texttt{rxn\_scaling\_coefs}[i].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The input genome-scale metabolic model with a defined objective function representing the required metabolic functionality. |
required |
rxn_expr_score
|
dict[str, float]
|
Mapping of reaction IDs to expression scores. |
required |
protected_rxns
|
str or list[str] or None
|
Reaction IDs excluded from the SPOT objective (their expression scores do not contribute to the weighted sum). Defaults to None. |
None
|
obj_frac
|
float
|
Fraction of the FBA-optimal objective value that must be maintained as a lower-bound constraint during SPOT optimisation. Set to 0 to omit the FBA constraint (free maximisation). Defaults to 0.1. |
0.1
|
norm_ub
|
float
|
Upper bound for the L1 flux-sum constraint
|
10000.0
|
remove_zero_fluxes
|
bool
|
If |
False
|
flux_threshold
|
float
|
Flux cutoff used when |
1e-06
|
return_fluxes
|
bool
|
If |
True
|
keep_context
|
bool
|
If |
False
|
rxn_scaling_coefs
|
dict[str, float] or None
|
Per-reaction scaling coefficients that multiply the expression weights before forming the objective. Defaults to None (all 1.0). |
None
|
predefined_threshold
|
any
|
Currently unused by SPOT; accepted for API consistency. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
SPOTAnalysis
|
Object containing:
|
Notes
Based on: Becker, S. A., & Palsson, B. Ø. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4(5), e1000082. (SPOT is a variant of this family of methods.) The L1 norm constraint is implemented directly via the optlang API so that the function works with GLPK, CPLEX, and Gurobi without requiring any solver-specific imports.
apply_gecko_light ¶
apply_gecko_light(
model,
enzyme_data,
protein_abundance=None,
sigma=0.5,
f_factor=0.5,
ptot=0.5,
copy_model=True,
protected_rxns=None,
)
Apply simple kcat-based enzyme constraints (GECKO-light).
For every reaction in model that has associated kcat data, the upper bound is constrained to::
new_ub = kcat [1/s] * abundance [mmol/gDW] * sigma * 3600
where the factor 3600 converts from per-second to per-hour to match
typical COBRA flux units (mmol / gDW / h).
If absolute protein abundance is not provided, ptot * f_factor is
used as a coarse fallback abundance scale so those parameters have a
concrete effect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The metabolic model to constrain. |
required |
enzyme_data
|
EnzymeData
|
Enzyme data aligned with the model (must have been |
required |
protein_abundance
|
ProteinAbundanceData
|
Protein abundance data. If |
None
|
sigma
|
float
|
Average enzyme saturation factor (0 – 1). Default 0.5. |
0.5
|
f_factor
|
float
|
Fraction of the proteome that is metabolic enzymes (0 to 1). Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5. |
0.5
|
ptot
|
float
|
Total protein content in g / gDW. Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5. |
0.5
|
copy_model
|
bool
|
If |
True
|
protected_rxns
|
list of str
|
Reaction IDs whose bounds should not be modified. |
None
|
Returns:
| Type | Description |
|---|---|
GECKOLightAnalysis
|
|
apply_gecko_full ¶
apply_gecko_full(
model,
enzyme_data,
protein_abundance=None,
sigma=0.5,
ptot=0.5,
f_factor=0.5,
copy_model=True,
protected_rxns=None,
)
Build a full enzyme-constrained model (ecModel).
The GECKO formulation constrains total enzyme usage through a shared
protein pool. Each enzyme-catalysed reaction draws from this pool
in proportion to MW / kcat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The metabolic model. |
required |
enzyme_data
|
EnzymeData
|
Enzyme data aligned with the model. |
required |
protein_abundance
|
ProteinAbundanceData
|
Protein abundance data (currently used for logging only; the pool constraint implicitly limits usage). |
None
|
sigma
|
float
|
Average enzyme saturation factor (0 – 1). |
0.5
|
ptot
|
float
|
Total protein content in g / gDW. |
0.5
|
f_factor
|
float
|
Fraction of the proteome that is metabolic enzymes (0 – 1). |
0.5
|
copy_model
|
bool
|
Work on a deep-copy of model (default |
True
|
protected_rxns
|
list of str
|
Reaction IDs whose bounds should not be modified. |
None
|
Returns:
| Type | Description |
|---|---|
GECKOFullAnalysis
|
|
auto_parameterize ¶
auto_parameterize(
model,
enzyme_data: EnzymeData,
kcat_source: Literal[
"brenda", "sabio-rk", "manual"
] = "manual",
fill_missing: Literal[
"median", "geometric_mean", "dlkcat"
] = "median",
organism: str = "human",
metabolite_data=None,
device: str = "cpu",
) -> EnzymeData
Automated parameter collection and estimation pipeline.
Steps
- (Optional) Fetch kcat values from a database (BRENDA, SABIO-RK).
- Match to model reactions via EC numbers.
- Fill missing kcat values using the specified strategy.
- Optionally use DLKcat for prediction of remaining missing values.
- Return the enriched :class:
EnzymeData.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The metabolic model. |
required |
enzyme_data
|
EnzymeData
|
Existing enzyme data (may have missing kcat values). |
required |
kcat_source
|
str
|
Source for kcat values: |
'manual'
|
fill_missing
|
str
|
Strategy to fill missing kcat values:
|
'median'
|
organism
|
str
|
Organism name (used for database queries). |
'human'
|
metabolite_data
|
MetaboliteData
|
Metabolite data with SMILES (required when fill_missing is
|
None
|
device
|
str
|
Device for DLKcat ( |
'cpu'
|
Returns:
| Type | Description |
|---|---|
EnzymeData
|
The enriched enzyme data with filled kcat values. |
Enzyme-constrained integration¶
apply_gecko_light ¶
apply_gecko_light(
model,
enzyme_data,
protein_abundance=None,
sigma=0.5,
f_factor=0.5,
ptot=0.5,
copy_model=True,
protected_rxns=None,
)
Apply simple kcat-based enzyme constraints (GECKO-light).
For every reaction in model that has associated kcat data, the upper bound is constrained to::
new_ub = kcat [1/s] * abundance [mmol/gDW] * sigma * 3600
where the factor 3600 converts from per-second to per-hour to match
typical COBRA flux units (mmol / gDW / h).
If absolute protein abundance is not provided, ptot * f_factor is
used as a coarse fallback abundance scale so those parameters have a
concrete effect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The metabolic model to constrain. |
required |
enzyme_data
|
EnzymeData
|
Enzyme data aligned with the model (must have been |
required |
protein_abundance
|
ProteinAbundanceData
|
Protein abundance data. If |
None
|
sigma
|
float
|
Average enzyme saturation factor (0 – 1). Default 0.5. |
0.5
|
f_factor
|
float
|
Fraction of the proteome that is metabolic enzymes (0 to 1). Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5. |
0.5
|
ptot
|
float
|
Total protein content in g / gDW. Used as part of the fallback abundance scale when protein abundance is not provided. Default 0.5. |
0.5
|
copy_model
|
bool
|
If |
True
|
protected_rxns
|
list of str
|
Reaction IDs whose bounds should not be modified. |
None
|
Returns:
| Type | Description |
|---|---|
GECKOLightAnalysis
|
|
apply_gecko_full ¶
apply_gecko_full(
model,
enzyme_data,
protein_abundance=None,
sigma=0.5,
ptot=0.5,
f_factor=0.5,
copy_model=True,
protected_rxns=None,
)
Build a full enzyme-constrained model (ecModel).
The GECKO formulation constrains total enzyme usage through a shared
protein pool. Each enzyme-catalysed reaction draws from this pool
in proportion to MW / kcat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The metabolic model. |
required |
enzyme_data
|
EnzymeData
|
Enzyme data aligned with the model. |
required |
protein_abundance
|
ProteinAbundanceData
|
Protein abundance data (currently used for logging only; the pool constraint implicitly limits usage). |
None
|
sigma
|
float
|
Average enzyme saturation factor (0 – 1). |
0.5
|
ptot
|
float
|
Total protein content in g / gDW. |
0.5
|
f_factor
|
float
|
Fraction of the proteome that is metabolic enzymes (0 – 1). |
0.5
|
copy_model
|
bool
|
Work on a deep-copy of model (default |
True
|
protected_rxns
|
list of str
|
Reaction IDs whose bounds should not be modified. |
None
|
Returns:
| Type | Description |
|---|---|
GECKOFullAnalysis
|
|
auto_parameterize ¶
auto_parameterize(
model,
enzyme_data: EnzymeData,
kcat_source: Literal[
"brenda", "sabio-rk", "manual"
] = "manual",
fill_missing: Literal[
"median", "geometric_mean", "dlkcat"
] = "median",
organism: str = "human",
metabolite_data=None,
device: str = "cpu",
) -> EnzymeData
Automated parameter collection and estimation pipeline.
Steps
- (Optional) Fetch kcat values from a database (BRENDA, SABIO-RK).
- Match to model reactions via EC numbers.
- Fill missing kcat values using the specified strategy.
- Optionally use DLKcat for prediction of remaining missing values.
- Return the enriched :class:
EnzymeData.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
The metabolic model. |
required |
enzyme_data
|
EnzymeData
|
Existing enzyme data (may have missing kcat values). |
required |
kcat_source
|
str
|
Source for kcat values: |
'manual'
|
fill_missing
|
str
|
Strategy to fill missing kcat values:
|
'median'
|
organism
|
str
|
Organism name (used for database queries). |
'human'
|
metabolite_data
|
MetaboliteData
|
Metabolite data with SMILES (required when fill_missing is
|
None
|
device
|
str
|
Device for DLKcat ( |
'cpu'
|
Returns:
| Type | Description |
|---|---|
EnzymeData
|
The enriched enzyme data with filled kcat values. |