Skip to content

Model

Bases: GEMComposite

A comprehensive container for metabolic models and associated data.

This class wraps a cobra.Model object and extends it with capabilities for managing and integrating various types of biological data, including gene expression, enzyme kinetics, metabolite concentrations, and medium compositions. It also facilitates task-based analysis and model consistency checks.

Parameters:

Name Type Description Default
name_tag str

A unique identifier for this model instance, used within pipeGEM.Group. Defaults to "Unnamed_model".

None
model Model

An existing cobra.Model to initialize the pipeGEM.Model with. If None, an empty cobra.Model is created.

None
gene_data_factor_df DataFrame

A DataFrame specifying how different gene datasets should be grouped or factored during aggregation (e.g., by condition, time point).

None
**kwargs

Additional key-value pairs to store as annotations for this model.

{}

Raises:

Type Description
ValueError

If the provided model is not a cobra.Model instance.

annotation property

annotation: dict

dict: Arbitrary annotations associated with the model.

size property

size

int: The size of the model (always 1 for a single Model).

reaction_ids property

reaction_ids: List[str]

List[str]: A list of all reaction IDs in the model.

gene_ids property

gene_ids: List[str]

List[str]: A list of all gene IDs in the model.

metabolite_ids property

metabolite_ids: List[str]

List[str]: A list of all metabolite IDs in the model.

cobra_model property

cobra_model: Model

cobra.Model: The underlying COBRA model object.

subsystems property

subsystems: Dict[str, List[str]]

Dict[str, List[str]]: Reactions grouped by subsystem.

gene_data property

gene_data: Dict[str, GeneData]

Dict[str, GeneData]: Dictionary of associated gene data objects.

metabolite_data property

metabolite_data: Optional[MetaboliteData]

Optional[MetaboliteData]: Associated metabolite data object.

enzyme_data property

enzyme_data: Optional[EnzymeData]

Optional[EnzymeData]: Associated enzyme data object.

medium_data property

medium_data: Dict[str, MediumData]

Dict[str, MediumData]: Dictionary of associated medium data objects.

tasks property

tasks: Dict[str, TaskContainer]

Dict[str, TaskContainer]: Dictionary of associated task containers.

aggregated_gene_data property

aggregated_gene_data

GeneData: Aggregated gene data based on the factor DataFrame.

add_annotation

add_annotation(key, value)

Add or update an annotation.

get_rxn_info

get_rxn_info(attrs) -> pd.DataFrame

Get reaction information for specified attributes.

aggregate_gene_data

aggregate_gene_data(**kwargs)

Aggregate gene data using specified parameters.

copy

copy(
    copy_gene_data=False,
    copy_medium_data=False,
    copy_tasks=False,
    copy_merging_info=True,
)

Create a deep-copied object of this Model

Parameters:

Name Type Description Default
copy_gene_data

Also copy the gene data in this Model

False
copy_medium_data

Also copy the medium data in this Model

False
copy_tasks

Also copy the tasks in this Model

False
copy_merging_info

Also copy the merged reaction information

True

Returns:

Name Type Description
copied_model Model

add_medium_data

add_medium_data(
    name,
    data: Union[MediumData, DataFrame],
    data_kwargs=None,
    **kwargs
) -> None

Add medium data to the model.

Parameters:

Name Type Description Default
name str

Name to assign to this medium dataset.

required
data Union[MediumData, DataFrame]

The medium data, either as a MediumData object or a DataFrame. If a DataFrame, it will be converted to MediumData.

required
data_kwargs dict

Keyword arguments for the MediumData constructor if data is a DataFrame.

None
**kwargs

Additional keyword arguments passed to the align method of MediumData.

{}

apply_medium

apply_medium(name, **kwargs)

Apply a defined medium composition to the model's exchange reactions.

add_gene_data

add_gene_data(
    name_or_prefix: str,
    data: Union[GeneData, DataFrame, Series, AnnData],
    data_kwargs: dict = None,
    **kwargs
) -> None

Add gene data to the internal dictionary of gene data.

Parameters:

Name Type Description Default
name_or_prefix str

The name or prefix of the gene data. If a prefix is provided, then the actual column names in the pd.DataFrame will be suffixed with the prefix. If an empty string is provided, then the column names will not be modified.

required
data Union[GeneData, DataFrame, Series, AnnData]

The gene data to add to the internal dictionary. This can be a pd.DataFrame, pd.Series, anndata.AnnData, or GeneData object. If a pd.DataFrame is provided, then each column of the DataFrame will be converted into a GeneData object with a modified name based on the name_or_prefix argument. If a pd.Series is provided, then it will be converted into a GeneData object with the name provided by name_or_prefix. If a GeneData object is provided, then it will be added to the internal dictionary as-is.

required
data_kwargs dict

Additional keyword arguments to pass to the GeneData constructor when converting a pd.DataFrame or pd.Series into GeneData objects. The default value is None, which means no additional arguments are passed to the GeneData constructor. Ignored when the input data is already a GeneData.

None
**kwargs

Additional keyword arguments to pass to the align method of the GeneData object(s) after they have been added to the internal dictionary.

{}

Raises:

Type Description
ValueError

If the data argument is not a pd.DataFrame, pd.Series, anndata.AnnData, or GeneData object.

set_gene_data

set_gene_data(name, data, data_kwargs=None, **kwargs)

Replace an existing gene dataset.

add_tasks

add_tasks(name: str, tasks: TaskContainer)

Add a metabolic task container.

test_tasks

test_tasks(
    name, model_compartment_parenthesis="[{}]", **kwargs
)

Test the model's ability to perform defined metabolic tasks.

Parameters:

Name Type Description Default
name str

The name of the TaskContainer to use for testing.

required
model_compartment_parenthesis str

String format for compartment identifiers in the model, default "[{}]".

'[{}]'
**kwargs

Additional arguments passed to TaskHandler.test_tasks.

{}

Returns:

Type Description
TaskAnalysis

An object containing the results of the task analysis.

calc_ind_task_score

calc_ind_task_score(
    data_name: str,
    task_analysis: TaskAnalysis,
    all_na_indicator=-1,
    **kwargs
)

Calculate scores for individual tasks based on associated gene data.

Parameters:

Name Type Description Default
data_name str

Name of the GeneData object to use for scoring.

required
task_analysis TaskAnalysis

The TaskAnalysis result object containing task definitions and supporting reactions.

required
all_na_indicator numeric

Value to return if all genes associated with a task's reactions have NA scores. Default is -1.

-1
**kwargs

Additional arguments passed to GeneData.calc_rxn_score_stat.

{}

Returns:

Type Description
dict

A dictionary mapping task IDs to their calculated scores.

get_activated_tasks

get_activated_tasks(
    data_name,
    task_analysis: TaskAnalysis,
    all_na_indicator=-1,
    score_threshold=5 * np.log10(2),
    **kwargs
)

Identify tasks considered 'activated' based on gene data scores and task analysis results.

Parameters:

Name Type Description Default
data_name str

Name of the GeneData object to use for scoring.

required
task_analysis TaskAnalysis

The TaskAnalysis result object.

required
all_na_indicator numeric

Indicator value used in calc_ind_task_score. Default is -1.

-1
score_threshold float

Minimum score for a task to be considered activated. Default is 5*log10(2).

5 * log10(2)
**kwargs

Additional arguments passed to calc_ind_task_score.

{}

Returns:

Type Description
list

A list of task IDs considered activated.

get_activated_task_sup_rxns

get_activated_task_sup_rxns(
    data_name: str,
    task_analysis: TaskAnalysis,
    score_threshold: float = 5 * np.log10(2),
    include_supp_rxns: bool = True,
    **kwargs
)

Get supporting reactions for tasks identified as 'activated'.

Parameters:

Name Type Description Default
data_name str

Name of the GeneData object to use for scoring.

required
task_analysis TaskAnalysis

The TaskAnalysis result object.

required
score_threshold float

Minimum score threshold used in get_activated_tasks. Default is 5*log10(2).

5 * log10(2)
include_supp_rxns bool

Whether to include supplementary reactions defined in the tasks. Default is True.

True
**kwargs

Additional arguments passed to get_activated_tasks.

{}

Returns:

Type Description
list

A list of unique reaction IDs supporting the activated tasks.

check_rxn_scales

check_rxn_scales(threshold=10000.0)

Check if reaction stoichiometric coefficients exceed a threshold.

check_model_scale

check_model_scale(method='geometric_mean', n_iter=10)

Check the numerical scale of the model's stoichiometric matrix.

Parameters:

Name Type Description Default
method str

Scaling method to use ('geometric_mean', etc.). Default is "geometric_mean".

'geometric_mean'
n_iter int

Number of iterations for the scaling algorithm. Default is 10.

10

Returns:

Type Description
ScalingResult

An object containing the results of the scaling analysis.

scale_model

scale_model(scaling_result)

Apply a previously calculated scaling to the model.

Parameters:

Name Type Description Default
scaling_result ScalingResult

The result object obtained from check_model_scale.

required

Returns:

Type Description
Model

The rescaled pipeGEM Model object.

check_consistency

check_consistency(
    method: str = "FASTCC", tol: float = 1e-06, **kwargs
)

Check the flux consistency of the model.

Parameters:

Name Type Description Default
method str

Consistency checking algorithm ('FASTCC', etc.). Default is "FASTCC".

'FASTCC'
tol float

Numerical tolerance for consistency checks. Default is 1e-6.

1e-06
**kwargs

Additional arguments passed to the consistency checker's analyze method.

{}

Returns:

Type Description
ConsistencyAnalysis

An object containing the results of the consistency check, including a consistent sub-model.

do_flux_analysis

do_flux_analysis(method, solver='gurobi', **kwargs)

Perform flux balance analysis (FBA) or its variants.

Parameters:

Name Type Description Default
method str

Flux analysis method ('FBA', 'pFBA', 'FVA', etc.).

required
solver str

LP solver to use ('gurobi', 'cplex', 'glpk', etc.). Default is "gurobi".

'gurobi'
**kwargs

Additional arguments passed to the flux analyzer's analyze method.

{}

Returns:

Type Description
FluxAnalysisResult

An object containing the results of the flux analysis.

simulate_ko_genes

simulate_ko_genes(gene_ids, **kwargs)

Simulate gene knockouts by setting their associated reaction scores to zero.

Parameters:

Name Type Description Default
gene_ids list

List of gene IDs to knock out.

required
**kwargs

Additional arguments passed to GeneData.align.

{}

Returns:

Type Description
Series

Reaction scores reflecting the simulated knockouts.

do_ko_analysis

do_ko_analysis(
    method="single_KO", solver="gurobi", **kwargs
)

Perform gene knockout analysis.

Parameters:

Name Type Description Default
method str

Knockout analysis method ('single_KO', etc.). Default is "single_KO".

'single_KO'
solver str

LP solver to use. Default is "gurobi".

'gurobi'
**kwargs

Additional arguments passed to the knockout analyzer's analyze method.

{}

Returns:

Type Description
KOAnalysisResult

An object containing the results of the knockout analysis.

integrate_enzyme_data

integrate_enzyme_data(
    prot_abund_data_name=None, method="GECKOLight", **kwargs
)

Integrate enzyme data using GECKO formulations.

Parameters:

Name Type Description Default
prot_abund_data_name str

Name of the ProteinAbundanceData attached to this model. If None, protein abundance is not used (only kcat limits flux).

None
method str

GECKO method to use: "GECKOLight" or "GECKOFull".

'GECKOLight'
**kwargs

Additional keyword arguments passed to the integrator.

{}

Returns:

Type Description
GECKOLightAnalysis or GECKOFullAnalysis

integrate_gene_data

integrate_gene_data(
    data_name,
    integrator="GIMME",
    integrator_init_kwargs=None,
    rxn_scaling_coefs=None,
    predefined_threshold=None,
    protected_rxns=None,
    **kwargs
)

Integrate gene data with this model.

Parameters:

Name Type Description Default
data_name

Name of the gene data to be integrated with the model

required
integrator

Name of the used integrator (algorithm name) Possible choices: GIMME, CORDA, rFASTCORMICS, mCADRE, RIPTiDe, and Eflux (for now).

'GIMME'
integrator_init_kwargs

Keyword arguments for initializing the integrator

None
rxn_scaling_coefs

Reaction scaling coefficient for the integrator if the model was rescaled before.

None
predefined_threshold

Threshold analysis object contains expression threshold needed, or a dict contains an expression threshold with a key named exp_th and a non-expression threshold with a key named non_exp_th

None
protected_rxns

Protected reaction IDs contained in a list

None
kwargs

Keyword arguments for integrating the data.

{}

Returns:

Name Type Description
integrating_result BaseAnalysis

Result object containing gene data-integrated model (context-specific model).

save_model

save_model(file_name: str) -> None

Save the pipeGEM model and its annotations.

Saves the underlying cobra.Model to the specified file_name (e.g., 'model.json', 'model.xml'). Additionally, saves model annotations (including name_tag) to a corresponding TOML file (e.g., 'model_annotations.toml') in the same directory. This is just a workaround for now since the io function for all the file types haven't been implemented. Besides the model, this function stores annotations and name_tag as a toml file in the same folder of the model.

Parameters:

Name Type Description Default
file_name str
required

Returns:

Type Description
None

load_model classmethod

load_model(file_name: str)

Load a pipeGEM model from a model file (json, sbml, mat..) and a toml file storing the metadata of the model

Parameters:

Name Type Description Default
file_name str

Model file name. In the same directory, here should be a toml file having the same file name and a .toml suffix For example, a valid model file called 'model.json' is stored in a folder called 'folder'. Then the files in the folder should be: folder |- model.json |- model.toml ...

required

Returns:

Name Type Description
model Model

update_merged_rxn

update_merged_rxn(merged_rxn)

Update internal state when a reaction is merged.

Stores the original objective coefficients if not already done, adds the merged reaction to the lookup table, and handles empty merged reactions.

Parameters:

Name Type Description Default
merged_rxn Reaction

The reaction object representing the merged reaction. It should have a merged_rxns attribute (dict mapping original reactions to coefficients).

required

get_merged_rxn

get_merged_rxn(rxn_id)

Retrieve the merged reaction object corresponding to an original reaction ID.

Parameters:

Name Type Description Default
rxn_id str

The ID of the original reaction before merging.

required

Returns:

Type Description
Reaction or None

The merged reaction object if the original reaction was merged, otherwise None.