Model¶

Bases: GEMComposite

A comprehensive container for metabolic models and associated data.

This class wraps a cobra.Model object and extends it with capabilities for managing and integrating various types of biological data, including gene expression, enzyme kinetics, metabolite concentrations, and medium compositions. It also facilitates task-based analysis and model consistency checks.

Parameters:

Name	Type	Description	Default
`name_tag`	`str`	A unique identifier for this model instance, used within `pipeGEM.Group`. Defaults to "Unnamed_model".	`None`
`model`	`Model`	An existing `cobra.Model` to initialize the `pipeGEM.Model` with. If None, an empty `cobra.Model` is created.	`None`
`gene_data_factor_df`	`DataFrame`	A DataFrame specifying how different gene datasets should be grouped or factored during aggregation (e.g., by condition, time point).	`None`
`**kwargs`		Additional key-value pairs to store as annotations for this model.	`{}`

Raises:

Type	Description
`ValueError`	If the provided `model` is not a `cobra.Model` instance.

annotation `property` ¶

annotation: dict

dict: Arbitrary annotations associated with the model.

size `property` ¶

size

int: The size of the model (always 1 for a single Model).

reaction_ids `property` ¶

reaction_ids: List[str]

List[str]: A list of all reaction IDs in the model.

gene_ids `property` ¶

gene_ids: List[str]

List[str]: A list of all gene IDs in the model.

metabolite_ids `property` ¶

metabolite_ids: List[str]

List[str]: A list of all metabolite IDs in the model.

cobra_model `property` ¶

cobra_model: Model

cobra.Model: The underlying COBRA model object.

subsystems `property` ¶

subsystems: Dict[str, List[str]]

Dict[str, List[str]]: Reactions grouped by subsystem.

gene_data `property` ¶

gene_data: Dict[str, GeneData]

Dict[str, GeneData]: Dictionary of associated gene data objects.

metabolite_data `property` ¶

metabolite_data: Optional[MetaboliteData]

Optional[MetaboliteData]: Associated metabolite data object.

enzyme_data `property` ¶

enzyme_data: Optional[EnzymeData]

Optional[EnzymeData]: Associated enzyme data object.

medium_data `property` ¶

medium_data: Dict[str, MediumData]

Dict[str, MediumData]: Dictionary of associated medium data objects.

tasks `property` ¶

tasks: Dict[str, TaskContainer]

Dict[str, TaskContainer]: Dictionary of associated task containers.

aggregated_gene_data `property` ¶

aggregated_gene_data

GeneData: Aggregated gene data based on the factor DataFrame.

add_annotation ¶

add_annotation(key, value)

Add or update an annotation.

get_rxn_info ¶

get_rxn_info(attrs) -> pd.DataFrame

Get reaction information for specified attributes.

aggregate_gene_data ¶

aggregate_gene_data(**kwargs)

Aggregate gene data using specified parameters.

copy ¶

copy(
    copy_gene_data=False,
    copy_medium_data=False,
    copy_tasks=False,
    copy_merging_info=True,
)

Create a deep-copied object of this Model

Parameters:

Name	Description	Default
`copy_gene_data`	Also copy the gene data in this Model	`False`
`copy_medium_data`	Also copy the medium data in this Model	`False`
`copy_tasks`	Also copy the tasks in this Model	`False`
`copy_merging_info`	Also copy the merged reaction information	`True`

Returns:

Name	Type	Description
`copied_model`	`Model`

add_medium_data ¶

add_medium_data(
    name,
    data: Union[MediumData, DataFrame],
    data_kwargs=None,
    **kwargs
) -> None

Add medium data to the model.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name to assign to this medium dataset.	required
`data`	`Union[MediumData, DataFrame]`	The medium data, either as a MediumData object or a DataFrame. If a DataFrame, it will be converted to MediumData.	required
`data_kwargs`	`dict`	Keyword arguments for the MediumData constructor if `data` is a DataFrame.	`None`
`**kwargs`		Additional keyword arguments passed to the `align` method of MediumData.	`{}`

apply_medium ¶

apply_medium(name, **kwargs)

Apply a defined medium composition to the model's exchange reactions.

add_gene_data ¶

add_gene_data(
    name_or_prefix: str,
    data: Union[GeneData, DataFrame, Series, AnnData],
    data_kwargs: dict = None,
    **kwargs
) -> None

Add gene data to the internal dictionary of gene data.

Parameters:

Name	Type	Description	Default
`name_or_prefix`	`str`	The name or prefix of the gene data. If a prefix is provided, then the actual column names in the pd.DataFrame will be suffixed with the prefix. If an empty string is provided, then the column names will not be modified.	required
`data`	`Union[GeneData, DataFrame, Series, AnnData]`	The gene data to add to the internal dictionary. This can be a pd.DataFrame, pd.Series, anndata.AnnData, or GeneData object. If a pd.DataFrame is provided, then each column of the DataFrame will be converted into a GeneData object with a modified name based on the name_or_prefix argument. If a pd.Series is provided, then it will be converted into a GeneData object with the name provided by name_or_prefix. If a GeneData object is provided, then it will be added to the internal dictionary as-is.	required
`data_kwargs`	`dict`	Additional keyword arguments to pass to the GeneData constructor when converting a pd.DataFrame or pd.Series into GeneData objects. The default value is None, which means no additional arguments are passed to the GeneData constructor. Ignored when the input data is already a GeneData.	`None`
`**kwargs`		Additional keyword arguments to pass to the align method of the GeneData object(s) after they have been added to the internal dictionary.	`{}`

Raises:

Type	Description
`ValueError`	If the data argument is not a pd.DataFrame, pd.Series, anndata.AnnData, or GeneData object.

set_gene_data ¶

set_gene_data(name, data, data_kwargs=None, **kwargs)

Replace an existing gene dataset.

add_tasks ¶

add_tasks(name: str, tasks: TaskContainer)

Add a metabolic task container.

test_tasks ¶

test_tasks(
    name, model_compartment_parenthesis="[{}]", **kwargs
)

Test the model's ability to perform defined metabolic tasks.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the TaskContainer to use for testing.	required
`model_compartment_parenthesis`	`str`	String format for compartment identifiers in the model, default "[{}]".	`'[{}]'`
`**kwargs`		Additional arguments passed to `TaskHandler.test_tasks`.	`{}`

Returns:

Type	Description
`TaskAnalysis`	An object containing the results of the task analysis.

calc_ind_task_score ¶

calc_ind_task_score(
    data_name: str,
    task_analysis: TaskAnalysis,
    all_na_indicator=-1,
    **kwargs
)

Calculate scores for individual tasks based on associated gene data.

Parameters:

Name	Type	Description	Default
`data_name`	`str`	Name of the GeneData object to use for scoring.	required
`task_analysis`	`TaskAnalysis`	The TaskAnalysis result object containing task definitions and supporting reactions.	required
`all_na_indicator`	`numeric`	Value to return if all genes associated with a task's reactions have NA scores. Default is -1.	`-1`
`**kwargs`		Additional arguments passed to `GeneData.calc_rxn_score_stat`.	`{}`

Returns:

Type	Description
`dict`	A dictionary mapping task IDs to their calculated scores.

get_activated_tasks ¶

get_activated_tasks(
    data_name,
    task_analysis: TaskAnalysis,
    all_na_indicator=-1,
    score_threshold=5 * np.log10(2),
    **kwargs
)

Identify tasks considered 'activated' based on gene data scores and task analysis results.

Parameters:

Name	Type	Description	Default
`data_name`	`str`	Name of the GeneData object to use for scoring.	required
`task_analysis`	`TaskAnalysis`	The TaskAnalysis result object.	required
`all_na_indicator`	`numeric`	Indicator value used in `calc_ind_task_score`. Default is -1.	`-1`
`score_threshold`	`float`	Minimum score for a task to be considered activated. Default is 5*log10(2).	`5 * log10(2)`
`**kwargs`		Additional arguments passed to `calc_ind_task_score`.	`{}`

Returns:

Type	Description
`list`	A list of task IDs considered activated.

get_activated_task_sup_rxns ¶

get_activated_task_sup_rxns(
    data_name: str,
    task_analysis: TaskAnalysis,
    score_threshold: float = 5 * np.log10(2),
    include_supp_rxns: bool = True,
    **kwargs
)

Get supporting reactions for tasks identified as 'activated'.

Parameters:

Name	Type	Description	Default
`data_name`	`str`	Name of the GeneData object to use for scoring.	required
`task_analysis`	`TaskAnalysis`	The TaskAnalysis result object.	required
`score_threshold`	`float`	Minimum score threshold used in `get_activated_tasks`. Default is 5*log10(2).	`5 * log10(2)`
`include_supp_rxns`	`bool`	Whether to include supplementary reactions defined in the tasks. Default is True.	`True`
`**kwargs`		Additional arguments passed to `get_activated_tasks`.	`{}`

Returns:

Type	Description
`list`	A list of unique reaction IDs supporting the activated tasks.

check_rxn_scales ¶

check_rxn_scales(threshold=10000.0)

Check if reaction stoichiometric coefficients exceed a threshold.

check_model_scale ¶

check_model_scale(method='geometric_mean', n_iter=10)

Check the numerical scale of the model's stoichiometric matrix.

Parameters:

Name	Type	Description	Default
`method`	`str`	Scaling method to use ('geometric_mean', etc.). Default is "geometric_mean".	`'geometric_mean'`
`n_iter`	`int`	Number of iterations for the scaling algorithm. Default is 10.	`10`

Returns:

Type	Description
`ScalingResult`	An object containing the results of the scaling analysis.

scale_model ¶

scale_model(scaling_result)

Apply a previously calculated scaling to the model.

Parameters:

Name	Type	Description	Default
`scaling_result`	`ScalingResult`	The result object obtained from `check_model_scale`.	required

Returns:

Type	Description
`Model`	The rescaled pipeGEM Model object.

check_consistency ¶

check_consistency(
    method: str = "FASTCC", tol: float = 1e-06, **kwargs
)

Check the flux consistency of the model.

Parameters:

Name	Type	Description	Default
`method`	`str`	Consistency checking algorithm ('FASTCC', etc.). Default is "FASTCC".	`'FASTCC'`
`tol`	`float`	Numerical tolerance for consistency checks. Default is 1e-6.	`1e-06`
`**kwargs`		Additional arguments passed to the consistency checker's `analyze` method.	`{}`

Returns:

Type	Description
`ConsistencyAnalysis`	An object containing the results of the consistency check, including a consistent sub-model.

do_flux_analysis ¶

do_flux_analysis(method, solver='gurobi', **kwargs)

Perform flux balance analysis (FBA) or its variants.

Parameters:

Name	Type	Description	Default
`method`	`str`	Flux analysis method ('FBA', 'pFBA', 'FVA', etc.).	required
`solver`	`str`	LP solver to use ('gurobi', 'cplex', 'glpk', etc.). Default is "gurobi".	`'gurobi'`
`**kwargs`		Additional arguments passed to the flux analyzer's `analyze` method.	`{}`

Returns:

Type	Description
`FluxAnalysisResult`	An object containing the results of the flux analysis.

simulate_ko_genes ¶

simulate_ko_genes(gene_ids, **kwargs)

Simulate gene knockouts by setting their associated reaction scores to zero.

Parameters:

Name	Type	Description	Default
`gene_ids`	`list`	List of gene IDs to knock out.	required
`**kwargs`		Additional arguments passed to `GeneData.align`.	`{}`

Returns:

Type	Description
`Series`	Reaction scores reflecting the simulated knockouts.

do_ko_analysis ¶

do_ko_analysis(
    method="single_KO", solver="gurobi", **kwargs
)

Perform gene knockout analysis.

Parameters:

Name	Type	Description	Default
`method`	`str`	Knockout analysis method ('single_KO', etc.). Default is "single_KO".	`'single_KO'`
`solver`	`str`	LP solver to use. Default is "gurobi".	`'gurobi'`
`**kwargs`		Additional arguments passed to the knockout analyzer's `analyze` method.	`{}`

Returns:

Type	Description
`KOAnalysisResult`	An object containing the results of the knockout analysis.

integrate_enzyme_data ¶

integrate_enzyme_data(
    prot_abund_data_name=None, method="GECKOLight", **kwargs
)

Integrate enzyme data using GECKO formulations.

Parameters:

Name	Type	Description	Default
`prot_abund_data_name`	`str`	Name of the ProteinAbundanceData attached to this model. If `None`, protein abundance is not used (only kcat limits flux).	`None`
`method`	`str`	GECKO method to use: `"GECKOLight"` or `"GECKOFull"`.	`'GECKOLight'`
`**kwargs`		Additional keyword arguments passed to the integrator.	`{}`

Returns:

Type	Description
`GECKOLightAnalysis or GECKOFullAnalysis`

integrate_gene_data ¶

integrate_gene_data(
    data_name,
    integrator="GIMME",
    integrator_init_kwargs=None,
    rxn_scaling_coefs=None,
    predefined_threshold=None,
    protected_rxns=None,
    **kwargs
)

Integrate gene data with this model.

Parameters:

Name	Description	Default
`data_name`	Name of the gene data to be integrated with the model	required
`integrator`	Name of the used integrator (algorithm name) Possible choices: GIMME, CORDA, rFASTCORMICS, mCADRE, RIPTiDe, and Eflux (for now).	`'GIMME'`
`integrator_init_kwargs`	Keyword arguments for initializing the integrator	`None`
`rxn_scaling_coefs`	Reaction scaling coefficient for the integrator if the model was rescaled before.	`None`
`predefined_threshold`	Threshold analysis object contains expression threshold needed, or a dict contains an expression threshold with a key named exp_th and a non-expression threshold with a key named non_exp_th	`None`
`protected_rxns`	Protected reaction IDs contained in a list	`None`
`kwargs`	Keyword arguments for integrating the data.	`{}`

Returns:

Name	Type	Description
`integrating_result`	`BaseAnalysis`	Result object containing gene data-integrated model (context-specific model).

save_model ¶

save_model(file_name: str) -> None

Save the pipeGEM model and its annotations.

Saves the underlying cobra.Model to the specified file_name (e.g., 'model.json', 'model.xml'). Additionally, saves model annotations (including name_tag) to a corresponding TOML file (e.g., 'model_annotations.toml') in the same directory. This is just a workaround for now since the io function for all the file types haven't been implemented. Besides the model, this function stores annotations and name_tag as a toml file in the same folder of the model.

Parameters:

Name	Type	Description	Default
`file_name`	`str`		required

Returns:

Type	Description
`None`

load_model `classmethod` ¶

load_model(file_name: str)

Load a pipeGEM model from a model file (json, sbml, mat..) and a toml file storing the metadata of the model

Parameters:

Name	Type	Description	Default
`file_name`	`str`	Model file name. In the same directory, here should be a toml file having the same file name and a .toml suffix For example, a valid model file called 'model.json' is stored in a folder called 'folder'. Then the files in the folder should be: folder \|- model.json \|- model.toml ...	required

Returns:

Name	Type	Description
`model`	`Model`

update_merged_rxn ¶

update_merged_rxn(merged_rxn)

Update internal state when a reaction is merged.

Stores the original objective coefficients if not already done, adds the merged reaction to the lookup table, and handles empty merged reactions.

Parameters:

Name	Type	Description	Default
`merged_rxn`	`Reaction`	The reaction object representing the merged reaction. It should have a `merged_rxns` attribute (dict mapping original reactions to coefficients).	required

get_merged_rxn ¶

get_merged_rxn(rxn_id)

Retrieve the merged reaction object corresponding to an original reaction ID.

Parameters:

Name	Type	Description	Default
`rxn_id`	`str`	The ID of the original reaction before merging.	required

Returns:

Type	Description
`Reaction or None`	The merged reaction object if the original reaction was merged, otherwise None.

Model¶

annotation property ¶

size property ¶

reaction_ids property ¶

gene_ids property ¶

metabolite_ids property ¶

cobra_model property ¶

subsystems property ¶

gene_data property ¶

metabolite_data property ¶

enzyme_data property ¶

medium_data property ¶

tasks property ¶

aggregated_gene_data property ¶

add_annotation ¶

get_rxn_info ¶

aggregate_gene_data ¶

copy ¶

add_medium_data ¶

apply_medium ¶

add_gene_data ¶

set_gene_data ¶

add_tasks ¶

test_tasks ¶

calc_ind_task_score ¶

get_activated_tasks ¶

get_activated_task_sup_rxns ¶

check_rxn_scales ¶

check_model_scale ¶

scale_model ¶

check_consistency ¶

do_flux_analysis ¶

simulate_ko_genes ¶

do_ko_analysis ¶

integrate_enzyme_data ¶

integrate_gene_data ¶

save_model ¶

load_model classmethod ¶

update_merged_rxn ¶

get_merged_rxn ¶

annotation `property` ¶

size `property` ¶

reaction_ids `property` ¶

gene_ids `property` ¶

metabolite_ids `property` ¶

cobra_model `property` ¶

subsystems `property` ¶

gene_data `property` ¶

metabolite_data `property` ¶

enzyme_data `property` ¶

medium_data `property` ¶

tasks `property` ¶

aggregated_gene_data `property` ¶

load_model `classmethod` ¶