Skip to content

Group

Bases: GEMComposite

A container for managing and comparing multiple pipeGEM.Model objects.

This class facilitates comparative analyses across a collection of metabolic models, such as comparing component numbers, calculating similarity indices (e.g., Jaccard), performing dimensionality reduction (PCA), and aggregating analysis results.

Parameters:

Name Type Description Default
group Union[List[Model], Dict[str, Model], Dict[str, List[Model]], Dict[str, Dict[str, Model]]]

The collection of models to include in the group. Can be provided as: - A list of pipeGEM.Model objects. - A dictionary mapping desired name tags to cobra.Model objects. - A dictionary mapping subgroup names to lists of pipeGEM.Model objects. - A dictionary mapping subgroup names to dictionaries mapping model names to cobra.Model objects.

required
name_tag str

An identifier for this group. Defaults to "Unnamed_group".

None
factors DataFrame

A DataFrame providing annotations for the models in the group. Index should correspond to model name tags, columns are annotation keys.

None
**kwargs

Additional annotations provided as key-value pairs, where keys are annotation names and values are dictionaries mapping model name tags to annotation values (e.g., condition={'model1': 'control', 'model2': 'treated'}).

{}

Raises:

Type Description
ValueError

If input models have non-unique name tags or if the input group format is invalid.

TypeError

If elements within the input group are not of the expected types.

KeyError

If annotation dictionaries or factor DataFrames refer to model names not present in the group.

annotation property

annotation: DataFrame

pd.DataFrame: Combined annotations from models and the group level.

reaction_ids property

reaction_ids: List[str]

List[str]: A list of unique reaction IDs across all models in the group.

metabolite_ids property

metabolite_ids: List[str]

List[str]: A list of unique metabolite IDs across all models in the group.

gene_ids property

gene_ids: List[str]

List[str]: A list of unique gene IDs across all models in the group.

subsystems property

subsystems: Dict[str, set]

Dict[str, set]: Unique reaction IDs grouped by subsystem across all models.

gene_data property

gene_data: DataAggregation

GeneData: Aggregated gene data from all models in the group.

items

items()

Return an iterator over the group's (name_tag, model) items.

add_annotation

add_annotation(added, store_in_model=False)

Add annotations to the models in the group.

Parameters:

Name Type Description Default
added Dict

A dictionary where keys are annotation names and values are dictionaries mapping model name tags to annotation values. Example: {'condition': {'model1': 'A', 'model2': 'B'}}

required
store_in_model bool

If True, add annotations directly to the individual pipeGEM.Model objects. If False (default), store them at the Group level.

False

index

index(item, raise_err=True)

Get the numerical index of a model within the group's internal order (dict).

Note: Dictionary order is guaranteed in Python 3.7+.

Parameters:

Name Type Description Default
item str

The name tag of the model.

required
raise_err bool

If True (default), raise KeyError if the item is not found. If False, return None.

True

Returns:

Type Description
int or None

The index of the model, or None if not found and raise_err is False.

Raises:

Type Description
KeyError

If item is not found and raise_err is True.

get_RAS

get_RAS(data_name, method='mean')

Calculate aggregated Reaction Activity Scores (RAS) across the group.

Parameters:

Name Type Description Default
data_name str

The name of the gene data set within each model to use. (Currently assumes this name exists in all models, might need error handling).

required
method str

Aggregation method for RAS (e.g., 'mean', 'median'). Default is 'mean'.

'mean'

Returns:

Type Description
Series or DataFrame

Aggregated RAS scores. Structure depends on GeneData.aggregate implementation.

aggregate_models

aggregate_models(group_by)

Create new Group objects based on an annotation key.

Parameters:

Name Type Description Default
group_by str

The annotation key to group models by.

required

Returns:

Type Description
Dict[str, Group]

A dictionary where keys are the unique values of the group_by annotation, and values are new Group objects containing the corresponding models. Returns {self.name_tag: self} if group_by is None.

get_rxn_info

get_rxn_info(
    models: Optional[Union[str, list]] = None,
    attrs: list = None,
    drop_duplicates=True,
) -> pd.DataFrame

Get reaction information across specified models in the group.

Parameters:

Name Type Description Default
models Union[str, list]

A single model name tag or a list of name tags to include. If None (default), includes all models in the group.

None
attrs list

A list of reaction attributes to retrieve (e.g., ['name', 'subsystem', 'gene_reaction_rule']). If None, behavior might depend on Model.get_rxn_info.

None
drop_duplicates bool

If True (default), remove duplicate rows from the combined DataFrame.

True

Returns:

Type Description
DataFrame

A DataFrame containing the requested reaction information, indexed by reaction ID (potentially duplicated if drop_duplicates=False).

rename

rename(name_tag, inplace=False)

Rename the group.

Parameters:

Name Type Description Default
name_tag str

The new name tag for the group.

required
inplace bool

If True, modify the current group's name tag directly. If False (default), return a new Group instance with the new name.

False

Returns:

Type Description
Group or None

The renamed Group instance if inplace is False, otherwise None.

do_flux_analysis

do_flux_analysis(
    method: str,
    aggregate_method: str = "concat",
    solver: str = "gurobi",
    group_by: str = None,
    **kwargs
)

Do flux analysis on the models contained in this group.

Parameters:

Name Type Description Default
method str

Analysis performed on the models.

required
aggregate_method str

Aggregation method performed on the flux result.

'concat'
solver str

Solver used to do the analysis.

'gurobi'
group_by str

Used to determine the groups for the aggregate_method.

None
kwargs

Keyword arguments used in the model.do_flux_analysis()

{}

Returns:

Name Type Description
flux_result FluxAnalysis

get_info

get_info(models=None, features=None) -> pd.DataFrame

Get a information table by traversing the object structure

Parameters:

Name Type Description Default
models

The name tag of the selected models, if None, use all models.

None
features

The features to be obtained while the traverse

None

Returns:

Name Type Description
information_table DataFrame

compare

compare(
    models: Optional[Union[str, list, ndarray]] = None,
    group_by: Optional[str] = "group_name",
    method: Literal["jaccard", "PCA", "num"] = "jaccard",
    **kwargs
)

Compare models within the group based on their components.

This method provides different ways to compare the models contained within this group, or subsets/aggregations thereof. Comparisons can be based on component overlap (Jaccard index), component counts, or dimensionality reduction (PCA) of component presence/absence.

Parameters:

Name Type Description Default
models str or list[str] or ndarray

Specifies which models from the group to include in the comparison. Can be a single model name tag, a list/array of name tags, or None to include all models in the current group. Defaults to None.

None
group_by str

An annotation key present in the group's annotation DataFrame. If provided, models are first aggregated into subgroups based on the unique values of this annotation before comparison. The comparison (e.g., Jaccard index, PCA) is then performed between these aggregated subgroups. If None, comparison happens between individual models specified by the models parameter (or all models if models is None). Defaults to "group_name" (which might exist if groups were nested).

'group_name'
method (jaccard, PCA, num)

The comparison method to use: - 'jaccard': Calculate pairwise Jaccard similarity based on shared components (genes, reactions, metabolites). See _compare_components_jaccard. - 'PCA': Perform Principal Component Analysis on a matrix where rows are models/groups and columns are components (presence/absence). See _compare_component_PCA. - 'num': Compare the number of components (genes, reactions, metabolites) across models/groups. See _compare_component_num. Defaults to "jaccard".

'jaccard'
**kwargs

Additional keyword arguments passed to the specific comparison method (e.g., components for 'jaccard' and 'num', n_components for 'PCA').

{}

Returns:

Type Description
Union[ComponentComparisonAnalysis, ComponentNumberAnalysis, PCA_Analysis]

An analysis result object corresponding to the chosen method.

Raises:

Type Description
ValueError

If an invalid method is specified.

KeyError

If group_by refers to an annotation key not present, or if models contains names not in the group.

See Also

_compare_components_jaccard : Calculates Jaccard similarity. _compare_component_num : Compares component counts. _compare_component_PCA : Performs PCA on component presence. aggregate_models : Aggregates models based on annotations.