Thresholds¶

ThresholdFinders ¶

ThresholdFinders()

Bases: ObjectFactory

Factory class for registering and creating threshold finding algorithms.

Initializes the ThresholdFinders factory.

ThresholdFinder ¶

ThresholdFinder()

Abstract base class for threshold finding algorithms.

Initializes the ThresholdFinder.

find_threshold ¶

find_threshold(**kwargs)

Abstract method to find expression thresholds.

Subclasses must implement this method.

Parameters:

Name	Type	Description	Default
`**kwargs`		Algorithm-specific parameters.	`{}`

Raises:

Type	Description
`NotImplementedError`	If the subclass does not implement this method.

DistributionBased ¶

DistributionBased()

Bases: ThresholdFinder

Base class for threshold finders based on data distribution analysis.

Initializes the DistributionBased finder.

find_threshold ¶

find_threshold(**kwargs)

Abstract method for distribution-based threshold finding.

Raises:

Type	Description
`NotImplementedError`

gaussian_dist `staticmethod` ¶

gaussian_dist(
    x: ndarray, amp: float, cen: float, wid: float
) -> np.ndarray

Calculate the Gaussian function value.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Input array.	required
`amp`	`float`	Amplitude (height) of the Gaussian peak.	required
`cen`	`float`	Center (mean) of the Gaussian distribution.	required
`wid`	`float`	Width (related to variance) of the Gaussian distribution.	required

Returns:

Type	Description
`ndarray`	Gaussian function values corresponding to x.

bimodal_dist ¶

bimodal_dist(
    x: ndarray,
    A1: float,
    mu1: float,
    wid1: float,
    A2: float,
    mu2: float,
    wid2: float,
) -> np.ndarray

Calculate the sum of two Gaussian distributions (bimodal).

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Input array.	required
`A1`	`float`	Amplitude, center, and width of the first Gaussian component.	required
`mu1`	`float`	Amplitude, center, and width of the first Gaussian component.	required
`wid1`	`float`	Amplitude, center, and width of the first Gaussian component.	required
`A2`	`float`	Amplitude, center, and width of the second Gaussian component.	required
`mu2`	`float`	Amplitude, center, and width of the second Gaussian component.	required
`wid2`	`float`	Amplitude, center, and width of the second Gaussian component.	required

Returns:

Type	Description
`ndarray`	Sum of the two Gaussian functions evaluated at x.

get_init_for_bimodal ¶

get_init_for_bimodal(
    x: ndarray,
    y: ndarray,
    max_x: float,
    min_x: float,
    min_h_ratio: float = 1.5,
    max_w_ratio: float = 2,
    n_top: int = 100,
) -> Tuple[float, float]

Heuristically find initial guesses for the centers of two modes in a distribution.

Analyzes the second derivative of the distribution y with respect to x to find potential peaks (local minima in the second derivative) and selects the best pair based on height, width ratios, and a scoring function.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	The x-coordinates of the distribution data points. Assumed to be evenly spaced.	required
`y`	`ndarray`	The y-coordinates (density/frequency) of the distribution data points.	required
`max_x`	`float`	Maximum allowed x-value for candidate peaks.	required
`min_x`	`float`	Minimum allowed x-value for candidate peaks.	required
`min_h_ratio`	`float`	Minimum ratio relative to the global peak height for a candidate peak to be considered valid. Default is 1.5.	`1.5`
`max_w_ratio`	`float`	Maximum allowed ratio between the widths defined by the two candidate peaks relative to the overall data range. Default is 2.	`2`
`n_top`	`int`	Number of top candidate peaks (based on second derivative depth) to consider. Default is 100.	`100`

Returns:

Type	Description
`Tuple[float, float]`	A tuple containing the estimated x-coordinates (centers) of the two modes.

rFASTCORMICSThreshold ¶

rFASTCORMICSThreshold()

Bases: DistributionBased

Finds expression thresholds by fitting a bimodal Gaussian distribution.

This method assumes gene expression data often follows a bimodal distribution, representing 'off' and 'on' states. It fits two Gaussian curves to the data's Kernel Density Estimate (KDE) and uses the means of these curves as potential thresholds.

Initializes the rFASTCORMICSThreshold finder.

find_threshold ¶

find_threshold(
    data: Union[ndarray, Series, dict, list],
    cut_off: float = -np.inf,
    return_heuristic: bool = False,
    hard_x_lims: tuple = (0.05, 0.95),
    k_best: int = 3,
) -> rFASTCORMICSThresholdAnalysis

Finds expression and non-expression thresholds using bimodal Gaussian fitting.

Filters data below cut_off, calculates KDE, fits a bimodal Gaussian, and returns the means of the fitted Gaussians as potential thresholds.

Parameters:

Name	Type	Description	Default
`data`	`Union[ndarray, Series, dict, list]`	Input expression data (typically log-transformed).	required
`cut_off`	`float`	Value below which data points are excluded before analysis. Default is -infinity (no cutoff).	`-inf`
`return_heuristic`	`bool`	If True, returns the initial heuristic guesses for the means instead of the fitted means. Default is False.	`False`
`hard_x_lims`	`tuple`	Tuple defining the percentile range (e.g., (0.05, 0.95)) of the data to consider for fitting, restricting the search space for means. Default is (0.05, 0.95).	`(0.05, 0.95)`
`k_best`	`int`	Number of best-fitting parameter sets to return (ranked by p-score). Default is 3.	`3`

Returns:

Type	Description
`rFASTCORMICSThresholdAnalysis`	An analysis object containing the calculated thresholds, fitted curves, and intermediate data (x, y KDE values).

RankBased ¶

RankBased()

Bases: ThresholdFinder

Base class for threshold finders based on data ranking (e.g., percentiles).

Initializes the RankBased finder.

find_threshold ¶

find_threshold(**kwargs)

Abstract method for rank-based threshold finding.

Raises:

Type	Description
`NotImplementedError`

PercentileThreshold ¶

PercentileThreshold()

Bases: RankBased

Finds thresholds based on simple percentiles of the data.

Initializes the PercentileThreshold finder.

find_threshold ¶

find_threshold(
    data: Union[ndarray, Series, dict, list],
    p: Union[int, float, List[int], List[float], ndarray],
    exp_p: Optional[Union[int, float]] = None,
    non_exp_p: Optional[Union[int, float]] = None,
    **kwargs
) -> PercentileThresholdAnalysis

Calculate thresholds based on specified percentiles of the input data.

Parameters:

Name	Type	Description	Default
`data`	`Union[ndarray, Series, dict, list]`		required
`p`	`Union[int, float, List[int], List[float], ndarray]`	Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.	required
`exp_p`	`Optional[Union[int, float]]`	Percentile to compute the expression threshold. If None, the maximum value of the percentiles is returned as the exp_th.	`None`
`non_exp_p`	`Optional[Union[int, float]]`	Percentile to compute the non-expression threshold. If None, the minimum value of the percentiles is returned as the non_exp_th.	`None`
`kwargs`		kwargs used by np.percentile().	`{}`

Returns:

Name	Type	Description
`result`	`PercentileThresholdAnalysis`

LocalThreshold ¶

LocalThreshold()

Bases: RankBased

Finds local expression thresholds based on percentiles within groups.

Calculates gene-specific thresholds for different sample groups based on a specified percentile p. Optionally applies global 'on'/'off' thresholds to override local thresholds for genes consistently high or low across a group.

Initializes the LocalThreshold finder.

find_threshold ¶

find_threshold(
    data: Union[ndarray, DataFrame],
    p: Union[int, float],
    global_on_p: Optional[Union[int, float]] = 90,
    global_off_p: Optional[Union[int, float]] = 10,
    global_on_th: Optional[Union[int, float]] = None,
    global_off_th: Optional[Union[int, float]] = None,
    groups: Optional[Union[Series, dict]] = None,
    genes: Optional[List[str]] = None,
    samples: Optional[List[str]] = None,
    **kwargs
) -> LocalThresholdAnalysis

Calculate local, gene-specific expression thresholds for sample groups.

Parameters:

Name	Type	Description	Default
`data`	`Union[ndarray, DataFrame]`	Input expression data (genes x samples). If ndarray, `genes` and `samples` arguments must be provided.	required
`p`	`Union[int, float]`	The percentile (0-100) used to calculate the local threshold for each gene within each group.	required
`global_on_p`	`Optional[Union[int, float]]`	Percentile (0-100) to determine the global 'on' threshold. If a gene's minimum expression within a group exceeds this percentile's value across the group, the global threshold overrides the local one. Used if `global_on_th` is None. Default is 90.	`90`
`global_off_p`	`Optional[Union[int, float]]`	Percentile (0-100) to determine the global 'off' threshold. If a gene's maximum expression within a group is below this percentile's value across the group, the global threshold overrides the local one. Used if `global_off_th` is None. Default is 10.	`10`
`global_on_th`	`Optional[Union[int, float]]`	Explicit value for the global 'on' threshold. Overrides `global_on_p`. Default is None.	`None`
`global_off_th`	`Optional[Union[int, float]]`	Explicit value for the global 'off' threshold. Overrides `global_off_p`. Default is None.	`None`
`groups`	`Optional[Union[Series, dict]]`	Mapping samples to groups. If dict, keys are group names, values are lists of sample names. If Series, index is sample names, values are group names. If None, all samples are treated as one group named 'exp_th'. Default is None.	`None`
`genes`	`Optional[List[str]]`	List of gene names corresponding to rows if `data` is an ndarray. Required if `data` is an ndarray. Default is None.	`None`
`samples`	`Optional[List[str]]`	List of sample names corresponding to columns if `data` is an ndarray. Required if `data` is an ndarray. Default is None.	`None`
`**kwargs`		Additional keyword arguments (currently unused).	`{}`

Returns:

Name	Type	Description
`local_threshold_analysis`	`LocalThresholdAnalysis`	The result object contains a local threshold dataframe. The dataframe will be N_gene x N_group(if the groups arg is specified) containing the expression thresholds

Thresholds¶

ThresholdFinders ¶

ThresholdFinder ¶

find_threshold ¶

DistributionBased ¶

find_threshold ¶

gaussian_dist staticmethod ¶

bimodal_dist ¶

get_init_for_bimodal ¶

rFASTCORMICSThreshold ¶

find_threshold ¶

RankBased ¶

find_threshold ¶

PercentileThreshold ¶

find_threshold ¶

LocalThreshold ¶

find_threshold ¶

gaussian_dist `staticmethod` ¶