Skip to content

Thresholds

ThresholdFinders

ThresholdFinders()

Bases: ObjectFactory

Factory class for registering and creating threshold finding algorithms.

Initializes the ThresholdFinders factory.

ThresholdFinder

ThresholdFinder()

Abstract base class for threshold finding algorithms.

Initializes the ThresholdFinder.

find_threshold

find_threshold(**kwargs)

Abstract method to find expression thresholds.

Subclasses must implement this method.

Parameters:

Name Type Description Default
**kwargs

Algorithm-specific parameters.

{}

Raises:

Type Description
NotImplementedError

If the subclass does not implement this method.

DistributionBased

DistributionBased()

Bases: ThresholdFinder

Base class for threshold finders based on data distribution analysis.

Initializes the DistributionBased finder.

find_threshold

find_threshold(**kwargs)

Abstract method for distribution-based threshold finding.

Raises:

Type Description
NotImplementedError

gaussian_dist staticmethod

gaussian_dist(
    x: ndarray, amp: float, cen: float, wid: float
) -> np.ndarray

Calculate the Gaussian function value.

Parameters:

Name Type Description Default
x ndarray

Input array.

required
amp float

Amplitude (height) of the Gaussian peak.

required
cen float

Center (mean) of the Gaussian distribution.

required
wid float

Width (related to variance) of the Gaussian distribution.

required

Returns:

Type Description
ndarray

Gaussian function values corresponding to x.

bimodal_dist

bimodal_dist(
    x: ndarray,
    A1: float,
    mu1: float,
    wid1: float,
    A2: float,
    mu2: float,
    wid2: float,
) -> np.ndarray

Calculate the sum of two Gaussian distributions (bimodal).

Parameters:

Name Type Description Default
x ndarray

Input array.

required
A1 float

Amplitude, center, and width of the first Gaussian component.

required
mu1 float

Amplitude, center, and width of the first Gaussian component.

required
wid1 float

Amplitude, center, and width of the first Gaussian component.

required
A2 float

Amplitude, center, and width of the second Gaussian component.

required
mu2 float

Amplitude, center, and width of the second Gaussian component.

required
wid2 float

Amplitude, center, and width of the second Gaussian component.

required

Returns:

Type Description
ndarray

Sum of the two Gaussian functions evaluated at x.

get_init_for_bimodal

get_init_for_bimodal(
    x: ndarray,
    y: ndarray,
    max_x: float,
    min_x: float,
    min_h_ratio: float = 1.5,
    max_w_ratio: float = 2,
    n_top: int = 100,
) -> Tuple[float, float]

Heuristically find initial guesses for the centers of two modes in a distribution.

Analyzes the second derivative of the distribution y with respect to x to find potential peaks (local minima in the second derivative) and selects the best pair based on height, width ratios, and a scoring function.

Parameters:

Name Type Description Default
x ndarray

The x-coordinates of the distribution data points. Assumed to be evenly spaced.

required
y ndarray

The y-coordinates (density/frequency) of the distribution data points.

required
max_x float

Maximum allowed x-value for candidate peaks.

required
min_x float

Minimum allowed x-value for candidate peaks.

required
min_h_ratio float

Minimum ratio relative to the global peak height for a candidate peak to be considered valid. Default is 1.5.

1.5
max_w_ratio float

Maximum allowed ratio between the widths defined by the two candidate peaks relative to the overall data range. Default is 2.

2
n_top int

Number of top candidate peaks (based on second derivative depth) to consider. Default is 100.

100

Returns:

Type Description
Tuple[float, float]

A tuple containing the estimated x-coordinates (centers) of the two modes.

rFASTCORMICSThreshold

rFASTCORMICSThreshold()

Bases: DistributionBased

Finds expression thresholds by fitting a bimodal Gaussian distribution.

This method assumes gene expression data often follows a bimodal distribution, representing 'off' and 'on' states. It fits two Gaussian curves to the data's Kernel Density Estimate (KDE) and uses the means of these curves as potential thresholds.

Initializes the rFASTCORMICSThreshold finder.

find_threshold

find_threshold(
    data: Union[ndarray, Series, dict, list],
    cut_off: float = -np.inf,
    return_heuristic: bool = False,
    hard_x_lims: tuple = (0.05, 0.95),
    k_best: int = 3,
) -> rFASTCORMICSThresholdAnalysis

Finds expression and non-expression thresholds using bimodal Gaussian fitting.

Filters data below cut_off, calculates KDE, fits a bimodal Gaussian, and returns the means of the fitted Gaussians as potential thresholds.

Parameters:

Name Type Description Default
data Union[ndarray, Series, dict, list]

Input expression data (typically log-transformed).

required
cut_off float

Value below which data points are excluded before analysis. Default is -infinity (no cutoff).

-inf
return_heuristic bool

If True, returns the initial heuristic guesses for the means instead of the fitted means. Default is False.

False
hard_x_lims tuple

Tuple defining the percentile range (e.g., (0.05, 0.95)) of the data to consider for fitting, restricting the search space for means. Default is (0.05, 0.95).

(0.05, 0.95)
k_best int

Number of best-fitting parameter sets to return (ranked by p-score). Default is 3.

3

Returns:

Type Description
rFASTCORMICSThresholdAnalysis

An analysis object containing the calculated thresholds, fitted curves, and intermediate data (x, y KDE values).

RankBased

RankBased()

Bases: ThresholdFinder

Base class for threshold finders based on data ranking (e.g., percentiles).

Initializes the RankBased finder.

find_threshold

find_threshold(**kwargs)

Abstract method for rank-based threshold finding.

Raises:

Type Description
NotImplementedError

PercentileThreshold

PercentileThreshold()

Bases: RankBased

Finds thresholds based on simple percentiles of the data.

Initializes the PercentileThreshold finder.

find_threshold

find_threshold(
    data: Union[ndarray, Series, dict, list],
    p: Union[int, float, List[int], List[float], ndarray],
    exp_p: Optional[Union[int, float]] = None,
    non_exp_p: Optional[Union[int, float]] = None,
    **kwargs
) -> PercentileThresholdAnalysis

Calculate thresholds based on specified percentiles of the input data.

Parameters:

Name Type Description Default
data Union[ndarray, Series, dict, list]
required
p Union[int, float, List[int], List[float], ndarray]

Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.

required
exp_p Optional[Union[int, float]]

Percentile to compute the expression threshold. If None, the maximum value of the percentiles is returned as the exp_th.

None
non_exp_p Optional[Union[int, float]]

Percentile to compute the non-expression threshold. If None, the minimum value of the percentiles is returned as the non_exp_th.

None
kwargs

kwargs used by np.percentile().

{}

Returns:

Name Type Description
result PercentileThresholdAnalysis

LocalThreshold

LocalThreshold()

Bases: RankBased

Finds local expression thresholds based on percentiles within groups.

Calculates gene-specific thresholds for different sample groups based on a specified percentile p. Optionally applies global 'on'/'off' thresholds to override local thresholds for genes consistently high or low across a group.

Initializes the LocalThreshold finder.

find_threshold

find_threshold(
    data: Union[ndarray, DataFrame],
    p: Union[int, float],
    global_on_p: Optional[Union[int, float]] = 90,
    global_off_p: Optional[Union[int, float]] = 10,
    global_on_th: Optional[Union[int, float]] = None,
    global_off_th: Optional[Union[int, float]] = None,
    groups: Optional[Union[Series, dict]] = None,
    genes: Optional[List[str]] = None,
    samples: Optional[List[str]] = None,
    **kwargs
) -> LocalThresholdAnalysis

Calculate local, gene-specific expression thresholds for sample groups.

Parameters:

Name Type Description Default
data Union[ndarray, DataFrame]

Input expression data (genes x samples). If ndarray, genes and samples arguments must be provided.

required
p Union[int, float]

The percentile (0-100) used to calculate the local threshold for each gene within each group.

required
global_on_p Optional[Union[int, float]]

Percentile (0-100) to determine the global 'on' threshold. If a gene's minimum expression within a group exceeds this percentile's value across the group, the global threshold overrides the local one. Used if global_on_th is None. Default is 90.

90
global_off_p Optional[Union[int, float]]

Percentile (0-100) to determine the global 'off' threshold. If a gene's maximum expression within a group is below this percentile's value across the group, the global threshold overrides the local one. Used if global_off_th is None. Default is 10.

10
global_on_th Optional[Union[int, float]]

Explicit value for the global 'on' threshold. Overrides global_on_p. Default is None.

None
global_off_th Optional[Union[int, float]]

Explicit value for the global 'off' threshold. Overrides global_off_p. Default is None.

None
groups Optional[Union[Series, dict]]

Mapping samples to groups. If dict, keys are group names, values are lists of sample names. If Series, index is sample names, values are group names. If None, all samples are treated as one group named 'exp_th'. Default is None.

None
genes Optional[List[str]]

List of gene names corresponding to rows if data is an ndarray. Required if data is an ndarray. Default is None.

None
samples Optional[List[str]]

List of sample names corresponding to columns if data is an ndarray. Required if data is an ndarray. Default is None.

None
**kwargs

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
local_threshold_analysis LocalThresholdAnalysis

The result object contains a local threshold dataframe. The dataframe will be N_gene x N_group(if the groups arg is specified) containing the expression thresholds