Thresholds¶
ThresholdFinders ¶
ThresholdFinders()
Bases: ObjectFactory
Factory class for registering and creating threshold finding algorithms.
Initializes the ThresholdFinders factory.
ThresholdFinder ¶
ThresholdFinder()
Abstract base class for threshold finding algorithms.
Initializes the ThresholdFinder.
find_threshold ¶
find_threshold(**kwargs)
Abstract method to find expression thresholds.
Subclasses must implement this method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Algorithm-specific parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the subclass does not implement this method. |
DistributionBased ¶
DistributionBased()
Bases: ThresholdFinder
Base class for threshold finders based on data distribution analysis.
Initializes the DistributionBased finder.
find_threshold ¶
find_threshold(**kwargs)
Abstract method for distribution-based threshold finding.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
|
gaussian_dist
staticmethod
¶
gaussian_dist(
x: ndarray, amp: float, cen: float, wid: float
) -> np.ndarray
Calculate the Gaussian function value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Input array. |
required |
amp
|
float
|
Amplitude (height) of the Gaussian peak. |
required |
cen
|
float
|
Center (mean) of the Gaussian distribution. |
required |
wid
|
float
|
Width (related to variance) of the Gaussian distribution. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Gaussian function values corresponding to x. |
bimodal_dist ¶
bimodal_dist(
x: ndarray,
A1: float,
mu1: float,
wid1: float,
A2: float,
mu2: float,
wid2: float,
) -> np.ndarray
Calculate the sum of two Gaussian distributions (bimodal).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Input array. |
required |
A1
|
float
|
Amplitude, center, and width of the first Gaussian component. |
required |
mu1
|
float
|
Amplitude, center, and width of the first Gaussian component. |
required |
wid1
|
float
|
Amplitude, center, and width of the first Gaussian component. |
required |
A2
|
float
|
Amplitude, center, and width of the second Gaussian component. |
required |
mu2
|
float
|
Amplitude, center, and width of the second Gaussian component. |
required |
wid2
|
float
|
Amplitude, center, and width of the second Gaussian component. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Sum of the two Gaussian functions evaluated at x. |
get_init_for_bimodal ¶
get_init_for_bimodal(
x: ndarray,
y: ndarray,
max_x: float,
min_x: float,
min_h_ratio: float = 1.5,
max_w_ratio: float = 2,
n_top: int = 100,
) -> Tuple[float, float]
Heuristically find initial guesses for the centers of two modes in a distribution.
Analyzes the second derivative of the distribution y with respect to x
to find potential peaks (local minima in the second derivative) and selects
the best pair based on height, width ratios, and a scoring function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
The x-coordinates of the distribution data points. Assumed to be evenly spaced. |
required |
y
|
ndarray
|
The y-coordinates (density/frequency) of the distribution data points. |
required |
max_x
|
float
|
Maximum allowed x-value for candidate peaks. |
required |
min_x
|
float
|
Minimum allowed x-value for candidate peaks. |
required |
min_h_ratio
|
float
|
Minimum ratio relative to the global peak height for a candidate peak to be considered valid. Default is 1.5. |
1.5
|
max_w_ratio
|
float
|
Maximum allowed ratio between the widths defined by the two candidate peaks relative to the overall data range. Default is 2. |
2
|
n_top
|
int
|
Number of top candidate peaks (based on second derivative depth) to consider. Default is 100. |
100
|
Returns:
| Type | Description |
|---|---|
Tuple[float, float]
|
A tuple containing the estimated x-coordinates (centers) of the two modes. |
rFASTCORMICSThreshold ¶
rFASTCORMICSThreshold()
Bases: DistributionBased
Finds expression thresholds by fitting a bimodal Gaussian distribution.
This method assumes gene expression data often follows a bimodal distribution, representing 'off' and 'on' states. It fits two Gaussian curves to the data's Kernel Density Estimate (KDE) and uses the means of these curves as potential thresholds.
Initializes the rFASTCORMICSThreshold finder.
find_threshold ¶
find_threshold(
data: Union[ndarray, Series, dict, list],
cut_off: float = -np.inf,
return_heuristic: bool = False,
hard_x_lims: tuple = (0.05, 0.95),
k_best: int = 3,
) -> rFASTCORMICSThresholdAnalysis
Finds expression and non-expression thresholds using bimodal Gaussian fitting.
Filters data below cut_off, calculates KDE, fits a bimodal Gaussian,
and returns the means of the fitted Gaussians as potential thresholds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[ndarray, Series, dict, list]
|
Input expression data (typically log-transformed). |
required |
cut_off
|
float
|
Value below which data points are excluded before analysis. Default is -infinity (no cutoff). |
-inf
|
return_heuristic
|
bool
|
If True, returns the initial heuristic guesses for the means instead of the fitted means. Default is False. |
False
|
hard_x_lims
|
tuple
|
Tuple defining the percentile range (e.g., (0.05, 0.95)) of the data to consider for fitting, restricting the search space for means. Default is (0.05, 0.95). |
(0.05, 0.95)
|
k_best
|
int
|
Number of best-fitting parameter sets to return (ranked by p-score). Default is 3. |
3
|
Returns:
| Type | Description |
|---|---|
rFASTCORMICSThresholdAnalysis
|
An analysis object containing the calculated thresholds, fitted curves, and intermediate data (x, y KDE values). |
RankBased ¶
RankBased()
Bases: ThresholdFinder
Base class for threshold finders based on data ranking (e.g., percentiles).
Initializes the RankBased finder.
find_threshold ¶
find_threshold(**kwargs)
Abstract method for rank-based threshold finding.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
|
PercentileThreshold ¶
PercentileThreshold()
Bases: RankBased
Finds thresholds based on simple percentiles of the data.
Initializes the PercentileThreshold finder.
find_threshold ¶
find_threshold(
data: Union[ndarray, Series, dict, list],
p: Union[int, float, List[int], List[float], ndarray],
exp_p: Optional[Union[int, float]] = None,
non_exp_p: Optional[Union[int, float]] = None,
**kwargs
) -> PercentileThresholdAnalysis
Calculate thresholds based on specified percentiles of the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[ndarray, Series, dict, list]
|
|
required |
p
|
Union[int, float, List[int], List[float], ndarray]
|
Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive. |
required |
exp_p
|
Optional[Union[int, float]]
|
Percentile to compute the expression threshold. If None, the maximum value of the percentiles is returned as the exp_th. |
None
|
non_exp_p
|
Optional[Union[int, float]]
|
Percentile to compute the non-expression threshold. If None, the minimum value of the percentiles is returned as the non_exp_th. |
None
|
kwargs
|
kwargs used by np.percentile(). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
result |
PercentileThresholdAnalysis
|
|
LocalThreshold ¶
LocalThreshold()
Bases: RankBased
Finds local expression thresholds based on percentiles within groups.
Calculates gene-specific thresholds for different sample groups based on
a specified percentile p. Optionally applies global 'on'/'off' thresholds
to override local thresholds for genes consistently high or low across a group.
Initializes the LocalThreshold finder.
find_threshold ¶
find_threshold(
data: Union[ndarray, DataFrame],
p: Union[int, float],
global_on_p: Optional[Union[int, float]] = 90,
global_off_p: Optional[Union[int, float]] = 10,
global_on_th: Optional[Union[int, float]] = None,
global_off_th: Optional[Union[int, float]] = None,
groups: Optional[Union[Series, dict]] = None,
genes: Optional[List[str]] = None,
samples: Optional[List[str]] = None,
**kwargs
) -> LocalThresholdAnalysis
Calculate local, gene-specific expression thresholds for sample groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[ndarray, DataFrame]
|
Input expression data (genes x samples). If ndarray, |
required |
p
|
Union[int, float]
|
The percentile (0-100) used to calculate the local threshold for each gene within each group. |
required |
global_on_p
|
Optional[Union[int, float]]
|
Percentile (0-100) to determine the global 'on' threshold. If a gene's
minimum expression within a group exceeds this percentile's value
across the group, the global threshold overrides the local one.
Used if |
90
|
global_off_p
|
Optional[Union[int, float]]
|
Percentile (0-100) to determine the global 'off' threshold. If a gene's
maximum expression within a group is below this percentile's value
across the group, the global threshold overrides the local one.
Used if |
10
|
global_on_th
|
Optional[Union[int, float]]
|
Explicit value for the global 'on' threshold. Overrides |
None
|
global_off_th
|
Optional[Union[int, float]]
|
Explicit value for the global 'off' threshold. Overrides |
None
|
groups
|
Optional[Union[Series, dict]]
|
Mapping samples to groups. If dict, keys are group names, values are lists of sample names. If Series, index is sample names, values are group names. If None, all samples are treated as one group named 'exp_th'. Default is None. |
None
|
genes
|
Optional[List[str]]
|
List of gene names corresponding to rows if |
None
|
samples
|
Optional[List[str]]
|
List of sample names corresponding to columns if |
None
|
**kwargs
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
local_threshold_analysis |
LocalThresholdAnalysis
|
The result object contains a local threshold dataframe. The dataframe will be N_gene x N_group(if the groups arg is specified) containing the expression thresholds |