Statistical Power Computation

ideal_genom.power_comp.calculate_power_binary(beta: ndarray, daf: ndarray, prevalence: float, ncase: int, ncontrol: int, sig_level: float = 5e-08, or_to_rr: bool = False) → ndarray

Calculate the statistical power of genetic association tests in a case-control study with binary traits. This function computes the power to detect genetic associations in a case-control study design, assuming an additive genetic model. It accounts for allele frequencies, effect sizes, disease prevalence, and sample sizes to determine the probability of detecting true associations at a specified significance level.

Parameters:

beta (np.ndarray) – Log odds ratios representing effect sizes of variants.
daf (np.ndarray) – Disease allele frequencies.
prevalence (float) – Disease prevalence in the population.
ncase (int) – Number of cases in the study.
ncontrol (int) – Number of controls in the study.
sig_level (float, optional) – Significance level threshold for declaring associations (default: 5e-8, standard for GWAS).
or_to_rr (bool, optional) – If True, converts odds ratios to risk ratios using the disease prevalence (default: False).

Returns:

Array of statistical power values for each variant, representing the probability of detecting a true association at the specified significance level.

Return type:

np.ndarray

Notes

The function implements power calculation for an additive genetic model using normal approximation. When or_to_rr is True, it converts odds ratio to relative risk using the formula from Zhang and Yu (JAMA, 1998).

References

Zhang J, Yu KF. What’s the Relative Risk? JAMA. 1998;280(19):1690-1691. doi:10.1001/jama.280.19.1690

ideal_genom.power_comp.calculate_power_quantitative(beta: ndarray, eaf: ndarray, sample_size: int, sig_level: float = 5e-08, variance: float = 1) → ndarray

Calculate statistical power for detecting genetic associations in quantitative traits. This function computes the statistical power for quantitative trait association tests based on effect sizes (beta), effect allele frequencies (EAF), sample size, significance level, and trait variance.

Parameters:

beta (numpy.ndarray) – Array of effect sizes (regression coefficients)
eaf (numpy.ndarray) – Array of effect allele frequencies (between 0 and 1)
sample_size (int) – Number of individuals in the study
sig_level (float, optional) – Significance threshold (default: 5e-8, typical for GWAS)
variance (float, optional) – Phenotypic variance of the trait (default: 1)

Returns:

Array of statistical power values corresponding to each variant

Return type:

numpy.ndarray

Notes

The calculation uses the non-central chi-square distribution to determine the probability of detecting true associations at the specified significance level. Power is calculated as 1 minus the cumulative distribution function of the non-central chi-square distribution with 1 degree of freedom.

ideal_genom.power_comp.get_beta_binary(prevalence: float, ncase: int, ncontrol: int, eaf_range: tuple = (0.0001, 0.5), beta_range: tuple = (0.0001, 5), t: float = 0, sig_level: float = 5e-08, n_matrix: int = 500, or_to_rr: bool = False)

Find combinations of effect allele frequencies (EAF) and effect sizes (beta) that achieve a specified statistical power for binary traits in genetic association studies.

Parameters:

prevalence (float) – Disease prevalence in the population.
ncase (int) – Number of cases in the study.
ncontrol (int) – Number of controls in the study.
eaf_range (tuple, optional) – Range of effect allele frequencies to consider (min, max). Default is (0.0001, 0.5).
beta_range (tuple, optional) – Range of effect sizes to consider (min, max). Default is (0.0001, 5).
t (float, optional) – Target power threshold. Combinations with power >= t will be returned. Default is 0.
sig_level (float, optional) – Significance level (alpha) for power calculation. Default is 5e-8 (genome-wide significance).
n_matrix (int, optional) – Resolution of the EAF-beta matrix. Default is 500.
or_to_rr (bool, optional) – If True, converts odds ratios to relative risks using the provided prevalence. If False, approximates genetic relative risk (GRR) using odds ratios. Default is False.

Returns:

DataFrame containing EAF and beta combinations that achieve the target power. The DataFrame has columns ‘eaf’ and ‘beta’ and is sorted by ‘eaf’ in descending order.

Return type:

pandas.DataFrame

Notes

When prevalence is less than 10%, GRR is very similar to OR if or_to_rr is set to False. The function uses a greedy algorithm to find EAF-beta combinations with power >= t.

ideal_genom.power_comp.get_beta_quantitative(eaf_range: Tuple[float, float] = (1e-05, 0.5), beta_range: Tuple[float, float] = (1e-05, 5), t: float = 0, sample_size: int | None = None, sig_level: float = 5e-08, variance: float = 1, n_matrix: int = 500) → DataFrame

Calculate beta values for quantitative traits based on power threshold.

Parameters:

eaf_rangeTuple[float, float]: Range of effect allele frequencies (min, max)
beta_rangeTuple[float, float]: Range of beta values (min, max)
tfloat: Power threshold (0 to 1)
sample_sizeint: Sample size
sig_levelfloat: Significance level
variancefloat: Variance
n_matrixint: Size of the grid for calculations

Returns:

pd.DataFrame: DataFrame containing eaf and beta values meeting the threshold