GWAS with Generalized Linear Models (GLM)

This module provides a class for performing Genome-Wide Association Studies (GWAS) using a Generalized Linear Model (GLM) with PLINK2.

It includes methods for association analysis, obtaining top hits, and annotating SNPs with gene information.

class ideal_genom.gwas.gwas_glm.GWASfixed

Bases: object

Class for performing Genome-Wide Association Studies (GWAS) using a Generalized Linear Model (GLM) with PLINK2.

This class provides methods to perform association analysis, obtain top hits, and annotate SNPs with gene information.

input_path

Path to the input directory.

Type:

str

output_path

Path to the output directory.

Type:

str

input_name

Base name of the input PLINK files.

Type:

str

output_name

Base name for the output files.

Type:

str

recompute

Flag indicating whether to recompute the analysis.

Type:

bool

results_dir

Directory where the results will be saved.

Type:

str

Raises:
  • ValueError – If input_path, output_path, input_name, or output_name are not provided.

  • FileNotFoundError – If the specified input_path or output_path does not exist.

  • FileNotFoundError – If the required PLINK files (.bed, .bim, .fam) are not found in the input_path.

  • TypeError – If input_name or output_name are not strings, or if recompute is not a boolean.

__init__(input_path: str, input_name: str, output_path: str, output_name: str, recompute: bool = True) None
annotate_top_hits(gtf_path: str | None = None, build: str = '38', anno_source: str = 'ensembl') dict

Annotate top SNP hits from COJO analysis with gene information.

This method reads the COJO joint analysis results, extracts the top SNPs, and annotates them with gene information using the specified genome build and annotation source. The annotated results are saved to a TSV file.

Parameters:
  • gtf_path (Optional[str], default=None) – Path to the GTF (Gene Transfer Format) file for custom annotation. If None, the annotation will use default resources.

  • build (str, default='38') – Genome build version to use for annotation (‘38’ for GRCh38, etc.).

  • anno_source (str, default="ensembl") – Source of annotations to use (e.g., “ensembl”, “refseq”).

Returns:

A dictionary containing: - ‘pass’: Boolean indicating if the process completed successfully - ‘step’: The name of the step (‘annotate_hits’) - ‘output’: Dictionary with output file paths

Return type:

dict

Raises:

FileExistsError – If the COJO results file is not found in the results directory.

Notes

The annotated results are saved to ‘top_hits_annotated.tsv’ in the results directory.

fixed_model_association_analysis(maf: float = 0.01, mind: float = 0.1, hwe: float = 5e-06, ci: float = 0.95) dict

Perform fixed model association analysis using PLINK2.

This method performs a fixed model association analysis on genomic data using PLINK2. It checks the validity of the input parameters, ensures necessary files exist, and executes the PLINK2 command to perform the analysis.

Parameters:
  • maf (float) – Minor allele frequency threshold. Must be between 0 and 0.5.

  • mind (float) – Individual missingness threshold. Must be between 0 and 1.

  • hwe (float) – Hardy-Weinberg equilibrium threshold. Must be between 0 and 1.

  • ci (float) – Confidence interval threshold. Must be between 0 and 1.

Returns:

A dictionary containing the status of the process, the step name, and the output directory.

Return type:

dict

Raises:
  • TypeError – If any of the input parameters are not of type float.

  • ValueError – If any of the input parameters are out of their respective valid ranges.

  • FileNotFoundError – If the required PCA file is not found.

get_top_hits(maf: float = 0.01) dict

Get the top hits from the GWAS results.

Parameters:

maf (float) – Minor allele frequency threshold. Must be a float between 0 and 0.5.

Returns:

A dictionary containing the process status, step name, and output directory.

Return type:

dict

Raises:
  • TypeError – If maf is not of type float.

  • ValueError – If maf is not between 0 and 0.5.

Notes

The function performs the following steps:
  1. Validates the type and range of the maf parameter.

  2. Computes the number of threads to use based on the available CPU cores.

  3. Loads the results of the association analysis and renames columns according to GCTA requirements.

  4. Prepares a .ma file with the necessary columns.

  5. If recompute is True, constructs and executes a GCTA command to perform conditional and joint analysis.

  6. Returns a dictionary with the process status, step name, and output directory.