Association Studies via Penalized Regression for Common and Rare Variants

Kristin Ayers


Abstract:

We have previously investigated a variety of penalized regression methods in genetic association analysis as they offer an attractive alternative to single marker testing with common variants. Analyzing markers together in a regression model allows one to consider the impact of markers on other markers. Penalized regression methods perform model selection by shrinking down to zero the size of the regression coefficient of individual markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. We compared several penalty functions, including the elastic net, ridge, Lasso, MCP, and the NEG shrinkage prior, to standard single locus analysis and simple forward stepwise regression in both detection and localization. Results show that penalized methods outperform single marker analysis when we have several causal variants.

However, association studies are currently underpowered for very rare variants, and methods have been proposed for grouping or aggregating rare variants for testing. In this way, we can look for the effect of a combination of markers. Additionally, testing for association between multiple markers, such as a haplotype, can capture untyped causal variants in weak linkage disequilibrium with nearby typed markers. We propose a sliding window approach which uses multi-marker genotypes as variables in penalized regression. We investigate a penalty with three separate components: (1) a group lasso that encourages all multi-marker genotypes in a gene to be included or excluded from the model, (2) an allele sharing penalty that encourages multi-marker genotypes with similar alleles to have similar coefficients, and (3) a penalty that shrinks the size of coefficients while performing model selection. We perform association analysis on the GAW 17 sequence data with a simulated quantitative trait, and compare our method to single marker analysis and a gene based group lasso. We hope that increased sample size and better sequencing technologies with make this and similar methods valuable options in future analyses.