Introduction to GWASpi

Motivation: Genome-wide Association Studies (GWAS) based on Single Nucleotide Polymorphism (SNP) arrays are the most widely used approach to detect loci associated to human traits. Due to the complexity of the methods and software packages available, each with its particular format requiring intricate management work-flows, the analysis of GWAS usually confronts scientists with steep learning curves. Indeed, the wide variety of tools makes the parsing and manipulation of data the most time-consuming and error prone part of a study. To help solving these issues, we present GWASpi, a user-friendly, multi-platform, desktop-able application for the management and analysis of GWAS data, with a novel approach on database technologies to leverage the most out of commonly available desktop hardware. GWASpi is a start-to-finish GWAS management application, from raw data to results, containing the most common analysis tools. As a result, GWASpi is easy to use, both in Graphic User Interface as well as Command Line Interface, and reduces in up to two orders of magnitude the time needed to perform the fundamental steps of a GWAS.

As standardized arrays from several fabricators have become widespread, Genome Wide Association Studies (GWAS) have come to be the favoured method to detect loci associated with human hereditary traits, specially diseases. The number of studies published yearly based on these arrays, has constantly increased from 3 in 2003 to 337 in 2010. In parallel, reference databases such as Hapmap Phases II and III, HGDP and the 1000 Genomes Project are being made available for a wide range of analytical methods such as controlling for geographic stratification of samples and imputation of non-observed SNPs. Many tested and reliable statistical methods are vailable to analyse GWAS data and new approaches to obtain meaningful results are constantly being put to the disposal of scientists. All of the extant tools, however, leave it to the same user to tackle the jungle of formats and the bulk of raw data generated by GWAS.

Also, learning how to apply the methods commonly used in GWASs takes significant time, extending an already lengthy and arduous data gathering phase. Thus, the steep learning curve and the burden of manipulation of the raw data still makes the access to GWAS a costly endeavour for departments not endowed with Bioinformatics personnel, often proving to be a persistent bottleneck right before publishing deadlines.

To contribute to solve this problem and make GWAS an achievable effort for smaller teams, as well as for the sake of speeding up raw data management in a consistent, self-contained way for the general researcher community, the GWAS Pipeline (GWASpi) has been developed and made available.

Comments are closed.