Best Usage

Following is a number of recommendations on the best usage of GWASpi. More sections will be added as we discuss study cases with our users and a data bank of knowledge is generated on issues related to GWAS in general and their management with GWASpi in particular.

You may find further details on how to perform basic statistical analysis in a population-based genetic association case-control study in this Nature Protocls article.

Study Design

GWASpi is not intended to fix the study design ‘a posteriori’. Studies should be well designed from the start to expect any ensuing statistical analysis to yield dependable results.

Similarly, each genotyping platform and/or technology have their own caveats and these should be respected. These platforms will provide the suitable quality control methods to achieve a consistent dataset. These quality control methods are often centered on the process of genotyping and the correct functioning of particular techniques (such as contrast check, signal normalization, correct hybridization tests and so on). GWASpi is devised to take over from that point and perform second stage quality controls, oriented towards ascertaining whether a dataset is fit for GWAS analysis.

It should be noted that raw data that have been genotyped on different platforms should not be joined until a complete suite of previous quality controls have been performed and the genotypes can be considered called with a sufficient degree of certainty.

Also, different platforms deliver the genotype calls in different encodings. Translating from one encoding to another is sometimes simple (translating AB, 12 and 1234 to ACGT) and sometimes requires painstaking manipulations and checks (different builds and mappings, non matching stranding of alleles…). Merging such genotypes previous to the necessary manipulations will produce erroneous datasets. We are currently working in utilities to be integrated in GWASpi that would ease that problem, but until then care should be taken.

Population effects

A common complication of GWAS is the occurrence of population structure in the chosen sample-sets.The most common bias occurs when, in a case-control study, the groups corresponding to cases and controls present different population structure, resulting in different allele frequencies in the two groups. This circumstance is well-known to produce an excess of significance and spurious associations. As a consequence, population stratification is usually checked by means of Principal Component Analysis previous to final performance of association studies.

GWASpi has not yet implemented a statistical analysis allowing for detection of such an effect and we recommend the usage of freely available and well documented packages such as Eigensoft’s smartPCA. Exporting to this format from GWASpi is straightforward.