Using ancestry-informative markers to identify fine structure across 15 populations of European origi
Authors: L. M. Huckins1, V. Boraska1,2, C. Franklin1, J. Floyd1, Genetics Consortium of Anorexia Nervosa, Wellcome Trust Case Control Consortium, P. Sullivan3, D. Collier4, C. Bulik3, C. Tyler-Smith1, E. Zeggini1, I. Tachmazidou1;
1Wellcome Trust Sanger Institute, Hinxton, United Kingdom, 2University of Split, Split, Croatia, 3University of North Carolina, Chapel Hill, NC, United States, 4King's College, London, United Kingdom.
Abstract: The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2,907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip (UK, Dutch, Swedish, Finnish, German, Austrian, Polish, Northern Italian, Southern Italian, Greek, USA, Canadian, Czech, French, Norwegian
). This offers a unique opportunity to study genomic variation within and across these populations, and establish genomic relationships with other publicly available populations of European ancestry. We have examined the allele frequency spectrum of common variants, and compared genomic characteristics across these populations and also with populations from the 1000 Genomes Project. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF)>5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF>1% should be used. We have established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations (Novembre et al. 2008), some studied for the first time at this scale. We identified ancestry-informative markers using a method novel to the human genetics field, which may correct for sample size bias in smaller population sizes (following the bias-corrected entropy estimator proposed by Panzeri and Treves, 2007) and which allows for more efficient use of these SNPs. Finally, we investigated substructure within these 15 populations and identified SNPs that help capture hidden stratification.
This work can inform the design and association results interpretation of trans-ethnic studies.