home
what is ancestry by DNA
products and services
faq
press
glossary
contact
order now
 
Ancestry Kit | Upgrade | Manual | Experiments | Accuracy and Precision | Ethnicities |
Results from simulated experiments | Case Studies | Interpretation of Results

Accuracy and Precision

The question to answer when developing an admixture test is how many AIMs, and what quality of AIMs are required to minimize the incidence of individuals with MLEs of artificial or erroneous admixture. How frequently they are encountered for each group, given a specific battery of AIMs, is best determined through simulations.

If we assume that the loci are unlinked (which they have been determined to be) and the mating between individuals of a given population are random, the multi-locus genotype frequencies in each population are determined from the allele frequencies and an equation known as the product rule. Using these allele frequencies, we simulate a population of 100,000 European, East Asian, African and Native American individuals, draw 10,000 from each population and use the 71 and 175 marker AncestryByDNA tests (2.0 and 2.5, respectively) to calculate the BGA proportions for each simulated sample selected.

With a perfect test, using discretely distributed alleles, each simulated individual would type as 100% affiliation with their own group. In the real world, using markers that are continuously distributed, there is a level of statistical noise. The purpose of the simulations is to define what that level of noise should be expected.

Below we show the average percentage affiliations for each type of simulated sample. The total admixture average is the sum of the average admixture percentages (columns) for samples of a particular simulated population (represented in the rows). For example, for the 71-marker test, we calculate the average level of European, East Asian and Native American admixture in simulated Africans to be 0.96%, 0.1% and 0.87%, respectively.

AncestryByDNA 2.0 (71 AIM's)

AFR EUR EAS NAM Total Total admixture Avg.
Africans 98.07 0.96 0.1 0.87 100 1.93
Europeans 0.08 95.63 2.25 2.04 100 4.37
East Asians 0.03 2.45 92.98 4.54 100 7.02
Native Americans 0.01 1.83 3.63 94.53 100 5.47
Avg 4.70

AncestryByDNA 2.5 (175 AIM's)

AFR EUR EAS NAM Total Total admixture Avg.
Africans 98.21 0.93 0.71 0.15 100 1.79
Europeans 0.4 96.36 1.5 1.74 100 3.64
East Asians 0.08 1.43 95.48 3.01 100 4.52
Native Americans 0 1.16 2.08 96.76 100 3.24
Avg 3.30

From this table one can see that the average level of artificial admixture using the 71 marker test is about 5% and the level using the 175 marker test is lower, at about 3%. Looking at the European row in the 175-marker table, we can see that the average simulated European sample exhibits 1.5% East Asian ancestry and 1.74% Native American ancestry. This suggests that the average (but of course, not every) real person who is 100% European will show 1.5% East Asian ancestry as statistical noise, and 3.64% non-European admixture in total as statistical noise. Some will show higher levels, and some will show 0%. One can use these values as a guide for interpreting their result. For example, if a European suspects a small amount of NAM admixture using the 175 marker test and obtains a reading of 1%, they can see that the average simulated European has the same level, and so the data does not support (nor refute) the NAM admixture.

We can look at the simulation data in another way – by looking at the variation of results in simulated samples. How common is it that a simulated European (or other) individual exhibits 5% or greater African (or other) ancestry? The results are shown below.

71 marker test

>15% AFR EUR EAS NAM Avg. Outside Group
Africans 1 0.0012 0 0.0008 0.0007
Europeans 0.0006 1 0.0342 0.0306 0.0218
East Asians 0 0.029 1 0.1225 0.0505
Native Americans 0 0.0177 0.0873 1 0.035
Avg. = 0.027

175 marker test

>15% AFR EUR EAS NAM Avg. Outside Group
Africans 1 0.00048 0 0 .00016
Europeans 0.00096 1 0.005268 0.0158 .00734
East Asians 0 0.00165 1 0.050 .01722
Native Americans 0 0.001 0.021 1 .00733
Avg= 0.0081

We see that the about 2.7% of simulated samples show 15% or greater artificial admixture using the 71 marker test, and that less than 1% of simulated samples show 15% or greater artificial admixture using the 175 marker test. Reading across the European row, for the 175-marker test, we see that .5% of European individuals registered with 15% or greater East Asian admixture. One can use these values also as a guide for interpreting their result. For example, if a European suspects a small amount of NAM admixture using the 175 marker test and obtains a reading of 17%, this table shows that only 1.5% of simulated Europeans exhibited NAM affiliation this high, and so based on the average MLE, there is about a 98.5% chance that this result indicates real NAM admixture (of course this is based on the average MLE, and a customer should also refer to their individualized confidence contours on their triangle plot).

We can make tables like this for all of the possible values X>5%, X>10%, X>15% . . . all the way to X>50%. All 40,000 samples were completely affiliated with their own group at levels of 50% or greater. Down to X=5%, we see it is not uncommon for simulated samples to show affiliation with another group at this level of X. From all of these tables, we can compute the rough percentage value necessary to conclude with 95% certainty that partial group affiliation means real affiliation and not statistical noise.

The tables below show the threshold of 95% confidence for each type of admixture in simulated individuals of homogeneous ancestry. A reading at or above X (where X is a % value in a cell of this table) means with 95% certainty that the reading is caused by affiliation with that group in the column, as opposed to the alternative, that the individual is really homogeneously affiliated with their group and the partial affiliation is the result of statistical noise. For example, using the 175 marker test, admixture greater than or equal to 10% Native American is required for an individual of polarized (i.e. mainly) European ancestry to conclude with 95% confidence that there really is Native American admixture as opposed to there being none (and no other admixture). Using the 71 marker test, one must see 12.5% NAM affiliation to conclude with 95% confidence that there really is Native American admixture as opposed to there really being none (and no other admixture).

Of course, a customer could get a reading of 8% Native American with the 71 marker test, which is below the 12.5% level threshold, and it still be true that there is Native American admixture. Such a person should type other members of their family. If the level of NAM increases going up the family tree, and if the level is above the threshold in some of the ancestors, it is probably a real indication of NAM admixture for the customer, even though the 8% is below the 95% confidence threshold. In fact, 8% falls near the 90% threshold (we don’t show the 90% threshold on the website), not the 95% threshold, meaning that on its own (regardless of values in family members), the 8% is an indicator of NAM ancestry with 90% (not 95%) confidence.

Table SIMSUM175
Threshold of affiliation percentages for samples of polarized, binary affiliation, above which results indicate fractional affiliation with a p < 0.05, using the 175-marker admixture test.

AFR EUR EAS NAM
Africans <2.0% 7% 5% <2.0%
Europeans 3.50% <2.0% 9% 10%
East Asians <2.0% 8% <2.0% 12.50%
Native American <2.0% 7.50% 11.50% <2.0%

Table SIMSUM71

AFR EUR EAS NAM
Africans < 2% 8% <2% <8%
Europeans 2% < 2% 13% 12.5%
East Asians < 2% 11.5% < 2% 17.50%
Native American <2% 9% 17.5% < 2%

Of course, there is nothing magical about a 95% vs. 90% confidence interval - and one might argue that genealogists rely on much lower levels of confidence considering non-genetic data. The threshold values required to conclude a bona-fide affiliation with 90% certainty are about 2/3rds those shown above

Results compared to the expectations of amateur genealogists

In 2003, Charles Kerchner,a customer and amateur genealogist, established a website through which customers could report their expected (from traditional lines of research) and observed (from AncestryByDNA 2.0) admixture proportions along with explanations and comments on the nature of their evidence. Our calculations below were based on the data taken from Charles Kerchner’s project as of Jan. 2004, and you can view his site at http://www.dnaprintlog.org.

From a sample of 108 genealogists, we are able to compare the expected and observed ancestry levels and note any interesting trends. As shown in the tables below, the average amateur genealogist expected 6.6% Native American admixture and got 5.1% using the 71-marker test (Table GENEXPOBS). Of all the amateur genealogists, 69% of those expected some Native American ancestry and got a value from the 71 marker test that was within 10 percentage points of the level they expected (Table ADMIX10).

Table GENEXPOBS
The average percentage of fractional BGA affiliation expected by 108 amateur genealogists compared to the average percentage this sample of genealogist actually obtained using the 71 AIM BGA admixture test (AncestryByDNA 2.0)

Expected versus Observed levels n=108

EUR EAS NAM AFR
Expected 0.858 0.020 0.066 0.056
Observed 0.844 0.055 0.051 0.051

Table ADMIX10. Precision of MLE against genealogist expectations using the 71 AIM BGA test. (AncestryByDNA 2.0)

Admix value within 10% expected value
Percentage of genealogists for whom the Admix value was within 10 percentage points of expected value

EUR EAS NAM AFR
All samples 0.71 (n=108) 0.83 (n=108) 0.81(108) 0.96 (n=108)
Expecting admix. 0.7 (n=105) 0.7 (n=10) 0.69 (64) 0.77 (n=17 )

Though this shows that the average individual gets a result within 10 percentage points of expected, not shown in these tables is the fact that the standard deviation (avg. difference from expectations) was relatively high, and so there were many individuals that got significantly different results than they expected. Part of the standard deviation is due to the anecdotal nature of some genealogical expectations; scanning over Mr. Kerchner’s site gives you an idea of the differences in quality of the expectations between individuals – some are better than others. Some of the standard deviation is due to the test. Low levels of NAM or EAS admixture were commonly expected – many times at levels not much different from the threshold values shown above, suggesting that the test assigning low levels of EAS/NAM admixture differently than some genealogists expected them to be assigned. Though the simulations do not specifically address individuals with low levels of admixture (they address individuals of homogeneous ancestry), data from our labs show that the data for the former is not much different and that of the latter and that the relatively high standard deviation is not due solely to statistical error inherent to the test – in other words, there may very well be an anthropological, sociological, psychological or other type of explanation.