Accuracy and Precision
The question to answer when developing an admixture test
is how many AIMs, and what quality of AIMs are required to minimize the incidence
of individuals with MLEs of artificial or erroneous admixture. How frequently
they are encountered for each group, given a specific battery of AIMs, is best
determined through simulations.
If we assume that the loci are unlinked (which they have been determined to
be) and the mating between individuals of a given population are random, the
multi-locus genotype frequencies in each population are determined from the
allele frequencies and an equation known as the product rule. Using these allele
frequencies, we simulate a population of 100,000 European, East Asian, African
and Native American individuals, draw 10,000 from each population and use the
71 and 175 marker AncestryByDNA™ tests (2.0 and 2.5, respectively) to calculate
the BGA proportions for each simulated sample selected.
With a perfect test, using discretely distributed alleles, each simulated
individual would type as 100% affiliation with their own group. In the real
world, using markers that are continuously distributed, there is a level of
statistical noise. The purpose of the simulations is to define what that level
of noise should be expected.
Below we show the average percentage affiliations for each type of simulated
sample. The total admixture average is the sum of the average admixture percentages
(columns) for samples of a particular simulated population (represented in
the rows). For example, for the 71-marker test, we calculate the average level
of European, East Asian and Native American admixture in simulated Africans
to be 0.96%, 0.1% and 0.87%, respectively.
AncestryByDNA™ 2.0 (71 AIM's)
|
AFR |
EUR |
EAS |
NAM |
Total |
Total admixture Avg. |
| Africans |
98.07 |
0.96 |
0.1 |
0.87 |
100 |
1.93 |
| Europeans |
0.08 |
95.63 |
2.25 |
2.04 |
100 |
4.37 |
| East Asians |
0.03 |
2.45 |
92.98 |
4.54 |
100 |
7.02 |
| Native Americans |
0.01 |
1.83 |
3.63 |
94.53 |
100 |
5.47 |
|
Avg |
4.70 |
AncestryByDNA™ 2.5 (175 AIM's)
| |
AFR |
EUR |
EAS |
NAM |
Total |
Total admixture Avg. |
| Africans |
98.21 |
0.93 |
0.71 |
0.15 |
100 |
1.79 |
| Europeans |
0.4 |
96.36 |
1.5 |
1.74 |
100 |
3.64 |
| East Asians |
0.08 |
1.43 |
95.48 |
3.01 |
100 |
4.52 |
| Native Americans |
0 |
1.16 |
2.08 |
96.76 |
100 |
3.24 |
| |
Avg |
3.30 |
From this table one can see that the average level of artificial
admixture using the 71 marker test is about 5% and the level using the 175
marker test is lower, at about 3%. Looking at the European row in the 175-marker
table, we can see that the average simulated European sample exhibits 1.5%
East Asian ancestry and 1.74% Native American ancestry. This suggests that
the average (but of course, not every) real person who is 100% European will
show 1.5% East Asian ancestry as statistical noise, and 3.64% non-European
admixture in total as statistical noise. Some will show higher levels, and
some will show 0%. One can use these values as a guide for interpreting their
result. For example, if a European suspects a small amount of NAM admixture
using the 175 marker test and obtains a reading of 1%, they can see that the
average simulated European has the same level, and so the data does not support
(nor refute) the NAM admixture.
We can look at the simulation data in another way – by looking at
the variation of results in simulated samples. How common is it that a simulated
European (or other) individual exhibits 5% or greater African (or other)
ancestry? The results are shown below.
71 marker test
| >15% |
AFR |
EUR |
EAS |
NAM |
Avg. Outside Group |
| Africans |
1 |
0.0012 |
0 |
0.0008 |
0.0007 |
| Europeans |
0.0006 |
1 |
0.0342 |
0.0306 |
0.0218 |
| East Asians |
0 |
0.029 |
1 |
0.1225 |
0.0505 |
| Native Americans |
0 |
0.0177 |
0.0873 |
1 |
0.035 |
| |
Avg. = 0.027 |
175 marker test
| >15% |
AFR |
EUR |
EAS |
NAM |
Avg. Outside Group |
| Africans |
1 |
0.00048 |
0 |
0 |
.00016 |
| Europeans |
0.00096 |
1 |
0.005268 |
0.0158 |
.00734 |
| East Asians |
0 |
0.00165 |
1 |
0.050 |
.01722 |
| Native Americans |
0 |
0.001 |
0.021 |
1 |
.00733 |
| |
|
|
|
|
Avg= 0.0081 |
We see that the about 2.7% of simulated samples show 15%
or greater artificial admixture using the 71 marker test, and that less than
1% of simulated samples show 15% or greater artificial admixture using the
175 marker test. Reading across the European row, for the 175-marker test,
we see that .5% of European individuals registered with 15% or greater East
Asian admixture. One can use these values also as a guide for interpreting
their result. For example, if a European suspects a small amount of NAM admixture
using the 175 marker test and obtains a reading of 17%, this table shows that
only 1.5% of simulated Europeans exhibited NAM affiliation this high, and so
based on the average MLE, there is about a 98.5% chance that this result indicates
real NAM admixture (of course this is based on the average MLE, and a customer
should also refer to their individualized confidence contours on their triangle
plot).
We can make tables like this for all of the possible values X>5%, X>10%,
X>15% . . . all the way to X>50%. All 40,000 samples were completely
affiliated with their own group at levels of 50% or greater. Down to X=5%,
we see it is not uncommon for simulated samples to show affiliation with another
group at this level of X. From all of these tables, we can compute the rough
percentage value necessary to conclude with 95% certainty that partial group
affiliation means real affiliation and not statistical noise.
The tables below show the threshold of 95% confidence for each type of admixture
in simulated individuals of homogeneous ancestry. A reading at or above X (where
X is a % value in a cell of this table) means with 95% certainty that the reading
is caused by affiliation with that group in the column, as opposed to the alternative,
that the individual is really homogeneously affiliated with their group and
the partial affiliation is the result of statistical noise. For example, using
the 175 marker test, admixture greater than or equal to 10% Native American
is required for an individual of polarized (i.e. mainly) European ancestry
to conclude with 95% confidence that there really is Native American admixture
as opposed to there being none (and no other admixture). Using the 71 marker
test, one must see 12.5% NAM affiliation to conclude with 95% confidence that
there really is Native American admixture as opposed to there really being
none (and no other admixture).
Of course, a customer could get a reading of 8% Native American with the 71
marker test, which is below the 12.5% level threshold, and it still be true
that there is Native American admixture. Such a person should type other members
of their family. If the level of NAM increases going up the family tree, and
if the level is above the threshold in some of the ancestors, it is probably
a real indication of NAM admixture for the customer, even though the 8% is
below the 95% confidence threshold. In fact, 8% falls near the 90% threshold
(we don’t show the 90% threshold on the website), not the 95% threshold,
meaning that on its own (regardless of values in family members), the 8% is
an indicator of NAM ancestry with 90% (not 95%) confidence.
Table SIMSUM175
Threshold of affiliation percentages for samples of polarized, binary affiliation,
above which results indicate fractional affiliation with a p < 0.05, using
the 175-marker admixture test.
|
AFR |
EUR |
EAS |
NAM |
| Africans |
<2.0% |
7% |
5% |
<2.0% |
| Europeans |
3.50% |
<2.0% |
9% |
10% |
| East Asians |
<2.0% |
8% |
<2.0% |
12.50% |
| Native American |
<2.0% |
7.50% |
11.50% |
<2.0% |
Table SIMSUM71
|
AFR |
EUR |
EAS |
NAM |
| Africans |
< 2% |
8% |
<2% |
<8% |
| Europeans |
2% |
< 2% |
13% |
12.5% |
| East Asians |
< 2% |
11.5% |
< 2% |
17.50% |
| Native American |
<2% |
9% |
17.5% |
< 2% |
Of course, there is nothing magical about a 95% vs. 90%
confidence interval - and one might argue that genealogists rely on much lower
levels of confidence considering non-genetic data. The threshold values required
to conclude a bona-fide affiliation with 90% certainty are about 2/3rds those
shown above
Results compared to the expectations of amateur genealogists
In 2003, Charles Kerchner,a customer and amateur genealogist,
established a website through which customers could report their expected (from
traditional lines of research) and observed (from AncestryByDNA™ 2.0) admixture
proportions along with explanations and comments on the nature of their evidence.
Our calculations below were based on the data taken from Charles Kerchner’s
project as of Jan. 2004, and you can view his site at http://www.dnaprintlog.org.
From a sample of 108 genealogists, we are able to compare the expected and
observed ancestry levels and note any interesting trends. As shown in the tables
below, the average amateur genealogist expected 6.6% Native American admixture
and got 5.1% using the 71-marker test (Table GENEXPOBS). Of all the amateur
genealogists, 69% of those expected some Native American ancestry and got a
value from the 71 marker test that was within 10 percentage points of the level
they expected (Table ADMIX10).
Table GENEXPOBS
The average percentage of fractional BGA affiliation expected by 108 amateur
genealogists compared to the average percentage this sample of genealogist
actually obtained using the 71 AIM BGA admixture test (AncestryByDNA™ 2.0)
Expected versus Observed levels n=108
|
EUR |
EAS |
NAM |
AFR |
| Expected |
0.858 |
0.020 |
0.066 |
0.056 |
| Observed |
0.844 |
0.055 |
0.051 |
0.051 |
Table ADMIX10. Precision of MLE against genealogist expectations
using the 71 AIM BGA test. (AncestryByDNA™ 2.0)
Admix value within 10% expected value
Percentage of genealogists for whom the Admix value was within 10 percentage
points of expected value
|
EUR |
EAS |
NAM |
AFR |
| All samples |
0.71 (n=108) |
0.83 (n=108) |
0.81(108) |
0.96 (n=108) |
| Expecting admix. |
0.7 (n=105) |
0.7 (n=10) |
0.69 (64) |
0.77 (n=17 ) |
Though this shows that the average individual gets a result
within 10 percentage points of expected, not shown in these tables is the fact
that the standard deviation (avg. difference from expectations) was relatively
high, and so there were many individuals that got significantly different results
than they expected. Part of the standard deviation is due to the anecdotal
nature of some genealogical expectations; scanning over Mr. Kerchner’s
site gives you an idea of the differences in quality of the expectations between
individuals – some are better than others. Some of the standard deviation
is due to the test. Low levels of NAM or EAS admixture were commonly expected – many
times at levels not much different from the threshold values shown above, suggesting
that the test assigning low levels of EAS/NAM admixture differently than some
genealogists expected them to be assigned. Though the simulations do not specifically
address individuals with low levels of admixture (they address individuals
of homogeneous ancestry), data from our labs show that the data for the former
is not much different and that of the latter and that the relatively high standard
deviation is not due solely to statistical error inherent to the test – in
other words, there may very well be an anthropological, sociological, psychological
or other type of explanation.