Experiments
The following figures and information comprise many in house
experiments dealing with AncestryByDNA™ 2.0 and 2.5. The data is presented in
various formats not all of which are identical to the results generated for
customers.
- Slide 1: How we measure AncestryByDNA™ 2.5 Results in Populations?
- Slide 2: AncestryByDNA™ 2.5 Results in Africans Parentals and Simulated Africans versus African Americans
- Slide 3: AncestryByDNA™ 2.5 Results in European Parentals and Simulated Europeans versus European Americans
- Slide 4: AncestryByDNA™ 2.5 results in East Asian Parentals and simulated East Asians.
- Slide 5: AncestryByDNA™ 2.5 Results in Native American Parentals and Simulated Native Americans
- Slide 6: Parental Samples Used to Calibrate AncestryByDNA™ 2.5
|
Slide 1: How we measure AncestryByDNA™ 2.5 Results in
Populations?

Click image for larger image
Here we show results obtained with the AncestryByDNA™ 2.0 test for entire populations.
It is helpful to observe the results in European
Americans, Hispanics, African Americans etc. in order to identify trends in test results.Before you appreciate
these plots it would be helpful to understand how to read them as shown in
the example here. Four populations are represented at the verticies of the
triangles, one at the center of the large equilateral triangle (in this example
it is NA or Native American), and three at the tips of the triangle.
There are many spots plotted on the triangle plot, and each represents the
ancestry proportions for a different human being. The closer the spot is to
a vertex (ancestry group), the more the ancestral affiliation this person has
with that group. The majority of the samples are very close to the center apex
and overlap each other. The outliers do not overlap with other samples and
therefore appear to be more numerous. If one of these samples was 100% Native
American, they would plot exactly at the center of the large equilateral triangle,
at the NA position. If they were 50% Native American and 50% European, they
would plot half way down the line connecting the Native American center to
the European vertex. The red spot represents an individual who is 50% Native
American, 35% East Asian, and 15% European.
To determine the percentages for any given spot on the triangle use the following
method.
There are three isosceles triangles in each equilateral triangle.Draw a line
one side to vertex opposite that side and you have created an axis for reading
the percentage of XX ancestry, where XX is the group represented by the vertex.Use
a line parallel to the base (not necessarily perpendicular to the axis) to
project the spot onto this axis.The fractional distance from base to vertex
corresponding to the projected location is the percentage of ancestry.
GO BACK TO TOP
Slide 2: AncestryByDNA™ 2.5 Results in Africans Parentals
and Simulated Africans versus African Americans
Click image for larger
image - Figure A
Click image for larger
image - Figure B
Click image for larger
image - Figure C
Here we see three plots for measurements of ancestry in various sets of African
American samples.
The first is for parental African samples – the samples we used from
Western, sub-Saharan Africa to find the markers we employ, and to estimate
the allele frequencies of these markers.When you look at a triangle plot, you
should realize that a 100 spots at the exact center of the plot appears the
same as 1, so really all you are able to appreciate here is the spread away
from the center, not the proportion of non-African admixture results.
In Figure A, we see that the parental
samples plot as of relatively homogeneous
African ancestry, as expected.
Figure B shows simulated samples. While it is hard to collect 10,000 sub-Saharan
African samples, it is easy to simulate these samples. Since we know the allele
frequencies for the markers in the African population, we can use them to combine
marker sequences into combinations (i.e. simulated individuals) of all probabilities,
in proportions based on these probabilities, representing what we would expect
to find in a homogeneous Western, sub-Saharan African population. Determining
the ancestry proportions for these simulated individuals, and plotting them
as we have in B, gives us an indication of the statistical error inherent to
our assay due to the fact that our marker alleles are continuously distributed
among the populations (i.e. in our analogy, this is represented by the fact
that the shops do not just sell items of one color, but sell them of all colors
with a strong bias for one of the colors).We see from Figure B that this statistical
error is relatively low (about 0.34%), and similar to that observed for the
parentals.
In contrast, when African Americans from North America are plotted, we see
generally more admixture, which the previous two figures show is statistically
significant. Further, there is a tremendous bias towards the European vertex.In
other words, the average African American has considerable European ancestry
(actually, it’s 19.6%), but Africans from Africa do not (only 0.17%)!In
fact, we know from other researchers that much of this European ancestry came
from a directional flow between European males to African females (i.e. the
European signatures are present on the Y-chromosome of many African American
males, but not so much in the mtDNA).
GO BACK TO TOP
Slide 3: AncestryByDNA™ 2.5 Results in European Parentals
and Simulated Europeans versus European Americans
Click image for larger image - Figure A
Click image for larger image - Figure B
Click image for larger image - Figure C
Here we see three plots for measurements of ancestry in various sets of European
American (i.e. “Caucasian”) samples.
The first is for parental European samples – the samples we used from
Europe and the United States to find the markers we employ, and to estimate
the allele frequencies of these markers. When you look at a triangle plot,
you should realize that a 100 spots at the exact center of the plot appears
the same as 1, so really all you are able to appreciate here is the spread
away from the center, not the proportion of non-African admixture results.
In Figure A, we see that the parental
samples plot as of relatively homogeneous
European ancestry, as expected.
Figure B shows simulated
samples. While it is hard to collect 10,000 European
samples, it is easy to simulate these samples.Since we know the allele frequencies
for the markers in the European population, we can use them to combine marker
sequences into combinations (i.e. simulated individuals) of all probabilities,
in proportions based on these probabilities, representing what we would expect
to find in a homogeneous European population. Determining the ancestry proportions
for these simulated individuals, and plotting them as we have in B, gives us
an indication of the statistical error inherent to our assay due to the fact
that our marker alleles are continuously distributed among the populations
(i.e. in our analogy, this is represented by the fact that the shops do not
just sell items of one color, but sell them of all colors with a strong bias
for one of the colors).We see from Figure B that this statistical error is
relatively low for the European group, and similar to that observed for the
parentals.This level is about 3.6% - that is the determination for the average
parental/simulated sample has 3.6% error, which is the same thing as saying
that the AncestryByDNA™ 2.5 test provides results of 3.6% error (AncestryByDNA™ 2.0 had an error of 4.3% for Europeans and European Americans).
Figure C shows a similar number of European American individuals as the number
of simulated Europeans shown in Figure B, yet from this plot, we see generally
more admixture (8.6%), which figures A and B show is statistically significant.Further,
there is spread to all three other groups on the plot, not just one of them
like for the African Americans, so the admixture is not as homogeneous as it
was for the African Americans.As you can see, many European Americans have
low levels of Native American, East Asian or African ancestry.
GO BACK TO TOP
Slide 4: AncestryByDNA™ 2.5 results in East Asian parentals
and simulated East Asians.
Click image for larger image - Figure A
Click image for larger image - Figure B
Click image for larger image - Figure C
In Figure A, we see that the parental
East Asian samples plot as of relatively
homogeneous East Asian ancestry, as expected. None of the samples was determined
to be of fractional African ancestry.
Figure B shows simulated
East Asian samples. While it is hard to collect 10,000
East Asian samples, it is easy to simulate these samples. Since we know the
allele frequencies for the markers in the Asian population, we can use them
to combine marker sequences into combinations (i.e. simulated individuals)
of all probabilities, in proportions based on these probabilities, representing
what we would expect to find in a homogeneous Asian population. Determining
the ancestry proportions for these simulated individuals, and plotting them
as we have in B, gives us an indication of the statistical error inherent to
our assay due to the fact that our marker alleles are continuously distributed
among the populations (i.e. in our analogy, this is represented by the fact
that the shops do not just sell items of one color, but sell them of all colors
with a strong bias for one of the colors).
We see from Figure B that this statistical error is relatively low for the
East Asian group, and similar to that observed for the parentals. This level
is about 4.4% - that is the determination for the average parental/simulated
East Asian sample has 4.4% error, which is the same thing as saying that the
AncestryByDNA™ 2.5 test provides results for East Asians of 4.4% error. As you
can see, most of the error is along the East Asian/Native American axis (and
none of it is along the East Asian/African axis. East Asian/Native American
ancestry is more difficult to resolve than other pairs, because these groups
shared relatively recent common ancestors. If East Asian/Native American ambiguity
is relevant for a customers ancestry, it should be clearly represented by the
confidence contours of the customers triangle plot, stretching further along
the Native American/East Asian axis than along other axes.
GO BACK TO TOP
Slide 5: AncestryByDNA™ 2.5 Results in Native American
Parentals and Simulated Native Americans
Click image for larger image - Figure B
Click image for larger image - Figure C
In Figure A, we see that the parental
Native American samples (from isolated
regions of Southern Mexico) plot as of relatively homogeneous Native American
ancestry, as expected. None of the samples was determined to be of fractional
African ancestry.
Figure B shows simulated
Native American samples. While it is hard to collect
10,000 Native American samples, it is easy to simulate these samples. Since
we know the allele frequencies for the markers in the Native American population,
we can use them to combine marker sequences into combinations (i.e. simulated
individuals) of all probabilities, in proportions based on these probabilities,
representing what we would expect to find in a homogeneous Native American
population. Determining the ancestry proportions for these simulated individuals,
and plotting them as we have in B, gives us an indication of the statistical
error inherent to our assay due to the fact that our marker alleles are continuously
distributed among the populations (i.e. in our analogy, this is represented
by the fact that the shops do not just sell items of one color, but sell them
of all colors with a strong bias for one of the colors).
We see from Figure B that this statistical error is relatively low for the
Native American group, and similar to that observed for the parentals. This
level is about 3.2% - that is the determination for the average parental/simulated
Native American sample has 3.2% error, which is the same thing as saying that
the AncestryByDNA™ 2.5 test provides results for Native Americans of 3.2% error.
As you can see, most of the error is along the East Asian/Native American axis
(and none of it is along the Native American/African axis. East Asian/Native
American ancestry is more difficult to resolve than other pairs, because these
groups shared relatively recent common ancestors. If East Asian/Native American
ambiguity is relevant for a customers ancestry, it should be clearly represented
by the confidence contours of the customers triangle plot, stretching further
along the Native American/East Asian axis than along other axes.
GO BACK TO TOP
Slide 6: Parental Samples Used to Calibrate AncestryByDNA™2.5
Since there is no such thing as a completely pure population, how is it that
we can accurately measure allele frequencies? In other words, how can we use
European Americans as a parental group if European Americans exhibit a low level
of systematic admixture?
We use a Bayesian methodology described by Pritchard et al., 2000 (“STRUCTURE”,
also see Pritchard et al., 2001 and Rosenberg et al., 2002) to identify individuals
within our parental group that do not exhibit homogeneous ancestry (in other
words, they are of detectable admixture and therefore not parental samples).
This particular method does not require a definition of structure a priori
to infer population structure and assign individual samples to groups. “It
assumes a model in which there are K populations (where K may be unknown),
each of which is characterized by a set of allele frequencies at each locus,
and assigns individuals … (probabilistically) to a population, or jointly
to two or more populations if their genotypes indicate that they are admixed”.
We use it to identify individuals part of the parental set that themselves
are admixed. Once identified, these samples are eliminated and the allele frequencies
are determined using individuals of only homogeneous affiliation to their parental
group. It is important to point out that STRUCTURE uses a different method
than we use to determine admixture proportions, but that the results generally
agree with one another (though we developed our method to determine confidence
intervals as well).
Here, we show the STRUCTURE results for our parental samples. Western sub-Saharan
African (red), Native American (green), European (blue), East Asian (yellow).
Samples with more than 2% admixture were eliminated prior to calculating allele
frequencies. How many samples does it take to acquire an accurate allele frequency?
Surprisingly few. We used about 60 (after elimination of admixed samples) for
each group but the allele frequencies obtained for any subgroup of 25 is imperceptibly
different.
We have performed simulation studies that test the sensitivity of the test
results to systematic over/under estimation of allele frequency. We simulated
an error in estimating allele frequencies that influenced only the measurement
of SNPs informative for a particular pair of groups. For example, of the 179
SNPs, 60 or so are particularly informative between European and Native American
ancestry and each is informative to a varying extent (this is measured with
a value called the delta value). Adjusting the allele frequencies for just
these two groups, such that the delta value is diminished by 20% only in the
European/Native American dimension, resulted in a change in admixture proportions
for all groups in actual samples of less than 1% on average. We have performed
this simulation on all possible pairs of groups, with similar results, showing
the test is relatively impermeable to systematic allele frequency estimation
errors such as those that may be generated in the parental sample selection
process. This is likely because we use so many markers of extreme delta value.
GO BACK TO TOP
Pritchard JK, Stephens M, Donnelly P., Genetics. 2000 Jun;155(2):945-59.
Pritchard JK, Donnelly P. Theor Popul Biol. 2001 Nov;60(3):227-37.
Rosenberg et al., Science. 2002 Dec 20;298(5602):2381-5).