home
Corporate
Science
products and services
investor relations
subsidiaries
press
glossary
pipeline
join mailing list
contact
 
Ancestry Kit | Upgrade | Manual | Experiments | Accuracy and Precision | Ethnicities |
Results from simulated experiments | Case Studies | Interpretation of Results

Experiments

The following figures and information comprise many in house experiments dealing with AncestryByDNA 2.0 and 2.5. The data is presented in various formats not all of which are identical to the results generated for customers.

  • Slide 1: How we measure AncestryByDNA 2.5 Results in Populations?
  • Slide 2: AncestryByDNA 2.5 Results in Africans Parentals and Simulated Africans versus African Americans
  • Slide 3: AncestryByDNA 2.5 Results in European Parentals and Simulated Europeans versus European Americans
  • Slide 4: AncestryByDNA 2.5 results in East Asian Parentals and simulated East Asians.
  • Slide 5: AncestryByDNA 2.5 Results in Native American Parentals and Simulated Native Americans
  • Slide 6: Parental Samples Used to Calibrate AncestryByDNA 2.5

Slide 1: How we measure AncestryByDNA 2.5 Results in Populations?


Click image for larger image

Here we show results obtained with the AncestryByDNA 2.0 test for entire populations.

It is helpful to observe the results in European Americans, Hispanics, African Americans etc. in order to identify trends in test results.Before you appreciate these plots it would be helpful to understand how to read them as shown in the example here. Four populations are represented at the verticies of the triangles, one at the center of the large equilateral triangle (in this example it is NA or Native American), and three at the tips of the triangle.

There are many spots plotted on the triangle plot, and each represents the ancestry proportions for a different human being. The closer the spot is to a vertex (ancestry group), the more the ancestral affiliation this person has with that group. The majority of the samples are very close to the center apex and overlap each other. The outliers do not overlap with other samples and therefore appear to be more numerous. If one of these samples was 100% Native American, they would plot exactly at the center of the large equilateral triangle, at the NA position. If they were 50% Native American and 50% European, they would plot half way down the line connecting the Native American center to the European vertex. The red spot represents an individual who is 50% Native American, 35% East Asian, and 15% European.

To determine the percentages for any given spot on the triangle use the following method.

There are three isosceles triangles in each equilateral triangle.Draw a line one side to vertex opposite that side and you have created an axis for reading the percentage of XX ancestry, where XX is the group represented by the vertex.Use a line parallel to the base (not necessarily perpendicular to the axis) to project the spot onto this axis.The fractional distance from base to vertex corresponding to the projected location is the percentage of ancestry.

GO BACK TO TOP

Slide 2: AncestryByDNA 2.5 Results in Africans Parentals and Simulated Africans versus African Americans


Click image for larger image - Figure A


Click image for larger image - Figure B


Click image for larger image - Figure C

Here we see three plots for measurements of ancestry in various sets of African American samples.

The first is for parental African samples – the samples we used from Western, sub-Saharan Africa to find the markers we employ, and to estimate the allele frequencies of these markers.When you look at a triangle plot, you should realize that a 100 spots at the exact center of the plot appears the same as 1, so really all you are able to appreciate here is the spread away from the center, not the proportion of non-African admixture results.

In Figure A, we see that the parental samples plot as of relatively homogeneous African ancestry, as expected.

Figure B shows simulated samples. While it is hard to collect 10,000 sub-Saharan African samples, it is easy to simulate these samples. Since we know the allele frequencies for the markers in the African population, we can use them to combine marker sequences into combinations (i.e. simulated individuals) of all probabilities, in proportions based on these probabilities, representing what we would expect to find in a homogeneous Western, sub-Saharan African population. Determining the ancestry proportions for these simulated individuals, and plotting them as we have in B, gives us an indication of the statistical error inherent to our assay due to the fact that our marker alleles are continuously distributed among the populations (i.e. in our analogy, this is represented by the fact that the shops do not just sell items of one color, but sell them of all colors with a strong bias for one of the colors).We see from Figure B that this statistical error is relatively low (about 0.34%), and similar to that observed for the parentals.

In contrast, when African Americans from North America are plotted, we see generally more admixture, which the previous two figures show is statistically significant. Further, there is a tremendous bias towards the European vertex.In other words, the average African American has considerable European ancestry (actually, it’s 19.6%), but Africans from Africa do not (only 0.17%)!In fact, we know from other researchers that much of this European ancestry came from a directional flow between European males to African females (i.e. the European signatures are present on the Y-chromosome of many African American males, but not so much in the mtDNA).

GO BACK TO TOP

Slide 3: AncestryByDNA 2.5 Results in European Parentals and Simulated Europeans versus European Americans


Click image for larger image - Figure A


Click image for larger image - Figure B


Click image for larger image - Figure C

Here we see three plots for measurements of ancestry in various sets of European American (i.e. “Caucasian”) samples.

The first is for parental European samples – the samples we used from Europe and the United States to find the markers we employ, and to estimate the allele frequencies of these markers. When you look at a triangle plot, you should realize that a 100 spots at the exact center of the plot appears the same as 1, so really all you are able to appreciate here is the spread away from the center, not the proportion of non-African admixture results.

In Figure A, we see that the parental samples plot as of relatively homogeneous European ancestry, as expected.

Figure B shows simulated samples. While it is hard to collect 10,000 European samples, it is easy to simulate these samples.Since we know the allele frequencies for the markers in the European population, we can use them to combine marker sequences into combinations (i.e. simulated individuals) of all probabilities, in proportions based on these probabilities, representing what we would expect to find in a homogeneous European population. Determining the ancestry proportions for these simulated individuals, and plotting them as we have in B, gives us an indication of the statistical error inherent to our assay due to the fact that our marker alleles are continuously distributed among the populations (i.e. in our analogy, this is represented by the fact that the shops do not just sell items of one color, but sell them of all colors with a strong bias for one of the colors).We see from Figure B that this statistical error is relatively low for the European group, and similar to that observed for the parentals.This level is about 3.6% - that is the determination for the average parental/simulated sample has 3.6% error, which is the same thing as saying that the AncestryByDNA 2.5 test provides results of 3.6% error (AncestryByDNA 2.0 had an error of 4.3% for Europeans and European Americans).

Figure C shows a similar number of European American individuals as the number of simulated Europeans shown in Figure B, yet from this plot, we see generally more admixture (8.6%), which figures A and B show is statistically significant.Further, there is spread to all three other groups on the plot, not just one of them like for the African Americans, so the admixture is not as homogeneous as it was for the African Americans.As you can see, many European Americans have low levels of Native American, East Asian or African ancestry.

GO BACK TO TOP

Slide 4: AncestryByDNA 2.5 results in East Asian parentals and simulated East Asians.


Click image for larger image - Figure A


Click image for larger image - Figure B


Click image for larger image - Figure C

In Figure A, we see that the parental East Asian samples plot as of relatively homogeneous East Asian ancestry, as expected. None of the samples was determined to be of fractional African ancestry.

Figure B shows simulated East Asian samples. While it is hard to collect 10,000 East Asian samples, it is easy to simulate these samples. Since we know the allele frequencies for the markers in the Asian population, we can use them to combine marker sequences into combinations (i.e. simulated individuals) of all probabilities, in proportions based on these probabilities, representing what we would expect to find in a homogeneous Asian population. Determining the ancestry proportions for these simulated individuals, and plotting them as we have in B, gives us an indication of the statistical error inherent to our assay due to the fact that our marker alleles are continuously distributed among the populations (i.e. in our analogy, this is represented by the fact that the shops do not just sell items of one color, but sell them of all colors with a strong bias for one of the colors).

We see from Figure B that this statistical error is relatively low for the East Asian group, and similar to that observed for the parentals. This level is about 4.4% - that is the determination for the average parental/simulated East Asian sample has 4.4% error, which is the same thing as saying that the AncestryByDNA 2.5 test provides results for East Asians of 4.4% error. As you can see, most of the error is along the East Asian/Native American axis (and none of it is along the East Asian/African axis. East Asian/Native American ancestry is more difficult to resolve than other pairs, because these groups shared relatively recent common ancestors. If East Asian/Native American ambiguity is relevant for a customers ancestry, it should be clearly represented by the confidence contours of the customers triangle plot, stretching further along the Native American/East Asian axis than along other axes.

GO BACK TO TOP

Slide 5: AncestryByDNA 2.5 Results in Native American Parentals and Simulated Native Americans


Click image for larger image - Figure B


Click image for larger image - Figure C

In Figure A, we see that the parental Native American samples (from isolated regions of Southern Mexico) plot as of relatively homogeneous Native American ancestry, as expected. None of the samples was determined to be of fractional African ancestry.

Figure B shows simulated Native American samples. While it is hard to collect 10,000 Native American samples, it is easy to simulate these samples. Since we know the allele frequencies for the markers in the Native American population, we can use them to combine marker sequences into combinations (i.e. simulated individuals) of all probabilities, in proportions based on these probabilities, representing what we would expect to find in a homogeneous Native American population. Determining the ancestry proportions for these simulated individuals, and plotting them as we have in B, gives us an indication of the statistical error inherent to our assay due to the fact that our marker alleles are continuously distributed among the populations (i.e. in our analogy, this is represented by the fact that the shops do not just sell items of one color, but sell them of all colors with a strong bias for one of the colors).

We see from Figure B that this statistical error is relatively low for the Native American group, and similar to that observed for the parentals. This level is about 3.2% - that is the determination for the average parental/simulated Native American sample has 3.2% error, which is the same thing as saying that the AncestryByDNA 2.5 test provides results for Native Americans of 3.2% error. As you can see, most of the error is along the East Asian/Native American axis (and none of it is along the Native American/African axis. East Asian/Native American ancestry is more difficult to resolve than other pairs, because these groups shared relatively recent common ancestors. If East Asian/Native American ambiguity is relevant for a customers ancestry, it should be clearly represented by the confidence contours of the customers triangle plot, stretching further along the Native American/East Asian axis than along other axes.

GO BACK TO TOP

Slide 6: Parental Samples Used to Calibrate AncestryByDNA2.5
Since there is no such thing as a completely pure population, how is it that we can accurately measure allele frequencies? In other words, how can we use European Americans as a parental group if European Americans exhibit a low level of systematic admixture?

We use a Bayesian methodology described by Pritchard et al., 2000 (“STRUCTURE”, also see Pritchard et al., 2001 and Rosenberg et al., 2002) to identify individuals within our parental group that do not exhibit homogeneous ancestry (in other words, they are of detectable admixture and therefore not parental samples). This particular method does not require a definition of structure a priori to infer population structure and assign individual samples to groups. “It assumes a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus, and assigns individuals … (probabilistically) to a population, or jointly to two or more populations if their genotypes indicate that they are admixed”. We use it to identify individuals part of the parental set that themselves are admixed. Once identified, these samples are eliminated and the allele frequencies are determined using individuals of only homogeneous affiliation to their parental group. It is important to point out that STRUCTURE uses a different method than we use to determine admixture proportions, but that the results generally agree with one another (though we developed our method to determine confidence intervals as well).

Here, we show the STRUCTURE results for our parental samples. Western sub-Saharan African (red), Native American (green), European (blue), East Asian (yellow). Samples with more than 2% admixture were eliminated prior to calculating allele frequencies. How many samples does it take to acquire an accurate allele frequency? Surprisingly few. We used about 60 (after elimination of admixed samples) for each group but the allele frequencies obtained for any subgroup of 25 is imperceptibly different.

We have performed simulation studies that test the sensitivity of the test results to systematic over/under estimation of allele frequency. We simulated an error in estimating allele frequencies that influenced only the measurement of SNPs informative for a particular pair of groups. For example, of the 179 SNPs, 60 or so are particularly informative between European and Native American ancestry and each is informative to a varying extent (this is measured with a value called the delta value). Adjusting the allele frequencies for just these two groups, such that the delta value is diminished by 20% only in the European/Native American dimension, resulted in a change in admixture proportions for all groups in actual samples of less than 1% on average. We have performed this simulation on all possible pairs of groups, with similar results, showing the test is relatively impermeable to systematic allele frequency estimation errors such as those that may be generated in the parental sample selection process. This is likely because we use so many markers of extreme delta value.

GO BACK TO TOP

Pritchard JK, Stephens M, Donnelly P., Genetics. 2000 Jun;155(2):945-59.

Pritchard JK, Donnelly P. Theor Popul Biol. 2001 Nov;60(3):227-37.

Rosenberg et al., Science. 2002 Dec 20;298(5602):2381-5).