AncestryByDNA™ - User Manual
Please click link to go to manual
What follows is the User Manual for the genomics test you
have purchased. To fully understand your results, we recommend that you read
this user manual from the beginning to the end. To view your data files you
will need the Adobe Acrobat Reader 5.0 Program available free of charge at
http://www.adobe.com/products/acrobat/readstep2.html.
1. How the test works?
To determine your ancestry,
we have extracted DNA from your buccal sample (mouth swab) the AncestryByDNA™ test to determine the sequence of your DNA at a large number of different positions.
The buccal sample you returned to us contained thousands of cells, and each
of your cells contains your DNA. Though we are all 99.9% identical at the level
of our DNA sequence, there are certain regions of each chromosome that are
different from person to person. These regions are called genetic markers or
Single Nucleotide Polymorphisms (SNPs), and it so happens that some small fraction
of these SNPs are also different amongst the world’s continental population
groups. These types of markers are best termed Ancestry Informative Markers
(AIMs) and they constitute less than 5% of our genetic material, which is
related to the very recent common origin of our species. To help you better
understand why this is the case, we recommend the book: The Great Human Diasporas;
The History of Diversity and Evolution, by Cavalli-Sforza. The positions
in your DNA we have sequenced are those we have discovered that are different
in this way, and they are spread across all of your chromosomes. That is
why it is called a "genomics" test " it is a survey of all
of your genetic material, which is known as your "genome". In other
words, we have sequenced markers from chromosome 1, 2, 3 … 22,. Until
AncestryByDNA™, genetics tests for genealogy or personal interest have been
restricted to just one chromosome, like the Y-chromosome or the mitochondrial
DNA. As such, these other tests offer information that is very different
from AncestryByDNA™. It is not that this information is incomplete or defective,
it’s simply different information and it is useful for other purposes.
Your DNA was derived from your mother and father, and theirs
was derived from their mothers and fathers, and theirs from their parents and
so on. You have 23 pairs of chromosomes. Your maternal copy of chromosome 1
could have been passed through your mother from your maternal grandmother OR
your maternal grandfather, but which one you received was randomly determined
at conception (you could not have received both). Most of the time this chromosome
1 copy that you receive from your mother is actually a chimeric chromosome
that includes parts from your grandfather and your grandmother. Recombination
is the process by which these chimeric chromosomes are created and occurs
at least once on each chromosome every time a new sperm or egg cell is made.
As such, although blending does not occur at the level of the gene (the unit
of trait expression) our chromosomes are mixed together and so our genomes
contain segments of DNA from all of our ancestors. In contrast, the mitochondrial
DNA or Y-chromosome test can only provide data on one single lineage of ancestors
each generation into the past. For example, 10 generations ago (year 1802
at 20 years per generation), a baby born today has 1024 ancestors. By measuring
your ancestral proportions using a genomics method, we are actually measuring
the average population affiliations of all of these 1024 ancestors. Since
random processes (recombination and independent assortment) at inception
determines the mixing and pairings you harbor, two offspring from a set
of parents may have different sets of chromosome pairs, and therefore different
ancestral proportions even though they were the product of the same male-female
union. For example, an employee of our company is a European male who married
a Hispanic woman and had three children of mixed descent. Each of the children
exhibits their own unique proportionality, which you can see by clicking
on case study.
If these two had an infinite number of children, the average would correspond
to that proportionality exactly between the mother and father, but each child
would deviate from this average by a unique and random amount.
GO BACK TO TOP
2. Statement on Race
Race is a defining issue of modern times in the US, Europe, and many other
parts of the world. The impact of the European colonial period that started
more than 500 years ago has set the tone for the interactions among diverse
populations of the world. Colonization, Genocide, Slavery, Legalized Segregation,
Apartheid, Jim Crow Laws and Concentration Camps are but a few of the atrocities
that are the history of our civilized world and every culture has its own
list to be ashamed of. Given the enormity of these events, their long-term
consequences will take generations to overcome. Modern conceptions of Race,
Racism, and Racialization are some of the fallout of these events.
Part of our mission at DNAPrint® is to work towards the abolition of these
misconceptions and the social injustice that result from Racism and Racialization.
In this light, we are dedicating considerable internal resources to education
regarding the different perspectives (Socio-cultural, Political, and Biological)
on race and the meaning of populations in light of genomic science and biomedical
research.
- Race is not a biological concept. There is not enough genetic differentiation
among human populations to consider them zoological races.
- Race is a social construct. This means that these classifications (black,
white, Hispanic, Jewish) are defined (and redefined) by the prevailing sociopolitical
structure.
- Race is often a great amalgamation of many diverse populations and ethnicities.
- Race is often ascribed only to the minority populations.
- In the US, any minority population ancestry is dominant and the
person is completely of the minority group (e.g. "the one-drop rule").
- Despite the veracity of points one and two above, since there is a correspondence
among broad racial categories and populations, the conclusion that there
are no average biological differences among any racially described groups
may not be true.
- Racism continues. "In some places, and for some people, overt racism
has given way to implicit racialization and "Colorblind Racism" a
term coined by Dr. Eduardo Bonilla-Silva (Stanford University)."
- Race should not be used as a surrogate for population. Doing so may lead
to over generalization and unfounded stereotypes. A population is the unit
of evolution and refers to a group of persons who generally select mates from
within the group.
- Being respectful is the first important step in not having a racialized
perspective.
- a) Each person is a human being first and foremost. It is disrespectful,
at any level, on the street, in the lab, or in the clinic, to consider
his or her population group first.
b) Populations should be described
(not defined) in precise language that members of the community would
use. We believe that the physical and cultural diversity of the world’s
peoples should be embraced as a valuable and even sacred resource.
Indeed, the genomic variation both within and among populations
is in many ways our Human Biodiversity and will provide important
clues as to the origins, our physiological construction, and the
possible futures of our fragile species.
|
GO BACK TO TOP
3. Understanding BioGeographical Ancestry
The
test provides a research grade estimation of a person’s BioGeographical
Ancestry. BioGeographical Ancestry (BGA) is a means of expressing the proportional
ancestry of a person that is devoid of the ethnic labels and the dichotomous
grouping of persons into racial categories. There are important uses of this
in epidemiological and complex diseases mapping research and in forensic
science. BGA estimates provide a description of a person in terms of ancestral
proportions that are based on the evolutionary and geographical history of
our species. Our recommended book, by one of the leaders in the field of
Evolutionary Anthropology, Dr. Luca Cavalli-Sforza (Stanford University),
details a broadly accepted model of human evolution. It is within this scientific
framework of human origins that the BGA estimation can be understood as a
description of a person’s placement on a Multi-Dimensional Continuum
of Ancestry.
GO BACK TO TOP
4. What the results mean?
Human beings migrated
out of central, sub-Saharan Africa some 200,000 years ago to inhabit various
regions of our globe. These migrants established founder groups that gave rise
to present-day Europeans, Native Americans, Africans, and East Asians. A map
of these human migration patterns can be found in the map.gif file on your
CDROM. If your heritage has been derived from more than one of these groups
the test results tell you what your mixture ratios are. If you do not have
recent admixture, the test identifies which groups you are part of, and confirms
that there is no evidence of recent admixture. It so happens that many people
from places such as Nigeria, Ireland and Japan are of relatively unmixed heritage
(African, European and East Asian, respectively – see
our website at www.AncestrybyDNA.com), but many people from places such the
United States, South East Asia or Latin America are admixed. For example,
Hispanics from Mexico or elsewhere in Central or South America were derived
from the colonial mixture of Europeans, Native Americans, with some proportion
of West African. Native Americans inhabited North and South America from
Alaska to Patagonia. If your great grandfather was a great Aztec warrior,
and of unmixed heritage, you will exhibit at least 12% Native American ancestry
on average. If your great-great grandmother were a sanguine Chinese philosopher
of unmixed heritage, you would be of at least 6% East Asian heritage on average.
Many people from Puerto Rico are heavily admixed – generally showing
significant Native American, African and European mixture. It is important
to point out that these results do not give you any information other than
your ancestral proportions. You should not use your results to make inferences
about your predisposition to respond to a particular drug, or develop a particular
disease.
It is notable that some regions of the world have more complicated histories
than other regions making the concept of ancestry more complicated and even
tedious. For much of our history as a species, we were more mobile than today.
The advent of agriculture, in at least four separate global regions about
10,000 years ago changed this for many people, but did not stop the process
of migration. Indeed, the largest migrations in human history started only
500 years ago with the European colonial period, the trade in enslaved West
Africans, and the colonization of the New World. However, prior to this time
and for millennia people have moved about and particular regions of the world
show traces of these migrations back and forth into and out of continental
and sub-continental regions. Some examples of such regions are East Africa,
North Africa, Central Asia, South Asia, and Insular Southeast Asia. Although
these populations are distinct groups today with languages, cuisines and
cultures that identify them as such, their genetic makeup reflects the long-term
history of migrations from more than one region.
GO BACK TO TOP
5. Your Genotypes and the genotypes.doc file
Your
sequences are provided to you in the genotypes.doc file. Each of the sequences
is comprised of two letters. There are two sequences because you received one
chromosome from your mother and another from your father. DNA is comprised
of nucleotide bases and there are four different types: Guanine (G), Adenine
(A), Thymine (T) and Cytosine (C). Most DNA sites do not differ among individuals,
but the sites we measure are variable in sequence from person to person, as
well as from one population to the next. It is these positions that we measure in
order to estimate your ancestry. Each site has only two possible letters -
they are called bi-allelic sites for this reason. Whereas one person may have
two copies of a "G" at one site (represented
as "G/G"), another may have a G and a C at this same site (represented
as "G/C"). The possible letters for one site may be G or C, but
for another they may be C or T, or A or G. Your string of letter pairs across
all sites measured are most likely quite unique to you, and in a way they
could represent a sort of a genetic bar code for your identification. However,
it is the information they give us about your ancestral heritage that is
of value here.
It so happens that these letters change slowly over time. When our ancestors
migrated to establish the founder groups that gave rise to today’s
continental population groups, the nature of their sequences at these sites
gradually drifted towards one or the other letter. By measuring the frequency
of each letter in each of the descendant ancestral groups of these splinter
groups, we can construct a sort of genetic map distinguishing each. We have
sequenced your DNA at these sites and made an estimate of the proportional
extent to which you share identity with each of four major population groups,
sub-Saharan African, European, East Asian, and Native American.
GO BACK TO TOP
6. Interpreting your results-Introduction
An
important concept to understand when interpreting your data is that even though
we can always derive a maximum likelihood estimate (MLE) of your ancestry from
your DNA, your answer is an estimate of percentages that are similarly likely,
not one specific set of percentages. The MLE is the most likely ancestry mix.
Other percentages are possible, though less likely, and these are shown in
terms of confidence contours on your triangle plot and in your bar graph. This
is because with a genetics test, ancestry can only be estimated in a statistical
sense, much like the track of a hurricane. Anyone who lives in the South East
US knows that it is impossible to project the track of a hurricane exactly,
but that the “cones” of most likely migration are extremely
accurate. The same is true with AncestryByDNA™ 2.0 and 2.5.
Your results are expressed in two separate graphical representations. The
first we will discuss is the Composite Triangle Plot and the second is the
Bar Graph.
GO BACK TO TOP
7. Your Triangle Plot and the triangle.doc file
We
have provided you with a graphical “snapshot” of your results
called a composite triangle plot. The red point on this plot is ca1,183,020led a
Maximum Likelihood Estimate (MLE), and only one is present on the plot. This
MLE represents the best estimate of your ancestral proportions. To read a
triangle plot, you drop a perpendicular line from the vertex (triangle point)
of each triangle to the triangle edge below it.

In the example above, the red dot is the MLE. We have dropped a line from
the Native American vertex to the line below, and this particular line serves
as a scale for the percentage of Native American ancestry – from 0%
at the base to 100% at the vertex (or tip). In this example, the individual
is about 15% Native American (indicated by the hash mark).

In the example above, we have created the scale line for each of the other
two vertices. As with the first figure, for each line the point or vertex
represents 100%, and the base near the line represents 0%. The arrows show
where the circle projects on to each of these lines. In addition to being
able to see that the person is of 15% Native American ancestry in the plot
above, we can see that the person is of 60% European Ancestry and 25% African
ancestry as well. You will notice that the three percentages must add up
to 100%.
The MLE we have plotted is a statistical estimate, and all statistical estimates
have confidence intervals. The MLE tells you what the best estimates of the
proportions are, but in reality, there is a small chance that they are something
else. The confidence intervals tell you what other values are also likely.
In order to know your proportions with 100% confidence, we would have to
perform the test for each region of the variable genome, which would make
the test very expensive. Since we have not, your results are statistical
estimates. We calculate and plot for you all of the estimates that are "2
fold, 5 fold and 10 fold less likely" than the MLE. The way it works
is that the MLE is the most likely estimate. Any point within the first contour
(inner most ring) is up to "2 fold less likely" than the MLE. The
farther from this point, the closer it is to "2 fold less likely".
Any point outside the first and within the second is from "2-5 fold
less likely" and any point within the last contour, but outside the
second contour is from "5-10 fold less likely Any point outside the
last contour is AT LEAST "10 fold less likely" and the farther
from the MLE, the greater the increase over "10 fold".
The greater the number of DNA positions we read, the closer these contour
lines come to the MLE point. On the triangle plot, the likelihood (probability)
that your true value is represented by a different point other than the MLE
decreases as you approach the red dot, where the probability is at its maximum
(hence, it is called the Maximum Likelihood Estimate or MLE). We could perform
the test so that the contour lines are very close to the MLE, however this
would require us to sequence a much larger collection of markers. To keep
the test affordable, we limit the survey to a reasonable number of markers that
are sufficient for you to know with good confidence what your proportions
are. The yellow circle (10 fold contour) is also referred to as the "one-log
interval" and is generally taken as a scientific level of confidence.
The composite triangle plot is read in the same manner as the equilateral
triangle previously reported. The composite triangle holds 4 smaller equilateral
triangles or sub-triangles, and the percentages for any point in each of
these triangles is read as described above. You have all possible 3-population
triangles and you can see the extension of the confidence intervals in all
possible dimensions. Each of the four equilateral triangles within the composite
triangle is called a sub-triangle and the percentage mixture corresponding
to any point within the sub-triangle is determined using the same scaling
method described in above for a single triangle plot. The most likely result
is plotted in the center sub-triangle of the composite triangle. Any point
within the composite triangle represents a mixture percentage for three groups
at a time, but the reason for expressing the data in this new triangle is
that all possible groups of three can be presented on a 2-dimensional sheet
of paper. The MLE is found at the center of the 2-fold confidence interval
marked by the black line. Points within each sub-triangle that fall within
the black line represent BGA proportions that are from 1.1 to 2 times less
likely to be the true value than the MLE (the greater the distance from the
MLE the greater the reduced likelihood). Points within each sub-triangle
that fall within the black and blue lines represent BGA proportions that
are from 2.1 to 5 times less likely to be the true value than the MLE (the
greater the distance from the MLE the greater the reduced likelihood). Points
within each sub-triangle that fall within the blue and yellow lines represent
BGA proportions that are up to 10 times less likely to be the true value
than the MLE (the greater the distance from the MLE the greater the reduced
likelihood).
In the example plot shown here, the individual is 92% European and 8% Native
American. For this individual, there is a noticeable indentation in the confidence
contours away from the African axis, showing that there is relatively little
chance that the individual has African ancestry. The confidence contours
in the Native American direction are broader, and indeed this individual
has Native American ancestry. The test is "saying" that there is
a chance this person has 16% Native American ancestry, or even 0% Native
American ancestry, rather than 8% but that its much less likely (at least
10 times less likely) than the 8% Native American answer. There is also fair
spread in the East Asian direction for this individual, indicating that it
is possible, though less likely, that this person has some level of East
Asian heritage.

Print this plot out and fold along the lines such that you connect all three
vertices of the composite triangle to make a pyramid. You will notice that
the confidence contour space is now symmetrical. In fact, this is always
the case. The confidence contour lines are on the surface of the pyramid
but they connect through the center of the pyramid to form a 3-dimensional
shape, which has symmetry, and you can imagine this connection by looking
at your pyramid. Points in the interior of the pyramid are points with 4-population
admixture. Points on the surface of the pyramid are points with 3-population
admixture. How likely it is that a 3-population or a 4-population point represents
the true admixture proportions depends on whether the point falls within
the yellow confidence contour, blue confidence contour or black confidence
contour, and whether the point is on the surface (3-population) or interior
of the pyramid (4-population). In this way, you can imagine the likelihood
of a variety of 4-population mixture results. In fact, people who are most
likely to be 4-population mixture would have discontinuous regions on the
2-dimensional representation of the pyramid. An example of such a person
is shown below:
Click here/image for larger image
However, you will notice that when the triangles are folded and the intervals
on the two sides are connected through the center of the resulting pyramid
(in your mind, looking at the pyramid), the three dimensional shape created
by the contour lines has a large amount of volume. In fact, for people
of 2-population or 3-population admixture, the ratio of the surface area
to volume of the 3-dimensional space is relatively high compared to people
of 4-population admixture, where there is considerably more volume to the
shape.
Other examples of triangle plots: (each of these individuals is mainly European)

GO BACK TO TOP
8. Your Bar Graph and the bar graph.doc file
The data is essentially the same as that shown in your triangle plot, it
is just shown in a different way.
To make the bar graph we plot the MLE values including the values within
the 2-fold confidence range, one group at a time.
For people of simple or polarized admixture, the bar graph is a useful presentation
because it provides a separate, objective view of your possible ancestry
percentages for any one particular group "one group at a time. Customers
seeking to confirm a great- great-grandparent of Native American ancestry
who have obtained an MLE showing 100% European find the bar graph useful
in understanding the statistical meaning of their results and how it may
still be possible (though less likely) that there is a small amount of Native
American ancestry. For people of more complex admixture, (i.e. 4-population
admixture, such as 30% European, 30% African, 20% Native American and 20%
East Asian, which might be obtained from a person with a Dominican father
and a Philippine mother), the MLE is computed using a more complex 4-D methodology.
Because the triangle plot projects the results in terms of most likely 3-population
mixture results, and because individuals of 4-way admixture are obviously
more complex, on a group-by-group basis, the bar graph is more informative
for the latter.
How is the bar graph generated? The bar graph is made by taking the MLE
values from the 3-D triangle plot method (for individuals of polarized ancestry)
or from the 4-D algorithm (for individuals of more complex, or even admixture).
The confidence ranges are established for each ancestry group by searching
the likelihood calculation of all 3-population or 4-population admixture
combinations and finding the highest and lowest values for each group that
fall within the 2-fold likelihood range (0.3 Log base 10) of the MLE.
Example: The bar graph below shows the results for a person of 2-population
admixture. This person’s most likely ancestry mix (MLE) is 55% Native
American, 45% European. There are confidence ranges extending above and below
the tip of each bar. These ranges show the other percentages this person
could be, though any percentage is up to 2-times less likely than the percentage
indicated by the blue bar. Though this person is most likely to be 55% Native
American, 45% European, they could also possibly be 65% Native American,
35% European though it is 2-times less likely than the MLE value of 55% Native
American, 45% European. You will notice that there are confidence bars for
the East Asian (EA) and African (AF) groups too. Thus, it is possible that
this person is 54% Native American, 44% European, 1% East Asian and 1% Western
sub-Saharan African, though again, it is significantly less likely than the
most likely estimate of 55% Native American, 45% European.
Click here/image for larger image
People of 1-population or 2-population admixture would most likely show
one or two blue bars, respectively. People with 3-population or 4-population
mixture would most likely show 3 or 4 bars, respectively. A person that shows
3 bars may still be a 4-way mix if there is a confidence range for that fourth
group, its simply less likely (2 times less likely) that they are a 4-population
mix incorporating a value for the fourth group within the range than the
indicated 3-population mix.
GO BACK TO TOP
9. Simple versus complex admixture
The composite triangle plot shows your Ancestry mix. To make the triangle
plot we assume that your pedigree since the time the major human populations
diverged from one another is no more complex than a 3-population mixture
of ancestries. The central triangle shows the most likely 3-population mix
(each vertex or point of the triangle is a group, and a triangle has 3 such
vertices/points). Even if your most likely ancestry mix is a function of
only 2 groups (i.e. your MLE falls on an edge of the triangle), for most
people, there will be confidence contours extending in the 3rd dimension
(toward the 3rd vertex) in this triangle, and the distance these intervals
will extend towards this vertex is greater than the distance they would extend
toward the vertex for the 4th group (this should be visible in one of the
three other triangles around the central triangle). In other words, your
DNA sequences optimally fit with one particular 3-population ancestry combination.
The particular point/MLE is the most likely outcome given your DNA sequences
and is indicated by the red point that is located within the 10-fold, 5-fold
and 2-fold confidence contours.
The bar graph will communicate approximately the same ratios as presented
in your composite triangle plot. However, the bar graph provides you a completely
different view of your results. For the bar graph, we calculate the likelihood
of ancestry affiliation using the triangle plot algorithm and plot the values
from your most likely triangle. If the value for another triangle is within
a log base 10 (within 10-fold) of the MLE from the best triangle, the level
of the 4th group is also plotted in the bar graph. The data is basically
the same as presented in the triangle plot. Your most likely estimate,
or "answer", is still the set of percentages provided in the results
table, and these percentages should be about the same as those provided in
the bar graph, but the view provided by the bar graph is different. The reason
we provide you with the bar graph is that it may be useful to some customers
because:
- Looking at ancestry percentages on a bar graph is easier to understand for some customers.
- The representation of ancestry levels within one log
of likelihood in terms of bars illustrates the equivalence of readings
that are similar in likelihood and that the answer is really not only
one simple set of percentages but a range of likely percentages with
one that happens to be the most likely. This is a concept many of our
customers have had trouble with in the past.
- Showing the fourth group if its likelihood score is
within 10-fold of the third allows for greater sensitivity in detecting
affiliation for some groups in some customers.
|
For example, a person who believes their great-great-grandfather was an
American Indian but who obtained an MLE result on their triangle plot of
100% European with confidence contours that extend towards the Native American
(and/or East Asian) axes only narrowly may find that in the bar graph the
2-fold confidence range is more substantial. This may lend credence to their
hypothesis.
Each point in the triangle plot is a specific and unique 3-population ancestry
mix. What if your ancestry is more complex " that is, individuals from
all four ancestry groups have contributed to your pedigree since the time
the major human populations diverged from one another? In this case, the
MLE from the triangle plot will be an oversimplification of your ancestry.
Instead we calculate your MLE using a more complex 4-way algorithm, but we
still provide you the triangle plot that carries information that is diagnostic
of 4-population admixture. For the individuals of 4-population admixture
the following apply:
- If your heritage is best explained as a function of 4
groups you will most likely see discontinuous (separated) confidence
contour regions on your composite triangle plot. In other words, you
will see contours in more than one of the 4 sub-triangles. After you
print out your plot and fold it up into a 3-D pyramid, you will see
that the confidence regions for the different triangles are now shapes
on faces of a pyramid that can be united within the pyramid to form a
3-dimensional shape or space. This space would be rather large compared
to that for individuals of homogeneous ancestry (their spaces are
normally confined to the tips of the pyramid, and the surface
area-to-volume ratio is higher for their spaces than yours). Points
within the 3-D space (within the pyramid) are likelihood estimates for
4-dimensional admixture. Your MLE is determined using the 4-dimensional
algorithm, and the percentage values should roughly correspond to the
center of this shape. Whether any other point within this space is
10-fold, 5-fold or 2-fold less likely than the MLE at the center is
determined by which of the 3 confidence contours they fall within (the
black contour which is the 2-fold likelihood space, the blue shape
which is the 5-fold likelihood space or the yellow shape which is the
10-fold likelihood space).
- If your heritage is best explained as a function of 4 groups you would see
four blue bars on your bar graph. Individuals of homogeneous ancestry, or 2
or 3-population admixture will always have confidence ranges (the lines above
and below the tip of the blue bar) for each of the 4 groups, though they may
be very small, but they never show 4 blue bars.
- If your heritage is best explained as a function of 4 groups, it will have
been determined using the 4-D algorithm and the bar graph will exhibit 4 blue
bars with corresponding confidence ranges. This is your most likely estimate
(MLE) of ancestry admixture.
|
If a 4-D algorithm works best for people of complex admixture then why don’t
we simply calculate the 4-population admixture percentages for each customer
whether they are of homogeneous, 2-population, 3-population or 4-population
admixture? Mainly because the 4-D algorithm is not as robust for individuals
with 3-population or simpler admixture; for these individuals the simultaneous
consideration of 4 groups produces too much statistical noise that would
impact the quality of the results by a couple of percent. We have concluded
from internal research and development that when the number of classification
groups increases from 3 to 4, a larger number of genetic "markers" is
needed to provide results of comparable confidence as evidenced by the comparison
of AncestryByDNA™ 2.5 (a 175 marker test) from the previous version of the
test, AncestryByDNA™ 2.0 (a 71 marker test).
GO BACK TO TOP
10. Your Data Table and the results.doc file
The composite triangle and bar graph are tools for you to visualize your
data. The results of the MLE are reported in the data table as in this example:
SAMPLEID : XXXXXXX
| ESTIMATE |
ANCESTRY |
| 90% |
European |
| 10% |
Native American |
| 0% |
Sub-Saharan African |
| 0% |
East Asian |
The MLE percentages of European, Sub-Saharan African, East Asian, and Native
American are shown. In the example above, the person was determined to be
90% European, 10% Native American, and the MLE is plotted at the position
of the triangle plot indicated by these proportions. The bar graph would
show solid blue bars at 90% for European and 10% Native American. Unlike
the table however, the triangle plot and bar graph show the confidence regions
around the MLE, which is, as we have mentioned, are very important for properly
interpreting your results.
The ancestral proportions we report to you are a function of probability,
and the MLE is the best estimate. Going back to the triangle, the further
from the point estimate on the triangle, the less likely that point represents
your true values. The farther from the red dot (the MLE) in any direction,
the lower the probability. Therefore, your plot is an estimate rather than
a precise fixed number, but it is the most likely estimate based on the data
produced from many regions of your genome.
Why is it that it is not possible to determine what your proportions are
exactly? When drawing conclusions from DNA one must use statistics. To do
this without using statistics, we would have to go back in time 200,000 years
and keep track of every one of your ancestors since the origin of our species
in Africa. Clearly, this is impossible. Since you inherited your DNA from
your ancestors, your ancestral proportions are written in your DNA, but this
information must be statistically inferred from the DNA sequence. If we measured
1,000 or 10,000 genetic sites in your DNA as opposed to the 200 or so we
measure today, your confidence intervals would be smaller but the MLE would
probably be very similar (in the triangle plot, the confidence ring would
shrink around the MLE point). The costs of measuring more genetic sites can
quickly become out of hand making the price to the consumer unrealistic.
Thus, in conclusion, the result of our test is your MLE. Though the MLE is
a statistical estimate, and there is a small chance your true proportions
are slightly different from that of the MLE, the MLE is the best estimate.
If you want to keep it simple, simply use this MLE when describing your genetic
heritage. An alternative to understanding more in terms of your family’s
heritage is to have more persons in your family (parents, grandparents, siblings)
take the standard AncestryByDNA™ test.
GO BACK TO TOP
11. Your Sequences and the sequences.doc file
In the sequences.doc file, we have provided you some markers so that you
can cut and paste them into various types of gene search engines. By doing
this, you can learn about:
- The human genome project
- How our human DNA sequences are similar to, and differ from those of other organisms.
- How cancer research is conducted through genetic research
- And other things
|
The sequences are just a few of those that we measure in order to determine
your ancestral proportions. For any genetic position we measure, the variant
sequence is flanked by invariant sequence (remember, 99.9% of our DNA is
identical from person to person!). For example, if we measured whether you
have a C or a T at position 25 in the following sequence:
GACCTACCATGATAAATTCCTAGG[C/T]GAGGAAGCTACTACACTGAGTTTAT
The site we actually sequence is indicated by the [C/T], and all of the
other letters represent DNA nucleotides that are invariant. That is to say,
they are the same from person to person. Since these invariant nucleotides
have no impact on the estimate of your ancestral proportions, there is no
reason to measure them. We provide you with this view of your sequences because
you can use them to have fun and learn more about genomics at the same time – here
is how.
The human genome project has spawned the development of myriad DNA databases,
and you use the sequences in a search through these databases. We have all
paid for the construction of these databases, and they are yours to use freely.
Their main function is as a tool for scientists to better understand why
it is that certain people are afflicted with diseases or respond to drugs
differently.
Many of the databases can be reached through the website of our partner,
Geospiza Corp. at http://www.geospiza.com/outreach/organelles/index.html.
Geospiza operates this site as a community service, and its main function
is to help educate the layperson. In addition to searching with your sequences,
you can click on links that will help you understand how the databases were
constructed, and what information each offers.
For a basic slide show introduction to genomics and DNA databases please
visit: http://www.geospiza.com/outreach/organelles/mutation/slide1.html
To screen your sequences against the human genome, all you need to do is
copy and paste your sequences into the database query forms provided through
the http://www.ncbi.nlm.nih.gov/ site (which can be accessed through the
http://www.geospiza.com/outreach/organelles/index.html site). Click on the "BLAST" selection
on the top menu and you have entered the human genome database search engine.
Select "Standard nucleotide-nucleotide BLAST", and then simply
paste your sequence (one sequence per query) into the Search box and click
BLAST! A progress page will appear, and you will need to click the "Format!" Button
to see your results. Learn more about the region of the chromosome corresponding
to this sequence by clicking on the blue links.
GO BACK TO TOP
12. The scientific foundation and validation of the test
The science behind the test has been published in the scientific literature
(for references, please see our web site at www.AncestrybyDNA.com). We have
determined the frequency of DNA sequence variants in the various human populations
(some of this data can be seen on the website), and by determining your sequence
for each; we can determine the probability that you identify with each group.
The test has been evaluated using a large number of people from a wide range
of ancestral groups, and the estimates correspond well to what is known from
anthropological and historical data. For example, Hispanics are known to
have arisen as an ethnic group from the blending of colonial Europeans with
Native Americans, and the hundreds of Hispanics we have tested align with
these two groups almost exclusively, as expected. As another example, though
most Nigerians plot as of unmixed African BioGeographical Ancestry (BGA),
African Americans plot more as a mixture between this group and Europeans,
which is also what would be expected from what we know about the admixture
between Africans and Europeans in the US. The method has also been validated
through pedigree challenge; when the BGA is determined from a mother and
father, that of their children should plot somewhere between the two. To
date, we have tested numerous family pedigrees, and the ancestral proportions
of offspring always plot somewhere among those of their parents. When outside
agencies blindly test the MLE estimates, they prove to be excellent estimates
of ancestral proportions.
When tested against known pedigrees, the AncestryByDNA™ 2.5 test performs
quite well. The data for individual A is presented below. His wife, individual
B, is Hispanic and she was determined to be of mostly Native American ancestry
but with some European and African heritage. This was also expected based
on what we know from anthropological origin of the Hispanics (which were
derived from the union of Spanish explorers, Native Americans, and West Africans
in Colonial Caribbean and Latin America). Each of the 3 children is plotted
roughly half way amongst both parents, as expected. None of the children
exhibit East Asian ancestry. The results of the children were consistent
with those of the parents, and the MLE’s are accurate estimates when
tested against what is known from biographical data.
Experiment: AncestryByDNA™ 2.5 Blind trials on samples from families.
Purpose: To determine how well the test results agree with expectations formed
from appreciation of a family pedigree.
Summary of Results:
Family 1
| Individual A |
EUROPEAN |
93 |
EAST-ASIAN |
0 |
NATIVE-AMERICAN |
7 |
| Individual B |
EUROPEAN |
7 |
AFRICAN |
22 |
NATIVE-AMERICAN |
71 |
| C1 |
EUROPEAN |
47 |
AFRICAN |
15 |
NATIVE-AMERICAN |
38 |
| C2 |
EUROPEAN |
60 |
AFRICAN |
2 |
NATIVE-AMERICAN |
38 |
| C3 |
EUROPEAN |
57 |
AFRICAN |
6 |
NATIVE-AMERICAN |
37 |
| S |
EUROPEAN |
86 |
EAST-ASIAN |
0 |
NATIVE-AMERICAN |
14 |
| M |
EUROPEAN |
81 |
EAST-ASIAN |
7 |
NATIVE-AMERICAN |
12 |
| F |
EUROPEAN |
92 |
EAST-ASIAN |
0 |
NATIVE-AMERICAN |
8 |
Family 1: The father is F and mother is M. Both exhibit some Native American
admixture (8% and 12%, respectively). Their children, Individual A and S
also both exhibit some Native American admixture (7% and 14%, respectively).
Due to the law of independent assortment (the test markers span 22 pairs of chromosomes),
these results are reasonable. For example, if one of the children measured
with 75% Native American, or 20% East- Asian, these results would be unreasonable.
Individual A married his wife, who is Mexican and they had three children,
C1, C2, C3 Each of these three children have 38%, 38% and 37% Native American
ancestry, respectively, with the balance EUROPEAN and African. Again, these
results are quite reasonable given the fact that the father was mostly EUROPEAN
with slight Native American admixture and the mother was Hispanic, and mostly
Native American with EUROPEAN and African admixture.
For more information on how MLE’s read for individuals of various
mixed populations, please see www.AncestrybyDNA.com (product info section).
You will find that the results are in good agreement with what is known from
the anthropological history of each population and that all of the evidence
we have to date suggests that AncestryByDNA™ test results are highly accurate.
GO BACK TO TOP
13. Accuracy and Precision - Simulations
The question to answer when developing an admixture test is how many AIMs,
and what quality of AIMs are required to minimize the incidence of individuals
with MLEs of artificial or erroneous admixture. How frequently they are encountered
for each group, given a specific battery of AIMs, is best determined through
simulations.
If we assume that the loci are unlinked (which they have been determined
to be) and the mating between individuals of a given population are random,
the multi-locus genotype frequencies in each population are determined from
the allele frequencies and an equation known as the product rule. Using these
allele frequencies, we simulate a population of 100,000 European, East Asian,
African and Native American individuals, draw 10,000 from each population
and use the 71 and 175 marker AncestryByDNA™ tests (2.0 and 2.5, respectively)
to calculate the BGA proportions for each simulated sample selected.
With a perfect test, using discretely distributed alleles, each simulated
individual would type as 100% affiliation with their own group. In the real
world, using markers that are continuously distributed, there is a level
of statistical noise. The purpose of the simulations is to define what that
level of noise should be expected.
Below we show the average percentage affiliations for each type of simulated
sample. The total admixture average is the sum of the average admixture percentages
(columns) for samples of a particular simulated population (represented in
the rows). For example, for the 71-marker test, we calculate the average
level of European, East Asian and Native American admixture in simulated
Africans to be 0.96%, 0.1% and 0.87%, respectively.
AncestryByDNA™ 2.0 (71 AIM's)
| |
AFR |
EUR |
EAS |
NAM |
Total |
Total admixture Avg. |
| Africans |
98.07 |
0.96 |
0.1 |
0.87 |
100 |
1.93 |
| Europeans |
0.08 |
95.63 |
2.25 |
2.04 |
100 |
4.37 |
| East Asians |
0.03 |
2.45 |
92.98 |
4.54 |
100 |
7.02 |
| Native Americans |
0.01 |
1.83
|
3.63 |
94.98 |
100 |
5.47 |
| |
Avg |
4.70 |
AncestryByDNA™ 2.5 (175 AIM’s)
| |
AFR |
EUR |
EAS |
NAM |
Total |
Total admixture Avg. |
| Africans |
98.21 |
0.93 |
0.71 |
0.15 |
100 |
1.79 |
| Europeans |
0.4 |
96.36 |
1.5 |
1.74 |
100 |
3.64 |
| East Asians |
0.08 |
1.43 |
95.48 |
3.01 |
100 |
4.52 |
| Native Americans |
0 |
1.16 |
2.08 |
96.76 |
100 |
3.24 |
| |
Avg |
3.30 |
From this table one can see that the average level of artificial admixture
using the 71 marker test is about 5% and the level using the 175 marker test
is lower, at about 3%. Looking at the European row in the 175-marker table,
we can see that the average simulated European sample exhibits 1.5% East
Asian ancestry and 1.74% Native American ancestry. This suggests that the
average (but of course, not every) real person who is 100% European will
show 1.5% East Asian ancestry as statistical noise, and 3.64% non-European
admixture in total as statistical noise. Some will show higher levels, and
some will show 0%. One can use these values as a guide for interpreting their
result. For example, if a European suspects a small amount of NAM admixture
using the 175 marker test and obtains a reading of 1%, they can see that
the average simulated European has the same level, and so the data does not
support (nor refute) the NAM admixture.
We can look at the simulation data in another way – by looking at
the variation of results in simulated samples. How common is it that a simulated
European (or other) individual exhibits 5% or greater African (or other)
ancestry? The results are shown below.
71 marker test
| |
AFR |
EUR |
EAS |
NAM |
Avg. Outside Group |
| >15% |
1 |
0.0012 |
0 |
0.0008 |
0.0007 |
| Africans |
0.0006 |
1 |
0.0342 |
0.0306 |
0.0218 |
| Europeans |
0 |
0.029 |
1 |
0.1225 |
0.0505 |
| Native Americans |
0 |
0.0177 |
0.0873 |
1 |
0.035 |
| |
Avg. = 0.027 |
175 marker test
| |
AFR |
EUR |
EAS |
NAM |
Avg. Outside Group |
| >15% |
1 |
0.0048 |
0 |
0 |
0.00016 |
| Africans |
0.00096 |
1 |
0.005268 |
0.0158 |
0.00734 |
| Europeans |
0 |
0.00165 |
1 |
0.050 |
0.01722 |
| Native Americans |
0 |
0.001 |
0.021 |
1 |
0.00733 |
| |
Avg. = 0.0081 |
We see that the about 2.7% of simulated samples show 15% or greater artificial
admixture using the 71 marker test, and that less than 1% of simulated samples
show 15% or greater artificial admixture using the 175 marker test. Reading
across the European row, for the 175-marker test, we see that .5% of European
individuals registered with 15% or greater East Asian admixture. One can
use these values also as a guide for interpreting their result. For example,
if a European suspects a small amount of NAM admixture using the 175 marker
test and obtains a reading of 17%, this table shows that only 1.5% of simulated
Europeans exhibited NAM affiliation this high, and so based on the average
MLE, there is about a 98.5% chance that this result indicates real NAM admixture
(of course this is based on the average MLE, and a customer should also refer
to their individualized confidence contours on their triangle plot).
We can make tables like this for all of the possible values X>5%, X>10%,
X>15% . . . all the way to X>50%. All 40,000 samples were completely
affiliated with their own group at levels of 50% or greater. Down to X=5%,
we see it is not uncommon for simulated samples to show affiliation with
another group at this level of X. From all of these tables, we can compute
the rough percentage value necessary to conclude with 95% certainty that
partial group affiliation means real affiliation and not statistical noise.
The tables below show the threshold of 95%
confidence for each type of admixture in simulated individuals of
homogeneous ancestry. A reading at or above X (where X is a % value in
a cell of this table) means with 95% certainty that the reading is
caused by affiliation with that group in the column, as opposed to the
alternative, that the individual is really homogeneously affiliated
with their group and the partial affiliation is the result of
statistical noise. For example, using the 175 marker test, admixture
greater than or equal to 10% Native American is required for an
individual of polarized (i.e. mainly) European ancestry to conclude
with 95% confidence that there really is Native American admixture as
opposed to there being none (and no other admixture). Using the
71 marker test, one must see 12.5% NAM affiliation to conclude with 95%
confidence that there really is Native American admixture as opposed to
there really being none (and no other admixture).
Of course, a customer could get a reading of 8% Native American with the
71 marker test, which is below the 12.5% level threshold, and it still be
true that there is Native American admixture. Such a person should type other
members of their family. If the level of NAM increases going up the family
tree, and if the level is above the threshold in some of the ancestors, it
is probably a real indication of NAM admixture for the customer, even though
the 8% is below the 95% confidence threshold. In fact, 8% falls near the
90% threshold (we don’t show the 90% threshold on the website), not
the 95% threshold, meaning that on its own (regardless of values in family
members), the 8% is an indicator of NAM ancestry with 90% (not 95%) confidence.
Table SIMSUM175 Threshold of affiliation percentages for samples of polarized,
binary affiliation, above which results indicate fractional affiliation with
a p < 0.05, using the 175-marker admixture test.
| |
AFR |
EUR |
EAS |
NAM |
| Africans |
<2.0% |
7% |
5% |
<2% |
| Europeans |
3.50% |
<3.0% |
9% |
10% |
| East Asians |
<2.0% |
8% |
<2.0% |
12.50% |
| Native Americans |
<2.0% |
7.50% |
11.50% |
<2.0% |
Table SIMSUM71 Threshold of affiliation percentages for samples of polarized,
binary affiliation, above which results indicate fractional affiliation with
a p < 0.05, using the 71 AIM admixture test.
| |
AFR |
EUR |
EAS |
NAM |
| Africans |
<2.0% |
8% |
2% |
<8% |
| Europeans |
2% |
<2% |
13% |
12.5% |
| East Asians |
<2.0% |
11.5% |
2% |
17.50% |
| Native Americans |
<<2.0% |
9% |
17.5% |
<2% |
Of course, there is nothing magical about a 95% vs. 90% confidence interval
- and one might argue that genealogists rely on much lower levels of confidence
considering non-genetic data. The threshold values required to conclude a
bona-fide affiliation with 90% certainty are about 2/3rds those shown above
Results compared to the expectations of amateur genealogists
In 2003, Charles Kerchner established a website through which customers
could report their expected (from traditional lines of research) and observed
(from AncestryByDNA™ 2.0) admixture proportions along with explanations and
comments on the nature of their evidence. The data below was taken from Charles
Kerchner’s project as of Jan. 2004, and you can view his site at http://www.dnaprintlog.org/.
From a sample of 108 genealogists, we are able to compare the expected and
observed ancestry levels and note any interesting trends. As shown in the
tables below, the average amateur genealogist expected 6.6% Native American
admixture and got 5.1% using the 71-marker test (Table GENEXPOBS). Of all
the amateur genealogists, 69% of those expecting Native American ancestry
got a value from the 71 marker test that was within 10% of the level they
expected (Table ADMIX10).
Table GENEXPOBS . The average percentage of fractional BGA affiliation expected
by 108 amateur genealogists compared to the average percentage this sample
of genealogist actually obtained using the 71 AIM BGA admixture test (AncestryByDNA™ 2.0)
Expected versus Observed levels n=108
| |
EUR |
EAS |
NAM |
AFR |
| Expected |
0.858 |
0.02 |
0.066 |
0.056 |
| Observed |
0.844 |
0.055 |
0.051 |
0.051 |
Table ADMIX10 Precision of MLE against genealogist expectations using the
71 AIM BGA test (AncestryByDNA™ 2.0)
Admix value within 10% expected value
| |
EUR |
EAS |
NAM |
AFR |
| All samples |
0.71 (n=108) |
0.83 (n=108) |
0.81(108) |
0.96 (n=108) |
| Expecting admix. |
0.7 (n=105) |
0.7 (n=10) |
0.69 (64) |
0.77 (n=17 ) |
Though this shows that the average individual gets what they expected, not
shown in these tables is the fact that the standard deviation (avg. difference
from expectations) was relatively high, and so there were many individuals
that got significantly different results than they expected. Part of the
standard deviation is due to the anecdotal nature of some genealogical expectations;
scanning over Mr. Kerchner’s site gives you an idea of the differences
in quality of the expectations between individuals – some are better
than others. Some of the standard deviation is due to the test. Low levels
of NAM or EAS admixture were commonly expected – many times at levels
not much different from the threshold values shown above, suggesting that
the test assigning low levels of EAS/NAM admixture differently than some
genealogists expected them to be assigned. Though the simulations do not
specifically address individuals with low levels of admixture (they address
individuals of homogeneous ancestry), data from our labs show that the data
for the former is not much different and that of the latter and that the
relatively high standard deviation is not due solely to statistical error
inherent to the test – in other words, there may very well be an anthropological,
sociological, psychological or other type of explanation.
GO BACK TO TOP
14. Test Accuracy
The genotypes (nucleotide letters)
we have determined for you are quite accurate. Because we use the latest genetic
reading equipment available, we routinely achieve a greater than 99% accuracy
for each site. If an accurate value was not obtained for you at a particular
site, you will see an "FL" instead
of your letters for that site. Having a few of these generally does not prevent
us from making a good ancestry estimate, but of course having too many would.
Some reasons you may have an "FL" for a site include
- A small region of your chromosome around this site is missing or is
of different sequence character than for most. This result is not uncommon
given the highly variable nature of the chromosomal positions we measure,
and it certainly does not imply you have any sort of defect in any way whatsoever
(in fact, it may be an indication of your uniqueness).
- We did not get enough DNA from your swab. Some markers are more sensitive
to this than others. If there are too many "FLs" for your read-out,
we will not be able to determine your ancestry proportions to a degree
of accuracy that we would like, and in this case we will have to asked
you to submit another sample for a second try.
|
Experiment: Repeated estimation from the same samples with AncestryByDNA™ 2.0(ABD).
Purpose: To determine how reproducible the results are by measuring the
proportions in the same individuals on different occasions.
Results:
| plate3-BD101-Data.INP |
EUROPEAN |
100 |
EAST-ASIAN |
0 |
NATIVE-AMERICAN |
0 |
| plate5-BD101-Data.INP |
EUROPEAN |
100 |
EAST-ASIAN |
0 |
NATIVE-AMERICAN |
0 |
| plate3-BD304-Data.INP |
EUROPEAN |
85 |
AFRICAN |
15 |
NATIVE-AMERICAN |
0 |
| plate5-BD304-Data.INP |
EUROPEAN |
86 |
AFRICAN |
14 |
NATIVE-AMERICAN |
0 |
| plate3-BD316-Data.INP |
EUROPEAN |
72 |
AFRICAN |
27 |
EAST-ASIAN |
1 |
| plate5-BD316-Data.INP |
EUROPEAN |
79 |
AFRICAN |
20 |
EAST-ASIAN |
1 |
| plate3-BD3162-Data.INP |
EUROPEAN |
100 |
EAST-ASIAN |
0 |
NATIVE-AMERICAN |
0 |
| plate5-BD3162-Data.INP |
EUROPEAN |
89 |
EAST-ASIAN |
5 |
NATIVE-AMERICAN |
6 |
| plate3-BD317-Data.INP |
EUROPEAN |
79 |
EAST-ASIAN |
21 |
NATIVE-AMERICAN |
0 |
| plate5-BD317-Data.INP |
EUROPEAN |
84 |
EAST-ASIAN |
16 |
NATIVE-AMERICAN |
0 |
Summary of Results:
Variation in percentages is a result of failed markers. This test shows a
5-6% variation for the absolute percentage in any one group. Since this
experiment we have been using 5 samples on each run as internal controls.
The average variation is 2-3% for these controls. Another group of 11 have
also been tested repeatedly, and these show an average 2-3% variation for
10 samples, and an average 5% variation for the other sample. Best estimate
from all of the data on repeated measurements is on order of 3-4% variations
for most determinations if the individual tested has failed markers. Thus,
if your profile came back as 96% EUROPEAN and 4% East-Asian, it is debatable
whether the 4% East Asian is significant and would also be addressed by
the confidence contours.
GO BACK TO TOP
15. Percentages and physical appearance:
We have noted that individuals exhibiting physical characteristics of a
population group generally have at least 30-35% identity with that group.
For example, persons with an 85% EUROPEAN and 15% African generally exhibit
few if any physical features characteristic of the African group, such as
darker skin. Why is this? The genes that determine physical appearance are
but a very small percentage of the total number of genes in the genome. Thus,
for all of these genes to have sequences characteristic of one group, the
person would need to be of relatively high proportions for that group. The
higher the percentage of African a person is, the more likely the areas of
the genome that determines physical appearance will be of African origin.
GO BACK TO TOP
16. Better understanding human genetics
First of all, let us admit that genetics is a complex subject. You do not
have to be a scientist to understand your results, and if you do not understand
your results from thoroughly reading all of the materials supplied to you,
you may want to do some basic reading on the internet.
The U.S. Department of Energy Human Genome Program is an excellent place
to begin the process of understanding Human genetics.
http://www.ornl.gov/hgmis/publicat/primer2001/index.html
For further reading, we recommend: The Great Human Diasporas; The History
of Diversity and Evolution, by Cavalli-Sforza. It is an informative tool
in understanding migration patterns and ancestral evolution.
People interested in discussing genealogy from DNA should visit this interactive
forum: http://archiver.rootsweb.com/th/index/GENEALOGY-DNA
GO BACK TO TOP
17. Glossary
Ancestry Informative Marker (AIM): AIMs are the subset of genetic markers
that are different in allele frequencies across the populations of the world.
Most polymorphisms is shared among all populations and for most loci the most
common allele is the same in each population.
Allele: Alternate sequences for a particular position in the genome. For
example, a common variation in the genome is for some forms of the sequence
to have Cytosine (C) while other forms have Thymine (T). Thus, since we
have two copies of each chromosome, there are three genotypes at this position
CC, CT, and TT.
Chromosome: The physical units of heredity: long linear strands of DNA.
Humans have 22 autosomal chromosome pairs, plus two sex chromosomes, X and
Y. Men have two copies of each autosome, 1, 2, …, 22, X, Y. Women
have two copies of each chromosome 1, 2, 3, …, 22, X, X. Each person
thus has a total of 46 chromosomes.
Genomics: The study of the complete compliment of genetic material in a
species.
Genome: All of the genetic material in a species. The human genome is approximately
3,300,000,000 base pairs in length.
Locus (pl. loci): The name for a physical position on the genome. Can either
refer to a large region such as a complete gene or a very specific region,
like a particular base pair position.
Polymorphism: The property of having more than one state or alternate sequence
at a particular position. The alternate states are called alleles.
Single Nucleotide Polymorphism (SNP;
pronounced snip): A precise base pair
position where different people are found to vary in sequence. Generally
two alternate alleles are found at a particular SNP. At least 2,000,000 SNPs
are now known and there may be over 30,000,000 in the human genome.
GO BACK TO TOP