MacGregor DNA Project January 2017 update
I used the 2015 update of this
blog to explore the various tests available. Since then there has been a
noticeable increase in the number of individuals taking the test known as ‘Family
Finder’, or something similar, rather than Y chromosome or mtDNA tests.
This has probably been as a result of quite aggressive marketing by
Ancestry.com [it has a variety of website endings depending on where it is
based] in particular. This has promoted the equal use of DNA testing for both
males and females and tied it into the submission of family trees which individual
testers can use to identify the same family name(s) with others who have tested
and submitted their genealogies. What has perhaps been rather glossed over in
this is the fact, firstly, that DNA gets ‘lost’ over time – if it didn’t we
would have the DNA of billions of ancestors in our bodies, and, secondly, that
it is only a tiny portion of our DNA which is currently being examined for
genealogical purposes. In relation to the latter, if you took all the DNA out
of your approximately 100 trillion cells and stretched it out in a long line,
with DNA in one cell being about 2 metres long, it would reach to the moon and
back about 8000 times. By comparison, if you did a similar process to your
veins, arteries and capillaries they would measure about 100,000 kilometres (or
62,000+ miles) or roughly twice the height, or thereabouts, that aircraft fly
above the earth.
You
do not inherit 25% of your autosomal ancestry from each of your 4 grandparents.
This is because your autosomal DNA is randomly recombined, and not in equal
proportions from each parent, and so the more you go back in time the
percentage inherited from people in a particular generation becomes smaller and
smaller and therefore the more distant the ancestor is the more difficult it
becomes to identify what you received from that person. What then are the
chances of that same bit of DNA being preserved from a specific ancestor in
yourself and someone else? For example, if your name is, say, Smith, and your
male MacGregor ancestor lived 10 generations ago on your mother’s side it is really
not feasible with today’s technology to identify that DNA by looking at your
DNA today. The tests which are offered by Ancestry, Family Tree DNA etc. only
try to identify links to 5/6 generations back. The key thing to remember is
that if you and someone else have, say, people called Brown in your trees it
does not necessarily mean that you have a recent ancestor in common, or indeed
that you have any ancestor called Brown in common at all. For these tests of
connection to work properly, you, and the person you are comparing with, need
to have as much genealogical information
as possible on every ancestral line in your respective trees, going back 5
or 6 generations (as well as a significant shared portion of DNA). This
information can be displayed on a fan chart such as this one, which can be
viewed at:
Fig 1. fan chart example - Perry
You and the person you are
comparing with both have to put all your family information into one of the
computer genealogy programs (like Family Tree Maker, Reunion etc.), save it as
a GEDCOM file, and upload it to whichever DNA company you have tested autosomal
DNA with. The possible links between the two family trees are then highlighted
in some way which allows a comparison of ancestry to be made in order to
explore if there is a family match on some surname. The fact that there is a
match on surname does not necessarily mean that it is the same family, only
that there is a surname in common. Clearly the more unusual the surname, the
more likely that the match will be with the same family. In my own case the
extensive GEDCOM file that I have made found a name match in another equally
detailed GEDCOM tree which linked to a common ancestor born towards the end of
the 18th century in the Volga German colonies in Russia, although in
fact this was just confirmation of a previously suspected connection found by
traditional genealogical research (I’ll come back to this later). Clearly autosomal
testing COULD be very valuable, despite these caveats, for females who cannot
take advantage of the Y chromosome test which only men can do since it relates
directly to father’s surname (unless of course a father, brother, or cousin who
has the name of interest will do the test on the female’s behalf).
There is another benefit to autosomal testing in that portions of a
person’s DNA can be compared with typical results from other ethnic groups. So,
for example, a person might find through this utility (sometimes called My Origins or similar) that they have,
for example, Mediterranean, African, or Native American ancestry as part of
their genetic makeup. This won’t be a link to a specific person (unless of
course this genealogy is already known or at least suspected) but is often of
interest when an individual wants to know the answer to the question ‘where did
I come from?’. This is explored a bit more later in this blog. What we
invariably find is that our most ancient ancestry is incredibly mixed and that
our own personal ethnic makeup links back to many different regions and races. The Economist discussing the work of
scientist and geneticist Luigi Luca Cavalii-Sforza remarked that his work "challenges the
assumption that there are significant genetic differences between human races,
and indeed, the idea that 'race' has any useful biological meaning at all".
SNPs
I want to spend most the rest of this
update considering the current and possible future significance of SNP testing.
A SNP, or single nucleotide polymorphism, is a DNA ‘event’ which functions a
bit like a marker in time. At the moment, I am here talking about SNPs which
occur in the male line (and are often relatable to surnames). As far as is
known, once an SNP occurs it does not re-mutate backwards. Because of this
fact, if we can identify specific SNPs, as a first step, and if we can then date
them, even approximately, we find ourselves on the way to constructing family
trees, not just in prehistory but, potentially, for SNP events which took place
within historic time. More than a decade ago Ken Nordtvedt and Dr John McEwan worked
on ways of grouping STR results [that is, results from the standard male Y
chromosome test] to show how some numerical patterns were constant within
defined groups. This was limited at the time by the fact that only 37 markers
were available in STR tests (whereas now one can test 67, 111 or even more). McEwan
identified 49 groupings, and the one he called R1bSTR-47 came to be known as
‘Scots’. This was the DNA profile based
on 37 markers that was identified as ‘Scots’ (Fig. 2):
Fig 2. The ‘Scots Modal’ and
‘MacGregor Modal’ compared
The lower figure was the modal
figure for the MacGregor group who claimed descent from Ian Cam (who is in the
record of obituaries written down by Sir James MacGregor Dean of Lismore as
having died in 1390). What was exciting at the time was the realisation that
with only three mutations different, the MacGregors seemed to be a group who
had mutated a little way, but not far, from the Scots group. Since then some
commentators and researchers, particularly Alistair Moffat and Jim Wilson in
their book The Scots: A Genetic Journey
have suggested that the MacGregors were actually Picts. Without wanting to go
into the arguments for or against this interpretation in any depth I did want
to ask: if these were the Picts then who are the Scots [this is an inversion of
the way the question used to be asked]? If you look at the various DNA discussion
boards you will see that many people disagree with their interpretation. It is
nonetheless important to repeat it here because the definition of Pict was
based, by Jim Wilson, on his interpretation of the SNP S530 which he found and
named when he was attached to ScotlandsDNA [the company now has various
versions of the same name such as IrelandsDNA, BritainsDNA and so on].
Following on from this, S530 was found to
be equivalent to L1335 [the confirmation came in 2012] and the search was on to
find what SNPs were more recent in time than L1335/S530. Four years ago not enough was known about SNPs
to attempt a time estimate as to when they occurred, but since then the group known
as YFull have given broad estimates, from their research, of possible dates for
each SNP. At this point I am concentrating on just the MacGregor male line but
later in this blog I will make reference to another line to demonstrate how SNP
testing is currently revolutionising our understanding of genetic/clan
genealogy going back into the past beyond the time of written genealogies. What
we still lack are more SNPs coming forward into the time of written records
although there are some SNPs which are beginning to divide up families into
smaller subgroups with the same surname, particularly when associated with STR
results.
At this point I repeat a chart which I
first included in the 2015 blog. The chart was derived by Jim Wilson from work
which he did in relation to clan DNA origins through the ScotlandsDNA company
(Fig. 3).
Fig. 3: The SNP tree of the Scottish clans as at 2014
This helpful map from Wikipedia shows how the clans relate to each other geographically (Fig. 4):
Fig 4. Scottish_clan_map Wikipedia Commons.png
Taking each of the SNPs from
S530/L1335 onwards, YFull have given the following time estimates:
L1335
(also known as [aka]S530) formed 4300ybp (years before present), and time to
most recent common ancestor 3600ybp
L1065
(aka CTS11722 or S749) formed 3600ybp, time to most recent common ancestor
1750ybp
S744
formed 1750ybp, time to most recent common ancestor 1750ybp
S691
formed 1750ybp, time to most recent common ancestor 1700ybp
S695
formed 1700ybp, time to most recent common ancestor 1550ybp [or, c.320 AD to c.
470 AD]
It not clear from the above how
YFull distinguish the time frames of the later SNPs as they seem to be suggesting
that they all arise around the same time, although, other commentators on DNA
have speculated that there are approximately 90 years on average between SNP
DNA mutations – but – that assumes a
degree of regularity in the occurrence of mutations, which we know is not the
case.
If we work in reverse from S690 [see Fig. 3 above] which is the defining mutation of the MacGregor bloodline, and which, we
think, could have arisen as late as 1360 AD and perhaps as early as 1200 AD, then,
using the standard SNP sequence going backwards in time, S695 might have occurred
as late as 1270 AD and as early as 1110 AD.
Then S691 above S695 as late as 1180 AD or as early as 1020 AD. Ian Cam
MacGregor descendants will notice that there is no mention of S697 here. We
simply don’t know enough about this SNP or FGC17830 which seem to be typical of
the MacGregor main line to say anything definite about them at this point. Indeed,
we know little about S701,
FGC17831, S703, FGC17832, S27834, BY144, FGC17829, S27835, and BY143 which are currently
found as positive ONLY in the SNP results of MacGregor bloodline and in two
Buchanan men. (By the way, it is very interesting that almost all other
Buchanan men descend from the next level up on the ancestral tree as shown by
the results collected and displayed on the excellent site run by Alex
Williamson). The section for descendants of SNP L1335 can be found on
Williamson’s site at:
Fig 5.
MacGregor/Buchanan section of Alex Williamson’s The Big Tree
If YFull are right in their estimates then it is
quite possible that these SNPs [from S701 onwards] fill the gap between c. 500 AD
and c. 1300 AD, implying that MacGregors remained a homogenous group from 500 AD
onwards BUT this seems highly unlikely, to say the least. Looking at
Williamson’s trees it is quite clear that there are many SNPs about which we
need more information. Williamson shows the following as the sequence of SNPs
from L1335 (and this same sequence is shown as part of the Scots Modal Panel
which can be ordered from YSEQ):
L1335/S530 > L1065 > Z16325 > S744 > S691 > S695 > S701
> S690
Fig 6: Section of
the S691 descendant SNPs from YSEQ
This image, which is taken from the YSEQ website shows
just how complex the clans family tree is becoming, and changing as we learn
more and more about SNPs. It is clear that the relationships which exist
between Scottish clan groups will eventually be refined in such a way that the
traditional stories of clan origins will either be confirmed or refuted. So,
even though now we cannot say with absolute confidence what the exact sequence
of SNPs leading from L1335 to S690 is and cannot assign secure dates for their
first appearance it will only be a matter of time before the confidence levels
on all this become much clearer. It is remarkable how much has been achieved in
just four years.
For
comparison, I attach another small section from Alex Williamson’s Big Tree – in
this case for the McLeans (Fig. 7). We can see in this that the family groups
are beginning to divide up based on individual SNPs separating family groups. The
McLeans, like the MacGregors, have a small number of individuals who are quite
separate from the mainline, in having the SNP M222 which often has Irish
connections, or other SNPs which seem to indicate adoption of the name by
different families during the time of change from patronymics to surnames.
Fig 7: McLean
section from Williamson The Big Tree
What are patronymics?
The use
of patronymics versus surnames is not always well understood, so a brief
discussion here might be helpful. Years before surnames became common in
Scotland an individual might be referred to by his geographical location – so
in Clan Gregor we have John of Glenorchy who flourished in the 13th
century. It has always been believed that this John was one of the first
MacGregors, but without a surname we have no real way of knowing if he was
indeed a MacGregor – there’s always the possibility that he was an early Campbell.
After the middle of the 14th century, and in Britain following the
devastation wreaked by the Black Death, not only did individuals move around
rather more than before but individuals were less tied by servitude to
landowners. In Scotland individuals who
managed to acquire some land-holding capability began to use surnames to
identify family groupings, whereas, the common people around them would continue
to be identified by their family relationships in the formula ‘the son of
and grandson of’ (or ‘daughter of’ in the case of females) another male forename.
Thus, whereas the most important MacGregor might be known as Patrick MacGregor
of Glenstrae, the under tenants, who might, or might not, be distantly related
to him, would be known as, for example, Patrick McEwan VicConachie [son of
Ewan, grandson of Duncan]. As late as
the middle of the 18th century the rental documents of the
Menzies estates were still referring to males by their patronymics, but, by the
end of the century all tenants had acquired and been identified by fixed
surnames. These families might indeed be genetically MacGregors but they might
equally be genetically Menzies or Drummond, or they might never have been
genetically linked to one of the main families in that area at all. It is
because of this variety of means by which names were acquired that clans
include individuals who have a variety of genetic origins.
DNA testing
companies and their products.
Y chromosome tests:
It can
be very difficult for people new to DNA testing to work out what the best test
is for them. The answer depends on what question a person wants answered. If a
male wants to know about his surname and its origins, then the Y chromosome
test is in practice the only option. Very little is gained however by simply
doing 12 or 25 markers because comparisons with other people with the same
surname are only effective when comparing 37 or more markers. Many of the
programs which help to make comparisons with another individual’s marker scores
work best with 67 or 111 markers, and some do not work with less than 67
markers. The advantage to participants of an FtDNA (Family Tree DNA) Y
chromosome project is that the company has the largest publicly available
database of Y chromosome results for comparison. Surname projects allow direct
comparisons with others, but it is possible to keep results private and not
visible to the general public, although there is no way to identify an
individual testee based on kit number alone.
One of
the results of Y chromosome testing is that an initial assignment of the
results to a haplogroup is made. For the majority of Western Europeans this will be R1b-M269 while most others will be I-M253
[originally I1], or I-M223 [originally I2] (both associated with Scandinavia, and originally from the Balkans),
and R-M512 (or R1a), whose origins lie
in the Steppes. What the numbers indicate is an SNP that defines a specific
group (for example, R1b-M269 is Western Atlantic origin). Some companies,
however, such as 23andme, still classify individuals by alphabetic lettering.
Thus, in my own case 23andme labels my paternal line as R1b1b2a1a2f* ‘a
subgroup of R1b1b2’ which is a ‘subgroup of R1b1’. R1b1b2 then described in the
accompanying explanation as:
· Age: 17,000
years
· Region: Europe
· Example
Populations: Irish, Basques, British, French
· Highlight: R1b1b2
is the most common haplogroup in western Europe, with distinct branches in
specific regions.
Hopefully the earlier part of this blog has shown
why this doesn’t say much about recent genetic connections.
For I1 [I-M253], for example, the 23andme include
generalised results for some famous individuals, including Leo Tolstoy and
Warren Buffett. It’s unfortunate that some newspapers then choose to interpret
such results as ‘meaning’ that a named person is ‘related’ to Warren Buffett
(yes they are, but probably cousins at 10,000-20,000 years distance), or, even
worse, that a named person is a ‘descendant’ of the historical figure (and that is extremely unlikely unless it
happens to be Genghis Khan!!). In this way completely false stories about DNA
relationships spread.
The
origins and spread of these haplogroups are shown on the attached map found in
several forms on the internet:
Fig 8: Origin and
Spread of haplogroups R1a, R1b and I
What about Big Y or
Full Genome testing?
Big Y
FtDNA advertise their Big Y as follows:
“Nearly 25,000 known SNPs, placing you deep on
the haplotree.
10 Million base-pair coverage - more than any other Y-DNA test on the
market.
Find SNPs that
may be completely unique to you.
Explore your deep paternal ancestry
Help the community uncover
new, undiscovered SNPs.
Use your newly
discovered SNPs to help grow the
haplotree”.
Whereas FGC (Full Genome Corporation) offer:
The “GenomeGuide, a whole genome test for ancestry
purposes, and Y Elite 2.1 a comprehensive test” of a person’s “Y chromosome. Y
Elite 2.1 determines those markers (i.e. SNPs and STRs) that are most useful”
for a person’s “paternal ancestry”.
Both these tests aim to locate SNPs on a male Y chromosome and may
include SNPs classified as ‘private’, meaning that at this point in time they
have only been found in a single or very limited number of individuals, and
their exact significance to the more general tree or to an individual’s
personal family tree has yet to be confirmed.
It will be clear from the
above product descriptions that FGC’s offer is more comprehensive (and they
have other versions which probe the Y chromosome even more thoroughly, but cost
considerably more). The essential problem lies in identifying which test, if
any, gives the most useful information. Some project administrators make
suggestions as to which more comprehensive test to take, or, they highlight
specific SNPs that an individual might choose, but, these usually build on
previous testing rather than being aimed at people starting to look at SNP
testing for the first time. A good starting point is to observe what SNPs
others in a group have already tested (FtDNA show these as ‘confirmed SNPs’ in
green). Individuals who don’t know here to start with SNP testing do need to
look for help from a project administrator regarding which SNP(s) to choose. If
we take M269 (for group R1b) for example, in many projects in FtDNA this will
show in red, meaning the SNP is predicted but unconfirmed. Normally the
prediction is correct. If starting from this point probably the best thing to
do, short of going straight to one of the two big tests mentioned above, is to have SNP L21 tested for positive or
negative. If a person is L21 positive and doesn’t want to go down the line
of Big Y or FGC testing then the next step, having looked at any confirmed
green entries for SNPs in the sheets of excel data for people lying nearby in
the grid, is to go for an L21 SNP Panel either with FtDNA or with YSEQ.com (but
using the latter will require a new registration and a new sample, although
their pricing is competitive). If SNP testing is done with FtDNA, their results
program will usually suggest what the next SNP tests might be. At a certain
point in testing it is definitely worth (if only financially) trying an
appropriate SNP Panel. For instance,
results in the STR Y chromosome tables for a surname project which lean towards
L1335 suggest that that would be a good STR Panel to test. Both FtDNA and YSEQ
offer L1335 panels as well as individual SNPs (but doing one SNP at a time can
get expensive).
Just to re-emphasise: the
advantage of the more comprehensive tests is that ‘private’ SNPs are often identified. Sometimes these
are unique to an individual but sometimes they will be found in several
individuals and therefore they may well define a discrete family group from
within the historic period. However, in order to identify these as belonging to
more than one person, other people who seem to be closely related (when looking
at the other DNA male line results) need to test for the same ‘private’ SNPs.
Many surname groups are working to try to identify these ’private’ SNPs for
family groups both to advance genealogical links but also to save participants
some money!
Health related
issues
Most DNA testing companies do not give reports
which include information about health risks. Exceptionally, 23andme have
offered health related reports in the past but after difficulties in America
with the FDA they suspended these reports, but later reinstated some for the
non-American market. These tests do not have a genealogical component and
therefore will not be discussed further.
Ethnic mix
As mentioned earlier, FamilytreeDNA, through its MyOrigins report, ScotlandsDNA through Ancestry Painting, Ancestry.com through
the AncestryDNA test, and 23andme
through Ancestry Composition, all,
with some variations in reporting procedures, aim to give an individual a
‘picture’ of his or her ancestral connections with populations around the
world. Results naturally vary considerably from almost 100% European to real
mixtures of different ancestral backgrounds including American Indian, Far
Eastern, African and so on. Ancestry for example says that their DNA test ‘looks at a person's entire genome at
over 700,000 locations’ and covers ‘26 ethnic regions’. Ancestry.com claim to have ‘more
than 2 million people’ in their database and ‘the unique ability to connect
with Ancestry’s billions of historical records and millions of family trees’. For
further information on these tests and how they report see my 2015 blog on this
site.
Discovering ‘distant relatives’
The reference to Ancestry.com in the above
paragraph was deliberate. On the one hand the ability to contact other members
part of whose DNA is the same as one’s own is clearly attractive. This is
exactly what I was referring to in the opening paragraphs of this blog. The
difficulty is that Ancestry does not remind you to check that the information
you receive from others is actually accurate. Many a false genealogical
connection has been made through eagerness to get back as far as possible. What
many people do not realise is that the
written records on which genealogies are constructed can be missing for some
areas of the world. Even in Scotland the records for the counties in the
very north are missing for many localities before 1800 and almost universally
before 1750. Wars and carelessness, as well as the wide dispersal of the
populations in remote locations meant that children might well not ever be
baptised, or if they were, it was done
whenever the minister happened to be in the locality. However, it was the
parish clerk’s job to keep the records, not the minister’s, and the parish
clerk might be tending his cattle 20 or more miles away. The same cautionary statement holds true of
FtDNA’s Family Finder in that what
appears in an imported GEDCOM file only represents the family researcher’s work
and, as with all internet genealogy, needs to be checked for accuracy.
Fig 9: Screen grab
from Family Finder proposed matches
In this screen grab from Family Finder, for the sake of privacy and data protection, I have removed
the picture details of matches including the email of the individual whose
family includes an individual related to my own family. The match is Charlotta
Major but she is not an ancestor in my line, but her father Konrad born in 1797
is. This then is not an MtDNA link
(and in any case the person who is my match has not tested this, nor, being
female, could she test the Y chromosome), it is an autosomal link with a male
line which is my mother’s great grandfather. I have, however, been unable to
identify any links with the other individuals listed as matches.
Which company then?
As I said earlier the choice of company depends
entirely on what question or questions you want answered:
Fig. 10: DNA testing company list
-->
I have not drawn out trees based
on STR Y chromosome results this year [that is, those that appear in a chart
for people in, for example, an R1b group as having a number sequence like 13,
24, 14, 10, 11,14 etc.] since these results are too diverse and complex when
making a comparison between surname
groups in the project which now has over 1200 participants, and sometimes
even too variable within a surname project name subgroup [as for example Greer,
Grier, Grierson in the MacGregor Project]. In short, there are now too many
people in the project to do comparison charts that would have any real meaning.
Also, the amount of detail would be far too great to permit any links to be
seen. Because of this I repeat here my usual offer in relation to those who
have tested their Y chromosome through STR tests. If you wish me
to run a comparison with other participants, then please state the group or
individuals with whom you wish to be compared and I will make a personalised graph
for you and help you interpret the results. Please note though that it is only
feasible to compare like with like (i.e. 67 markers with 67, 37 with 37). As
usual my email address is richardmcgregor1ATyahoo.co.uk (substitute @ for AT).
Please contact me offline also for advice on SNP test choices. Could members of
the Ian Cam MacGregor group [the bloodline group] please note that the terminal
SNP for the group is currently S690 and we do not yet have any ‘private’ SNPs
to recommend, other than S696 and S698 which seem to be carried only by the
Glencarnock line, and may have arisen in the last 250-300 years. Apart from the
two known carriers of these SNPs other members of the Ian Cam group who have
tested these SNPs have found them to be negative.