(Note to enlarge images simply click on them, and to dismiss and return to text click the X in the top right-hand corner - back arrow works on a phone)
Welcome to my MacGregor Project DNA blog for 2019. This year I am going to focus in part on some of the problems presented by DNA testing for genealogy and, in some cases, suggest not exactly what the solutions might be, but rather possible way(s) forward. I will be dealing with, for the most part, homogeneous groups from the MacGregor Project: all of these groups contain some MacGregor surname participants, participants with related or sept-based surnames, as well as those whose surnames have no currently known MacGregor connection, other than perhaps by tradition. It is important to understand the difference between a related surname and a sept of a clan and I have explored this issue in last year’s blog, so I won’t repeat it here. As for the processes of DNA testing for genealogy the current Wikipedia entry is very helpful, and it can be accessed at:
I hope it is
not too simplistic to say that one of the key ways in which DNA testing for
genealogy will improve in the future is by more people doing higher level
tests. This is certainly true for the male Y chromosome (only males have the Y)
since the potential of the Big Y test is still relatively unexplored, and to
have more participants at this level of testing is essential for the future.
Having said that I must here give great credit to all those who are working to
develop analyses and interpretations including, for the MacGregor project,
Professor Neil McGregor in Australia. If you have already tested Big Y then you
should send your results to Neil (neilmcgregorATtpg.com.au, substituting @ for
AT) if you are from a MacGregor (any spelling line) or where the DNA evidence
suggests that your surname was once MacGregor but changed as a result of
Proscription (banning of the name from 1603-60 and 1692-1774). In addition, you
should send your results to Alex Williamson for inclusion in The Big Tree (see https://www.ytree.net/). This tree is only for
branches under the SNP R-P312 and therefore not applicable to other groups such
as the I groups which are connected with the Scandinavian countries (and by
implication, Vikings). However, Alex’s tree does deal with results which account
for at least 70% of British and Irish DNA, if not more. You can join the
discussion of this at http://mcewanjc.org/scotsr1b.htm.
You will also find helpful pictorial analyses of population densities at https://www.eupedia.com/genetics/britain_ireland_dna.shtml.
Those of you
most interested in the information on MacGregor SNPs should skip to the end of
this article where I discuss what progress has been made (and will be updated
later in the year). I want to begin by dealing with two specific groups. In the
analyses that follow you will need to
have your kit number to hand as I am not labelling the charts with surnames
(with two exceptions) but will mention surnames in the text of my analysis. You
can easily refer to the surnames for each kit number by looking at the relevant
subgroup in the project: www.familytreedna.com/public/macgregor. The first two groups are what I have termed
Viking 1 and Viking 2: these are separate genetic lines generally associated
with the area including Denmark (Viking 1) and Norway (Viking 2). I should
point out that I have only done analyses at the level of 37, 67, or 111 markers
and have had to exclude 12 and 25 markers as these are too general and not
specific enough to generate discrete results.
The
Viking Y chromosome groups
In Viking 1 group
at the level of 37 marker testing we see a common point of origin (typified by
the star shape) in an individual who probably bore the SNP I-M253 (Figure 1):
From that individual there have come many branches, some
of which split to show connection to a common ancestor more recent than the one
that bore I-M253. In general, the closer together and further up a line the
splits come suggests more recent shared ancestry – but it does not mean that
everyone has the same surname. Compare
this with the 67-marker chart Figure 2), which has less lines since those who
only did 37 marker testing have dropped off. 67 markers make links much more
obvious especially for splits at the ends of lines.
Fig. 2: Viking 1 group at 67 markers Y chromosome |
In the Figure the three results which include kit 101344,
2 are surnamed Skinner and therefore closely related, but one is Young – it
looks like one or other surname was adopted quite late. At this point I should mention that the
programmes I use (Dean McGee’s DNA Comparison Utility and Splitstree [details
at the end]) group individuals by common characteristics that are not always
evident when looking at the number sequences on the Project subgroup grid. In
other words, if there is a mutation or two earlier in the number sequence
(particularly in the first 25) the project grid at FtDNA will change the
position of the results on the grid relative to others who might be more
closely related, whereas, the Splitstree charts tend to ignore smaller
variations and early changes in number of mutations. As a second example, the
group above which includes kit 377721 are all related having the surname White
(however spelt). The genealogies for these three kits connect Scotland, NE
England and Kent. N73491 has the surname Kellogg associated with Essex (across
the river Thames from Kent). If this is accurate then one interpretation is
that there were two Viking brothers who went separate ways, one to north
Britain one south (or both south then one went north!) – but that really is
speculation.
Remembering
that DNA testing is entirely random as to who actually decides to test it is
interesting to note that the Viking 2 group (Norway) is less represented than
Viking 1. This could be explained by geography, in that the Danes primarily
colonised the east side of Britain while the Norse went to the north and west (the
Scottish Islands, north Wales, East Ireland) where populations were relatively
smaller.When we look
at the 37-marker chart for the Norse group (Figure 3) we see that there are
some very clear groupings related to families. At the 67-marker level (Figure
4) some of the results have dropped off since participants only did 37 markers.
The radiating lines are longer (greater time depth) and what we see may
represent two individuals rather than one as the origin in prehistoric times.
Fig. 3: Viking 2 (Norse) group 37 markers Y chromosome |
Fig. 4: Viking 2 (Norse) group 67 markers Y chromosome |
Interestingly there are the same number of more recent
family groupings for less participants than the Viking 1 group. There is a perhaps
a suggestion here of geographical shared origin. The group of 4 results which
include kit 22659 are consistently named King whereas the group which contains
kit 45658 has five different surnames apparently more closely related
genetically: McGregor, McLean, McClister, Mills and White.
There is one question which participants with
Scottish ancestry who find they have a Viking Y chromosome ask and that is ‘am
I a Viking?’ The answer to this would be
yes ONLY if the most part of your DNA could be tracked back to the Scandinavian
countries. As it is most Scots whose ancestors stayed in Scotland into the 20th
century would find that their male ancestors were largely from haplogroup R1b (related
to Scots modal) and not I (Viking). It was only that perhaps a thousand years
ago that a Viking Y chromosome was ‘inserted’ into the predominantly Celtic clan
environment but then the families who descended from him would continue living
and reproducing in a Celtic/clan environment. Other parts of the DNA – the autosomal
which I discuss further under the Family Finder section – show comparatively few
genes which have ‘Viking’ connections. Yes, there are some, but most indigenous
Scots’ DNA is ‘Celtic’. In a subsequent blog I will discuss the thorny questions
of ‘Who were the Scots? ‘versus ‘Who were the Picts?’ and can we tell the
difference as certain authors suggest we can (and they say that some clans are
Pictish)? In the end it is a similar question to ‘Am I a Viking?’ but probably needed
DNA testing aa thousand years ago1
The
Irish-Related group (note spelling both Grier and Greer to show there is no
distinction)
If we turn now to
the group which I have called Irish Related we find another star shape at 37
markers (Figure 5) which is preserved at 67 markers (Figure 6) suggesting
descent from one individual (who carried SNP R-M222), although this is less
obvious when looking at 111 markers (Figure 7). The group which contains kit
862 is almost exclusively surnamed Greer, however spelt, and this group should
also contain kits 333215 and N225557 (I will move these from where they are in the
Scottish Irish Grier group in due course).
Fig. 5: Irish Related group 37 markers Y chromosome
|
Fig. 6: Irish Related group 67 markers Y chromosome
|
Fig 7. Irish related group 111 markers Y chromosome |
I have deliberately included all three Irish Related
charts to show that while we can see relationships at the 37-marker level and
to a lesser extent 67 (without the 37 only participants), when we reach the
number that have tested at 111 markers the relationships detail and closeness
disappears – only the group with kit 862 still has 3 related individuals – all the other close matches have disappeared. We see that lines have lengthened, and this
gives us a better understanding of how far back in time the family split
happened. If we look again at the group containing 862, it appears like this
may represent two brothers or even three brothers diverging at a point: this
would need to be checked with genealogies if they exist. I believe this split
may have happened in the United States. When we look at the 111 marker group
and include a chart which estimates the number of years to the Common Ancestor
(using 75% probability on FtDNA’s stated mutation rate in Figure 8) we see that
many of these individuals can trace a relationship back in time to after the
beginning of the Second Millennium (1000AD or 1918BP – before present) which is
interesting given the fact that this represents different surnames it is not a single
surname grid.
Fig. 8: Irish Related group – 111 markers Time to Most Recent Common Ancestor |
At this point I should mention that there three other
distinct and unrelated groups of Grier/Greer/Grierson. One group of these Greer groups I have termed
Scottish Irish could be from Northern Ireland or South West Scotland. The
second group has Viking genetic heritage (I have called the group Grier
Viking). This latter group descend from 3 or at most 4 separate individuals
whereas the Scottish/Irish Griers (Figure 9) descend from more individuals,
possibly as many as seven (there could well be more in reality, but it is quite
a small test group).
Fig. 9: Grier Scottish Irish group 37 markers Y chromosome |
L1065
SNP – the Scots Modal
If we apply the Splitstree chart making program to those
who have tested SNPs and who show that they all descend from the individual who
had the L1065 SNP, we see again a star shape suggesting a common origin [YFull
dates this SNP origin point to approximately 1750 years ago] but there are many
separate lines emanating from it with few shared numerical sequences. This chart (Figure does NOT include any of
the MacGregor related participant surnames that have tested L1065, nor any of
the other smaller surname groups within the project - it only draws out those
identified in the L1065+, S691+ and S695+ MacGregor Project groups. At the end of this Blog you will find all the
surnames in the MacGregor Project who have the L1065+ SNP. My point here is to
show just how diverse the descent was from L1065 using just a few surnames, Alex
Williamson’s Big Tree shows just how many individuals with that SNP there are, (or
rather those who have actually tested – there must be many many more), and how
many surnames it led to (including the main MacGregor line). However, the key
factor is that it is an SNP which originated in Scotland and most of those who
have tested FtDNA’s Big Y or YFull’s DNA testing and now appear on Alex’s tree
show a genealogical origin in Scotland. See: https://www.ytree.net/DisplayTree.php?blockID=160.
Fig. 10: L1065 positive results in the MacGregor project (only groups 1065+, S691+ and S695+) |
Two
short specific cases: White; McGhie
What we see again and again in DNA genealogy is that
there are multiple origins for surnames especially those which are descriptive.
If we examine the chart (Figure 11) generated by those participants who have
the surname White we see multiple origins for the name. The significance of
White as related to Clan Gregor is that it was a name associated with the clan (sometimes
as an alias) and indeed many other clans. It is the English version of Ban or
Bain and indeed there is one participant with that name who is descended from
Gregor the name father of the clan. All those with the name White are part of
Clan Gregor if they wish to be so the genetic connection is, to a degree,
irrelevant.
Fig. 11: Diverse genetic origins of surname White |
A different result is seen with the McGhies (however
spelt). A large cluster of individuals, almost certainly all from the original
emigrant to the United States is found to the right of Figure 12 while there
are two other, unrelated and distant families, of McGee and Magee on the left.
This chart does not include Magees (however spelt) who are assigned to the
Irish related group (and would show as another tangential line on this Figure. The connection of the MackGehees, who have a
tradition of change of name from MacGregor on arrival in America, is not clear.
The reference to Iain Dubh (MacGregor) is a supposition.
Fig. 12: McGhie/MacGehee/McGee at 37 markers |
I have concentrated on STR data, especially in Figure 11
above because at the moment the best source of SNP analysis is the previously
mentioned Alex Williamson The Big Tree and to show it can be used to expand on
STR results there is a good example which elaborates two of the McGhie (however
spelt) results located on the right-hand group. The relevant Big Tree Block is:
https://www.ytree.net/DisplayTree.php?blockID=16.
The two SNP results which relate to the McGhie results are for Thomas_Ma1 and
William_M1 in Figure 12 which lie within the large group. Both individuals have
the SNP R-BY172925 which is a descendant of SNP Z255, sometimes referred to as
the Irish Sea Haplotype from its general geographical distribution. The DNA
shows that this split occurred much earlier than that which led to the cots
Modal L1065+ having occurred in prehistoric time.
The
current problem in SNP testing with respect to families
SNP testing IS the way forward for DNA genealogy but
there is a particular issue associated with confidently assigning individual to
family groups and especially with the MacGregor ‘houses’ or discrete families
(such as Roro etc). The problem resides with SNPs that are ‘no shows’ – in other
words have not given a result in the testing. In a recent communication on this
Professor Neil McGregor elaborated on the essential problem this brings. When
you consider that the Big Y DNA test looks at about 2,000,000 SNPs [out of
approximately 10,000,000 in humans] which is allied to an extensive STR
analysis then any missing key data [that is, ‘no reads’] is a problem. Rather
than summarise I give an edited version of Neil’s email to me:
“I
certainly can separate every individual within the group. Every person has at
least 2-4 unique SNPs, but many have common SNPs with a small sub cluster of
people, thereby allocating them to a subfamily.
I only use the SNPs that have high levels of accuracy to do this, but
this leaves me with a significant quandary. There are exactly 100 SNPS which
have higher levels of no reads which should be capable of subdividing the
branches more accurately. Of these ~20 are critical in dividing the clan
participants into the final family clusters and sub clusters. I will be contacting
familytreedna to request they address this issue in their data analysis.
The
second major problem is that if you wish to compare the BIGY data you have to
redownload every individual each time [you do it] as familytreedna keep adding
SNPs and they also allocate some of the unique SNPs a new name (e.g BY1123252)
and they drop the old position number so that prevents one from easily
comparing the changes.
The third major problem is working out who has a back
mutation of a SNP which tends to make you place them in one group until one
sees the whole dataset. They may have one match with two separate individuals
but have multiple matches with others who do not match the common one. I
personally have one of those in my data. I do not believe I will be able to
separate them until the ‘no read’ SNP data can be clarified better.”
When Neil checked his own genetic data using STR and SNP
data this is what he found: “My nearest match in the SNP data was allocated as a
distant cousin in the STR data 450 years using 111 STR data and 630 years to
common ancestor using 37 [marker] data”.
In summary, any
allocations to individual families within a surname subgroup – especially the
MacGregors – needs to be provisional until the ‘no read’ data issue is resolved
since these currently unavailable SNPs could well change the allocation of an
individual from one family subgroup to another. The update for the MacGregor SNP family groupings is now added here {Fig. 13] from Neil's analysis but is provisional
pending clarifications as outlined above. It is quite likely that we will
eventually suggest specific SNPs that individuals should have tested but we are
not yet at that point.
The
scope of Family Finder (or Ancestry DNA or similar test)
I have been asked several times “why do you not do a
surname study grouping for Family Finder?”. The simple answer it simply isn’t
feasible. Consider this scenario: in six generations every individual could
have 64 different ancestors with 64 different surnames. Take 100 individuals:
then you might have to include 64,000 different surnames. Take 1000 individuals
and you could have up to 640,000 different surnames. What you then need is a database that
compares results with all the other results in a project and that is exactly
what the various companies’ databases do in order to find individuals whose
genetic signature is similar in some places to yours. The more similar the DNA
the more likely it is that the second person is recently related. Then, each
company estimates what the actual relationship might be. The first problem is – very few people can
confidently say who every one of their 64 ancestors was actually called as a
surname (especially the earlier females). The second problem is - to make a
comparison you need to have uploaded your family tree so surnames can be
compared with any other person. It really baffles me that someone would do the
Family Finder test and not put in their ancestral names at least. So, you might
have a good match, but your match has inserted no family detail at all – how
can you compare and estimate the connection? The third problem is – you write
to the email address and get no reply: did the person see the mail and choose
not to answer; did the mail go into their Junk or Spam folder and they never
saw it; is the email address even current and has never been updated?; did the
person do the test but has never done any genealogy – it just seemed like a good idea?
What I am
saying is that Family Finder requires active genealogical and communicative
information from all who do that test. The material that others need in order
to compare with you can only be there if you input it. You don’t need a Project
Admin to do this for you – in fact, only you know your genealogy, a Project
Admin almost always has no specific information on your family tree. Finally,
even if you find genealogy information inputted always remember that it may be
full of assumptions and can have inaccurate family links, which means you have
to check the accuracy for yourself. Finally, remember that the predictions that
are offered by the testing company on possible relations are exactly that,
predictions, not facts.
Mitochondrial
DNA
If you have taken an MtDNA test with Family Tree DNA and
are in the MacGregor Project, you will find from the Project menu that you can
look at the female haplogroup groupings by selecting to view the mitochondrial
results. You will notice that I have assigned everyone to their appropriate
alphabetical group. However mitochondrial groups have subdivisions, so,
haplogroup J, for example, has many subdivisions (not as many, by far, as
haplogroup H). If your subgroup is, say, J1a1b1a then you will most closely
match someone with the same configuration. You should then look at the column
with the DNA results which will show if you have an exact match or if there are
some different mutations. Mutations suggest an ancestor further back (to give
time for mutations to have taken place) and it may be hard to find the exact
common ancestor since very few genealogies have all the female surnames back
much further than the mid eighteenth century (if you are lucky!). You may be
able to tie back to a geographical location rather than to an individual. Remember that MtDNA is passed from a mother
to her children but only daughters can pass it on. In that sense it works like
a bit like the Y chromosome, except that surnames change with every generation
and it is thus comparatively rare to be able trace back to an ancestor in the
past (though it was done in the case of Richard III – whose bones were dug up
in a Leicester car park and which were identified by a mitochondrial descent to
the present day).
Finally
As always you
can ask me to generate limited charts with your results along with others which
look potentially related. If you could keep the number of individual results to
about a dozen per query that would be helpful. Also, don’t forget that this is
for comparison of STR data not SNP
and better results are obtained when comparing larger numbers of markers.
Again,
I would repeat Neil’s encouragement to do BigY if you can – please ask me for
further information if needed [richardmcgregor1ATyahoo.co.uk substituting @ for
AT]
Charts were constructed using Dee McGee’s Utility at
http://www.mymcgee.com/tools/yutility.html?mode=ftdna_mode,
using a 75% level of confidence, on Family Tree DNA’s mutation rate, an average
of 30 years per generation and with no modal results assigned (except where
indicated). The graphic representations of phylogenetic trees are made by
Splitstree:
D. H.
Huson and D. Bryant, Application of Phylogenetic Networks in Evolutionary
Studies,
Mol. Biol. Evol., 23(2):254-267, 2006.
Surname participants in the MacGregor project who have
SNP L1065+ – not everyone with the surname will have L1065+: MacGregor;
Anderson; Bain; Lawrie; McLachlan; Tuttle; McFarlane/McFarlane; Buchanan;
Miller; Laird; McDaniel; Adams; Duncan; Simpson; Stewart; Watkins; Davis;
Gregory; King; Moore; Cain; Grier; Jamieson; Murchison; Cameron; Griffin;
Napier; Doran; Bissett; Gillis; Ferguson; Allen; McViccar; McAfee; BradyPierce;
Kincaid; (Mc)Whannell; Hanby; Brown; More-Gordon; Henderson; Laird; Eunson;
Colborne; Peden; Robertson; Scott; Looper.
--> --> --> --> --> --> --> --> -->