08 January 2019

MacGregor DNA Project Blog 2019


(Note to enlarge images simply click on them, and to dismiss and return to text click the X in the top right-hand corner - back arrow works on a phone)

Welcome to my MacGregor Project DNA blog for 2019. This year I am going to focus in part on some of the problems presented by DNA testing for genealogy and, in some cases, suggest not exactly what the solutions might be, but rather possible way(s) forward.  I will be dealing with, for the most part, homogeneous groups from the MacGregor Project: all of these groups contain some MacGregor surname participants, participants with related or sept-based surnames, as well as those whose surnames have no currently known MacGregor connection, other than perhaps by tradition. It is important to understand the difference between a related surname and a sept of a clan and I have explored this issue in last year’s blog, so I won’t repeat it here. As for the processes of DNA testing for genealogy the current Wikipedia entry is very helpful, and it can be accessed at:
     I hope it is not too simplistic to say that one of the key ways in which DNA testing for genealogy will improve in the future is by more people doing higher level tests. This is certainly true for the male Y chromosome (only males have the Y) since the potential of the Big Y test is still relatively unexplored, and to have more participants at this level of testing is essential for the future. Having said that I must here give great credit to all those who are working to develop analyses and interpretations including, for the MacGregor project, Professor Neil McGregor in Australia. If you have already tested Big Y then you should send your results to Neil (neilmcgregorATtpg.com.au, substituting @ for AT) if you are from a MacGregor (any spelling line) or where the DNA evidence suggests that your surname was once MacGregor but changed as a result of Proscription (banning of the name from 1603-60 and 1692-1774). In addition, you should send your results to Alex Williamson for inclusion in The Big Tree (see https://www.ytree.net/). This tree is only for branches under the SNP R-P312 and therefore not applicable to other groups such as the I groups which are connected with the Scandinavian countries (and by implication, Vikings). However, Alex’s tree does deal with results which account for at least 70% of British and Irish DNA, if not more. You can join the discussion of this at http://mcewanjc.org/scotsr1b.htm. You will also find helpful pictorial analyses of population densities at https://www.eupedia.com/genetics/britain_ireland_dna.shtml.
      Those of you most interested in the information on MacGregor SNPs should skip to the end of this article where I discuss what progress has been made (and will be updated later in the year). I want to begin by dealing with two specific groups. In the analyses that follow you will need to have your kit number to hand as I am not labelling the charts with surnames (with two exceptions) but will mention surnames in the text of my analysis. You can easily refer to the surnames for each kit number by looking at the relevant subgroup in the project: www.familytreedna.com/public/macgregor.  The first two groups are what I have termed Viking 1 and Viking 2: these are separate genetic lines generally associated with the area including Denmark (Viking 1) and Norway (Viking 2). I should point out that I have only done analyses at the level of 37, 67, or 111 markers and have had to exclude 12 and 25 markers as these are too general and not specific enough to generate discrete results.

The Viking Y chromosome groups

 In Viking 1 group at the level of 37 marker testing we see a common point of origin (typified by the star shape) in an individual who probably bore the SNP I-M253 (Figure 1):

Fig. 1: Viking 1 group at 37 markers Y chromosome
From that individual there have come many branches, some of which split to show connection to a common ancestor more recent than the one that bore I-M253. In general, the closer together and further up a line the splits come suggests more recent shared ancestry – but it does not mean that everyone has the same surname.  Compare this with the 67-marker chart Figure 2), which has less lines since those who only did 37 marker testing have dropped off. 67 markers make links much more obvious especially for splits at the ends of lines.
Fig. 2: Viking 1 group at 67 markers Y chromosome
In the Figure the three results which include kit 101344, 2 are surnamed Skinner and therefore closely related, but one is Young – it looks like one or other surname was adopted quite late.  At this point I should mention that the programmes I use (Dean McGee’s DNA Comparison Utility and Splitstree [details at the end]) group individuals by common characteristics that are not always evident when looking at the number sequences on the Project subgroup grid. In other words, if there is a mutation or two earlier in the number sequence (particularly in the first 25) the project grid at FtDNA will change the position of the results on the grid relative to others who might be more closely related, whereas, the Splitstree charts tend to ignore smaller variations and early changes in number of mutations. As a second example, the group above which includes kit 377721 are all related having the surname White (however spelt). The genealogies for these three kits connect Scotland, NE England and Kent. N73491 has the surname Kellogg associated with Essex (across the river Thames from Kent). If this is accurate then one interpretation is that there were two Viking brothers who went separate ways, one to north Britain one south (or both south then one went north!) – but that really is speculation.
       Remembering that DNA testing is entirely random as to who actually decides to test it is interesting to note that the Viking 2 group (Norway) is less represented than Viking 1. This could be explained by geography, in that the Danes primarily colonised the east side of Britain while the Norse went to the north and west (the Scottish Islands, north Wales, East Ireland) where populations were relatively smaller.When we look at the 37-marker chart for the Norse group (Figure 3) we see that there are some very clear groupings related to families. At the 67-marker level (Figure 4) some of the results have dropped off since participants only did 37 markers. The radiating lines are longer (greater time depth) and what we see may represent two individuals rather than one as the origin in prehistoric times.


Fig. 3: Viking 2 (Norse) group 37 markers Y chromosome
Fig. 4: Viking 2 (Norse) group 67 markers Y chromosome
Interestingly there are the same number of more recent family groupings for less participants than the Viking 1 group. There is a perhaps a suggestion here of geographical shared origin. The group of 4 results which include kit 22659 are consistently named King whereas the group which contains kit 45658 has five different surnames apparently more closely related genetically: McGregor, McLean, McClister, Mills and White.
     There is one question which participants with Scottish ancestry who find they have a Viking Y chromosome ask and that is ‘am I a Viking?’  The answer to this would be yes ONLY if the most part of your DNA could be tracked back to the Scandinavian countries. As it is most Scots whose ancestors stayed in Scotland into the 20th century would find that their male ancestors were largely from haplogroup R1b (related to Scots modal) and not I (Viking). It was only that perhaps a thousand years ago that a Viking Y chromosome was ‘inserted’ into the predominantly Celtic clan environment but then the families who descended from him would continue living and reproducing in a Celtic/clan environment. Other parts of the DNA – the autosomal which I discuss further under the Family Finder section – show comparatively few genes which have ‘Viking’ connections. Yes, there are some, but most indigenous Scots’ DNA is ‘Celtic’. In a subsequent blog I will discuss the thorny questions of ‘Who were the Scots? ‘versus ‘Who were the Picts?’ and can we tell the difference as certain authors suggest we can (and they say that some clans are Pictish)? In the end it is a similar question to ‘Am I a Viking?’ but probably needed DNA testing aa thousand years ago1
   
The Irish-Related group (note spelling both Grier and Greer to show there is no distinction)

If we turn now to the group which I have called Irish Related we find another star shape at 37 markers (Figure 5) which is preserved at 67 markers (Figure 6) suggesting descent from one individual (who carried SNP R-M222), although this is less obvious when looking at 111 markers (Figure 7). The group which contains kit 862 is almost exclusively surnamed Greer, however spelt, and this group should also contain kits 333215 and N225557 (I will move these from where they are in the Scottish Irish Grier group in due course).

Fig. 5: Irish Related group 37 markers Y chromosome

Fig. 6: Irish Related group 67 markers Y chromosome

Fig 7. Irish related group 111 markers Y chromosome
I have deliberately included all three Irish Related charts to show that while we can see relationships at the 37-marker level and to a lesser extent 67 (without the 37 only participants), when we reach the number that have tested at 111 markers the relationships detail and closeness disappears – only the group with kit 862 still has 3 related individuals – all the other close matches have disappeared.   We see that lines have lengthened, and this gives us a better understanding of how far back in time the family split happened. If we look again at the group containing 862, it appears like this may represent two brothers or even three brothers diverging at a point: this would need to be checked with genealogies if they exist. I believe this split may have happened in the United States. When we look at the 111 marker group and include a chart which estimates the number of years to the Common Ancestor (using 75% probability on FtDNA’s stated mutation rate in Figure 8) we see that many of these individuals can trace a relationship back in time to after the beginning of the Second Millennium (1000AD or 1918BP – before present) which is interesting given the fact that this represents different surnames it is not a single surname grid.
Fig. 8: Irish Related group – 111 markers Time to Most Recent Common Ancestor
At this point I should mention that there three other distinct and unrelated groups of Grier/Greer/Grierson.  One group of these Greer groups I have termed Scottish Irish could be from Northern Ireland or South West Scotland. The second group has Viking genetic heritage (I have called the group Grier Viking). This latter group descend from 3 or at most 4 separate individuals whereas the Scottish/Irish Griers (Figure 9) descend from more individuals, possibly as many as seven (there could well be more in reality, but it is quite a small test group).
Fig. 9: Grier Scottish Irish group 37 markers Y chromosome
In a case like this, in order to sort out the different family groups we need participants to undertake SNP testing – preferably Big Y (since testing one SNP at a time can easily soon mount up in cost and is a much more hit and miss approach). We do have an example of a SNP which has started to define different family groups, often tied to surname – although as a SNP it dates to before the adoption of surnames in Scotland. SNP L1065 has been called the Scots Modal.

L1065 SNP – the Scots Modal

If we apply the Splitstree chart making program to those who have tested SNPs and who show that they all descend from the individual who had the L1065 SNP, we see again a star shape suggesting a common origin [YFull dates this SNP origin point to approximately 1750 years ago] but there are many separate lines emanating from it with few shared numerical sequences.  This chart (Figure does NOT include any of the MacGregor related participant surnames that have tested L1065, nor any of the other smaller surname groups within the project - it only draws out those identified in the L1065+, S691+ and S695+ MacGregor Project groups.  At the end of this Blog you will find all the surnames in the MacGregor Project who have the L1065+ SNP. My point here is to show just how diverse the descent was from L1065 using just a few surnames, Alex Williamson’s Big Tree shows just how many individuals with that SNP there are, (or rather those who have actually tested – there must be many many more), and how many surnames it led to (including the main MacGregor line). However, the key factor is that it is an SNP which originated in Scotland and most of those who have tested FtDNA’s Big Y or YFull’s DNA testing and now appear on Alex’s tree show a genealogical origin in Scotland. See: https://www.ytree.net/DisplayTree.php?blockID=160.


Fig. 10: L1065 positive results in the MacGregor project (only groups 1065+, S691+ and S695+)
Two short specific cases: White; McGhie

What we see again and again in DNA genealogy is that there are multiple origins for surnames especially those which are descriptive. If we examine the chart (Figure 11) generated by those participants who have the surname White we see multiple origins for the name. The significance of White as related to Clan Gregor is that it was a name associated with the clan (sometimes as an alias) and indeed many other clans. It is the English version of Ban or Bain and indeed there is one participant with that name who is descended from Gregor the name father of the clan. All those with the name White are part of Clan Gregor if they wish to be so the genetic connection is, to a degree, irrelevant.
Fig. 11: Diverse genetic origins of surname White
A different result is seen with the McGhies (however spelt). A large cluster of individuals, almost certainly all from the original emigrant to the United States is found to the right of Figure 12 while there are two other, unrelated and distant families, of McGee and Magee on the left. This chart does not include Magees (however spelt) who are assigned to the Irish related group (and would show as another tangential line on this Figure.  The connection of the MackGehees, who have a tradition of change of name from MacGregor on arrival in America, is not clear. The reference to Iain Dubh (MacGregor) is a supposition.
Fig. 12: McGhie/MacGehee/McGee at 37 markers
I have concentrated on STR data, especially in Figure 11 above because at the moment the best source of SNP analysis is the previously mentioned Alex Williamson The Big Tree and to show it can be used to expand on STR results there is a good example which elaborates two of the McGhie (however spelt) results located on the right-hand group. The relevant Big Tree Block is:  https://www.ytree.net/DisplayTree.php?blockID=16. The two SNP results which relate to the McGhie results are for Thomas_Ma1 and William_M1 in Figure 12 which lie within the large group. Both individuals have the SNP R-BY172925 which is a descendant of SNP Z255, sometimes referred to as the Irish Sea Haplotype from its general geographical distribution. The DNA shows that this split occurred much earlier than that which led to the cots Modal L1065+ having occurred in prehistoric time.

The current problem in SNP testing with respect to families

SNP testing IS the way forward for DNA genealogy but there is a particular issue associated with confidently assigning individual to family groups and especially with the MacGregor ‘houses’ or discrete families (such as Roro etc). The problem resides with SNPs that are ‘no shows’ – in other words have not given a result in the testing. In a recent communication on this Professor Neil McGregor elaborated on the essential problem this brings. When you consider that the Big Y DNA test looks at about 2,000,000 SNPs [out of approximately 10,000,000 in humans] which is allied to an extensive STR analysis then any missing key data [that is, ‘no reads’] is a problem. Rather than summarise I give an edited version of Neil’s email to me:

I certainly can separate every individual within the group. Every person has at least 2-4 unique SNPs, but many have common SNPs with a small sub cluster of people, thereby allocating them to a subfamily.  I only use the SNPs that have high levels of accuracy to do this, but this leaves me with a significant quandary. There are exactly 100 SNPS which have higher levels of no reads which should be capable of subdividing the branches more accurately. Of these ~20 are critical in dividing the clan participants into the final family clusters and sub clusters. I will be contacting familytreedna to request they address this issue in their data analysis.

The second major problem is that if you wish to compare the BIGY data you have to redownload every individual each time [you do it] as familytreedna keep adding SNPs and they also allocate some of the unique SNPs a new name (e.g BY1123252) and they drop the old position number so that prevents one from easily comparing the changes.

The third major problem is working out who has a back mutation of a SNP which tends to make you place them in one group until one sees the whole dataset. They may have one match with two separate individuals but have multiple matches with others who do not match the common one. I personally have one of those in my data. I do not believe I will be able to separate them until the ‘no read’ SNP data can be clarified better.”

When Neil checked his own genetic data using STR and SNP data this is what he found: “My nearest match in the SNP data was allocated as a distant cousin in the STR data 450 years using 111 STR data and 630 years to common ancestor using 37 [marker] data”.
    In summary, any allocations to individual families within a surname subgroup – especially the MacGregors – needs to be provisional until the ‘no read’ data issue is resolved since these currently unavailable SNPs could well change the allocation of an individual from one family subgroup to another.  The update for the MacGregor SNP family groupings is now added here {Fig. 13] from Neil's analysis but is provisional pending clarifications as outlined above. It is quite likely that we will eventually suggest specific SNPs that individuals should have tested but we are not yet at that point.

The scope of Family Finder (or Ancestry DNA or similar test)

I have been asked several times “why do you not do a surname study grouping for Family Finder?”. The simple answer it simply isn’t feasible. Consider this scenario: in six generations every individual could have 64 different ancestors with 64 different surnames. Take 100 individuals: then you might have to include 64,000 different surnames. Take 1000 individuals and you could have up to 640,000 different surnames.  What you then need is a database that compares results with all the other results in a project and that is exactly what the various companies’ databases do in order to find individuals whose genetic signature is similar in some places to yours. The more similar the DNA the more likely it is that the second person is recently related. Then, each company estimates what the actual relationship might be.  The first problem is – very few people can confidently say who every one of their 64 ancestors was actually called as a surname (especially the earlier females). The second problem is - to make a comparison you need to have uploaded your family tree so surnames can be compared with any other person. It really baffles me that someone would do the Family Finder test and not put in their ancestral names at least. So, you might have a good match, but your match has inserted no family detail at all – how can you compare and estimate the connection? The third problem is – you write to the email address and get no reply: did the person see the mail and choose not to answer; did the mail go into their Junk or Spam folder and they never saw it; is the email address even current and has never been updated?; did the person do the test but has never done any genealogy – it just seemed like a good idea? 
    What I am saying is that Family Finder requires active genealogical and communicative information from all who do that test. The material that others need in order to compare with you can only be there if you input it. You don’t need a Project Admin to do this for you – in fact, only you know your genealogy, a Project Admin almost always has no specific information on your family tree. Finally, even if you find genealogy information inputted always remember that it may be full of assumptions and can have inaccurate family links, which means you have to check the accuracy for yourself. Finally, remember that the predictions that are offered by the testing company on possible relations are exactly that, predictions, not facts.

Fig. 13 Provisional MacGregor SNP family allocation 2019
Mitochondrial DNA

If you have taken an MtDNA test with Family Tree DNA and are in the MacGregor Project, you will find from the Project menu that you can look at the female haplogroup groupings by selecting to view the mitochondrial results. You will notice that I have assigned everyone to their appropriate alphabetical group. However mitochondrial groups have subdivisions, so, haplogroup J, for example, has many subdivisions (not as many, by far, as haplogroup H). If your subgroup is, say, J1a1b1a then you will most closely match someone with the same configuration. You should then look at the column with the DNA results which will show if you have an exact match or if there are some different mutations. Mutations suggest an ancestor further back (to give time for mutations to have taken place) and it may be hard to find the exact common ancestor since very few genealogies have all the female surnames back much further than the mid eighteenth century (if you are lucky!). You may be able to tie back to a geographical location rather than to an individual.  Remember that MtDNA is passed from a mother to her children but only daughters can pass it on. In that sense it works like a bit like the Y chromosome, except that surnames change with every generation and it is thus comparatively rare to be able trace back to an ancestor in the past (though it was done in the case of Richard III – whose bones were dug up in a Leicester car park and which were identified by a mitochondrial descent to the present day).

Finally

As always you can ask me to generate limited charts with your results along with others which look potentially related. If you could keep the number of individual results to about a dozen per query that would be helpful. Also, don’t forget that this is for comparison of STR data not SNP and better results are obtained when comparing larger numbers of markers.

Again, I would repeat Neil’s encouragement to do BigY if you can – please ask me for further information if needed [richardmcgregor1ATyahoo.co.uk substituting @ for AT]

Charts were constructed using Dee McGee’s Utility at 
http://www.mymcgee.com/tools/yutility.html?mode=ftdna_mode, using a 75% level of confidence, on Family Tree DNA’s mutation rate, an average of 30 years per generation and with no modal results assigned (except where indicated). The graphic representations of phylogenetic trees are made by Splitstree:
D. H. Huson and D. Bryant, Application of Phylogenetic Networks in Evolutionary Studies, Mol. Biol. Evol., 23(2):254-267, 2006.

Surname participants in the MacGregor project who have SNP L1065+ – not everyone with the surname will have L1065+: MacGregor; Anderson; Bain; Lawrie; McLachlan; Tuttle; McFarlane/McFarlane; Buchanan; Miller; Laird; McDaniel; Adams; Duncan; Simpson; Stewart; Watkins; Davis; Gregory; King; Moore; Cain; Grier; Jamieson; Murchison; Cameron; Griffin; Napier; Doran; Bissett; Gillis; Ferguson; Allen; McViccar; McAfee; BradyPierce; Kincaid; (Mc)Whannell; Hanby; Brown; More-Gordon; Henderson; Laird; Eunson; Colborne; Peden; Robertson; Scott; Looper.






















--> --> --> --> --> --> --> --> -->