10 January 2018

MacGregor DNA Project  blog update 2018

This year’s blog will be slightly different from those done previously in that I want to spend a portion of it explaining how genetic genealogy helps us understand intra-clan relationships but in order to continue I need to say exactly what a Scottish clan is, as there are some particular misconceptions concerning the exact nature of clan in the 21st century.  At the end, I go on to discuss how SNPs are helping to build up a better picture of family groupings within the main line MacGregor family tree after the mid 14th century.

What is a clan in Scottish usage?
Six hundred years ago this question was quite simple to answer. You were associated with a clan if you had been born with the name – in the MacGregors’ case that might be expressed as Gregor, Grigor, MacGregor, McGrigor, McGregor and a whole range of alternative spellings such as, for example, McGreagor (an attempt to render the Gaelic phonetically into English?). At that time too spelling had not been standardised - so one might find Mckgregor, M’gregor and so on. You were also a member of the clan if your name was an accepted variant, such as Grierson, or Grier, Greig/Grieg/Grig etc. These were considered to be shortened or anglised versions of the main clan name. So, Grier-son equals Gregor-son and Grier is the same name without the ‘son’ on the end.
Whether or not these accepted names were genetically related to the main line was not the point, as a clan was a collection of related surnames. Members of the clan recognised as Chief the head of the main line (the Chief of the MacGregors for example), and often, especially in the early days relied on him for protection, or rather, on his ability to pull a ‘federation’ of individuals together to ensure, usually armed, protection, or, as a means of seeking retribution on another group for some offence.
There were others associated with individual clans. People who belonged to individual septs. Sometimes the same name would appear in two or more lists of accepted septs of different clans – such is the case with surname King, among others. Again, some descriptive words used as surnames were understood to have been borne by people associated with the clan, and these surnames are found in many clan lists: Bain (or Ban) or its anglicised equivalent White; Roy meaning Red; Dhu or Dow meaning dark or black, are some examples.
Finally, there were people who answered none of these ‘qualification’ but who lived on the land which was under the Chief’s influence as ‘part-takers’. Grant and Menzies rental documents of the 18th century reveal instances where individuals adopted the name of the local chief where formerly they were called only by their patronymics. A patronymic shows the genealogy of an individual back two generations – so my patronymic would be Richard McEwan VicPeter: there McEwan is not a surname but shows that my father was Ewan (not THAT Ewan). On a rental document you might find John McGregor VicPatrick which means that John’s father was Gregor and his grandfather Patrick: if John were a very poor inhabitant – a cottar – it might be that he had lost the knowledge that he was a genetic MacGregor and so ended up taking the surname Grant. It’s unusual to find this situation among MacGregors because of their turbulent history but it happens in other clans. Paradoxically because the MacGregor name was proscribed [forbidden] for so long [1603-60 and 1693 to 1774] it seems many families held on to the knowledge that they were MacGregors despite having been forced to take other surnames. Some families never changed back to MacGregor when it was finally possible to do so – which is why in the DNA project we see individuals called Drummond, Stirling, Campbell etc who are genetically MacGregors - their ancestors never readopted the name when it was safe to do so.

What is clan in the 21st century?
To some the concept of clan in the present century is an irrelevance. We no longer need the Chief to protect us and we no longer live in defined communities of inter-related individuals offering support to each other in the same way. In the 21st century a clan is made up of people who value the bonds of kinship often promoted by Clan Societies, and they value the, sometimes surprising, connections that clan association brings. At its gathering in 2014 the Clan Gregor Society of Scotland had 11 nationalities represented, some of whom like the Philippine and South African contingents actually shared a recent common ancestor. Members of a clan still recognise the Chief as head of the clan, but, as I have explained, it is likely that relatively few share a common ancestor with him (or her). Most Clan Societies recognise that finding paper evidence of relationships becomes increasingly difficult before 1750 and therefore accept both female line connection (e.g. ‘great grandmother was a MacGregor’, or a King, or a Bain etc), or, that there is a tradition within a family of MacGregor clan connection.

What does genetic genealogy tell us?

1)    That people called by a certain surname are not necessarily descended from a common ancestor, although as we have found in the DNA project approximately 50% within a surname subgroup do share a common ancestor. (I will explain my subgrouping in a moment)
2)    That there are different groups of individuals called, say MacGregor, who share a common ancestor in the fairly recent past but the connection of one group of these MacGregors to each other is likely to lie thousands of years in the past and pre-surnames
3)    That there are multiple origins for surname groups
4)    That clan sept names have varied genetic origins and not just one single origin within the period of the existence of surnames (surnames become more common after the mid 14th century, particularly in England following the Black Death as there was more movement of labour to replace those lost to the plague)

The MacGregor DNA project at this date has 1427 members. In order to make the results easier to navigate I have divided them up into subgroups. The advantage to this is that within a subgroup it is easier to see individuals who are potentially related, in the more recent past, to each other, because they appear in the surname grid near to each other. It also shows up where there are multiple origins for surnames, and is particularly useful where some members of the subgroup are able to indicate an earliest known ancestor. In a good number of cases that ancestor will be shared with other members of the subgroup and some of the group may not have the genealogical information that others have and so find their link to the past. It can also suggest geographical location for the ancestor. The disadvantage is that members of different subgroups in the results grid cannot quickly compare their results with members of other subgroups, but in a Y chromosome study that is less of a problem. In any case everyone has access to a ‘Matches’ tab on their DNA results page and also everyone can use Ysearch and Mitosearch.org.
To further illustrate what can be done I will demonstrate with one subgroup. Regular readers of this blog know that I have offered to make any comparisons between individuals that are wanted. It would be helpful if this could normally be limited to 10-12 results for comparison. Also, please remember that I have to be able to compare like for like – so I cannot compare a 37 result with a 67 – I can only use the first 37 of the latter for the comparison.

Example 1: Greig/Grieg/Gregg etc surname 37 marker result grid

The first example shows all the 37 marker results for this subgroup. No colours are present in this grid to show mutations because we are not trying to compare individuals to one ‘master’ result or even to an ‘average’ or modal result. It can be hard without colours to see how one individual is related to another but in general if results are closer together on the grid they would tend to be more closely related genetically (but just being close on the grid does not necessarily pick up results that are more related than others). That is why we use a graphics program to make comparing results more visual in what is known as a phylogenetic tree. Also the program used picks up similarities across results that might not be immediately obvious to the eye.

Example 2: Greig/Grieg/Gregg etc surname 37 marker phylogenetic tree

The second example therefore shows the chart in graphic form, generated by Splitstree (acknowledgement is given at the bottom of this blog). I have labelled them as the chart is labelled so perhaps the easiest way to make comparisons is to print off both examples and cross-refer between them. In this graphic representation some relationships become more obvious but there also are some surprises. The first, and most important, point to make is that we are seeing at least 11 distinct family groups who each shared a common ancestor many thousands of years ago long before surnames became common. So we see at least 11 different genetic origins for people called Greig, Grigg, Gragg and Gregor. In general, if results are closely clustered together on the graphic then they probably share a recent common ancestor (and by recent I mean since the acquisition of surnames).

1)    There are 7 individuals (kit 20673 is one) who share a common ancestor who could be the William Gregg born in 1616 or his immediate forebears - an early emigrant to the New World. This family have no spelling variations – always Gregg – so may have been literate from the earliest emigrant
2)    There is a group of 3 (kit 214992 is one), who are related and may have a connection to Tipperary in Ireland: these families are Gregg or Gragg
3)    There is another group of 3 (kit 239449 is one) - these connect to a common ancestor but there is no indication of who this might be in the genealogies submitted – these used spellings Grieg, Greig and Gregg
4)    Another group of 4 (kit 158127 is one) who seem to connect to Antrim in Ireland (using Gragg and Gregg)
5)    There is a group of 3 (45360 is one) who connect to Edinburgh and Pathhead (a nearby village)(surnames are Greig and Gregg)
6)    There is a group of individuals who are genetically related and with geographical links to North East Scotland with very different versions of the name: Griggs, Greig and Grigor (9690 is one of these)
7)    A group of 2 (kit 6979 is one) using Gregg (no locations available)

All other individuals appear to belong to separate unrelated families although the distant possible connection between Gregor(y) 476609 and Charles Greig 585177 would be worth further investigation. Robert Gregor 239031 belongs to a completely different genetic haplogroup.

Example 3: Greig/Grieg/Gregg etc surname 67 marker grid

This example [Examples 3 and 4] shows the smaller group of individuals who have tested 67 markers (they are all in the 37 group discussed above). What we are interested in is whether the greater number of markers gives any further information on genetic connections. Since 50% of the sample shown in Example 1 and 2 is now not present, the graphic representation is much sparser.

Example 4: Greig/Grieg/Gregg etc surname 67 marker phylogenetic tree

The problem that is immediately apparent [in Example 4] is that without a larger number of individuals testing to 67 (or more markers) family groups do not break down significantly further.  In Example 4 the only possible confirmation seen is that Kits 7489 Gregg, 214992 Gragg and 9690 Greig may share a common ancestor in the relatively distant past but it is possible that all three share a geographical origin –  North East Scotland as suggested in point 6 above.
                                                            ***
By way of comparison I have used the same processes on the Gregory group who also have diverse origins, but what is particularly interesting with this group - given that membership of a DNA surname testing project is essentially based on random participation - is to see just how many individuals descend from the same ancestor: almost certainly, given that there are forebears in common (for example Gideon Gregory – kits 58711 and 179683) then this group probably had a common emigrant ancestor in the United States.

Example 5: – Gregory surname 37 marker phylogenetic tree
 Apart from that group of related individuals there are only 4 other, much smaller groups, whose individual ancestry lies close to each other – their ‘earliest known ancestor’ as given by each participant,  suggesting a range of possible genetic origins (see the grid on Example 6)

Example 6: Gregory surname 37 marker grid

Dean McGee’s DNA Utility allows an estimate of time to most recent common ancestor.

Example 7: Gregory surname 37 marker Time to Most Recent Common Ancestor grid (partial)

This is only an estimate and the number of years suggested always depends on the confidence level chosen for the program – choosing 100% confidence would give a different result from choosing 10% confidence. In this example from the Gregory charts we see an estimate of the possible time to the shared ancestor for the each individuals in the group with each other person. In order to see a good number of results I have had to remove the labels from the top grid but they can simply be put in by hand - going from left to right on the grid top line in the same order as reading top to bottom. Notice, for example, that comparing the first two individuals ‘Peter R. Gregory’ 275887 and Gregory 37140 suggests that they share a common ancestor 5220 years ago.

All of these analyses benefit greatly – and benefit other genealogists – if testers indicate the name of their earliest known male ancestor with the surname – no matter how recent that might be.

If we now look at the phylogenetic tree created when we use only those kits that have tested to 67 marker level the only real difference is that some of the genetic distances seems to be clearer.  

Example 8 Gregory DNA 67 marker grid
However, this particular program only separates by mutation – so if we look at the Gideon Gregory results again it looks like they come from different lines of the same family, rather than from the common ancestor Gideon. This is a limitation not of the program but of an ability to input into the program that two results come from the same ancestor. We have to remember that programs such as this were not designed primarily for family history but for comparing genetic markers in species and not just the human species. After all, it would be next to impossible to say whether two turtles shared the same great great grandfather …

The development of SNP analysis

For several years now there has been an increasing focus on the testing and analysis of SNPs (single nucleotide polymorphisms). The difference between these and the more commonly tested STRs was given in my 2014 blog (opening paragraphs). To put it simply, SNPs are markers in time: as far as is known if these mutate they stay mutated in subsequent generations. What that means is that once enough SNPs are identified an element of dating can be applied to when the mutation happened. For family historians this fact is becoming hugely important. Dating SNPs to the time before surnames is of limited use to family historians but to have dates, even approximate, from the time after the adoption of surnames means that family surname groups can be split down into smaller and more recent family subgroups.  On the Greig and Gregor grids I show the SNP information which is assigned to each individual in the leftmost column. In most cases this is simply M269, a SNP that happened thousands of years ago. Some individuals have had some SNP testing done but very few people have had the ‘Big Y’ test done which takes results forward in time towards the present day, and identifies SNPs which may have happened between 500 and 800 years ago.

Greig SNPs
In the Greig grid confirmed SNPs are in green. M269 is too early in date to be considered so the only other SNPs to be taken into account are kit 476609 R-L066; 363402 R-FGC10125; 259416 R-U152; 585177 R-Z253; 195430 R-U106; 295321 R-FGC5494; 404866 R-FGC37100; 110496 R-L21; B196295 R-ZP77; and 239031 T-M70

Of these U152, R-Z253, R-U106, R-L21 are well known early SNPs which happened before surnames, sometimes by thousands of years and most have further testing options available to bring the results further forward in time. FGC in the results indicates that the SNP was identified by the Full Genome Corp (as indeed the other letters identify the source lab or individual who identified the SNP in question in the first place). Of the other SNPs:

L1066 is more recent but still before surnames.

is the next SNP in the sequence for some individuals after L1065 [not the same as L1066] which is said to identify the Scots modal group
[see http://www.ytree.net/DisplayTree.php?blockID=160] Since L1065 roughly dates to 1750 years before the present, FGC10125 may have happened before surnames.

FGC5494 is European in origin but again is somewhat earlier than surnames, and has SNPs which descend from it towards the present time

FGC37100 is a descendant, or technically, ‘downstream’ of L151 – that SNP is again an much earlier one and found in England as well as other places.

ZP77 is the same as FGC6562 and is found in concentrations in Ireland and to a lesser degree in Scotland: it also has numerous downstream markers

Finally T-M70 is an very early SNP with a distribution over southern Europe, the Middle East and East Africa. It is comparatively rare among tested individuals [see www.yfull.com/branch-info/T-M70/ which dates it to 16,000 years before the present].

Similar discussion on identified SNPs could be done for all the subgroups in the MacGregor DNA project. The Gregory group, for example, has the following identified SNPs (see example 6):
R-S16906; DF21; R-CTS7678; L48; Z343; R-P312; R-L1336; R-BY15955;R-S691

MacGregor DNA – current SNP analysis

The work which Neil McGregor in Australia has been doing in analysing MacGregor SNPs is not concerned with the earlier SNPs. We already knew from Jim Wilson’s work that most MacGregors in the main line group carried the SNPs S690+ and S697+, both probably dating from after 1200AD (though no absolute dating is yet available). In his analyses Neil has begun to break down the test results of MacGregor participants into individual family groups – which the clan has known about for generations and which are referred to in older documents as the ‘houses’ or ‘sleik’ [of] Clan Gregor, the main ‘houses’ being Glenstrae, Roro, Gregor McIan (or Brackley) Dugall Keir and more recently ‘of Glencarnock’ the Chief’s line (Glencarnock is the area they held from the mid 18th century).

Neil’s current identification of family groups is given in Example 9:

Example 9: current predictions of SNPs associated with MacGregor family groups.

Neil’s email to me allowing this to be included in the blog suggests that MacGregors from the main line should seriously now consider doing the BigY DNA test (rather than FGC – Full Genome Corps – with whom we have also undertaken testing). He says:

“The best recommendation is that people get BigY as everybody seems to have between 3 and 8 separate SNPs which will allow them to be separated from everybody [else], other than from their own immediate family or first cousins. Some of them [those who have tested under BigY] appear to have a cluster of SNPs which appear to have mutated together and may represent one mutation. A mutation seems to be as low as once per generation through to once every 4-5 generations – seems related to the number of STR [the marker scores than participants start with] mutations as well.

The clan seems to be divided into two major clusters and this would appear to be early on. The section I am in has at least 3-5 sub-branches as does the other major group. The dividing SNP appears to be BY28714”.

Just to repeat that I can do comparisons of STR results for individuals – comparing with up to 10 to 12 others. I would repeat Neil’s encouragement to do BigY if you can – please ask me for further information if needed [richardmcgregor1ATyahoo.co.uk substituting @ for AT]

I have just had a comment from EMC which is worth repeating here in case folks miss it:
It is important to note that the results Neil has recently shown are also due to FTDNA reprocessing BigY kits under a new genome reference called HG38. Prior, under HG19, many of the SNP's used now were heretofore unknown. This SNP refinement is important.

Charts were constructed using Dee McGee’s Utility at http://www.mymcgee.com/tools/yutility.html?mode=ftdna_mode, using a 75% level of confidence, on Doug MacDonald’s mutation rate, an average of 30 years per generation and with no modal results assigned. The graphic representations of phylogenetic trees are made by Splitstree:

-->
D. H. Huson and D. Bryant, Application of Phylogenetic Networks in Evolutionary Studies, Mol. Biol. Evol., 23(2):254-267, 2006

01 January 2017

MacGregor DNA Project January 2017 update

I used the 2015 update of this blog to explore the various tests available. Since then there has been a noticeable increase in the number of individuals taking the test known as ‘Family Finder’, or something similar, rather than Y chromosome or mtDNA tests. This has probably been as a result of quite aggressive marketing by Ancestry.com [it has a variety of website endings depending on where it is based] in particular. This has promoted the equal use of DNA testing for both males and females and tied it into the submission of family trees which individual testers can use to identify the same family name(s) with others who have tested and submitted their genealogies. What has perhaps been rather glossed over in this is the fact, firstly, that DNA gets ‘lost’ over time – if it didn’t we would have the DNA of billions of ancestors in our bodies, and, secondly, that it is only a tiny portion of our DNA which is currently being examined for genealogical purposes. In relation to the latter, if you took all the DNA out of your approximately 100 trillion cells and stretched it out in a long line, with DNA in one cell being about 2 metres long, it would reach to the moon and back about 8000 times. By comparison, if you did a similar process to your veins, arteries and capillaries they would measure about 100,000 kilometres (or 62,000+ miles) or roughly twice the height, or thereabouts, that aircraft fly above the earth.
     You do not inherit 25% of your autosomal ancestry from each of your 4 grandparents. This is because your autosomal DNA is randomly recombined, and not in equal proportions from each parent, and so the more you go back in time the percentage inherited from people in a particular generation becomes smaller and smaller and therefore the more distant the ancestor is the more difficult it becomes to identify what you received from that person. What then are the chances of that same bit of DNA being preserved from a specific ancestor in yourself and someone else? For example, if your name is, say, Smith, and your male MacGregor ancestor lived 10 generations ago on your mother’s side it is really not feasible with today’s technology to identify that DNA by looking at your DNA today. The tests which are offered by Ancestry, Family Tree DNA etc. only try to identify links to 5/6 generations back. The key thing to remember is that if you and someone else have, say, people called Brown in your trees it does not necessarily mean that you have a recent ancestor in common, or indeed that you have any ancestor called Brown in common at all. For these tests of connection to work properly, you, and the person you are comparing with, need to have as much genealogical information as possible on every ancestral line in your respective trees, going back 5 or 6 generations (as well as a significant shared portion of DNA). This information can be displayed on a fan chart such as this one, which can be viewed at:
Fig 1. fan chart example - Perry

You and the person you are comparing with both have to put all your family information into one of the computer genealogy programs (like Family Tree Maker, Reunion etc.), save it as a GEDCOM file, and upload it to whichever DNA company you have tested autosomal DNA with. The possible links between the two family trees are then highlighted in some way which allows a comparison of ancestry to be made in order to explore if there is a family match on some surname. The fact that there is a match on surname does not necessarily mean that it is the same family, only that there is a surname in common. Clearly the more unusual the surname, the more likely that the match will be with the same family. In my own case the extensive GEDCOM file that I have made found a name match in another equally detailed GEDCOM tree which linked to a common ancestor born towards the end of the 18th century in the Volga German colonies in Russia, although in fact this was just confirmation of a previously suspected connection found by traditional genealogical research (I’ll come back to this later). Clearly autosomal testing COULD be very valuable, despite these caveats, for females who cannot take advantage of the Y chromosome test which only men can do since it relates directly to father’s surname (unless of course a father, brother, or cousin who has the name of interest will do the test on the female’s behalf).
     There is another benefit to autosomal testing in that portions of a person’s DNA can be compared with typical results from other ethnic groups. So, for example, a person might find through this utility (sometimes called My Origins or similar) that they have, for example, Mediterranean, African, or Native American ancestry as part of their genetic makeup. This won’t be a link to a specific person (unless of course this genealogy is already known or at least suspected) but is often of interest when an individual wants to know the answer to the question ‘where did I come from?’. This is explored a bit more later in this blog. What we invariably find is that our most ancient ancestry is incredibly mixed and that our own personal ethnic makeup links back to many different regions and races. The Economist discussing the work of scientist and geneticist Luigi Luca Cavalii-Sforza remarked that his work "challenges the assumption that there are significant genetic differences between human races, and indeed, the idea that 'race' has any useful biological meaning at all".

SNPs
     I want to spend most the rest of this update considering the current and possible future significance of SNP testing. A SNP, or single nucleotide polymorphism, is a DNA ‘event’ which functions a bit like a marker in time. At the moment, I am here talking about SNPs which occur in the male line (and are often relatable to surnames). As far as is known, once an SNP occurs it does not re-mutate backwards. Because of this fact, if we can identify specific SNPs, as a first step, and if we can then date them, even approximately, we find ourselves on the way to constructing family trees, not just in prehistory but, potentially, for SNP events which took place within historic time. More than a decade ago Ken Nordtvedt and Dr John McEwan worked on ways of grouping STR results [that is, results from the standard male Y chromosome test] to show how some numerical patterns were constant within defined groups. This was limited at the time by the fact that only 37 markers were available in STR tests (whereas now one can test 67, 111 or even more). McEwan identified 49 groupings, and the one he called R1bSTR-47 came to be known as ‘Scots’.  This was the DNA profile based on 37 markers that was identified as ‘Scots’ (Fig. 2):



 Fig 2.  The ‘Scots Modal’ and ‘MacGregor Modal’ compared
The lower figure was the modal figure for the MacGregor group who claimed descent from Ian Cam (who is in the record of obituaries written down by Sir James MacGregor Dean of Lismore as having died in 1390). What was exciting at the time was the realisation that with only three mutations different, the MacGregors seemed to be a group who had mutated a little way, but not far, from the Scots group. Since then some commentators and researchers, particularly Alistair Moffat and Jim Wilson in their book The Scots: A Genetic Journey have suggested that the MacGregors were actually Picts. Without wanting to go into the arguments for or against this interpretation in any depth I did want to ask: if these were the Picts then who are the Scots [this is an inversion of the way the question used to be asked]? If you look at the various DNA discussion boards you will see that many people disagree with their interpretation. It is nonetheless important to repeat it here because the definition of Pict was based, by Jim Wilson, on his interpretation of the SNP S530 which he found and named when he was attached to ScotlandsDNA [the company now has various versions of the same name such as IrelandsDNA, BritainsDNA and so on]. 
     Following on from this, S530 was found to be equivalent to L1335 [the confirmation came in 2012] and the search was on to find what SNPs were more recent in time than L1335/S530.  Four years ago not enough was known about SNPs to attempt a time estimate as to when they occurred, but since then the group known as YFull have given broad estimates, from their research, of possible dates for each SNP. At this point I am concentrating on just the MacGregor male line but later in this blog I will make reference to another line to demonstrate how SNP testing is currently revolutionising our understanding of genetic/clan genealogy going back into the past beyond the time of written genealogies. What we still lack are more SNPs coming forward into the time of written records although there are some SNPs which are beginning to divide up families into smaller subgroups with the same surname, particularly when associated with STR results.
     At this point I repeat a chart which I first included in the 2015 blog. The chart was derived by Jim Wilson from work which he did in relation to clan DNA origins through the ScotlandsDNA company (Fig. 3).


Fig. 3: The SNP tree of the Scottish clans as at 2014

This helpful map from Wikipedia shows how the clans relate to each other geographically (Fig. 4):



Fig 4.  Scottish_clan_map Wikipedia Commons.png

Taking each of the SNPs from S530/L1335 onwards, YFull have given the following time estimates:

L1335 (also known as [aka]S530) formed 4300ybp (years before present), and time to most recent common ancestor 3600ybp
L1065 (aka CTS11722 or S749) formed 3600ybp, time to most recent common ancestor 1750ybp
S744 formed 1750ybp, time to most recent common ancestor 1750ybp
S691 formed 1750ybp, time to most recent common ancestor 1700ybp
S695 formed 1700ybp, time to most recent common ancestor 1550ybp [or, c.320 AD to c. 470 AD]

It not clear from the above how YFull distinguish the time frames of the later SNPs as they seem to be suggesting that they all arise around the same time, although, other commentators on DNA have speculated that there are approximately 90 years on average between SNP DNA mutations – but –  that assumes a degree of regularity in the occurrence of mutations, which we know is not the case.
     If we work in reverse from S690 [see Fig. 3 above] which is the defining mutation of the MacGregor bloodline, and which, we think, could have arisen as late as 1360 AD and perhaps as early as 1200 AD, then, using the standard SNP sequence going backwards in time, S695 might have occurred as late as 1270 AD and as early as 1110 AD.  Then S691 above S695 as late as 1180 AD or as early as 1020 AD. Ian Cam MacGregor descendants will notice that there is no mention of S697 here. We simply don’t know enough about this SNP or FGC17830 which seem to be typical of the MacGregor main line to say anything definite about them at this point. Indeed, we know little about S701, FGC17831, S703, FGC17832, S27834, BY144, FGC17829, S27835, and BY143 which are currently found as positive ONLY in the SNP results of MacGregor bloodline and in two Buchanan men. (By the way, it is very interesting that almost all other Buchanan men descend from the next level up on the ancestral tree as shown by the results collected and displayed on the excellent site run by Alex Williamson). The section for descendants of SNP L1335 can be found on Williamson’s site at: 

Fig 5. MacGregor/Buchanan section of Alex Williamson’s The Big Tree

If YFull are right in their estimates then it is quite possible that these SNPs [from S701 onwards] fill the gap between c. 500 AD and c. 1300 AD, implying that MacGregors remained a homogenous group from 500 AD onwards BUT this seems highly unlikely, to say the least. Looking at Williamson’s trees it is quite clear that there are many SNPs about which we need more information. Williamson shows the following as the sequence of SNPs from L1335 (and this same sequence is shown as part of the Scots Modal Panel which can be ordered from YSEQ):

Top of Form
L1335/S530 > L1065 > Z16325 > S744 > S691 > S695 > S701 > S690



Fig 6: Section of the S691 descendant SNPs from YSEQ

This image, which is taken from the YSEQ website shows just how complex the clans family tree is becoming, and changing as we learn more and more about SNPs. It is clear that the relationships which exist between Scottish clan groups will eventually be refined in such a way that the traditional stories of clan origins will either be confirmed or refuted. So, even though now we cannot say with absolute confidence what the exact sequence of SNPs leading from L1335 to S690 is and cannot assign secure dates for their first appearance it will only be a matter of time before the confidence levels on all this become much clearer. It is remarkable how much has been achieved in just four years.
     For comparison, I attach another small section from Alex Williamson’s Big Tree – in this case for the McLeans (Fig. 7). We can see in this that the family groups are beginning to divide up based on individual SNPs separating family groups. The McLeans, like the MacGregors, have a small number of individuals who are quite separate from the mainline, in having the SNP M222 which often has Irish connections, or other SNPs which seem to indicate adoption of the name by different families during the time of change from patronymics to surnames.


Fig 7: McLean section from Williamson The Big Tree

What are patronymics?
     The use of patronymics versus surnames is not always well understood, so a brief discussion here might be helpful. Years before surnames became common in Scotland an individual might be referred to by his geographical location – so in Clan Gregor we have John of Glenorchy who flourished in the 13th century. It has always been believed that this John was one of the first MacGregors, but without a surname we have no real way of knowing if he was indeed a MacGregor – there’s always the possibility that he was an early Campbell. After the middle of the 14th century, and in Britain following the devastation wreaked by the Black Death, not only did individuals move around rather more than before but individuals were less tied by servitude to landowners. In Scotland individuals who managed to acquire some land-holding capability began to use surnames to identify family groupings, whereas, the common people around them would continue to be identified by their family relationships in the formula ‘the son of and grandson of’ (or ‘daughter of’ in the case of females) another male forename. Thus, whereas the most important MacGregor might be known as Patrick MacGregor of Glenstrae, the under tenants, who might, or might not, be distantly related to him, would be known as, for example, Patrick McEwan VicConachie [son of Ewan, grandson of Duncan]. As late as the middle of the 18th century the rental documents of the Menzies estates were still referring to males by their patronymics, but, by the end of the century all tenants had acquired and been identified by fixed surnames. These families might indeed be genetically MacGregors but they might equally be genetically Menzies or Drummond, or they might never have been genetically linked to one of the main families in that area at all. It is because of this variety of means by which names were acquired that clans include individuals who have a variety of genetic origins.

DNA testing companies and their products.
Y chromosome tests:
     It can be very difficult for people new to DNA testing to work out what the best test is for them. The answer depends on what question a person wants answered. If a male wants to know about his surname and its origins, then the Y chromosome test is in practice the only option. Very little is gained however by simply doing 12 or 25 markers because comparisons with other people with the same surname are only effective when comparing 37 or more markers. Many of the programs which help to make comparisons with another individual’s marker scores work best with 67 or 111 markers, and some do not work with less than 67 markers. The advantage to participants of an FtDNA (Family Tree DNA) Y chromosome project is that the company has the largest publicly available database of Y chromosome results for comparison. Surname projects allow direct comparisons with others, but it is possible to keep results private and not visible to the general public, although there is no way to identify an individual testee based on kit number alone.
     One of the results of Y chromosome testing is that an initial assignment of the results to a haplogroup is made. For the majority of Western Europeans this will be R1b-M269 while most others will be I-M253 [originally I1], or I-M223 [originally I2] (both associated with Scandinavia, and originally from the Balkans), and R-M512 (or R1a), whose origins lie in the Steppes. What the numbers indicate is an SNP that defines a specific group (for example, R1b-M269 is Western Atlantic origin). Some companies, however, such as 23andme, still classify individuals by alphabetic lettering. Thus, in my own case 23andme labels my paternal line as R1b1b2a1a2f* ‘a subgroup of R1b1b2’ which is a ‘subgroup of R1b1’. R1b1b2 then described in the accompanying explanation as:

·       Age: 17,000 years
·       Region: Europe
·       Example Populations: Irish, Basques, British, French
·       Highlight: R1b1b2 is the most common haplogroup in western Europe, with distinct branches in specific regions.

Hopefully the earlier part of this blog has shown why this doesn’t say much about recent genetic connections.
     For I1 [I-M253], for example, the 23andme include generalised results for some famous individuals, including Leo Tolstoy and Warren Buffett. It’s unfortunate that some newspapers then choose to interpret such results as ‘meaning’ that a named person is ‘related’ to Warren Buffett (yes they are, but probably cousins at 10,000-20,000 years distance), or, even worse, that a named person is a ‘descendant’ of the historical figure (and that is extremely unlikely unless it happens to be Genghis Khan!!). In this way completely false stories about DNA relationships spread.
    The origins and spread of these haplogroups are shown on the attached map found in several forms on the internet:


Fig 8: Origin and Spread of haplogroups R1a, R1b and I

What about Big Y or Full Genome testing?
Big Y
FtDNA advertise their Big Y as follows:
“Nearly 25,000 known SNPs, placing you deep on the haplotree.
10 Million base-pair coverage - more than any other Y-DNA test on the market.
Find SNPs that may be completely unique to you.
Explore your deep paternal ancestry
Help the community uncover new, undiscovered SNPs.
Use your newly discovered SNPs to help grow the haplotree”.

Whereas FGC (Full Genome Corporation) offer:
The “GenomeGuide, a whole genome test for ancestry purposes, and Y Elite 2.1 a comprehensive test” of a person’s “Y chromosome. Y Elite 2.1 determines those markers (i.e. SNPs and STRs) that are most useful” for a person’s “paternal ancestry”.

Both these tests aim to locate SNPs on a male Y chromosome and may include SNPs classified as ‘private’, meaning that at this point in time they have only been found in a single or very limited number of individuals, and their exact significance to the more general tree or to an individual’s personal family tree has yet to be confirmed.
    It will be clear from the above product descriptions that FGC’s offer is more comprehensive (and they have other versions which probe the Y chromosome even more thoroughly, but cost considerably more). The essential problem lies in identifying which test, if any, gives the most useful information. Some project administrators make suggestions as to which more comprehensive test to take, or, they highlight specific SNPs that an individual might choose, but, these usually build on previous testing rather than being aimed at people starting to look at SNP testing for the first time. A good starting point is to observe what SNPs others in a group have already tested (FtDNA show these as ‘confirmed SNPs’ in green). Individuals who don’t know here to start with SNP testing do need to look for help from a project administrator regarding which SNP(s) to choose. If we take M269 (for group R1b) for example, in many projects in FtDNA this will show in red, meaning the SNP is predicted but unconfirmed. Normally the prediction is correct. If starting from this point probably the best thing to do, short of going straight to one of the two big tests mentioned above, is to have SNP L21 tested for positive or negative. If a person is L21 positive and doesn’t want to go down the line of Big Y or FGC testing then the next step, having looked at any confirmed green entries for SNPs in the sheets of excel data for people lying nearby in the grid, is to go for an L21 SNP Panel either with FtDNA or with YSEQ.com (but using the latter will require a new registration and a new sample, although their pricing is competitive). If SNP testing is done with FtDNA, their results program will usually suggest what the next SNP tests might be. At a certain point in testing it is definitely worth (if only financially) trying an appropriate SNP Panel. For instance, results in the STR Y chromosome tables for a surname project which lean towards L1335 suggest that that would be a good STR Panel to test. Both FtDNA and YSEQ offer L1335 panels as well as individual SNPs (but doing one SNP at a time can get expensive).
     Just to re-emphasise: the advantage of the more comprehensive tests is that ‘private’ SNPs are often identified. Sometimes these are unique to an individual but sometimes they will be found in several individuals and therefore they may well define a discrete family group from within the historic period. However, in order to identify these as belonging to more than one person, other people who seem to be closely related (when looking at the other DNA male line results) need to test for the same ‘private’ SNPs. Many surname groups are working to try to identify these ’private’ SNPs for family groups both to advance genealogical links but also to save participants some money!

Health related issues
Most DNA testing companies do not give reports which include information about health risks. Exceptionally, 23andme have offered health related reports in the past but after difficulties in America with the FDA they suspended these reports, but later reinstated some for the non-American market. These tests do not have a genealogical component and therefore will not be discussed further.

Ethnic mix
As mentioned earlier, FamilytreeDNA, through its MyOrigins report, ScotlandsDNA through Ancestry Painting, Ancestry.com through the AncestryDNA test, and 23andme through Ancestry Composition, all, with some variations in reporting procedures, aim to give an individual a ‘picture’ of his or her ancestral connections with populations around the world. Results naturally vary considerably from almost 100% European to real mixtures of different ancestral backgrounds including American Indian, Far Eastern, African and so on. Ancestry for example says that their DNA test ‘looks at a person's entire genome at over 700,000 locations’ and covers ‘26 ethnic regions’. Ancestry.com claim to have ‘more than 2 million people’ in their database and ‘the unique ability to connect with Ancestry’s billions of historical records and millions of family trees’. For further information on these tests and how they report see my 2015 blog on this site.

Discovering distant relatives’
The reference to Ancestry.com in the above paragraph was deliberate. On the one hand the ability to contact other members part of whose DNA is the same as one’s own is clearly attractive. This is exactly what I was referring to in the opening paragraphs of this blog. The difficulty is that Ancestry does not remind you to check that the information you receive from others is actually accurate. Many a false genealogical connection has been made through eagerness to get back as far as possible. What many people do not realise is that the written records on which genealogies are constructed can be missing for some areas of the world. Even in Scotland the records for the counties in the very north are missing for many localities before 1800 and almost universally before 1750. Wars and carelessness, as well as the wide dispersal of the populations in remote locations meant that children might well not ever be baptised, or if they were, it was done whenever the minister happened to be in the locality. However, it was the parish clerk’s job to keep the records, not the minister’s, and the parish clerk might be tending his cattle 20 or more miles away.  The same cautionary statement holds true of FtDNA’s Family Finder in that what appears in an imported GEDCOM file only represents the family researcher’s work and, as with all internet genealogy, needs to be checked for accuracy.



Fig 9: Screen grab from Family Finder proposed matches

In this screen grab from Family Finder, for the sake of privacy and data protection, I have removed the picture details of matches including the email of the individual whose family includes an individual related to my own family. The match is Charlotta Major but she is not an ancestor in my line, but her father Konrad born in 1797 is. This then is not an MtDNA link (and in any case the person who is my match has not tested this, nor, being female, could she test the Y chromosome), it is an autosomal link with a male line which is my mother’s great grandfather. I have, however, been unable to identify any links with the other individuals listed as matches.

Which company then?
As I said earlier the choice of company depends entirely on what question or questions you want answered:


Fig. 10: DNA testing company list

Bottom of Form

-->
I have not drawn out trees based on STR Y chromosome results this year [that is, those that appear in a chart for people in, for example, an R1b group as having a number sequence like 13, 24, 14, 10, 11,14 etc.] since these results are too diverse and complex when making a comparison between surname groups in the project which now has over 1200 participants, and sometimes even too variable within a surname project name subgroup [as for example Greer, Grier, Grierson in the MacGregor Project]. In short, there are now too many people in the project to do comparison charts that would have any real meaning. Also, the amount of detail would be far too great to permit any links to be seen. Because of this I repeat here my usual offer in relation to those who have tested their Y chromosome through STR tests. If you wish me to run a comparison with other participants, then please state the group or individuals with whom you wish to be compared and I will make a personalised graph for you and help you interpret the results. Please note though that it is only feasible to compare like with like (i.e. 67 markers with 67, 37 with 37). As usual my email address is richardmcgregor1ATyahoo.co.uk (substitute @ for AT). Please contact me offline also for advice on SNP test choices. Could members of the Ian Cam MacGregor group [the bloodline group] please note that the terminal SNP for the group is currently S690 and we do not yet have any ‘private’ SNPs to recommend, other than S696 and S698 which seem to be carried only by the Glencarnock line, and may have arisen in the last 250-300 years. Apart from the two known carriers of these SNPs other members of the Ian Cam group who have tested these SNPs have found them to be negative.