By Jonathan Blaney
In this post, Jonathan Blaney—editor of British History Online—explains the latest stage in BHO’s project recording the history of History PhDs in the UK and Ireland. Earlier this year we completed work to digitize, index and publish records of nearly 30,000 theses awarded between 1901 and 2014. These records, derived from historical data gathering exercises by the IHR, are now freely available on British History Online. In an earlier post we described this data and how it might be used to chart trends in History PhD research over the decades.
Here Jonathan explains his latest work to add links from thousands of doctorate listings in BHO to EThOS, the British Library’s online catalogue of all UK PhDs. As a result it’s now possible to browse BHO for theses in your field, link to the BL and from there (where available) click through to digital copy of the full thesis.
British History Online has recently published two sets of listings: theses in history awarded between 1970 and 2014, (7000 records) and an earlier set of theses awarded from 1900 to 1970 in the UK and Ireland (22,000 records). The listings are free to read on the website and the underlying data is also freely available to download if people want to use it to look for trends in the writing of history, or the history profession, over that period.
But what if you want to read one of the listed theses?
Many readers will know that the British Library runs an online service called EThOS, which lists doctoral theses awarded in the UK. Going back to the eighteenth century, this listing now numbers about 600,000 theses. At a minimum EThOS will let you discover the awarding institution, who should hold a copy of the thesis, but EThOS frequently offers a link to the thesis on an institutional repository or its own copy for download.
If you know about EThOS then it’s relatively easy to go from British History Online to EThOS and manually search for the thesis you are interested in. But we decided to try to add links from our listing of a particular thesis to its listing on EThOS for the 22,000 theses in our 1970-2014 data set.
From the outset we knew this would not be very easy, because the title on our listing often varies from the title on EThOS. To the human eye these differences may not even register: it’s obviously the same thesis! But a computer program would compare two titles that are exactly the same except for an extra space in one, and conclude that there is no match. Computers are very literal. Our first attempts at matching literally but case-insensitively only gave us a success rate of about 10%.
The problem is that if we relax the criteria for a match, using fuzzy matching of some kind, we run the risk of numerous false positives: matches to the wrong thesis on EThOS. There are 300 theses on EThOS for every thesis on BHO, so this a very real possibility. We think adding the wrong link is worse than not having a link at all.
After a bit of trial and error we used these steps for our matching process:
- truncate each thesis to 23 characters
- match case-insensitively
- match on the author surname as well, to cut down false positives
- make all punctuation optional, with an optional space before things like colons
This proved to be a reasonable compromise. We matched around 70% of the titles we were hoping to find on EThOS.
There were a few false positives, but each would very likely match a true positive as well, so we could manually check these by looking for BHO theses with multiple matches.
Take, for example a thesis authored by Jones which begins:
British Foreign Policy …
That is already 23 characters and can easily match a different thesis authored by a Jones. We had to accept a small amount of these as the price for sufficient correct matches, and the time taken to manually check theses with multiple links was quite small.
If you look at our listings now you will see lots of links, but perhaps not 70% of the 22,000 theses in our listing. There are two more caveats here, to do with the scope of EThOS. EThOS does not list non-doctoral theses, such as an M.Phil., but BHO does: these will never match. Equally, BHO lists theses awarded in Ireland, but EThOS confines itself to theses awarded in the UK, so these too will never match. We think that 70% of matchable theses have been matched: that’s just short of 11,000 thesis records in total.
So if you don’t find an EThOS link on one of our PhD listings it’s still worth looking yourself: there’s a good chance you will find it on EThOS. If you do, please contact us and send us a link so we can add it.
Where we have identified a match and added a link it’s now possible to move swiftly to the full text of your chosen thesis where this is provided on EThOS. In the example below (from our list of theses awarded in 2011) you’ll see links to 5 of the 6 titles (the missing link here is a York MA thesis which isn’t included in EThOS).
EThOS records—especially for recent theses—typically provide an abstract, so you can get a sense if it’s useful for your research. Many BL records also provide follow-on links to a copy of the dissertation itself, as for Dr Alison Ronan’s 2010 Keele thesis, ‘A small vital flame: anti-war women’s networks in Manchester, 1914-18’.
Using BHO’s thesis records in tandem with EThOS makes for a really powerful tool. BHO’s indexing of its records enables you to search across thousands of History dissertations by a range of attributes, including (to use Sara Wolfson’s record from the listing above): by subject (British and Irish History); year of completion (2010); university (e.g. Durham); chronological coverage (‘1625-1669’); temporal and thematic ‘categories’ (‘Gender and Women’ / ’16th-17th century’); index terms (‘court’ / ‘aristocracy’ etc.); and supervisor/s (Natalie Mears and Toby Osborne).
British History Online therefore provides many opportunities for searching thousands of theses for what’s of interest to you (subject, chronology, supervisor and so on). Not surprisingly, this granularity of search isn’t available on EThOS. But once you’ve found something of interest via BHO, and there’s an EThOS link, it’s now just two clicks from discovering a thesis to reading the full text.
For more on this project see our earlier blog posts ‘30,000 PhD theses now available on British History Online’ (June 2020) and ‘BHO theses completed: making the data available’ (July 2020).
British History Online (BHO) is a collection of nearly 1300 volumes of primary and secondary content relating to British and Irish history, and histories of empire and the British world. BHO also provides access to 40,000 images and 10,000 tiles of historic maps of the British Isles.
Within BHO Premium you’ll also find 200 volumes of prime research content via institutional and personal subscription; trial subscriptions are available for institutions. BHO was founded by the Institute of Historical Research and the History of Parliament Trust in 2003. It’s since grown into an essential resource for teachers and researchers which is regularly updated with new content.
Jonathan Blaney is Editor of British History Online and Head of Digital Projects at the Institute of Historical Research.