The IHR Blog |

Big Data


Announcing BUDDAH – Big UK Domain Data for the Arts and Humanities

by

buddah-scaled-cropped-littleWe are delighted to have been awarded AHRC funding for a new research project, ‘Big UK Domain Data for the Arts and Humanities‘. BUDDAH aims to transform the way in which researchers in the arts and humanities engage with the archived web, focusing on data derived from the UK web domain crawl for the period 1996-2013. Web archives are an increasingly important resource for arts and humanities researchers, yet we have neither the expertise nor the tools to use them effectively. Both the data itself, totalling approximately 65 terabytes and constituting many billions of words, and the process of collection are poorly understood, and it is possible only to draw the broadest of conclusions from current analytical analysis.

A key objective of the project will be to develop a theoretical and methodological framework within which to study this data, which will be applicable to the much larger on-going UK domain crawl, as well as in other national contexts. Researchers will work with developers at the British Library to co-produce tools which will support their requirements, testing different methods and approaches. In addition, a major study of the history of UK web space from 1996 to 2013 will be complemented by a series of small research projects from a range of disciplines, for example contemporary history, literature, gender studies and material culture.

The project, one of 21 to be funded as part of the AHRC’s Big Data Projects call, is a collaboration between the Institute of Historical Research, University of London, the British Library, the Oxford Internet Institute and Aarhus University.

Digging into Linked Parliamentary Data

by

Parliamentary papersWe were delighted to hear on 15 January that the IHR, along with the universities of Amsterdam and Toronto, King’s College London and the History of Parliament Trust, has been awarded funding by the international Digging into Data Challenge 2013. ‘Digging into Linked Parliamentary Data’ is one of fourteen projects which, over the next two years, will investigate how computational techniques can be applied to ‘big data’  in the humanities and social sciences.

Parliamentary proceedings reflect our history from centuries ago to the present day. They exist in a common format that has survived the test of time, and reflect any event of significance (through times of war and peace, of economic crisis and prosperity). With carefully curated proceedings becoming available in digital form in many countries, new research opportunities arise to analyse this data, on an unprecedented longitudinal scale, and across different nations, cultures and systems of political representation.

Focusing on the UK, Canada and The Netherlands, this project will deliver a common format for encoding parliamentary proceedings (with an initial focus on 1800–yesterday); a joint dataset covering all three jurisdictions; a workbench with a range of tools for the comparative, longitudinal study of parliamentary data; and substantive case studies focusing on migration, left/right ideological polarization and parliamentary language. We hope that comparative analysis of this kind, and the tools to support it, will inform a new approach to the history of parliamentary communication and discourse, and address new research questions.

A project website will be up and running in the next few weeks, so watch this space for more information!

Exploring the Domain Dark Archive

by

Bookings are open now for the first of a series of day workshops on the Domain Dark Archive, a comprehensive archive of websites from the UK web domain for the period 1996 to 2010. Our new JISC-funded AADDA project (Analytical Access to the Domain Dark Archive) is a joint venture with the University of Cambridge and the British Library, who curate the archive on behalf of the JISC. A summary of the project is available on the project blog.

What on earth would one do with such an enormous and varied dataset ? That was the question I attempted to answer on the project blog; and the workshop for historians on May 24th is an opportunity collectively to imagine the questions historians might ask of the data. These workshops will provide the British Library with a crucial orientation in the process of designing the interface for the data, which is not yet publicly available. It is then a rare opportunity to shape the development of what may prove to be a transformative resource.

Details of the workshop for historians on May 24th in London, and of how to book a place may be found here.