This post originally appeared on the Digging into Linked Parliamentary Data project blog, and is a guest post by one of the historians working the project, Luke Blaxill.

The Dilipad project is on one hand exciting because it will allow us to investigate ambitious research questions that our team of historians, social and political scientists, and computational linguists couldn’t address otherwise. But it’s also exciting precisely because it is such an interdisciplinary undertaking, which has the capacity to inspire methodological innovation. For me as a historian, it offers a unique opportunity not just to investigate new scholarly questions, but also to analyse historical texts in a new way.

We must remember that, in History, the familiarity with corpus-driven content analysis and semantic approaches is minimal. Almost all historians of language use purely qualitative approaches (i.e. manual reading) and are unfamiliar even with basic word-counting and concordance techniques. Indeed, the very idea of ‘distant reading’ with computers, and categorising ephemeral and context-sensitive political vocabulary and phrases into analytical groups is massively controversial even for a single specific historical moment, let alone diachronically or transnationally over decades or even generations. The reasons for this situation in History are complex, but can reasonably be summarised as stemming from two major scholarly trends which have emerged in the last four decades. The first is the wide-scale abandonment of quantitative History after its perceived failures in the 1970s, and the migration of economic history away from the humanities. The second is the influence of post-structuralism from the mid-1980s, which encouraged historians of language to focus on close readings, and shift from the macro to the micro, and from the top-down to the bottom-up. Political historians’ ambitions became centred around reconstructions of localised culture rather than ontologies, cliometrics, model making, and broad theories. Unsurprisingly, computerised quantitative text analysis found few, if any, champions in this environment.

In the last five years, the release of a plethora of machine-readable historical texts (among them Hansard) online, as well as the popularity of Google Ngram, have reopened the debate on how and how far text analysis techniques developed in linguistics and the social and political sciences can benefit historical research. The Dilipad project is thus a potentially timely intervention, and presents a genuine opportunity to push the methodological envelope in History.

We aim to publish outputs which will appeal to a mainstream audience of historians who will have little familiarity with our methodologies, rather than to prioritise a narrower digital humanities audience. We will aim to make telling interventions in existing historical debates which could not be made using traditional research methods. With this in mind, we are pursuing a number of exciting topics using our roughly two centuries-worth of Parliamentary data, including the language of gender, imperialism, and democracy. While future blog posts will expand upon all three areas in more detail, I offer a few thoughts below on the first.

The Parliamentary language of gender is a self-evidently interesting line of enquiry during a historic period where the role of women in the political process in Great Britain, Canada, and the Netherlands was entirely transformed. There has been considerable recent historical interest on the impact of women on the language of politics, and female rhetorical culture. The Dilipad project will examine differences in vocabulary between male and female speakers, such as on genre of topics raised, and also discursive elements, hedging, modality, the use of personal pronouns and other discourse markers- especially those which convey assertiveness and emotion. Next to purely textual features we will analyse how the position of women in parliament changed over time and between countries (time they spoke, how frequently they were interrupted, the impact of their discourse on the rest of the debate etc.).

A second area of great interest will be how women were presented and described in debate – both by men and by other women. This line of enquiry might present an opportunity to utilise sentiment analysis (which in itself would be methodologically significant) which might shed light on positive or negative attitudes towards women in the respective political cultures of our three countries. We will analyze tone, and investigate what vocabulary and lexical formations tended to be most associated with women. In addition, we can also investigate whether the portrayal of women varied across political parties.

More broadly, this historical analysis could help shed light on the broader impact of women in Parliamentary rhetorical culture. Was there a discernible ‘feminized language of politics’, and if so, where did it appear, and when? Similarly, was there any difference in Parliamentary behaviour between the sexes, with women contributing disproportionately more to debates on certain topics, and less to others? Finally, can we associate the introduction of new Parliamentary topics or forms of argument to the appearance of women speakers?

Insights in these areas – made possible only by linked ‘big data’ textual analysis – will undoubtedly be of great interest to historians, and will (we hope) demonstrate the practical utility of text mining and semantic methodologies in this field.