This is a post by team member Graeme Hirst of the University of Toronto:

Is it possible to tell what a politician’s ideology is or what party they are a member of just by looking at a list of the words that they and don’t use?

You might expect that even when two politicians express completely contrary opinions on a topic, they would use much the same words — mostly just opinion-neutral words germane to the topic of discussion. But research on members of the U.S. Congress by Daniel Diermeier and colleagues has shown that extreme conservatives can be distinguished from extreme liberals just by their vocabulary. A key shibboleth is the word gay, preferred by liberals, whereas conservatives say homosexual. But many other, more subtle differences are apparent as well. However, these differences apply only at the edges, and they don’t discriminate the more moderate conservatives from the more moderate liberals.

We wondered whether this result could be replicated on members of the Canadian Parliament, where party discipline is more rigid than in the U.S. Congress. Indeed, we found that it’s pretty easy to separate Liberals from Conservatives just by their vocabulary, but with a big caveat! When we applied our method, derived from the Hansards of the Chrétien Liberal government, to Hansards of the Harper Conservative government, we got systematically wrong answers. A closer examination showed that the vocabulary differences that we found weren’t discriminating between Liberal and Conservative, but rather between government, using words of defence and felicitation, and opposition, using words of attack — regardless of which party is which.

If that analysis is correct, then the effect should disappear if we look at parliamentary debates where there is no government and opposition per se. Using English-language data from the proceedings of the European Parliament, collected by our Dilipad colleague Maarten Marx, we found that we could indeed distinguish speakers of left-wing parties (unions, equality, gender) from those of right-wing parties (subsidiarity, competitiveness, Christian) with a fairly high accuracy, and we could pick out the speaker’s exact party from among the five largest parties with an accuracy far greater than chance.

Our results cast doubt on the generality of the results of research that uses words as features in classifying the ideology of speech in legislative settings, and possibly in political speech more generally. Rather, the language of attack and defence, of government and opposition, may dominate and confound any sensitivity to ideology. So our next step, as part of the Dilipad project, will be to move beyond simple word lists to analyze syntax, discourse structure, and the structure of arguments in parliamentary speech.

The research, including our methods for text classification, is described in detail in our newly published paper, “Text to ideology or text to party status?” by Graeme Hirst, Yaroslav Riabinin, Jory Graham, Magali Boizot-Roche, and Colin Morris, in From Text to Political Positions: Text analysis across disciplines, edited by Bertie Kaal, Isa Maks, and Annemarie van Elfrinkhof (Amsterdam: John Benjamins, 2014).