This is the second in our series of final reports by the AADDA project researchers, posted with their permission. This one is by Saskia Huc-Hepher:

AADDA Testing Report: The French Community in Londonby  Saskia Huc-Hepher

1 – Methodology

The initial purpose of this research was two-fold: firstly, to use the geo-indexing tool to map out the areas of London with the greatest concentrations of French inhabitants on the basis of the post-codes associated with ‘French’ Web sites / spaces; and, secondly, to identify French community websites in the Domain Dark Archive (DDA) appropriate for subsequent multimodal analysis on the basis their visual and textual meaning potentialities. The ultimate objective of the  former was to triangulate the findings of additional empirical research conducted within the framework of my PhD, which sought to ascertain the actual numbers and hot-spots of the London French community, thereby serving to dispel the exclusively, or at least predominantly, South Kensington myth. Whilst the aim of the latter was to scrutinise the visual landscape of the London French over the period of the DDA data set, as (re)presented through the images – still or moving, in parallel to the technological advances of the Internet – displayed on the French community websites found in the DDA. It was envisaged that this historical visual data would provide the study with greater temporal contextualisation and depth, and, using social semiotic theory, in particular multimodality, would allow meaning to be inferred and ethnographic conclusions drawn from the images, on such subjects as the community’s sense of belonging; how they perceive and conceive London and its inhabitants; how they (re)present and define their own identity through images; what elements of France and Frenchness they portray and promote; and whether any of these have changed over time.
Similarly, it was hoped that the geo-indexing analysis would be of historical value, determining whether or not there was any relationship between the areas most associated with the London French today and those districts favoured in previous waves of migration to the capital. 
The final objective of the DDA research proposed here was for the image-tagging analytical tool to enable a word, or combination of words, such as ‘French’ and ‘London’, to search for photographs or images only, the visual data thereby potentially serving to triangulate the findings of the geo-indexing investigation in that the images and spaces associated with key words such as ‘London’, or specific areas within London, could have coincided with the places and spaces that were identified as being particularly French through the geo-indexing process and/or historically. This micro-investigation was therefore to be binary in its objectives: visual data for ethnosemiotic analysis and geo-indexing data for triangulation of previous qualitative research.
The methodology outlined above was adopted on several occasions over the course of the AADDA project time-span: firstly in March 2013, later in August 2013 and September 2013, with a final trial, using the most functional interface and comprehensive data set, in October 2013. The results, at every stage, however, were disappointing. 

2 – Deep Search Data Testing

March 2013
The first trial session was carried out in the knowledge that at that point in time the DDA included only a random subset of the entire cohort of data, but one which was evenly spread over the archive in temporal terms. Therefore, in theory, trends, developments and patterns should have been identifiable, despite sentiment analysis and geographic options not being available at that stage. In practice, however, a number of basic search hurdles prevented any valuable findings from materialising. These included:

  • the lack of clarity regarding the need to click on the crawl date to access a website; choosing the website title would have been more intuitive. Such functionality was updated at the subsequent meeting (21/03/2013);
  • the lack of clarity regarding the purpose of the bar charts at the top of the page; they have since been removed;
  • the fact that not all web captures functioned at that time – e.g. Le Petit Parisien restaurant had no images and almost no text (but enabled me to do a current Google search for the website, only to find out that the restaurant – and website – is now closed; this is therefore an example of the potential historical worth of the DDA, had it been operating correctly, in allowing the analysis of obsolete Websites);
  • some websites cited in the list of ‘hits’ subsequently being found to be unavailable; the links to alternative sites proved to be useful, however;
  • time being wasted revisiting Websites which had already been scrutinised. Once a site has been viewed, it would be helpful and more time-efficient if the visited link appeared in a different colour (e.g. purple, cf. Google) from the others on the list;
  • the fact that search tools operated extremely slowly and the interface was not yet user-friendly. Speeds and appearance have since improved and the latter is no doubt a work in progress;
  •*/ Here, every separate date in the July (burka scandal) peak (as well as all the other dates in August and October 2008, the two snapshots available from 2009 and the single one from 2012) showed the same snapshot from The Guardian (12 July 2008). If the online material is unchanged in relation to another date, this should be immediately visible on the list of data (possibly via colour coding, as suggested for the pre-visited Web pages, or grouping by content & date);
  • the majority of search results not being particularly useful for my purposes; they were either not relevant (for instance displaying large numbers of Websites related to French tourism for English users) or not French-specific (that is, ‘Londres’ retrieved results in Portuguese, Spanish, etc., not French exclusively; while English search words retrieved sites aimed at Francophiles as opposed to Francophones);
  • phrase searching using the “double inverted commas” being equally disappointing (nothing of relevance was found following a search for “French community London”, or indeed ‘“French” and “community”’, trialled at a later stage); “French London” was therefore tested, resulting in a list of sites relating to French teachers & jobs in London.

Conversely, it was useful to have the ‘media’ / ‘pdf’ search options at the bottom of the screen, as this enabled access to images and audio ‘texts’ (of relevance to the multimodal methodological / theoretical approach taken in my research);

Overall, the initial testing was found to be useful in assessing the lasting impact, or otherwise, of the French community on London, in a temporally comparative manner. That is, by identifying French restaurants/cafés/businesses through their retrospective on-line presence before submitting the titles to a live Google search at the time of testing, I was able to discover if such enterprises were growing, in decline or defunct. Whilst that limited use was of potential value to my research in assessing the lasting contribution of French businesses to London’s cultural and economic landscape, I was nevertheless acutely aware (given my curation of the London French Special Collection for the UK Web Archive) of the mass of relevant data – such as community websites and blogs – which had not been detected or listed as featuring in the DDA. It was hoped at the time that this was due to the incomplete and arbitrary state of the data set.

August 2013
This trial was more successful than the last as regards the speed and efficiency of the data search tools, despite there still being only a five per cent random, if temporally representative, sample of websites available. Somewhat paradoxically, those searches which pinpointed the early years of Internet use, namely 1996 and 1997, proved to be the most valuable. Several different searches were tested on this occasion, as follows:
a) A search for the terms “French community” was filtered by language, using the “French” option. This functionality was found to be extremely useful in reducing the large amount of irrelevant data to a more manageable subset. Again, by filtering further, this time by year (in this case 1996 and 1997), I was able to focus in on yet more pertinent Web pages. Thus, when I began to analyse the <Associations Françaises> site, I noted that the landing page directed the visitor to separate sites, one for French expatriates and one for Belgians. Not only are these sites an indication of the relative establishment of the said Francophone communities in the UK, each warranting an on-line home for the long list of associations set up in the country of residence, but the fact that a distinction is made between Belgian and Franco-French populations has implications regarding identity.
Using the same search terms, another site <Les Grenouilles Cablées>, harvested in 1996, proved worthy of an initial analysis. Firstly, the landing page pointed the visitor in the direction of three separate sub-sections: <Grenouilles du monde>, <Grenouilles des USA> and <Grenouilles de Californie>. These distinctions suggest that either the French expatriate community was more significant in the USA than elsewhere at that time (including London, which is no longer the case and perhaps related to the opening of European borders) or that US residents, including French ones, were earlier adopters of Internet technology than in the UK. When examining the site more closely and entering the <Grenouilles du monde> space, it was telling that the first choice was then <Nouvelles de France> (before the hyperlink to Quebec), which suggests that this website is indeed aimed at the French expat diaspora worldwide, linked together by their shared affinity to France, and keen to maintain links with the homeland. Further, when choosing the French news link, the selection of newspapers available was a left-leaning one. Again, the possible implications of this are two-fold: either the political leanings of the newspapers featured are an indication of the papers’ social commitment, i.e. making information freely available to all, or they are an indication of the profile of the diaspora visiting on-line sites at that time, i.e. Libération and Charlie Hebdo both target a young, left-wing readership. If this is the case, it is thus a profile at odds with the predominantly right-wing (particularly at that time) expat community of the South Kensington stereotype, which serves to substantiate the hypothesis posited at the beginning of this report. There are also hyperlinks to <Metéo France> (suggestive of a need for a physical sense of proximity to the homeland, despite the geographical distance separating the community from it) and to <Les dernières nouvelles d’Alsace’ and <Pariscope>, both of which could be indicative of a longing for insignificant local minutiae in the globalised age, made possible through the worldwide Web, as well as pointing towards greater emigration from eastern France (and Belgium, as confirmed by the first website) and the French capital than other geographical zones.
This site offers links to French audiovisual sites including radio and TV and, perhaps more importantly for my research, to two on-line fora, <French Talk> and <Francopolis> which are evidence of the formation of both Internet and French communities (despite other empirical evidence suggesting that the French community per se does not exist, or if at all, in South Kensington alone). Finally, this website creator’s recommended sites are telling in terms of identity (just as a Blog would be today in its related networks) especially within the theoretical framework of Pierre Bourdieu’s Habitus, with the Vatican, Charlie Hebdo, the RATP (equivalent to TFL in London) and various French sports sites (football, Formula 1 and rugby) featuring among others.
Another site displayed following this search was the <Association des Francophones de Cranfield> in which advice is provided on low-cost means of transport to France and Belgium. This in itself demonstrates that the target audience are medium- to long-term French residents of the UK, rather than short-term visitors, and that they have been attracted to England by its (Higher) education system – a point which, as incongruous as it may appear, is compounded in the qualitative data gathered outside the AADDA project. 

b) The second search undertaken in the August trial was “London French” by “content type”, notably “image”. This was highly disappointing and of little use given that the few images which were displayed related to French football or simply contained a set of codes, with no discernible image.

c) To counter the insufficiency of the image search above, a “format search” was instead chosen from the AADDA homepage. This was more successful in terms of number, with some 6,369 items listed for the “French London + format” search trialled, filtered by year (2006). However, given that the images were not tagged and stood in complete isolation, their usefulness was questionable, as many appeared to relate not to the French community in London, but linked to websites on French property or university Webpages.

d) This search attempted to assess the value of the post-code filter, which initially was again rather disappointing. Given the lack of pertinence of the majority of the sites identified after the early years (1996, 1997), their related post-codes were of equal irrelevance. Furthermore, there were no apparent clusters of London websites, with many coming from outside London; no micro-geographical/demographic conclusions could therefore be drawn. A subsequent search (“French community” filtered by language and year), despite listing only one Website, revealed two potentially telling post-codes, N7 and NW5, for 2010, which could have been related to the forthcoming opening of a new French State school in Kentish Town (NW5) (but the insignificant numbers involved are again inconclusive).   
e) A search for “communauté française”, filtered by year (2001) and language (French) identified a Blog, which would have been of particular pertinence to my research. However, it transpired that the said Blog was the work of an English-speaker, practising their written French, rather than a French Londoner’s Blog. The lack of Blogs retrieved by the DDA search engine was perplexing, as many are known to me within the framework of my UK Web Archive Special Collection work. The question of whether this is due to the domains favoured by the London French Bloggers as hosts for their autobiographical logs is therefore worth consideration, and if so, the possibility of accessing them through the DDA should also be contemplated.

f) The same search as in item (e), this time written in and filtered by the English language for the year 2010, found only one Website, the <Ile aux enfants> school in North London. Despite the unexpected limitedness of the search results in this case, the “links to host” tool was telling, particularly in terms of “mapping the field” and Bourdieu’s “three-stage analysis” paradigm. That is, by scrutinising the – predominantly institutional – list of Websites linked to the <Ile aux enfants>, such as <ambafrance>, <assemblee-afe>, <bienvenuealondres> and <edufrance>, socio-cultural assessments were facilitated. Nevertheless, it was frustrating that these links to the host site were not functioning during the trial, directing the visitor back to the host page as opposed to opening the linked Webpage itself. It was not clear, therefore, whether their inclusion was exclusively for quantitative analysis (the number of visits was in brackets), as they were of no qualitative worth without access to the content of the linked Websites.  

September 2013
The most notable and satisfying difference between this trial and the preceding ones was that all the links to related Websites were at least partially, and in the great majority of cases completely, successful. This meant that the discovery of one website (from  a long list of still relatively futile others), namely the “Londoscope” reference pages of the <> proved to be invaluable through its hyperlinks, as opposed to the content of the site itself. Thus, several pertinent results were attained, as detailed below:
a) The apparition of London French social-networking-type pages, known as <Londoscope> is perhaps indicative of the growing numbers of French Londoners seeking a physical sense of community by means of digital linking and dissemination mechanisms.  Entries such as “Eglise protestante française de Londres: Soirée anti-stress” and the enumeration of French films on show at the Ciné Lumière and the NFT, together with other French cultural events at the Institute of Contemporary Arts bears witness to the importance of French culture to London’s overall cultural capital and is also evidence of  community belonging in practice.
b) The <Londoscope> pages from 2003 enabled the identification of a culturally and historically pertinent French amateur dramatics group which has been performing in London since 1929: Le Cercle dramatique français (CDF). My research into this amateur theatre company can now be taken forward in an effort to ascertain whether it is still in existence and, if so, its place in French community life today.
c) Another link on the same Website, from 2004, referred to the Francophone television channel TV5 celebrating its 20th anniversary and revealed some useful viewer figures, including it being watched in 167 million households in 2003, with some 56 million weekly viewers. This constitutes further evidence as to the impact of the French language and culture worldwide and potentially to the growing French diaspora.
d) The final finding of relevance during this trial session was the <Londoscope> link to the ADFE (Association Démocratique des Français à l’Etranger), created in 1980 ‘par des Français qui voulaient, pour les représenter, une association dynamique et correspondant aux nouvelles réalités de l’expatriation’ (i.e. by French people who sought representation through a dynamic association in tune with the new realities of expatriation). This quotation alone is of worth for a number of reasons; firstly the notion of ‘representation’ itself is key, as it begs the question of ‘representation to whom?’, which, reading further, it appears is to the French authorities. This in turn indicates that the need to be politically represented in France has its roots much further back historically than the election in 2012 of the first ever Député for French overseas residents implies, as well as demonstrating an unwillingness to integrate fully in the London socio-political scene and an attachment to the homeland. Similarly, the notion of “new realities” suggests a shift from an old form of migration to a new one, acting as a temporal forerunner to the massive wave of cross-Channel immigration which began in the early nineties and continues to this day. The term “dynamique” could also be seen to illustrate the London “pull factor” for French expats living in the capital; that is, many are arguably escaping the inertia and complacency of French institutions and mindsets in their decision to emigrate to London, as exemplified in other forms of empirical evidence gathered for this research. Here, therefore, the data gathered from a single Website in the DDA has served to triangulate several key findings in my PhD. 

October 2013
Having exhausted most of the available search options during the previous trial sessions, this was the shortest and least enlightening of all. It was necessary, nonetheless, to conduct a final test with the most functional interface to date and a now complete data set. The search tools also provided an opportunity for sentiment analysis, unavailable in previous trials.
This experiment involved a phrase search for “London French community” combined with “English language” and “very negative” sentiment filters. No results were identified. When the French language was used and chosen as a filter, 240 matches were found, but these were of little relevance to my research given their pedagogical focus. One potentially valuable find for historians of the French presence in the UK was a Website on Augustine monks, in which the flight of monks from France during and after the French Revolution, and the creation of brotherhoods in York (1802), Bristol (1818) and Ealing (1897), where a Benedictine monastery was founded, were reported. However, in view of the contemporary emphasis of my research, this proved of little relevance, once again.
Further searches, using different phrases/words, content types and language/sentiment filters were also trialled, to no avail. Furthermore, it was disappointing to note that the post-code and media filters appeared to have been removed, or were not readily visible. 
Overall, if not the least successful of the trials conducted to date, this was the most frustrating, given the unfulfilled aspirations of working with the complete data set.

3 – Lessons Learnt

The lessons learnt from this exercise are as follows:

  • “Think small” – minimising one’s research objectives is perhaps the only way of navigating the enormity of the data.
  • Maximise material – as the deep search process is akin to searching for the proverbial needle in a haystack, any relevant data identified as being pertinent should be analysed immediately, or saved for subsequent analysis, due to the apparent randomness of the retrieval process.
  • Use big data for its quantitative value, but not for drawing representative conclusions or in an attempt to test large-scale hypotheses, due to the apparent fallibility of the findings. Therefore, restrict qualitative research to the micro-findings of those Web sites and Web pages found to be of value – albeit somewhat arbitrarily – and optimise this data for its comparative and preservation worth.
4 – Future research and AADDA Recommendations

As regards my own research, I intend to explore the identity / Habitus evidence found in early Websites (1996 / 1997) in greater detail and compare it with contemporary Blogs to establish whether the same affiliations are present and the same sense of group, or otherwise, identity. These findings will also be compared and triangulated with the qualitative data gathered from one-to-one interviews with members of the contemporary French population in London. It is also possible that I will study sample historical Websites / Webpages alongside their contemporary equivalents, from a multimodal perspective, to gain an understanding of how technological constraints might influence the making of meaning to varying degrees over time.
It is unlikely that the post-code filter searches will be used to inform my research, given the weakness of the findings, but the process was worthwhile in its disproving of my theory, and some cautious, small-scale conclusions could be drawn from the associations with the NW5 district.
With respect to the AADDA project looking forward, the following recommendations have been tentatively made:

  • (colour?) coding to indicate both sites already visited and replica Webpages (identified repeatedly according to the sweep date)
  • inclusion of a Blog filter (in addition to the <>, <>, etc. domain filters 
  • “links to host” tool to open link in new window
  • retention of the post-code and media type (image, audio, video, etc.) filters, with tags / provenance if possible
  • user-friendly search “help” / search “tutorial” function as cursor hovers over certain fields (such as host links, domains, numbers, etc.) on the deep search landing page (giving particular advice regarding correct wording and punctuation, for example)
5 – Conclusion

The lasting impression, having carried out several trial sessions using the DDA data and its current search tools, is that the results can present islands of valuable resources within a sea of irrelevant material, but that the likelihood of finding them is dictated by chance rather than design. Throughout this testing process, I have pondered the reason for the seemingly arbitrary nature of my AADDA findings and for my failure to access a greater amount of material relevant to my research; that is, the question of whether my lack of technological expertise was the cause or whether such outcomes are inherent to searching this vast set of data has been recurrent and remains unanswered. Instructions offering clear guidelines on the best ways to use the archive and acknowledging its limitations would therefore be both helpful and reassuring to researchers.

Author; Jonathan Blaney

Originally published 23/10/13