Perhaps it’s the library geek in me but I enjoy articles which describe the construction and improvement of online searching. One such piece is by Patrick Spedding about using the Eighteenth Century Collections Online (ECCO). This collection is derived from the Eighteenth Century microfilm collection from Gale imprint Primary Source Microfilm and includes digitised facsimiles of 135,000 printed works – comprising more than 26 million pages.
Spedding outlines some of the problems with searching OCR scanned material – made worse by eighteenth-century printing methods. He briefly compares ECCO with JSTOR, Google Books and the Internet Archive. He follows this with the problems of searching ECCO using a case study researching condoms. Certainly he is well-placed to conduct such research being one of the editors of Eighteenth-Century British Erotica.
There are problems with spelling: condom, quondam, condon, condum and cundum are all used in literature. Publishers, conscious of censorship and obscenity, make searching even more problematic. They often used ellipsis, thus c—-m; alternative words such as armour, preservative, sheath; as well as the use of periphrasis – such as “the new machine” or “cloathing worn in merryland“. The search is further complicated by the town of Condom in France as well as (unfortunately) there being a bishop of Condom. By using various search strategies Spedding manages to reduce the initial search result of 536 results to 31. However these last results produced no correct use of condom – one result being a character name and others being OCR errors misreading condemn.
Using a different search strategy – looking for venereal disease and terms associated with the illness – he managed to discover some new references to condoms. He readily admits that working on sexual material will cause problems as outlined above however he acknowledges that even non-sexual commonplace phrases may cause difficulties. He argues that the search limitations are structural and can only be overcome by the publishers.
Patrick Spedding “The new machine”: Discovering the limits of Ecco, Eighteenth-century Studies, 44:4, 2011 p. 437-453
Spedding makes reference to the following articles which may be of interest:-
James May Accessing the inclusiveness of searches in the Online Burney Newspaper Collection. The Eighteenth-Century Intelligencer, 23.2, 2009, p. 28-34 [scroll down to page number]
James May Some problems in ECCO (and ESTC). The Eighteenth-Century Intelligencer, 23.1, 2009, p. 20-30 [scroll down to page number]
Nicholson Baker Double fold: libraries and the assault on paper. New York, Random House, 2001
Image: Giacomo Casanova tests his condom for holes by inflating it.