Second Look at Google Books Ryan James University of Hawaii at Manoa
Legibility
Methodology 50 randomly selected books –Random.org –OED 2,500 pages examined for legibility errors –First fifty pages, excluding prefatory material Major errors = loss of information Minor errors = difficult to read information
Summary of Results Legibility Number of Books50 Number of Pages2500 Major Errors15 % of Pages with Major Errors0.6 Minor Errors9 % of Pages with Minor Errors0.36 % of Pages with both types of errors 0.96
Major Error
Minor Error
Impact of Legibility Errors Loss of information Difficult to read information Frustrated Users OCR problems
Metadata Errors
Methodology 400 randomly selected records reviewed –Random.org –OED 1,600 metadata fields reviewed –
Metadata FieldErrors Found% of Errors Publisher8341% Author4824% Publication Date4120% Title3115% Total203100%
Errors per Book Expect around 1-12% error rate per traditional library catalog error rate Found 35.76% error rate (records having at least one error)
Types of Errors
Errors per Book
Example Search Author Edgar Allan Poe Works published before 1809
Impact of Metadata Errors Errors in search results –results list order –advanced searching –Frustrated Users Problems integrating with other information systems
Optical Character Recognition
Ham. To be, or not to be, that is the question, Whether tis nobler in the minde to suffer The flings and arrowes of outragious fortune, Or to take Armes against a sea of troubles, And by opposing, end them, to die to fleepe
Impact of OCR Errors Keyword searching! Frustrated Users Frustrated Disabled Users –text-to-speech technology
The Importance of Fleepe Cross-contamination of errors –errors in one Google product show up in other Google products
Ngram Viewer
Final thoughts How significant are the errors found in Google Books? Is it useful to patrons? What role can Google Books play?
James, R. (2010). An Assessment of the legibility of Google Books. Journal of Access Services, 7(4), pre-pub version: 125/