Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph

Similar presentations


Presentation on theme: "The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph"— Presentation transcript:

1

2 The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph

3 The Orlando Project l is producing the first scholarly history of women’s writing in the British Isles l is in the 4th of 6 years as a Major Collaborative Research Initiative funded by SSHRC and the Universities of Alberta and Guelph l is using SGML to markup its own research rather than tagging pre-existing texts

4 SGML l 3 DTDs that blend structural markup with tags that foreground the interpretive nature of our research l 252 unique tags l 114 unique attributes l 640,000 element occurrences across all documents l e.g. 22,459 uses of the tag

5 Why strive for consistency? l Delivery Needs/End-User needs l Consistency of presentation –titles, quotations, foreign words l Adequate search and retrieval –standard expressions for people, places, organizations, texts l Chronological sorting –standard expressions for dates

6 Tag Cleanup Pilot l Step 1: Analyze data to locate common inconsistencies l Step 2: Prioritize tag types for cleanup l Step 3: Establish workflow protocols l Step 4: Generate assignments l Step 5: Update user documentation

7 Division of Labour l Batch changes: fix errors that regular expressions can easily find and replace l Undergraduates: fix problems that are predictable but not machine processable or that require minimal research l GRAs and PDFs: fix problems that require an experienced tagger or that need further research l Volume Author: act as consultants on research and practice issues

8 Sample Batch Changes

9 Assignments and the tools that make them happen l SGML-aware fulltext search engine: helps find “prosey” tags in context; helps find incorrect or missing attributes l Document-wide statistics: reports odd sub- elements and content that varies from the norm l Tag cleanup reporting: organizes “index” tags to make cleanup easier

10 Conclusions l Establish priorities early on l Weigh priorities against needs, wants, and total available resources l Divide tasks according to expertise l Train “experts” to fix major tags l Develop a good checking system l Revisit workflow models regularly to make sure the workload is equal to the resources available

11 Reflections on Consistency l Consistency -- it is possible? is it possible across projects? l SMGL -- does it help?... “that's not a date”... “no, that's not a date, either”... “nope, still not a date”... “there you go: a date at last!”... “your text is getting pretty long”... “the text for this tag is usually not this long”... “you have exceeded the average length of text in this tag by 250%!”... “congratulations, you have created the world's longest tag!”

12 The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph

13


Download ppt "The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph"

Similar presentations


Ads by Google