Presentation is loading. Please wait.

Presentation is loading. Please wait.

Albert Bickford SIL International

Similar presentations


Presentation on theme: "Albert Bickford SIL International"— Presentation transcript:

1 Albert Bickford SIL International
Advanced Toolbox June 29-July 2, 2010 First day setup: Monitor 1: Xplorer2 with Toolbox folder Skype Bomgar PowerPoint Monitor 2: Standard project open in Toolbox, ready to switch to ZpChi project PowerPoint display Albert Bickford SIL International © 2010 J. Albert Bickford. May be freely copied for non-profit (educational, scientific, or humanitarian) use. 9/19/2018 3:26 AM

2 Course goals Help experienced users of Toolbox to use it more effectively Respond to needs of the class Share with each other what works Develop ability to help others Prerequisites: Previous familiarity with Toolbox (at least basic dictionary and text glossing) 9/19/2018 3:26 AM

3 Outline of topics June 29 (Day 1) June 30 – July 2
Introduction and getting acquainted Toolbox vs. FLEx Backup Interlinear text June 30 – July 2 Topics to be decided, based on class desires 9/19/2018 3:26 AM

4 Logistics Temporary course website: See “workremote.doc” on website for detailed contact information and software instructions Skype: audio, screen-sharing Bomgar Network Streaming: remote control 9/19/2018 3:26 AM

5 Logistics Local coordinator: TBA Work together and help each other
Consultation with me by appointment 9/19/2018 3:26 AM

6 Getting acquainted Please fill in “questionnaire.doc” from the website and to me. Discussion: What types of projects have you used Toolbox for? What are the main things you want to learn about it? If I go too fast, stop me and ask for further explanation. 9/19/2018 3:26 AM

7 Toolbox and FLEx Both developed within SIL
Toolbox started (as Shoebox) in the early 1990s FieldWorks FLEx started ca. 2005; intended to bring together the power of LinguaLinks and the ease of use of Toolbox Toolbox development continued in parallel with FLEx, to fill the gap until FLEx was mature enough to replace it We’ve reached that transition point 9/19/2018 3:26 AM

8 Toolbox and FLEx FieldWorks aims to provides features that would have been difficult to patch onto Toolbox Full support for non-Roman writing systems and Unicode Collaboration among teams of workers, not just one user per project Integrated suite of tools for field research, with easy data-sharing between tools: dictionary, glossed texts, parser, grammar, discourse analysis, etc. 9/19/2018 3:26 AM

9 Toolbox and FLEx Toolbox is nearing the end of its life-cycle
Usage will likely taper off over the next few years Most new users should start with FLEx (InField recommendation) Established users will need to decide: Continue with Toolbox (but plan for FLEx) Change to FLEx now (convert data or start over) Ongoing limited use of Toolbox for an indefinite period for some tasks (new technology never fully replaces the old) 9/19/2018 3:26 AM

10 Backup Make sure you have a spare copy of your data before experimenting on it in this workshop! Easiest: Just make a copy of the whole folder where the data is stored. How to find out where it is stored Store the spare copy in a safe place 9/19/2018 3:26 AM

11 Backup Backup strategy
How often? Answer: How much work are you willing to lose? Best: Make copies on an external device, preferably stored away from your computer Keep it simple and easy to recover; avoid compression and proprietary backup formats Automate it, e.g. Cobian backup ( 9/19/2018 3:26 AM

12 Interlinear text Many different approaches
Distribution of three types of lexical information: dictionary, wordform inventory, morpheme glossary Automated vs. manual parsing Different transcription systems (practical vs. technical) Different encodings (legacy vs. Unicode) Glosses/translations at how many levels: sentence/clause, word, morpheme Glosses/translations in multiple languages Other fields: grammar, notes, etc. 9/19/2018 3:26 AM

13 Interlinear text (demo)
Standard Toolbox (e.g. Start New Project): Dictionary and morpheme glossary in one file Automated parsing Single transcription system (Unicode) 4+2 lines: Aligning: text, morphological parse, morpheme gloss, part-of-speech (word class) Free: free translation, notes 9/19/2018 3:26 AM

14 Interlinear text (demo)
Alternate approach (SIL-Mexico) 3 lexical files: dictionary, wordform inventory, morpheme glossary Manual parsing Practical and technical Glosses at word and morpheme level, plus free translation, and in both English and Spanish 9/19/2018 3:26 AM

15 Interlinear text: database structure
Structure of an interlinear text Metadata Units ID (reference) line, usually \ref One or more “bundles” of aligning lines (lines wrap automatically) Freeform fields: free translations, notes, etc. 9/19/2018 3:26 AM

16 Interlinear text: database structure
Interlinear text model: Relationship of aligning lines to each other and to the lexical databases Where all this is controlled (Database Properties, Interlinear tab) (demo, using standard setup) 9/19/2018 3:26 AM

17 Interlinearizing: manual parsing
Most people shouldn’t try to get Toolbox to use automatic parsing! Toolbox parser not very robust and for most languages is more trouble than it is worth; few people succeed. Automatic parsing provides no permanent way to store the correct parse. If you use manual parsing, things will import more smoothly into FLEx. 9/19/2018 3:26 AM

18 Interlinearizing: manual parsing
Instead use a wordform inventory (a.k.a. a parsing database) Reliable parsing at the expense of some busy work (mostly at first) Permanent, organized record of the parse of each wordform Word glosses (for non-technical audience) Easier to use technical as well as practical orthography in the project Representation of extended senses and idiomatic phrases Stem and grammatical categories for each wordform (for searching) Etc. 9/19/2018 3:26 AM

19 Interlinear text: manual parsing
Every word is listed in the wordform inventory with its parse. We parse all words manually (by entering them in the wordform inventory) rather than attempting to get Toolbox to parse them automatically. (Demo: Std, then Mex) 9/19/2018 3:26 AM

20 9/19/2018 3:26 AM

21 --- End Day 1 --- 9/19/2018 3:26 AM

22 June 30 (Day 2) plan Feedback so far More on Toolbox and FLEx
Interlinear setup 2nd day setup: PowerPoint Explorer2: Sample files Seri project, Bluefly PDF text Printed copies of slides to refer to while doing demos 9/19/2018 3:26 AM

23 Feedback Your experience so far
Successes Problems Questions Requests Individual consulting: make an appointment 9/19/2018 3:26 AM

24 9/19/2018 3:26 AM

25 Good reasons to stay with Toolbox (rather than FLEx)
Established project, comfortable with Toolbox, production mode, don’t need extra capabilities Older computers (if FLEx runs too slowly) Don’t have time or resources to convert to FLEx now (most people need help from conversion specialists, plus relearning time) Most colleagues still use Toolbox (mutual help) Specialized database that doesn’t fit FLEx (e.g. comparative dictionary) 9/19/2018 3:26 AM

26 9/19/2018 3:26 AM

27 Interlinear text: standard setup
Demo of actual steps in setting up a standard Toolbox project 9/19/2018 3:26 AM

28 Implementing manual parsing
Step 1: Make a new database type (Project, Database Types) Call it something like “wordform inventory” Include at least the following markers: \wf = wordform \mb = morpheme break \dt = datestamp You may also want to add field for word glosses, but they can be added later. 9/19/2018 3:26 AM

29 Implementing manual parsing
Step 2: Make a new database, using the “wordform inventory” database type Call it “wordform.db” or something similar Setup the template for the wordform database In a blank record, make sure you have all of the markers that you want to appear in every record Use Database, Template to save that set of markers to the template. In the \wf field of that record, call that “#pattern for template” and leave it in the database in case you need to change the template later 9/19/2018 3:26 AM

30 Implementing manual parsing
Step 3: Make a new database type for texts that will use manual parsing In Project, Database Types, copy the existing Text type to a new type; call it e.g. “TextWithManualParse” Under Interlinear settings for the new type, find the Parse mapping from \tx to \mb. Click on “Lexicons”. Remove the dictionary from the list of databases to search. Instead, search for the parse in the wordform inventory. “Markers to find” should be \wf; “Marker to output” should be \mb. Choose OK. Under “If parse fails”, tell it to “insert into lexicon” and “Output failure mark”. Do not output original word or root guess. Disable word formulas (bottom of the box. Leave other settings alone. 9/19/2018 3:26 AM

31 Implementing manual parsing
Step 4: Make a new text database Use File, New to make a new text database. Call it “TextsManualParse.itx” or something similar Copy text into it (minimum: the \ref and \tx fields) and (re)gloss as normal 9/19/2018 3:26 AM

32 Interlinear text: semi-automatic parsing
It is possible to combine the manual and automatic parsing. There are two ways to do so (which can be used separately or in combination): Approach 1: If parse fails, output original word. That way, no monomorphemic words need to be added to the wordform inventory. Approach 2: In the parse process, choose SH2-style parse. List the wordform inventory as the Parse database, and the main dictionary as the Lexicon. Toolbox will only parse the word if you don’t have a parse in the wordform inventory. If it parses something wrong, all you have to do is manually override the parsing by inserting an entry in the wordform inventory. 9/19/2018 3:26 AM

33 Interlinear text: semi-automatic parsing
Some warnings about this: You lose the advantage of easy import to FLEx later You can’t do word glossing (because you won’t have any place to store the word gloss for wordforms that are automatically parsed) I have no experience doing this, so I don’t know what pitfalls await you. Be ready to experiment, and be sure to back up your data before you try! 9/19/2018 3:26 AM

34 Interlinear text: Other fields in wordform inventory
Other fields to put in a wordform inventory (and optionally copy to the texts) Word-level glosses Translation equivalents without technical terms Extended senses of words Contextualized Meanings of fixed expressions Citation form Notes 9/19/2018 3:26 AM

35 Interlinear text: Adding fields to texts
How to add to the text model (Interlinear settings) In database properties for the text file type, add a new marker that can be used for the new aligning line Also, change the interlinear settings so that the new marker is included. Be sure to position all word-level annotations before all morphem-level annotations. The line that parses words into morphemes should be at the transition point. (If you can’t get things in the right order, close Toolbox and open the .typ file for the text files with a text editor and move lines around. Carefully! Make a backup first!) 9/19/2018 3:26 AM

36 Interlinear text: Adding fields to texts
Then, regloss the text. It may be necessary to delete all existing lines in order to get the new lines in the right place. In the process, you’ll probably encounter a lot of “Lookup failure” errors. Usually this means that there is in fact a record for the word, but there is a field missing so there’s nothing to copy back to the text. 9/19/2018 3:26 AM

37 9/19/2018 3:26 AM

38 Lexical databases How do we use the standard setup for a full dictionary? 9/19/2018 3:26 AM

39 Lexical databases Types of information included in a lexical database
Phonological, semantic, grammatical, sociolinguistic, anthropological, etc. For different audiences: Language communities, linguists, non-linguist outsiders Dictionary and text annotation (glossing) Sometimes the same information in multiple formats Can grow to include a hundred bits of information for each entry (# of fields per record) 9/19/2018 3:26 AM

40 Lexical databases The Multi-Dictionary Formatter (MDF) system
Set of fields and standard format markers for producing dictionaries of different types Software for converting it to formatted output (either within Toolbox or via Lexique Pro) Contains fields both for a published dictionary and for morpheme-glossing Tip: don’t use it for wordforms/parsing—do that in a separate database 9/19/2018 3:26 AM

41 Lexical databases Mature, widely-tested, applicable to a variety of different situations and product types A de facto “standard” Toolbox and LexiquePro are preconfigured for it Imports easily into FLEx 9/19/2018 3:26 AM

42 Lexical databases, templates
Suggested routine MDF fields to add to the template for the dictionary database (see MDFields19a.txt for further info) \lx, \ps \ge, \re, \xv, \xe \cf, \nt, \dt How to do it (demo) Add fields to a single typical record Save to the template Make a new record, call it “#pattern for template” Adjust as needed whenever needed, then save it to the template 9/19/2018 3:26 AM

43 Lexical databases, templates, mass editing
Changes to the template only affect new records To fix the old records, you have to edit them Mass editing in an external editor (demo) 9/19/2018 3:26 AM

44 9/19/2018 3:26 AM

45 -- End Day 2 – 9/19/2018 3:26 AM

46 July 1 (Day 3) plan Feedback so far Special characters and Unicode
Learning more about MDF LexiquePro Audio and video Wordlist and concordance Jumps 2nd day setup: PowerPoint Explorer2: Sample files Seri project, Bluefly PDF text Printed copies of slides to refer to while doing demos 9/19/2018 3:26 AM

47 Feedback Your experience so far
Successes Problems Questions Requests Individual consulting: make an appointment What important topics haven’t we covered yet? 9/19/2018 3:26 AM

48 Unicode and language encodings
What do you need to know? General understanding of how special characters work on a computer? Font, keyboard, codepoint, encoding… What is Unicode, UTF-8 vs. UTF-16, composed vs. decomposed… “Legacy” vs. Unicode? Advantages/disadvantages Converting from legacy to Unicode Using Toolbox for each What fonts and keyboards are available? Setup language encodings in Toolbox? Sort orders, punctuation, case pairs Installing fonts, keyboards, Unicode Cautions about using Unicode with Toolbox? Troubleshooting specific problems? 9/19/2018 3:26 AM

49 Learning more about MDF
Marker properties in Toolbox: right-click on a marker (demo) MDFields19a.txt: accessible in Toolbox (demo) Making Dictionaries (MDF_2000.pdf) 9/19/2018 3:26 AM

50 LexiquePro Viewer for Toolbox dictionary files
Uses the same data files as Toolbox Has its own settings files, separate from Toolbox Good for distribution and creating formatted output (.doc, .htm) Can also be used to edit the file Changes made in either program are visible in the other WARNING: Don’t use at the same time as Toolbox! Limited capabilities compared to Toolbox 9/19/2018 3:26 AM

51 LexiquePro Setting up LexiquePro with a Toolbox MDF database (demo)
Settings files added by LexiquePro 9/19/2018 3:26 AM

52 9/19/2018 3:26 AM

53 Graphics, Sound, Video Both Toolbox and LexiquePro can have links to outside files \pc picture \sf Sound file \ff Video or other external file Tip: Keep them all in the same folder close to your data (e.g. subfolder “sup”) 9/19/2018 3:26 AM

54 9/19/2018 3:26 AM

55 Wordlist and concordance
(demo of wordlist and concordance) To work, you have to set up Text Corpora What files to be searched What fields in those files What fields to use for referencing (one word selected from each field) 9/19/2018 3:26 AM

56 9/19/2018 3:26 AM

57 Jumps Right-click (Alt-J) on a word to jump to another database
Jumps from the selection if something is selected Otherwise takes a whole word Tip: All characters must be listed correctly in the sort order! Jump paths have to be set up first in Database Properties (demo, esp. with interlinear text) “Jump target”: open in an existing window (rather than a new window) 9/19/2018 3:26 AM

58 9/19/2018 3:26 AM

59 Range sets, data properties, consistency checking
Toolbox allows you to do anything with your data—that’s not always good Also provides tools that allow you to limit your creativity when appropriate, i.e. to set rules and find violations of them (demo) Range sets (Marker properties) Data properties (Marker properties) Data links (Database properties, Jump Path Properties) Consistency checking (Checks menu) Interlinear check (Checks menu) 9/19/2018 3:26 AM

60 9/19/2018 3:26 AM

61 -- End Day 3 – 9/19/2018 3:26 AM

62 July 2 (Day 4) plan Feedback so far Cautions about Unicode
Sort order setup and Find/Replace Linguistic issues Idioms Derivational morphology Names Plants and animals Morphophonemics Nonlinear morphology (incl. reduplication) Comparative dictionaries 2nd day setup: PowerPoint Explorer2: Sample files Seri project, Bluefly PDF text Printed copies of slides to refer to while doing demos 9/19/2018 3:26 AM

63 Feedback Your experience so far
Successes Problems Questions Requests Individual consulting: make an appointment 9/19/2018 3:26 AM

64 9/19/2018 3:26 AM

65 Cautions about Unicode
Guide to help you decide when and how to switch to Unicode: All or none (don’t mix Unicode with legacy encodings) Don’t just check the Unicode box in Language Encodings—you need to convert the data files and settings files first Composite characters vs. separate diacritics 9/19/2018 3:26 AM

66 9/19/2018 3:26 AM

67 Sort order setup Editing a sort order (demo)
Implications for Find/Replace and Jumps 9/19/2018 3:26 AM

68 9/19/2018 3:26 AM

69 Linguistic problems in text glossing
Multi-word idioms and names Join the words together with _ on the text line Give one word gloss to the whole combination Separate them with spaces on the morpheme break line, and gloss each piece separately 9/19/2018 3:26 AM

70 Linguistic problems in text glossing
Derivational morphology Suggest breaking off only the most productive and regular derivational morphology Treat less productive derivational morphology as if it was part of the base of a verb, i.e., only break things down to the stem, not all the way to the root Why? It makes texts much more readable The stem is the lexical unit that is relevant to the context, not its internal structure Full details of a derived word’s structure can be given in the lexicon 9/19/2018 3:26 AM

71 Linguistic problems in text glossing
Names One word or more than one? Is the name analyzable into morphemes? Is there an equivalent name in the glossing language? 9/19/2018 3:26 AM

72 Linguistic problems in text glossing
Plants and animals In general, gloss with common names, e.g. genus labels Aim for the appropriate level of specificity (e.g. don’t use ‘bird’ for a specific type of bird) Specify the scientific name in a note, but only if the identification is done by an expert Be careful of names in English that are used for diverse organisms (e.g. badger, ironwood) 9/19/2018 3:26 AM

73 Linguistic problems in text glossing
Morphophonemics In the \mb field, suppress morphophonemic variation but retain suppletion, e.g. use a phonological underlying form If processes apply across word boundaries, 2 options for the \tx line: Don’t write the changes in \tx; explain them elsewhere Do write them, and include alternate forms in the wordform inventory so all variants receive the same word gloss and parse 9/19/2018 3:26 AM

74 Linguistic problems in text glossing
Nonlinear morphology Use abstract underlying forms, as if they were linear Use < > or some other convention to flag the nonlinearity Explain the facts in notes 9/19/2018 3:26 AM

75 9/19/2018 3:26 AM

76 Other uses for Toolbox Comparative dictionaries
Specialized analytical research Ethnologue Address list, administrative database 9/19/2018 3:26 AM

77 Comparative dictionaries
Organize either by cognate sets or a common gloss Separate fields for each language (demo) 9/19/2018 3:26 AM

78 9/19/2018 3:26 AM

79 Feedback Please fill in the course evaluation sheets that InField has provided. Please also send feedback on this workshop to me at (especially suggestions for the future) to: 9/19/2018 3:26 AM

80 -- End Day 4 – 9/19/2018 3:26 AM

81 9/19/2018 3:26 AM

82 ============================ Other topics
After this point in the file are topics that may be of interest to participants (left over from other workshops that I have given) but which we didn’t cover in class this time. 9/19/2018 3:26 AM

83 Interlinear text : Overview Procedure
Prepare text for glossing (import, break into units, number the units) Pre-load words/morphemes into the lexical database (if desired) Initial glossing Revise lexical database with new words/morphemes/glosses as needed, and regloss individual words 9/19/2018 3:26 AM

84 Interlinearizing: database structure
Special fields in MDF for glossing texts Glosses vs. definitions in lexicon \lx (lexeme) vs. \lc (citation form) fields 9/19/2018 3:26 AM

85 Interlinearizing: How to do it
Either put one text per file or one per record (multiple texts per file) If all in one file, use the existing shell “itx.db” If in separate files, start a new file with File, New and choose “EOPASInterlinear” as the database type. New files/records will contain blank metadata markers and a bunch of shell units 9/19/2018 3:26 AM

86 Interlinearizing: How to do it
Prepare text for glossing Type or paste text into the \tx marker(s). Two approaches Type or paste one sentence or clause per unit. Add markers for new units as needed. Paste the whole text into one marker, use Tools, Break/Number Text to setup for glossing. Delete excess units at the end. Add free translations and notes (inserting markers as needed) to each unit. 9/19/2018 3:26 AM

87 Interlinearizing: How to do it
Vital step: all symbols must be in appropriate language encoding If glossing isn’t working correctly (words being split up or not being found), look for symbols that aren’t yet listed in the sort order 9/19/2018 3:26 AM

88 Interlinearizing: How to do it
ALT+I to add glosses. *** means the word isn’t in the lexical database yet, or there isn’t a gloss yet. Entering new glosses in the database: Jump to add new entries to glossing database Return From Jump to retry CTRL+R Reglossing with ALT+I 9/19/2018 3:26 AM

89 Interlinearizing: How to do it
Multiple glosses for one item List both glosses in lexicon, separated by semi-colons If homophones, make two entries in lexicon 9/19/2018 3:26 AM

90 Interlinearizing: How to do it
Revising Make changes first in the glossing databases, then use ALT+I to copy to texts Revise by reading through the glossing databases for consistency Revise by reading the texts, e.g. comparing glosses and free translations 9/19/2018 3:26 AM

91 Interlinearizing: How to do it
Verifying ALT+C How to do it Will work better if every word is listed in the wordform database, even those that don’t divide further into morphemes 9/19/2018 3:26 AM

92 Interlinear text EOPAS: EthnoER Online Presentation and Annotation System See Schroeter and Thieberger 2006: 9/19/2018 3:26 AM

93 Learning more about Toolbox
Toolbox Help system Toolbox Self-Training.doc (accessible from Start Menu on Windows) Toolbox website Discussion list 9/19/2018 3:26 AM

94 Adjusting settings Language encodings (demo)
Appearance: font, size, style, color Sort orders Primary order Secondary order Ignore characters Case pairs and punctuation 9/19/2018 3:26 AM

95 Adjusting settings Marker properties (demo)
Language encoding for a field Appearance of individual fields Name or description of a marker Caution: Be careful of changing the marker itself—you can easily make things not work right by doing so (and you may not discover that you’ve broken things until weeks or months later, when you’ve forgotten what you did). 9/19/2018 3:26 AM

96 Adjusting settings Margins and text-wrapping (demo)
Text wrapping is only semi-automatic Set margin based on width of window (Database menu) Reshape Automatically (Database, Auto Wrap) Single field (Database, Reshape or SHIFT+F5) Whole database (Database, Reshape Entire File) Suppress reshaping for certain markers (Marker Properties, Data Properties, No Word Wrap) 9/19/2018 3:26 AM

97 Adjusting settings: be patient…
More advanced adjustments to be covered later Adding new field markers to hold new types of information Making a whole new database type (e.g. list of inflected wordforms, comparative dictionary, bibliography) Setting up a new project, especially if it requires any customization (which most do) 9/19/2018 3:26 AM

98 Revising and refining data
Find and replace Sorting Filtering 9/19/2018 3:26 AM

99 Formatted output File, Print: straight image of what you see on screen (no reformatting except page breaks) Formatting interlinear text: no standard “off-the-shelf” solution, requires custom setup and programming 9/19/2018 3:26 AM

100 Formatted output MDF output to Microsoft Word RTF
Different types of output for different audiences Omitting records Omitting fields 9/19/2018 3:26 AM

101 Formatted output LexiquePro (http://www.lexiquepro.com/download.htm).
Viewer with limited editing capability (demo) Best to close Toolbox before using it (only allowed to edit with one program at a time). Can redistribute it bundled with your dictionary Export to Word RTF or to web page (HTML) 9/19/2018 3:26 AM

102 Formatted output XML: useful mainly for techies, but will make many things possible for the rest of us Toolbox can export in XML format How XML differs from SIL standard format (demo with ZpChi data) Tools for manipulating XML: editors, stylesheets, and XSLTransformations 9/19/2018 3:26 AM

103 Linguistic problems in text glossing
Adjusting text units: (demo in Seri) Splitting Combining Renumbering 9/19/2018 3:26 AM

104 Linguistic problems in text glossing
Homophonous morphemes (demo in Seri) Include separate records in the lexical database for homophones Toolbox will offer a choice when it glosses In a particular word, however, it is often clear which morpheme is involved, there should be no need to choose This can be specified as part of the parse, as a “forced gloss” 9/19/2018 3:26 AM

105 Integration with other software
ELAN: import/export Transcriber: import XML: export RTF (Word document): export Lexique Pro (viewer and formatter, plus simple editing) 9/19/2018 3:26 AM

106 Structure behind the scenes (settings)
Database types Fields Marker Language used Other information Relationships between databases (“jumps”) Instructions for processing interlinear text 9/19/2018 3:26 AM

107 Structure behind the scenes (settings)
Language encodings (demo) Font and keyboard Sort order, digraphs, case equivalents Letters vs. punctuation Natural classes for phonological searching 9/19/2018 3:26 AM

108 Structure behind the scenes (settings)
Projects (workspace) (demo) Arrangement of windows on the screen What database is open in each window You can have more than one project for the same set of database files Multiple views: ZpChi duplicate window 2 records at once 9/19/2018 3:26 AM

109 9/19/2018 3:26 AM

110 Special characters Whenever possible, use Unicode fonts, rather than custom fonts designed for specific languages (see further discussion later). Unicode is a newer system that allows thousands of characters in a single font. The computer world is transitioning away from custom fonts for specific languages to this one common system that covers all languages. 9/19/2018 3:26 AM

111 Special characters Some common Unicode fonts that have many Latin characters for minority languages Charis SIL Doulos SIL On Windows Vista and later: Times New Roman, Arial, et al. Lucida Sans Unicode Arial Unicode MS 9/19/2018 3:26 AM

112 Special characters: Keyboarding
Use Character Map utility (Start, Run, “charmap.exe”) for keyboarding unless you have something better. Some people will be able to use one of the standard Windows keyboards, such as “US International”. Others can have a custom keyboard designed using Microsoft’s Keyboard Layout Creator or Tavultesoft Keyman. 9/19/2018 3:26 AM

113 Special characters: How to cope when you have less than the ideal
Use practical orthography when possible, not IPA or other phonetic transcription Reduces the number of special characters required Possibly modify it to make it more systematic (e.g. use k instead of c/qu to make morpheme shapes more consistent) If you need IPA too, you will need to use Character Map or have a custom keyboard designed for you. 9/19/2018 3:26 AM

114 Special characters: How to cope when you have less than the ideal
Substitute characters or digraphs, e.g. for ə, S for ʃ Use :o for ö, 'u for ú, etc. In other words, make do until you can get someone to set you up properly Once you have a good keyboarding system, you can do a search and replace to “correct” your makeshift transcriptions to the correct ones. 9/19/2018 3:26 AM

115 Special characters: Using legacy custom fonts (pre-Unicode)
Viable option for now if you already have everything you need all characters available in the custom font functioning keyboarding system Possibly the best option if the language community still uses the same system Eventually you (and the community) will need to switch to Unicode 9/19/2018 3:26 AM

116 Special characters: Using legacy custom fonts with Toolbox
Need to modify the standard Toolbox project setup. All language encodings must be adjusted: Not Unicode Use your custom fonts and keyboards Don't mix custom encodings with Unicode in the same project! All encodings should be set up for Unicode, or none of them should be. 9/19/2018 3:26 AM

117 9/19/2018 3:26 AM

118 Special characters “Special character”: anything other than what is normally printed on the keys of an English keyboard It’s an ethnocentric (linguocentric) definition, but it reflects the way computer technology developed and the problems that non-English characters can cause. 9/19/2018 3:26 AM

119 Codepoint (hexadecimal)
Special characters Every character (special or ordinary) is represented internally as a number, called its “codepoint”. LATIN CAPITAL LETTER A WITH ACUTE (Á) Abstract name 00C1 Codepoint (hexadecimal) 9/19/2018 3:26 AM

120 Special characters A font contains an image (the visible letter) used to display/print each codepoint. LATIN CAPITAL LETTER A WITH ACUTE Á Times New Roman 00C1 9/19/2018 3:26 AM

121 Á Á Á Special characters
Different fonts provide different images for a given character, but they are all recognizably the same character. Á LATIN CAPITAL LETTER A WITH ACUTE Arial Á 00C1 Times New Roman Comic Sans MS Á 9/19/2018 3:26 AM

122 Á Á Á ',A Special characters
An electronic “keyboard” provides a way of typing the character. Á LATIN CAPITAL LETTER A WITH ACUTE Arial Á ',A 00C1 Times New Roman US International Comic Sans MS Á 9/19/2018 3:26 AM

123 Á Á Á ',A Special characters
There can be more than one way to type the same character, depending on the keyboard used. Á LATIN CAPITAL LETTER A WITH ACUTE A,CTRL+' BU Keyboard Arial Á ',A 00C1 Times New Roman US International Windows built-in Comic Sans MS Á ALT+0193 Character map (pick from a chart with the mouse) 9/19/2018 3:26 AM

124 Á Á Á ',A Special characters 00C1
Keyboard: maps keystrokes to codepoint Font: maps codepoint to image There are (potentially) many options for each Á LATIN CAPITAL LETTER A WITH ACUTE A,CTRL+' BU Keyboard Arial Á ',A 00C1 Times New Roman US International ALT+0193 Windows built-in method Comic Sans MS Á Character map (pick from a chart with the mouse) 9/19/2018 3:26 AM

125 Special characters Most important issue: what codepoint represents each character Secondary: how it is typed, what font is used These can be changed without disturbing the data, as long as they are designed with the same characters and codepoints in mind 9/19/2018 3:26 AM

126 Special characters Encoding: a system for representing a set of characters with codepoints If you change encodings, you must also change keyboards and fonts Wrong font: data is displayed incorrectly Wrong keyboard: what you type comes out wrong When we talk of custom fonts and keyboards, what is really significant is the encoding that underlies them, not the font or keyboard itself. 9/19/2018 3:26 AM

127 Special characters Common encodings Windows ANSI
about 220 characters used in major Western European languages standard in all Windows fonts from the start (ca. 1990) Standard encodings for particular languages (ISO standards) Cyrillic Japanese Arabic 9/19/2018 3:26 AM

128 Special characters Custom encodings for specific languages ("custom” or “legacy” fonts): about 220 custom characters often based on Windows ANSI with some substitutions (a given codepoint represents a custom character rather than what it would normally represent in Windows ANSI) 9/19/2018 3:26 AM

129 Special characters Unicode
Over 100,000 characters already, with more to come—a little over a million possible Strong support by the entire computer industry Intended to handle all the world’s languages in one common system, without conflicts A genuine Unicode font might not have the character you need for a particular codepoint, but it will never have the wrong character 9/19/2018 3:26 AM

130 Special characters Quick guided tour of Unicode
(demo using Insert Symbol in PowerPoint) 9/19/2018 3:26 AM

131 Special characters Unicode is the only viable long-term choice
Large inventory of characters—practically anything a linguist would ever want Everyone worldwide can use the same system (no problems with data getting garbled by using the wrong fonts) Custom/legacy fonts may cease to work with future software 9/19/2018 3:26 AM

132 To make Unicode work Unicode-capable operating system and software
Windows 2000, XP, Vista Recent versions of Mac OS X and Linux Toolbox (NB: not Shoebox) and FLEx Most newer commercial software and much shareware/freeware See partial list at 9/19/2018 3:26 AM

133 To make Unicode work Unicode fonts that contain the characters you need Arial Unicode MS (with Microsoft Office, some versions don't include characters added to Unicode in the last few years) Lucida Sans Unicode (standard in Windows) Doulos SIL and Charis SIL ( standard fonts in Windows Vista Other sources, see 9/19/2018 3:26 AM

134 To make Unicode work A way to input the characters you need (see Character map utility (built-in to Windows) and Insert Symbol (built-in to Microsoft Office) Standard Windows keyboards, e.g. US International It is helpful to use them together with the Windows On-Screen Keyboard (Start, Programs, Accessories, Accessibility), which will show you how to type each character Custom Windows keyboards, made with Microsoft’s Keyboard Layout Creator (SIL has one for IPA characters) Tavultesoft Keyman ( 9/19/2018 3:26 AM

135 Legacy encodings vs. Unicode
Some people have older custom-encoded data that should be converted to Unicode To decide whether and when to change from legacy encodings to Unicode, see advice at For advice on how to proceed, see 9/19/2018 3:26 AM

136 Legacy encodings vs. Unicode
When copying from a document that uses custom fonts into one that uses Unicode (or vice versa), special characters may get garbled. If the problem is infrequent, just fix them manually. If the problem happens frequently, have a techie friend get the Unicode conversion tools available at and set them up for you to use when you need to convert data. Better yet: convert all your data at once and leave custom fonts in the past. 9/19/2018 3:26 AM


Download ppt "Albert Bickford SIL International"

Similar presentations


Ads by Google