Presentation is loading. Please wait.

Presentation is loading. Please wait.

IRCS Workshop on Linguistic Databases, 11-13 December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg.

Similar presentations


Presentation on theme: "IRCS Workshop on Linguistic Databases, 11-13 December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg."— Presentation transcript:

1 IRCS Workshop on Linguistic Databases, December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg

2 IRCS Workshop on Linguistic Databases, December 2001  2200 transcriptions of spoken language (30 min recording each) Language acquisition data, interviews, expert discourse, classroom discourse, presentation discourse, interpreted discourse, languages (German, English, Swedish, Norwegian, Danish, French, Spanish, Portuguese, Turkish, Italian, Basque, Japanese, Chinese, Russian, Luganda) 9 different data formats (dBase, syncWriter, HIAT-DOS, Verbmobil,...) 3 different operating systems (MAC OS 9.x, Windows, Linux) + MAC OS X research interests: phonetics, syntax, discourse,... Data Formats and Tools at the SFB

3 IRCS Workshop on Linguistic Databases, December 2001 syncWriter: editor for interlinear text MAC OS 9.x and earlier outputs binary data Data Formats and Tools at the SFB

4 IRCS Workshop on Linguistic Databases, December 2001 HIAT-DOS: editor for HIAT-transcription MS-DOS/Windows outputs text files Data Formats and Tools at the SFB

5 IRCS Workshop on Linguistic Databases, December 2001 Data Formats and Tools at the SFB dBase/Access/4th Dimension utterance databases

6 IRCS Workshop on Linguistic Databases, December 2001 Data Formats and Tools at the SFB Verbmobil: 7-bit ASCII files

7 IRCS Workshop on Linguistic Databases, December 2001 Database „Multilingualism“ Goals: 1. To have one common tool for accessing (querying) the data  Data must come in one format (AG)  Multilingual issues must be taken care of (UNICODE)  Data format should be software independent (XML)  Software should work across different OS (JAVA) 2. To have different tools reflecting the habits and needs of the different projects  different input methods (Score, column, vertical notation)  different output methods (dito)

8 IRCS Workshop on Linguistic Databases, December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ? ACCESS / dBase Database „Multilingualism“

9 IRCS Workshop on Linguistic Databases, December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ACCESS / dBase Database „Multilingualism“ Segmented Transcription List Transcription Basic Transcription EXMARaLDA Input / Editing Tools Output / Visualization Tools

10 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“)

11 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers

12 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories

13 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline

14 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline Events

15 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 1. Score notation („Partitur“)  Basic Transcription TiersSpeakersCategoriesTimelineEvents You keep interrupting me, Tom. pointing at Tom

16 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation

17 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation  Basic Transcription TiersSpeakersCategoriesTimelineEvents

18 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling)

19 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) 3. Vertical notation You keep interrupting TiersSpeakersCategoriesTimelineEvents

20 IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) TiersSpeakersCategoriesTimelineEvents Speaker-Turns

21 IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure)

22 IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure)

23 IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure) ProVVpartProPN. IntProVAdjPrepPro Words (linguistic structure)

24 IRCS Workshop on Linguistic Databases, December ab1c2 W: YouW: keepW: interruptingW: meW: Tom POS: proPOS: vPOS: vpartPOS: proPOS: pn U: You keep interrupting me, Tom. GER: Immer unterbrichst Du mich, Tom. 1d2 POS: intPOS: pn e POS: v W: OhW: IW: 'm U: Oh, I'm sorry for that. 3 GER: Oh, das tut mir Leid. Structure Of Annotated Data


Download ppt "IRCS Workshop on Linguistic Databases, 11-13 December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg."

Similar presentations


Ads by Google