Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.

Similar presentations


Presentation on theme: "CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer."— Presentation transcript:

1 CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

2 Part A Issues brought up by participants –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

3 Part B ISOcat and CLARIN: Do’s and don’ts (version 0.1) – Introduction and discussion

4 Part 1 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

5 When (not) to adopt an existing DC –It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … –It should come with the same profile –It should handle the same phenomenon, SpeakerID =/= SingerID

6 Speaker vs Singer String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème First: too generic, last: too specific The others are candidates Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)

7 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

8 Standards Within ISOcat currently there are little or no standards, Therefore CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)

9 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

10 Flagged DCs Never link with ‘deprecated’ DCs ! (in case of doubt: consult with Ineke or Menzo) In other cases the flags show whether the DC specification is correct from a technical point of view. Note that only DCs with a green marking are qualified for standardization

11 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

12 DC/DCS and profile Profiles are not added automatically, a DCS may contain elements with various profiles In case the profile you need is not yet available, contact Menzo and Ineke

13 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

14 What to include? Cf slide on SingerID/SpeakerID In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B) Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name

15 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

16 TEI, metadata, webservice TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use. Metadata: new in CMDI? In that case definition in ISOcat to be provided as well Webservice: to be taken care of in CMDI

17 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data

18 Larger amounts? in such a case: contact Menzo Windhouwer (menzo.windhouwer@mpi.nl)

19 Part B: do’s & don’ts Do’s: Create a DCS for your scheme (name project, ann.scheme, …) Provide clear definition (short, to the point) for your scheme, application, …. Take care not to leave concepts used in your definition undefined or vague Use appropriate vocabulary (per profile) Check ‘adopted’ DC’s regularly till standardization !

20 Do’s (continued) When creating a DC, fill out Justification: used in XYZ, part of tagset N Language section –Always English language section –Strong recommendation: sections for object language(s), for working language manual –Sections in the various languages should match (+/- be translations of each other)

21 Do’s (continued) When creating a DC, fill out Example section –Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))

22 Example sections Suppose you want to illustrate a German phenomenon: Ex.sec. in EN language section –German ex with transl in English Ex.sec. in NL language section –German ex with transl in Dutch Ex.sec. in EN linguistic section –EN example Ex.sec. in NL linguistic section –NL example with translation in English

23 Don’ts Confuse Language and Linguistic section –Latter contains language specific values for closed domains Be (too) language specific in definition Mention scheme in definition Use several definitions in one DC Circular definitions Rely on authority Rely on standardized status –Definition should fit YOUR scheme, etc

24 . --End --


Download ppt "CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer."

Similar presentations


Ads by Google