Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Italian An overview of Italian corpora. A linguistic corpus: a body of texts / transcripts collected for linguistic purposes, computerized, representative.

Similar presentations


Presentation on theme: "Digital Italian An overview of Italian corpora. A linguistic corpus: a body of texts / transcripts collected for linguistic purposes, computerized, representative."— Presentation transcript:

1 Digital Italian An overview of Italian corpora

2 A linguistic corpus: a body of texts / transcripts collected for linguistic purposes, computerized, representative for the variety studied, balanced, annotated.

3 Annotation Linguistic annotation can be useful or restrictive Extra-linguistic annotation useful for sociolinguistic research

4 Italian corpora General Written Diachronic Specialized Spoken Synchronic

5 General corporaWritten Italian Corpus e lessico di frequenza dellitaliano scritto (COLFIS) Corpus di riferimento dellitaliano scritto / Corpus dinamico dellitaliano scritto (CORIS/CODIS)

6 COLFIS - structure COLFIS (over three and a half million words) NewspapersPeriodicalsBooks Il Corriere della Sera La Repubblica La Stampa Other, arts, science and technology, cars and boats, children and youngsters, home and hobby, womens magazines, photo love story, general information, society, radio and television, sport, travels and ecology. Other, arts, children, SF, detective and spy stories, hobby and travel, classics, modern narrative, romance, essays, natural and exact sciences, human and social sciences, theatre and poetry. Economy, news of local interest, society, crime news, internal / external affairs, science, show biz and sports.

7 CORIS/CODIS – structure CORIS / CODIS (one hundred million words) PressFictionAcademic Prose Legal and Administrati ve Prose Miscella -nea Epheme- ra Newspaper, periodical, supplement Novels, short stories Human sciences, natural sciences, physics, experimental sciences Legal, bureaucratic, administrative Books on religion, travel, cookery, hobbies, etc. Letters, leaflets, instruction National, local/ specialist, non- specialist / connotated, non- connotated Italian, foreign, for adults, for children, crime, adventure, SF, women literature Books, reviews, scientific, popular history, philosophy, arts, literary criticism, law, economy, biology, etc. Books, reviews Private, public/ Printed form, electronic form

8 General corporaSpoken Italian Lessico di frequenza dellitaliano parlato (LIP) -> Bancadati dellitaliano parlato (BADIP). Archivio delle varietà dellitaliano parlato (AVIP). LABLITA

9 Spoken and written Italian: Corpora e lessici dellitaliano parlato e scritto (CLIPS) CLIPS (the spoken corpus) Radio and television speech Field recordings ReadingsTelephone speech Entertainment, informative transmissions, cultural and educational transmissions, commercials. Map task dialogues and spot the difference game. Readings by the speakers themselves or by professional dubbing actors. Conversations between a fake tour-operator and three hundred people.

10 Specialized corpora Corpus di italiano televisivo (CIT) La Repubblica

11 CIT – structure CIT Current affairs Entertain ment (games, talk-show, varieties) Commer- cials Sports newsNewscast Com- menta -ries. Play- by- play Studio broadcast. On-field broadcast. TextText. Slogans. Studio broad- cast On- field broad- cast TextHeadlines. Studio broadcast. On-field broadcast

12 Corpus di italiano televisivo

13 La Repubblica – structure La Repubblica Year1985 - 2000 GenreNews Comment TopicReligion Culture Economics Education News Politics Science Society Sport Weather Unclassified

14 La Repubblica

15 Thank you! Anne-Marie OBRETIN Mres in European Languages and Cultures University of Exeter ao231@exeter.ac.uk


Download ppt "Digital Italian An overview of Italian corpora. A linguistic corpus: a body of texts / transcripts collected for linguistic purposes, computerized, representative."

Similar presentations


Ads by Google