Presentation is loading. Please wait.

Presentation is loading. Please wait.

USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA.

Similar presentations


Presentation on theme: "USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA."— Presentation transcript:

1 USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA

2 USP workshop First steps Get a username and password You will receive one automatically

3 USP workshop

4

5 Working with the Corpógrafo Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research All research done ONLINE Each username/password = separate space on our server At present > anyone can work with it using 10 MB space for FREE BUT - you get an empty space + tools + tutorial!

6 USP workshop Help Files Introdução à utilização do Corpógrafo - um pequeno tutorial A tutorial – to be translated into English – describing the whole process of terminiology research using the Corpógrafo. Available in PDF.Introdução à utilização do Corpógrafo - um pequeno tutorial Corpógrafo Roadmap In English and Portuguese – a panoramic view of the Corpógrafo and how it works. Available in PDF.Corpógrafo Roadmap The Corpógrafo in Easy Stages In English and Portuguese – User’s guide to the Corpógrafo and FAQ. Available in PDF.The Corpógrafo in Easy Stages Also Note > on entry page there is a Glossary of terms and instructions PT > EN

7 USP workshop File Manager Area where each individual or group can: –upload texts to space on server –convert various text formats to.txt –‘clean’ them of unnecessary material –check tokenization and sentence divisions –register full information on source, domain and text type –group – and re-group - texts into corpora

8 USP workshop File Manager 1. Files >List Files on Server >Add Files >Add Files from URL (Experimental!) 2. Corpora > List Corpora > Compile New Corpus

9 USP workshop

10

11 EXTEX Tool for converting file formats to.txt at: http://poloclup.linguateca.pt/ferramentas

12 USP workshop

13

14

15

16

17 General corpus analysis Corpora analysis area: Concordancing tools for regular expressions –at sentence level –KWIC concordancing –Collocations N-gram tool –Case-sensitive –Alphabetical or frequency ordering

18 USP workshop

19

20

21

22 Corpora + TDB Choose corpus Choose related TDB = All terms, examples, definitions extracted from corpus (semi) automatically transferred to TDB = All metadata on texts in corpus can be automatically transferred to TDB

23 USP workshop Term extraction N-grams –Unfiltered –Filtered with restrictions on term in PT,EN,FR,IT,ES,DE –Filtered with restrictions on term and context in PT,EN,FR,IT,ES,DE –Singular + plural terms can be combined –Existing terms in TDB need not appear

24 USP workshop

25 Term selection from n/grams Consultation of list of n-grams Check term status of each n-gram via underlying concordances Check sources Send to TDB

26 USP workshop

27

28

29 Search for definition candidates Already possible via TDB Under development Research area for Mestrado dissertations and bolseiros

30 USP workshop TDB - Terminology database Databases are designed to be multilingual –Terms listed alphabetically + language tag –General data –Morphological data –Source metadata: Authors, texts etc –Definitions + search for candidates –Translation equivalents –Semantic relations

31 USP workshop

32

33

34

35

36

37

38

39

40

41

42

43 Future developments – general policy General testing and improvement Development of new ideas or functions – using isomorphic relationships between researchers’ needs and our possibilities Coordination of individual corpus projects into bigger projects, when possible or necessary


Download ppt "USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA."

Similar presentations


Ads by Google