Presentation is loading. Please wait.

Presentation is loading. Please wait.

IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

Similar presentations


Presentation on theme: "IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO."— Presentation transcript:

1 IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO (BARCELONA MEDIA) CARME COLOMINAS (UNIVERSITAT POMPEU FABRA) UCCTS, 2010 (Omskrik)

2 IAC CORPORA USE: REQUIREMENTS Its easy to build corpus from the web but difficult to search We need tools that allow frequency statistics, sorting results, linguistically-annotated sequences, etc.

3 Concordances software (MonoConc, Concordance) Databases Corpus query systems (ie.CQP, EMDROS) Useful but tough to learn Not useful for training as students spend too much time to learn the query system IAC CORPORA: SEARCHING METHODS

4 IAC CORPORA: INTERFACES (SEARCHING METHODS) DISADVANTAGES Learn more than 1 interface from the user point of view Programming and design interfaces background needed (external resources) If different attribute types are added > new design of the interface > new founding needed Usually, more expensive than other options ADVANTAGES User-friendly Not necessary training

5 IAC (ACCESS INTERFACE CORPUS) Translation Department (UPF) had many corpus (changing and growing constantly) IAC was born (developed by Barcelona Media and UPF) GOALS Monolingual and aligned corpora Fast and easy creation of interfaces for corpora One interface design for all the corpora

6 IAC INTERFACES Simple : Key Words Out of Context Advanced : Key Words In Context Statistics: KWIC and frequency-based results *** For corpus searching and indexation, IAC uses Corpus WorkBench (CWB) developed by IMS Stuttgart EXAMPLESIAC

7 IAC CORPUS FORMAT TheDetsg boyNounsg buysVerbsg pencilsNounpl Tabular xml for metadataVerticalizedxml for structural data

8 IAC CORPORA: INSERTING A CORPUS INTO IAC Upload the corpus (txt file) at the server Searching interface design through a graphical tool (included in IAC) according to the corpus type and the linguistic annotation added

9

10 IAC is a flexible and powerful tool that goes beyond current corpora interfaces limitations User-friendly tool Access to multiple corpus from the same platform No need of external developer or programming background Fast interface creation that can be modified easily IAC CONCLUSIONS

11 Thank you! judith.domingo@barcelonamedia.org Temporary web: http:// webconsultaiactemporal.barcelonamedia.org

12 SOME EXAMPLES…

13 ADVANCED SEARCH To show the advanced search, we use an annotated corpus with translation. Let's look at examples of sequences with 1 or more words with syntax errors.

14 ADVANCED SEARCH

15

16 ALIGNED CORPORA WITH METADATA As example of aligned corpora, a Spanish > English corpus Can Could May Might Poder (verb) Our goal is to get examples of poder (Verb) translated as may or might in Economics texts.

17 ALIGNED CORPORA WITH METADATA

18

19 STATISTICS Statistics are useful to get quantitative results of sequences. Our goal in this case is to get quantitative results of the prepositions that follow the verb pensar (to think) in Spanish

20 STATISTICS

21 Back


Download ppt "IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO."

Similar presentations


Ads by Google