Wa.amu.edu.pl A DAM M ICKIEWICZ U NIVERSITY IN P OZNAŃ Faculty of English Extracting neologisms from a corpus using NeoDet Marta Grochocka

Slides:



Advertisements
Similar presentations
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Advertisements

English Lexicography.
L EARNERS ’ D ICTIONARY Deny A. Kwary
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
What is a national corpus. Primary objective of a national corpus is to provide linguists with a tool to investigate a language in the diversity of types.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Machine Translation Anna Sågvall Hein Mösg F
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Tags, Networks, Narrative Explorations in Folksonomy Sue Thomas and Bruce Mason IOCT, De Montfort University 30 th January 2007.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
UNIT 1 The “words” of journalism Write as many words in English as you can that refer to the field of journalism and group them into subcategories.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Feed Corpus : An Ever Growing Up to Date Corpus Akshay Minocha, Siva Reddy, Adam Kilgarriff Lexical Computing Ltd.
Online Lexical Tool Theodora Sutanto. Dictionary  Cambridge Dictionaries Online (english learner all levels; Cambridge.
1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Conference Papers Scholarly Journal Articles Trade & Professional Journal Articles Academic Textbooks Audio-Visual Information Magazines Personal Communications.
Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi.
Growing Your Business with Social Media David Gerzof
Media in GB Simona Ferulíková, 4.B. newspapers television radio magazines agencies.
Masaryk University, Brno Friday 13 th September Katie Mansfield
  Творческая работа По Английскому Языку «British newspapers»  Выполнил: ученик.
The British Press.
Mass Media Newspapers. What is a newspaper? It is a paper printed and sold daily or weekly with news, advertisements, articles about political, crime,
Linguistic modeling of professional terminology Olga Klevtsova, Tyumen State University, Russia.
News as teenagers see it. Do you know the origin of the word “teenager”? “teenager”?
Chapter 1: By: Ms. Ola Al-arjani
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
British Newspapers.
Newspapers in the UK apple-over-itunes-subscriptions/
A future for dictionary publishing? MR, Lexicom 2009.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
group ПР-09-4 м Shevchenko Lilia
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Searching for Books and Journals -The Basics of OPAC Search- Nagoya University Central Library 1©2015 Nagoya University Library.
Resources Print slide 6 as handout for activity 1.
Introduction to Journalism Course Overview and Terminology.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Do You Read Newspapers on Sundays?
Colorado State University
Learners' Dictionaries Oxford1948 Longman1978 Collins COBUILD1987 Macmillan2002 Macmillan2008 (bilingualized) Merriam-Webster2008 Jackson, Howard
In the UK There are 10 daily national newspapers There are 9 Sunday national newspapers About million people read newspapers every day.
Read all about it Activity 1:
Slang. Informal verbal communication that is generally unacceptable for formal writing.
Learning Aim A.  Websites are used for a very wide range of purposes.
Global Classrooms: How to Research, Cite Sources and Avoid Plagiarism October 1 st, 2013.
LEXICOLOGY.
 GEETHA P.  Originally coined by Tim O’Reilly Publishing Media  Second generation of services available on www.  Lets people collaborate and share.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
英语词汇学课程课件 课件名称:英语词典制作人:孙红梅、寻阳单位:曲阜师范大学外国语学院. Chapter 10 English Dictionaries.
THE PROCESS OF WORDS BEING ENTERED IN A DICTIONARY WORD FORMATION IN ENGLISH Magdalena Soklevska April, 2016.
Unit 7 The British Media. Popularity of the British Media central to British leisure culture plays an important role in engendering a national culture.
MASS MEDIA.
Newspapers.
Developing EAP reading materials for teaching and publication
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Writing Inspirations, 2017 Aalto University
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
How many lexical items do students need to know?
Writing Inspirations, Spring 2016 Aalto University
FitzWimarc Politics Holiday Work
A methodology for analyzing NON-FICTION
Information in Monolingual Dictionaries
ХУД 60-р сургууль Сургалтын менежер Я. Алтантугалаг 2013 он
Objectives: 1.To learn new words.
Presentation.
Presentation transcript:

wa.amu.edu.pl A DAM M ICKIEWICZ U NIVERSITY IN P OZNAŃ Faculty of English Extracting neologisms from a corpus using NeoDet Marta Grochocka

2 The development of a lexical item (Bauer 1983) 1.Nonce formation Neologism (Fischer 1998) certain frequency over a certain period of time distribution in different contexts and domains 2.Institutionalization 3.Lexicalization

3 Types of neologisms formal  a new word, including acronyms and affixes, e.g. PC, e-, -gate (Metcalf 2002) syntactic  a new expression or grammatical construction semantic  a new meaning of an already existing word borrowing

4 Methodology Aims of the study: to examine productive morphological processes in English by means of studying formal neologisms PART 1: Formal classification PART 2: Semantic classification

5 Neologism detector tool Functions: 1.compilation of the study corpus 2.neologism extraction based on the exclusion principle 3.neologism management

6 Neologism extraction process Study corpus Exclusion sources Neologism candidates Manual verification Neologism management

7 Study corpus size and content 14.3 million words newspaper articles and blogs published between 1st Jan and 26th Oct daily broadsheets: The Daily Telegraph, The Times, The Guardian tabloids: The Sun, The Daily Mail almost 9,000 neologism candidates analyzed (out of ca. 73,000) 121 neologisms extracted (without borrowings)

8 Exclusion sources Corpus : The British National Corpus ( ) General dictionaries: Oxford Advanced Learner’s Dictionary 7th Edition, OALD7 (2005) Merriam-Webster's Collegiate Dictionary 11th Edition, MW11 (2006) Macmillan English Dictionary 2nd Edition, MEDAL2 (2007) Cambridge Advanced Learner's Dictionary 3rd Edition, CALD3 (2008) Chambers 21st Century Dictionary, CH11 (2008) Google: COBUILD Longman Dictionary of Contemporary English 5th Edition, LDOCE5 (2009) Dictionary.com Slang dictionaries: The Oxford Dictionary of New Words (1991) The Probert Encyclopaedia of Slang (2004) The Concise New Partridge Dictionary of Slang and Unconventional English (2007) The Dictionary of Contemporary Slang (2007) Word lists: proper names, geographical names

9 Neologism candidates analysis

10 Search engine

11 Neologism management 1

12 Neologism management 2

13 Neologism management 3

14 Formal classification of neologisms

15 Blends Twitterati (Twitter + glitterati) welectricity (wellingtons + electricity) retrotastic (retro + fantastic) girlicious (girl + delicious) Frankenfish (Frankenstein + fish) Obamarita (Obama + margarita) Holohoax (Holocaust + hoax) zeroflation (zero + inflation)

16 Semantic classification of neologisms

17 IT and communications technology beatblogger cyber-locker datablog Facebooker gamification iPad to liveblog to retweet Business and finance infocapitalism micro-employment zeroflation Semantic classification – examples Politics and current affairs Af-Pak Muslimist Obamanomics Entertainment celebdom fabby lip-syncher pet-set retrotastic Food and dieting frankenfish orthorexic

18 Problems impossible to detect semantic and syntactic neologisms alternative spelling, e.g. micro-blog, G & T items provided as examples in the exclusion sources not analyzed by NeoDet failure of the online exclusion sources to respond to the queries made by NeoDet overrepresentation of the Entertainment and News section in the study corpus

19 Conclusions formal neologisms as indicators of productive word formation processes confirmation of the status of affixation and compounding as the most popular methods of extending the lexicon blends as an important source of neologisms coined with the purpose of being witty, amusing and memorable the largest number of neologisms in the area of IT and communications technology

20 Thank you !