A Linguist’s Search Engine Philip Resnik University of Maryland JHU Conference on Spatial Language and Spatial Cognition September 18, 2003.

Slides:



Advertisements
Similar presentations
THE DEDUCTION OF ANXIETY AN EIP ON SHERLOCK FANS WHO USE TUMBLR.
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Fostering Learners’ Collaborative Problem Solving with RiverWeb Roger Azevedo University of Maryland Mary Ellen Verona Maryland Virtual High School Jennifer.
The Web in Theoretical Linguistics Research: Two Case Studies Using the Linguist’s Search Engine Philip Resnik, Aaron Elkiss, Heather Taylor, and Ellen.
Elements of Constructivist Teaching Practices EdSe 4244 Social Studies Methods.
Indexing Strategies for the Linguist’s Search Engine Aaron Elkiss and Philip Resnik UMIACS.
Introduction: The Chomskian Perspective on Language Study.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Steve Padget LVT and the thinking curriculum LVT and the thinking curriculum Working with post-graduate trainee English teachers.
Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.
LING 364: Introduction to Formal Semantics Lecture 26 April 20th.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
On Theories, Hypotheses, Variables, Validity, and Reliability.
August 23, 2010 Grammars and Lexicons How do linguists study grammar?
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
C SC 620 Advanced Topics in Natural Language Processing Lecture 19 4/6.
What is Interdisciplinary?. Discipline (and punish? :-) Physics Biology Chemistry MathematicsEconomics PsychologyEtc.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
1/17 Probabilistic Parsing … and some other approaches.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Fundamentals: Linguistic principles
Lecture 1 Introduction: Linguistic Theory and Theories
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
The Linguist’s Search Engine 02/04/2004. Background Address: Address:
Memory Strategy – Using Mental Images
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Linguistics, Pragmatics & Natural Grammar
U SING C ORPUS - BASED R ESEARCH FOR L ANGUAGE T EACHING AND L EARNING ENGLISH 510 Hee Sung (Grace) Jun & Kimberly LeVelle.
EngageNY.org Argument Writing: Going Deeper with Teachers.
Introduction to Florian Jaeger, For the Methods class, December 3 rd, 2003.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 13, 2010.
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
What is linguistics  It is the science of language.  Linguistics is the systematic study of language.  The field of linguistics is concerned with the.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Tag Dictionaries Accelerate Manual Annotation Marc Carmen*, Paul Felt†, Robbie Haertel†, Deryle Lonsdale*, Peter McClanahan†, Owen Merkling†, Eric Ringger†,
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
A Solution Oriented Approach In Educational Settings The aim of this series of training sessions is to give an introduction to the principles of solution.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Understanding Grammar Chapter 1. Group Work: Grammatical Structure Put the sentences in order: Ring bells loudly the. I gave a book my sister. Mary should.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
An Evaluation Tool for Natural Language Processing Systems Audrey N. Mbeje Department of Computer Science Ball State University November 09, 2000.
LING 580: Today Goals: 1. What constitute possible changes for the vowel systems of natural languages? 2. Schools of thought (McMahon 2) Neogrammarian.
Introduction to Scientific Research. Science Vs. Belief Belief is knowing something without needing evidence. Eg. The Jewish, Islamic and Christian belief.
Syntax Andrew Carnie. The web page for this textbook.
Psychology As Science Psychologists use the “scientific method” Steps to the scientific method: - make observations - ask question - develop hypothesis.
Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki.
The Unreasonable Effectiveness of Data
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Feminist Methods of Research
Unit Theoretical bases of psycholinguistics and sociolinguistics Theoretical bases of psycholinguistics Development and boundaries.
Competing Conceptions of Language Dr. Douglas Fleming University of Ottawa.
Syntax By WJQ. Syntax : Syntax is the study of the rules governing the way words are combined to form sentences in a language, or simply, the study of.
Publishing in Theoretical Linguistics Journals. Before you submit to a journal… Make sure the paper is as good as possible. Get any feedback that you.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
LI 2023 NATHALIE F. MARTIN L ANGUAGE V ARIATION. Outline of Today’s Class Today’s : Linguistic Community Linguistic Variation Geography Through time Social.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Labov’s Principles—1972 Language in Society, Vol.1 No. 1 “ Principles ” 1.Cumulative Principle 2.The Neogrammarian Hypothesis 3.The Uniformitarian Principle.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
An Introduction to Linguistics
An Introduction to Motivational Interviewing
Ling RM Intro: Linguistic facts vs. linguistic theories
GOOD MORNING.
Competence and performance
Elements of Constructivist Teaching and learning Practices
Presentation transcript:

A Linguist’s Search Engine Philip Resnik University of Maryland JHU Conference on Spatial Language and Spatial Cognition September 18, 2003

Acknowledgments Collaborators –Christiane Fellbaum (Princeton) –Mari Broman Olsen (Microsoft) Implementors –Aaron Elkiss –Rafi Khan, G. Craig Murray, Saurabh Khandelwal Inspiration –Steve Abney, Chris Manning, Mitch Marcus This work is supported by NSF ITR grant IIS

Facing Variability in Linguistic Data Sapir: “Everyone knows that language is variable” Chomsky: “[C]rucial evidence comes from marginal constructions; for the tests of analyses often come from pushing the syntax to its limits, seeing how constructions fare at the margins of acceptability.'' Student in linguistics talk (whispered to friend): “Does that sound ok to you?”

Traditional Linguistics with Naturally Occurring Data Long tradition outside the generative mainstream (e.g., Oostdijk and de Haan, 1994) Recent, labor intensive efforts (e.g., Macfarland 1995) Frequent back-of-napkin jottings

Grammars and Variability Sapir (1921): “All grammars leak.” Abney (1996): “[A]ttempting to eliminate unwanted readings... Is like squeezing a balloon: every dispreference that is turned into an absolute constraint to eliminate undesired structures has the unfortunate side effect of eliminating the desired structure for some other sentence.”

Theoretical versus Empirical Einstein (1940): Science is the attempt to make the chaotic diversity of our sense-experience correspond to a logically uniform system of thought [in which] experience must be correlated with the theoretical structure… What we call physics comprises that group of natural sciences which base their concepts on measurements… [emphasis added]

Where might data come from? Text collection efforts (British and American national corpora, LDC Gigaword corpora, CHILDES, Switchboard, etc.) The World Wide Web Shallow annotated corpora, e.g. part-of- speech in the Brown Corpus of American English Deeper annotations, e.g. Penn Treebank Even deeper: PropBank, FrameNet

What about tools? Concordancing, KWIC (e.g., Wordsmith) Treebanks and tgrep Gsearch (Corley et al.) Do-it-yourself parsing and search Manning (2003): “…it remains fair to say that these tools have not yet made the transition to the Ordinary Working Linguist without considerable computer skills.”

A Web Search Tool for the Ordinary Working Linguist Must have linguist-friendly “look and feel” Must minimize learning/ramp-up time Must permit real-time interaction Must permit large-scale searches Must allow search on linguistic criteria Must be reliable Must evolve with real use

If you build it, they will come…

Pollard and Sag (1994); discussion in Manning (2003) –(a) We consider Kim to be an acceptable candidate –(b) We consider Kim an acceptable candidate –(c) We consider Kim quite acceptable –(d) We consider Kim among the most acceptable candidates –(e) *We consider Kim as an acceptable candidate –(f) *We consider Kim as quite acceptable –(g) *We consider Kim as among the most acceptable candidates –(h) *We consider Kim as being among the most acceptable candidates

Constructions The Xer the NP1 the Yer the NP2

Overnight collection: 9pm-6am

Objections Chomsky (1979): “You can also collect butterflies and make many observations. If you like butterflies, that’s fine; but such work must not be confounded with research, which is concerned to discover explanatory principles of some depth and fails if it does not do so.”

Manning (2003): “To go out on a limb for a moment, let me state my view: generative grammar has produced many explanatory hypotheses of considerable depth, but is increasingly failing because its hypotheses are disconnected from verifiable linguistic data... I would join Weinreich, Labov, and Herzog (1968, 99) in hoping that ‘a model of language which accommodates the facts of variable usage... leads to more adequate descriptions of linguistic competence.”

Abney (1996): “The focus in computational linguistics has admittedly been on technology. But the same techniques promise progress on issues concerning the nature of language that have remained mysterious for so long. The time is ripe to apply them.”

Jackendoff: “[T]he reaction of some linguists to foundational discussion of the sort I engage in here is: ‘Do I and my students really have to think about this? I just want to be able to do good syntax (or phonology or whatever).... Still, when you’re driving you don’t just look ten feet in front of the car.... [if] integration seems to call for alteration of the larger context, one should not shrink from the challenge.”

Thank you!