Language Processing: Humans & Computer Lauren Kafka Marina Hamoy August 3, 2006 Psycholinguistics & Computational Linguistics
Psycholinguistics: The area of linguistics that is concerned with linguistic performance–how we use our linguistic competence–in speech (or sign) production and comprehension.
The Speech Chain: Brain-to-Brain Linking A spoken utterace starts as a message in the speaker’s brain/mind. The message is put into linguistic form and interpreted as articulation commands. It emerges as an acoustic signal. The signal is processed by the listener’s ear and sent to the brain/mind, where it is interpreted.
Comprehension One goal of psycholinguistics is to describe the processes people normally use in speaking and understanding language. Breakdowns in performance such as “tip-of- the-tongue” phenomena, speech errors, and failure to comprehend tricky sentences tell us a lot about how language is processed.
Can you think of any of your own? Examples of when some word was on the tip-of- your-tongue, but you couldn’t think of it Speech errors (Hung go) Failure to comprehend tricky sentences http://www.zippyvideos.com/5589295543497276/tim e_out-1/original http://www.zippyvideos.com/5589295543497276/tim e_out-1/original
Speech Sounds: Understanding Begins with Hearing Sound is produced whenever there is a disturbance in the position of air molecules. Acoustic phonetics is concerned only with speech sounds, all of which can be heard by the normal human ear.
Frequency, Pitch & Volume The speed of the variations of air pressure determines the fundamental frequency of sounds. This is perceived by the hearer as pitch. The magnitude, or intensity, of the variations determines the loudness of the sound.
Speech Perception The speech signal can be broken into strings of: Phonemes Syllables Morphemes Words Phrases
Context & Lexical Access Night rate vs. nitrate depends on context Meaning of words depends on lexical access or word recognition Example: A sniggle blick is procking a slar. If you don’t recognize the words, you conclude that the sentence is nonsense.
Lexical Semantics Processing speech to get at the meaning of what is said requires syntactic analysis as well as knowledge of lexical semantics. Stress and intonation provide some clues to syntactic structure. Example: He lives in the white house. He lives in the White House. Loudness, pitch, and duration of syllables provide information about meaning.
Timing & Rhythm I vant to sock your blut. Ivan tsuckyour blut. Ted Koppel gave an address. Ted Koppel gave Ann a dress. Can you think of two sentences that include the same letters or sounds, but differ in timing, rhythm, and meaning?
Machine translation (MT) – Between natural languages – Analysis of authentic materials Communication between people & computers – Artificial intelligence (AI) – World Wide Web (www) Research in linguistic theories Language Analysis & Computer Technology
Frequency Analysis Corpus: ~1M spoken or written language data gathered for linguistic research or analysis 1) Frequency analysis and, the, to, that, of, a, I, you, it, & know – SAE: 30% - and, the, to, that, of, a, I, you, it, & know – WAE: 25% - the (7%), of, and, to, a, that, in, is, was, & he – English prepositions WAE (except TO) – Profane/taboo SAE – http://textalyser.net/ http://textalyser.net/
Collocation Analysis 2 or more words with customary relationships http://esl.about.com/library/vocabulary/blcollocation_1.htm
Data Mining Information extraction using keyword queries Typical applications: customer profiling, fraud detection, credit risk analysis, promotion evaluation Norway to Wal-Mart: We don't want your shares - Pension-fund investing with a social consciousness. Intelligence obtained by applying data mining to a database of French theses on the subject of Brazil
Machine Translation “There's a message coming through, captain - TRANSLATION SOFTWARE, the science- fiction dream of a machine that understands any language, has taken a step closer to reality.” http://www.gutenberg.org/etext/6737 free download of literature
Computational Phonetics & Phonology Computers programmed to produce synthetic speech by following a ‘recipe’ of electronic blending Speech Recognition Speech Synthesis – TTS difficulties > 300 Heteronyms: read [reed] & [red] Inconsistent spelling: tough, bough, cough, dough
Computational Morphology Computers need to understand the inter-weaving of rules, exceptions & morpheme & word structure Computer’s dictionary: morphological forms – needs continual updating Form predictability: impossible for compounding – sky+box= skybox Component morpheme – Monomorpheme or not – [reZENT] or [Resent] – Heteronyms - lead [leed] & [led]
Computational Syntax: ELIZA ELIZA: 1st human-machine communication invented by J Weizenbaum – using syntax (print) simulating a psychiatric session Circuit-Fix-It-Shop: NCSU & DU repair tech programmed speech – Capable of understanding & speaking complex utterances Computer parser