Ch. 8 Language Processing: Humans and Computers

Ch. 8 Language Processing: Humans and Computers
An Introduction to Language (9e, 2009) by Victoria Fromkin, Robert Rodman and Nina Hyams

Human Language Processing
Psycholinguistics focuses on linguistic performance in speech production and comprehension We usually don’t have problems producing or understanding sentences in our language, and we do both without effort or awareness Some grammatical sentences are difficult to understand (The horse raced past the barn fell.), and some ungrammatical sentences are easy to understand (*The baby seems sleeping.) This means that language processing is more than grammar alone—there are psychological mechanisms that work with the grammar to allow us to produce and comprehend language

The Speech Signal Speech sounds can be described by their acoustic (or physical) properties The vibrations of our vocal cords cause variations of air pressure, and sounds we produce can be described in terms of: Fundamental frequency (pitch): how fast the variations of air pressure occur Intensity: the magnitude of the variations, which determines the loudness of a sound The quality of a speech sound is determined by the shape of the vocal tract; the shape affects how the sound waves travel Spectograms, or voiceprints, can be created by computers and are used to analyze speech sounds Spectograms indicate the intensity, formants (the strongest harmonics produced by the shape of the vocal tract during production), and pitch of speech sounds and demonstrate how different speech sounds have recognizably different acoustic properties

Speech Perception and Comprehension
The “segmentation problem”: how do listeners carve up the continuous speech signal into meaningful units? Lexical access (or word recognition) is the process of searching your lexicon for phonological strings that correspond to words Stress and intonation provide clues about structure The “lack of invariance problem”: how do listeners recognize different speech sounds when they are used in different contexts and spoken by different people? Listeners can normalize their perceptions to account for rate of speech and speaker pitch differences

Bottom-up and Top-down Models
Understanding language in real time is an impressive feat, and there is a certain amount of guesswork involved in real-time language comprehension Many psycholinguists believe that language perception and comprehension involves both: Top-down processing: proceeding from semantic and syntactic information to the lexical information from the sensory input Listeners can predict that if a speaker says the, then an NP is coming In experiments, listeners seem to make much use of top-down information Bottom-up processing: moving from the sensory phonetic input to phonemes, then morphemes, etc. up to semantic interpretation Listeners wait to construct an NP until they hear the followed by a noun

Lexical Access and Word Recognition
In order to discover more about lexical access or word recognition, psycholinguists have devised several experiments: Lexical decision experiments involve people deciding whether or not a string of letters or sounds is a word Frequently used words such as car are responded to more quickly than infrequent words such as fig This leads researchers to believe that frequent words are more easily accessed in the lexicon than infrequent words

Lexical Access and Word Recognition
A lexical decision about the word doctor will be faster if it has been preceded by the word nurse This effect is called semantic priming and could be due to semantically related words being stored in the same part of the lexicon Lexical access experiments show that people retrieve all the meanings of a word Naming tasks require subjects to read printed words aloud and findings that people read regularly spelled words faster than irregularly spelled words show that: 1. People A) look for the string of letters in their lexicon, and if they find it they can pronounce the stored representation for it or B) if they don’t recognize it they can sound it out based on linguistic knowledge 2. The mind notices irregularity

Syntactic Processing Listeners need to build phrase structure representations of sentences as they hear them in order to understand the sentence They must place each incoming word in a grammatical category and disambiguate messages Garden path sentences are ones that require listeners to shift their analysis midway through the sentence After the child visited the doctor prescribed a course of injections Readers will naturally put the doctor into the slot of direct object for the verb visited, but as the reader goes on they must change their analysis and recognize the doctor as the subject of the main clause instead

Syntactic Processing The mind uses two principles in parsing sentences that lead people to go stray when encountering garden path sentences: Minimal attachment: build the simplest structure consistent with the grammar of the language Late closure: attach incoming material to the phrase that is currently being processed Memory constraints prevent the easy comprehension of a sentence like: Jack built the house that the malt that the rat that the cat that the dog worried killed ate lay in. Performance constraints like this limit the number of sentences we are likely to create out of the infinite possibilities

Syntactic Processing Shadowing tasks involve subjects repeating what they hear as rapidly as possible Most people can shadow with a delay of 500 to 800 milliseconds, but some people can shadow within one syllable (300 milliseconds behind) Fast shadowers correct speech errors even when told not to, and corrections are more likely to occur when the target word is predictable based on linguistic context These experiments provide evidence for top-down processing and show how impressively fast listeners do grammatical analysis

Speech Production: Planning Units
Although speech sounds are linearly ordered, slips of the tongue (including spoonerisms) reveal that speech is conceptualized before it is uttered Intended: ad hoc Actual: odd hack The vowel sounds [] in the first word and [a] in the second word were reversed This type of error reveals that the second word was already planned Interestingly, phonological errors primarily occur in content morphemes rather than function morphemes, and function morphemes are not interchanged like content morphemes

Speech Production: Lexical Selection
Word substitutions are seldom random; we tend to accidentally replace a word with a semantically related word Sometimes we produce a blend, which is part of one word and part of another: splinters/blisters  splisters edited/annotated  editated Segments tend to stay in the same position in these blend errors

Application and Misapplication of Rules
Sometimes speakers also make errors with morphological and syntactic rules Rules may be applied to create possible but nonexistent words such as ambigual Regular rules may accidentally be applied to irregular words as in swimmed In an error such as saying a burly bird instead of an early bird, the appropriate allomorph (a instead of an) is chosen even though the speaker did not intend to produce a noun starting with a consonant This tells us that the rule to choose a or an must apply after early was accidentally switched to burly

Nonlinguistic Influences
Nonlinguistic factors can also contribute to speech production Intended utterance: I’ve never heard of classes on Good Friday Actual utterance: I’ve never heard of classes on April 9th Good Friday was on April 9th that year, so even though Good Friday and April 9th have nothing in common phonologically or morphologically, the nonlinguistic association was enough to prompt such an error

Computer Processing of Human Language
Computational linguistics: Is a subfield of linguistics and computer science Focuses on the interactions of human language and computers Includes the analysis of: written texts and spoken discourse the translation of text and speech between languages the use of human language for communication between computers and people the modeling and testing of linguistic theories

Computational Phonetics and Phonology
Computational phonetics and phonology is concerned with processing speech Speech recognition: analyzing speech and producing a transcription of it Speech synthesis: creating an electronic simulation of speech to be “said” by the computer

Speech Recognition Many interactive phone systems and cell phones have small vocabularies that allow for a limited number of messages These systems search the speech signal for anything resembling stored words More advanced systems must be trained to a speaker’s particular voice and uses phonotactics and statistical analysis to recognize speech If a sound that could be [l] or [r] occurs after a [d] sound, then computer knows that [r] is the correct sound, not [l] If the computer cannot distinguish between [r] and [l] and therefore must decide between rate and late in it’s too __, it can either use knowledge of syntax to choose late, or can use statistical knowledge that It’s too late occurs much more often than it’s to rate and thus chooses late But people are much better at filtering out irrelevant sounds and focusing on the voice of a single speaker (the cocktail effect)

Speech Synthesis Speech sounds can be reduced to a small number of acoustic components that can be mixed together like a recipe, which is known as formant synthesis: 1. Start with a tone at the same frequency as vibrating vocal cords 2. Emphasize the harmonics corresponding to the formants required for a particular sound 3. Add hissing or buzzing for fricatives 4. and so on… Another approach is known as concatenative synthesis which relies on recorded units from humans that are assembled to form the desired utterance

Text-to-Speech Text-to-speech programs converts input text into a phonetic representation (for formant synthesizers) or a representation of whatever units are to be combined (for concatenative synthesizers) Two problems with text-to-speech programs are: 1. Homographs that are pronounced differently Complex structural knowledge is required to know whether to pronounce read as [rid] or [rd] in the following sentences: I have read the book Which girl did the teacher have read the book? 2. Inconsistencies in spelling Computers now have memory to compile every word that is spelled similarly but pronounced differently, like tough, bough, cough, and dough But new words are always being added to each language, and text-to-speech programs need rules for converting any new word into speech sounds

Computational Morphology
Computers also need to understand morphology and be able to identify morphemes One strategy would be to compile all the morphological forms of all a language’s words into a dictionary But, the dictionary would constantly be out of date as new words enter the language And not all forms are predictable, so it would be impossible to predict a new compound like podcast or the plural of fax Stemming is the process of detecting affixes and stripping them from roots to identify morphemes For example, the computer would detect and strip the be- and the –ed from befriended, and then would identify those morphemes in its dictionary

Computational Syntax Computers must also be able to determine syntactic structure A parser is a program that uses grammar to assign phrase structure to a string of words A top-down parser proceeds by first consulting the grammar rules and then examining the input string to see if the first word could begin an S A bottom-up parser looks at the input string first and then finds phrasal categories When parsing a sentence, a parser may make faulty assumptions about syntactic categories or structures The parser could then backtrack to properly parse the sentence Or, the parser may parse all possible structures in parallel, and then only parses that finish are accepted as valid

Computational Syntax We also want computers to be able to use PS rules to create new sentences In order to create complex language, the computer must assign lexical items to the meanings to be expressed and then arrange these lexical items in order In the top-down approach, the system begins with the highest level category (S) and then works down to the lexical items In the bottom-up approach, the system begins with the lexical items and then combines them into larger and larger units

Computational Syntax A transition network composed of nodes (circles) and arcs (arrows) may be used to model syntactic processing Example: You put up the switch First, the computer uses a model to create the entire sentence Then it must create the subject NP (you) Next it must determine the VP (put up the switch), etc.

Compositional Semantics
Compositional semantics is concerned with 1) producing a semantic representation of the input in the computer and 2) producing natural language to represent meanings To generate sentences, the computer must find words to represent the concepts to be conveyed Then the syntactic rules will apply to these words In order to achieve speech understanding, the computer must find meanings that fit the words and structures of the input

Compositional Semantics
The relationships between words can also be demonstrated with a network like those used for computational syntax This model means that you is to do the verb (the agent) and the switch is to be acted upon by the verb (the theme) Or, systems can use formal logic for semantic representations such as PUT UP(YOU, THE SWITCH) The computer can check for truth values in this way by checking if the pair comprised of YOU and THE SWITCH are included in the set of pairs that represent the meaning of PUT UP

Compositional Pragmatics
Computers use semantic and pragmatic knowledge to analyze structurally ambiguous sentences Many natural language systems are equipped with some contextual and world knowledge Computers must also engage in reference resolution, or determining when two expressions refer to the same object (for example, pronoun use) This requires grammatical knowledge and situational context

Computational Sign Language
Linguists at Boston University are currently working on computer algorithms that will recognize sign language as spoken language can be The signer stands in front of a camera and the computer recognizes the distinctive features of sign language such as hand shape, movement, and orientation

Computer Models of Grammar
Computers can be programmed to model the grammar of language This forces linguists to be explicit in formulating the rules grammar If the program cannot generate a possible grammatical sentence, then there is an error in the grammar If the program generates an ungrammatical sentence, then there is an error in the grammar

Frequency Analysis, Concordances, and Collocations
Computers can be used to: do frequency analyses to reveal the most common words in written (the, of, and, to, a, in, that, is, was, he) and spoken (I, and, the, to, that, you, it, of, a, know) American English do concordances, which specify the location of any particular word and its context do collocation analyses, which reveals the occurrences of two or more words within a short space of each other in a corpus and provides evidence that the presence of one word in a text affects the occurrence of other words

Computational Lexicography
Computational linguists need more information about words and morphemes than just the meanings The field of computational lexicography is concerned with making standard dictionaries and dictionaries specifically for computational linguists These special dictionaries contain information about: Phonemic transcriptions Phonetic variants Syllabification Syntactic categories and more

Information Retrieval and Summarization
Information retrieval: the use of computers to locate and display data from possibly very large databases Data mining is the term used for complex information retrieval Summarization programs allow computers to eliminate redundancy and identify the most salient features of a body of information These programs can reduce each article in a corpus of articles by a certain amount, provide just the topic sentence of each paragraph, or provide paragraphs based on a concept vector A concept vector = a list of meaningful key words whose presence in a paragraph is a measure of the paragraph’s significance

Spell Checkers and Machine Translation
Spell checkers range in sophistication from mindless dictionary lookups to intelligent flagging of incorrect homonyms (your for you’re, bear for bare, etc.) The goal of automatic machine translation is to input a message from the source language and have it translated into the target language Translation requires more than just replacing each source language word with a target language word Humans encounter morphological, syntactic, idiomatic, and metaphorical challenges during translation, and these are exacerbated by an electronic translator

Computational Forensic Linguistics
Computational linguistics can be used in legal disputes regarding trademarks: A computer search proved that the bound morpheme Mc- is now used productively to mean “basic” or “inexpensive” But a judge ruled that another company could not use Mc- for their product because it was too firmly associated with McDonald’s for consumers Computational linguistics can also be used for the interpretation of legal terms: A court case hinged on the meaning of the word visa, and by searching a multimillion-word corpus, a computational linguist concluded that visa meant “a kind of permit to enter a country” not “a permit to request permission to enter a country” This finding affects laws surrounding international travel

Speaker Identification
Speaker identification is the use of computers to assist in the task of ascertaining the identity of a speaker Displays of wave forms (which show the amplitude changes of speech over time) and spectograms (which show the frequencies of speech over time) can help provide evidence in cases needing speaker identification

Speaker Identification
Consider the following bomb threat: Good morning. There are three bombs to go off today at three pharmaceuticals in North Carolina. Please be aware. Advise your people or go to their funerals. Goodbye. In this case, an African American man born and raised in North Carolina was arrested for making this threat But a computational forensic linguist determined that the suspect was unlikely to be the caller and that the caller was probably not a native speaker of English The caller inserted a vowel so that the pronunciation of goodbye sounded more like “good-a-bye” which is not likely to be said by a native speaker Unlike the caller, the suspect pronounced goodbye without a /d/ and with a monophthongized last vowels, as is typical for southern pronunciation

Ch. 8 Language Processing: Humans and Computers

Similar presentations

Presentation on theme: "Ch. 8 Language Processing: Humans and Computers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ch. 8 Language Processing: Humans and Computers

Similar presentations

Presentation on theme: "Ch. 8 Language Processing: Humans and Computers"— Presentation transcript:

Similar presentations

About project

Feedback