Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Tasneem Ghnaimat

Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences, a corpus, say, the last 3 years of a newspaper. Given a corpus, a (finite!) set of sentences, we want to estimate a probability over those sentences.

Language Model A language model has three parts: (1) a Vocabulary; (2) a way to define sentences using the vocabulary; and (3) a probability over the possible sentences.

Why Model is Useful? Speech recognition Handwriting recognition Spelling correction Machine translation systems Optical character recognizers

Handwriting Recognition Assume a note is given to a bank teller, which the teller reads as I have a gub. NLP will analyze as follows: gub is not a word gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank

Spell checker Collect list of commonly substituted words piece/peace, whether/weather, their/there... Example: “On Tuesday, the whether …’’ “On Tuesday, the weather …”

Different Models for languages Markov Model:  Probabilistic model that assume that we can predict the probability of some future unit without looking too far into the past  Each state has two probability distribution: the probability to generate a symbol and probability of moving to a particular state.  From one state, the Markov model generates a symbol and then moves to another state.

Hidden Markov Model Called hidden because the state transitions are not observable (the input symbols don’t determine the next state). HMM taggers require a lexicon and text for training a tagger Aims to make a language model automatically with little effort. For example, the word help will be tagged as noun rather than verb if it comes after an article. This is because the probability of noun is much more than verb in this context.

Hidden Markov Model In HMM, we know only the probabilistic function of the state sequence. S1S1 O1O1 S2S2 O2O2 SnSn OnOn … … The Oi nodes are called observed nodes. The Si nodes are called hidden nodes.

Hidden Markov Model Applications: Speech recognition (hidden nodes are text words, observations are spoken words) Part of Speech Tagging (hidden nodes are parts of speech, observations are words)

Language Analysis Morphology: handles the formation of words by using morphemes – base form (stem), e.g., believe – affixes (suffixes, prefixes, infixes), e.g., un-, -able, -ly

Morphology Important for many tasks – machine translation – information retrieval – Part-of-speech tagging

Morphemes and Words Combine morphemes to create words Inflection combination of a word stem with a grammatical morpheme same word class, e.g. clean (verb), clean-ing (verb) Derivation combination of a word stem with a grammatical morpheme Results in different word class, e.g. clean (verb), clean-ing (noun) Compounding combination of multiple word stems, e.g. : sun+shine  sunshine, base+ball  baseball Cliticization combination of a word stem with a clitic different words from different syntactic categories, e.g. I’ve = I + have

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Similar presentations

Presentation on theme: "Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Similar presentations

Presentation on theme: "Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,"— Presentation transcript:

Similar presentations

About project

Feedback