Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics in Linguistics ENG 331

Similar presentations


Presentation on theme: "Topics in Linguistics ENG 331"— Presentation transcript:

1 Topics in Linguistics ENG 331
Dr. Rania Al-Sabbagh Department of English Faculty of Al-Alsun (Languages) Ain Shams University

2 What are Regular Expressions?
Regular Expressions – RegEx or RegExp, for short – are a powerful way to do complex searches. With RegEx you can, for instance, find: Acronyms Rhyming words Postal codes Phone numbers s Spelling variations … and much more Week 8

3 RegEx at Work Many corpus processors support RegEx search. AntConc is one of them. For illustration purposes, we will use the Brown Corpus. The Brown University Standard Corpus of Present-Day American English (or just Brown Corpus) was compiled in the 1960s. The corpus originally contains 1,023,374 tokens and 41,506 types sampled from 15 text categories, including: press, religion, non-fiction books, etc. To get started open your AntConc and your Brown Corpus text file. We are using a raw version of the corpus. Week 8

4 RegEx at Work: Finding Acronyms
For a computer, what are Acronyms? They are all CAP words that include two or more characters. How can we tell the computer to look for ‘all CAP words that include two or more characters’? \b[A-Z]\b \b[A-Z]+\b \b[A-Z]*\b \b[A-Z]{2,}\b Week 8

5 Quiz Write a RegEx to find: two-letter acronyms only
three-letter acronyms only long acronyms of at least 4 letters Week 8

6 RegEx at Work: Finding Verb Conjugations
How can we search for all the verb conjugations of begin (i.e., begin, begins, began, begun) in one step? \bbeg?n\b \bbeg.n\b \bbeg+n\b \bbeg*n\b \bbeg*ns?\b \bbeg.ns\b \bbeg.ns?\b Week 8

7 Quiz Write a RegEx to find the verb conjugations of:
speak: speak, speaks, spoke, spoken fly: fly, flies, flew, flown Week 8

8 RegEx at Work: Finding Spelling Variation
Which RegEx matches: ‘colour’, ‘color’, ‘colours’, ‘colors’, ‘colouring’, ‘coloring’? colou?rs?(ing)? colours?(ing)? colou?rs?ing? Can colorful, colorless, and colored be matched by the same RegEx? If not, how can we modify the RegEx to match them? Is there are more concise way to write your RegEx? colou?rs?(ing)?(less)?(ful)?(ed)? colou?r(s|ing|less|ful|ed)? OR colou?r\w* Week 8

9 Quiz Write a RegEx to find: puppy – puppies behavior – behaviour
Week 8

10 RegEx at Work: Finding Affixes
Which RegEx matches all words ending with ‘ness’? \b[a-zA-Z]+ness\b \b[a-zA-Z]*ness\b Another way to do the same thing is \b\w+ness\b Week 8

11 Quiz Write a RegEx to find words starting with: anti un
Write a RegEx to find words ending with: ation ment Week 8

12 RegEx at Work: Finding Rhyming Words
Which RegEx find words rhyming with ‘duck’? Check all that apply: \b[a-zA-Z]uck\b \b[a-zA-Z]+uck\b \b\w+uck\b Quiz Write a RegEx to find words rhyming with: soon clip Week 8

13 RegEx at Work: Finding Specific Words
Which RegEx matches all words starting with ‘a’ in both upper and lower cases? \b[aA]\w+\b \ba\w+\b \b[aA]\b Which RegEx matches ‘This’ at the beginning of sentences? .This \. \bThis\b Which RegEx matches ‘this’ at the end of the sentence? this. this\. Week 8

14 RegEx at Work: Finding Numbers
Which RegEx matches all digits? \d \D Which RegEx returns years (e.g. 1999, 2010)? Check all that apply: [0-9]{4} [0-9][0-9][0-9][0-9] [0-9] \b[0-9]\b Week 8

15 RegEx at Work: Finding Punctuation Markers
Which RegEx matches all punctuation markers? [!?,.*()&”;’] \W+ Week 8

16 RegEx Cheat Sheet 1 Quantifiers + one or more * zero or more
? zero or one {n,m} at least n times and at most m times {n,} at least n times {,m} at most m times {n} exactly n times Ranges [0-9] the range of all possible digits [A-Z] the range of all upper case alphabet letters [a-z] the range of all lower case alphabet letters Week 8

17 RegEx Cheat Sheet 2 Grouping () all what is in-between is one unit
Characters \w all word characters (e.g. alphanumeric characters) \d all digits \D everything except digits \W all non-word characters (e.g. punctuation markers) \s white spaces Week 8

18 RegEx Cheat Sheet 3 Boundaries \b word boundary Symbols
\ escape symbol to treat special character literally | the either or symbol Regular expressions make raw corpora more useful. What are other ways in which raw corpora can be useful? Week 8


Download ppt "Topics in Linguistics ENG 331"

Similar presentations


Ads by Google