Presentation is loading. Please wait.

Presentation is loading. Please wait.

LANGUAGES Prepared by: Paridah Samsuri Dept. of Software Engineering

Similar presentations


Presentation on theme: "LANGUAGES Prepared by: Paridah Samsuri Dept. of Software Engineering"— Presentation transcript:

1 LANGUAGES Prepared by: Paridah Samsuri Dept. of Software Engineering
Faculty of Computer and Information Science

2 Finite Specification of Languages Regular Sets and Expressions
Contents: Strings and Languages Finite Specification of Languages Regular Sets and Expressions

3 Natural languages, computer languages, Mathematical languages
1. Strings and Languages Natural languages, computer languages, Mathematical languages A language is a set of strings over an alphabet. Syntax of the language: certain properties that must be satisfied by strings.

4 A String over a set X is a finite sequence of elements from X.
Strings and Alphabets A String over a set X is a finite sequence of elements from X. The set of elements are called alphabet of the language. Alphabet consists of a finite set of indivisible objects. The alphabet of a language is denoted ∑

5 Notations: use Greek letters  and  to designate alphabets.
Important : Strings of characters are fundamental building blocks in computer science. Alphabet over which the strings are defined, may vary with the application however, it is a finite set. Eg. 1 = (0, 1); 2 = (a, b, c, d); Notations: use Greek letters  and  to designate alphabets.

6 String over an alphabet is a finite sequence of symbols from that alphabet, written next to one another and not separated by commas. Example: Let  = {0,1} then strings over  are: 1010 and etc.

7 String Recursive Definition
Let  be an alphabet. The set of strings over  (written as *) is defined recursively as follows: Basis : * Recursive step: if w* and a Then wa* Closure: w* if and only if it can be obtained from basis element  by a finite number of applications of recursive step.

8 Length of String If w is a string over , the length of w, written |w| is the number of symbols that it contains. Example: |λ| = 0 |0| = 1 |1| = 1 |1010| = 4 |001101| = 6

9 The set of strings, *, includes:
length 0: λ length 1: a b c length 2: aa ab ac ba bb bc ca cb cc length 3: aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cab cac cba cbb cbc cca ccb ccc

10 Concatenation of String
If we have string x of length m and string y of length n, the concatenation of x and y written xy is x1…xmy1…yn. Example: x = aba y = bbbab Then xy = ababbbab yx = bbbababa xyx = ababbbababa

11 Self Concatenation If we have string x, the concatenation of x and x is self concatenation Example: x0 =  x1 = x = aba x2 = xx = abaaba x3 = xxx = abaabaaba xk = xx…x : self-concatenated string k times

12 Definition of Concatenation
Let u, v  * . The concatenation of u and v, written as uv, is a binary operation on * defined as follows: Basis : length (v)=0, then v= and uv=u Recursive step: Let v be a string with length(v)=n>0. Then v=wa, and uv=(uw)a for some string w with length n-1 and a

13 Let u=ab, v=ca, w=bb. Then the concatenation of: uv=abca vw=cabb (uv)w=abcabb u(vw)=abcabb The result is independent of the order in which the operations are performed.

14 Substring String z is a substring of w if z appears consecutively within w (z occurs inside of w) Example: w = abbaaababb bba is a substring of w abab is a substring of w baba is NOT a substring of w

15 Prefix Prefix for string v is a substring u where :
if x = , for  = xuy Then  = uy (string v starts with substring u) Example: if w = abbaaababb a is a prefix of w abbaa is a prefix of w bba is NOT a prefix of w

16 Suffix Suffix for string v is a substring u where :
if y = , for  = xuy Then  = xu (string v ends with substring u) Example: if w = abbaaababb abb is a suffix of w babb is a suffix of w bab is NOT a suffix of w

17 Reverse Example: if w = a wR = a if w = abb wR = bba
The reverse of w, written wR or wr is the string obtained by writing w in the opposite order. Example: if w = a wR = a if w = abb wR = bba if w = aba wR = aba if w = abbcd wR = dcbba

18 Definition of Reverse Let u be a string in *
The reverse of u, written as uR is defined recursively as follows: Basis : length(u)=0 Then u=  and R =  Recursive step: if w* and a if length(u)=n>0, then u=wa and uR = awR for some string w with length n-1 and some a

19 Take 5..!

20 2. Finite Spec. of Languages
Languages can be defined using 2 ways: either presented as an alphabet and the exhaustive list of all valid words an alphabet and a set of rules defining the acceptable words.

21 PALINDROME language Definition of a new language PALINDROME over the alphabet :

22 PALINDROME if we begin listing the elements in PALINDROME, we find
which if we concatenate 2 words in PALINDROME, sometimes it can produce a new words which is also in PALINDROME but sometimes it doesn’t. (Talk about it later)

23 Kleene Closure / star Closure of the alphabet (Σ ) is a language in which any string of letters from an alphabet Σ* is a word. For example, if , then if ,then

24 Kleene star (cont.) We can think of the Kleene star as
an operation that makes an infinite language of strings of letter out of an alphabet. Infinite language = infinitely many words, each of finite length.

25 Definition of S* If S is a set of words, then by S*
we mean the set of all finite strings formed by concatenating words from S, where any word may be used as often as we like, and where the null string is also included.

26 Example If S = {aa b}, then S* = {λ plus any word composed of
factors of aa and b} = {λ plus all strings of a’s and b’s in which the a’s occur in even clumps} = {λ b aa bb aab baa bbb … }

27 Example If S = {a ab}, then
S* = {λ plus any word composed of factors of a and ab} = {λ plus all strings of a’s and b’s except those that start with b and those that contain a double b} = {λ a aa ab aaa aab aba … }

28 Proof a word in the S* To prove that a certain word is in the closure language S*, we must show how it can be written as a concatenate of words from the base set S. For example, to show abaab is in S*, we can factor it as (ab)(a)(ab) and these are in S, therefore, their concatenation is in S*. If there is only one way to factor the string, we say that the factoring is unique.

29 Example Consider the 2 languages S = {a b ab} and T = {a b bb}
both S* and T* are languages of all strings of a’s and b’s since any string of a’s and b’s can be factored into syllables of either (a) or (b), both of which are in S and T.

30 Positive closure + If we would like to refer to only the concatenation of some (not zero) strings from a set S, we use the notation + instead of *, for example: if , then

31 S+ if S is a set of strings not include λ, then S+ is the language S* without the word λ. If S is a language that does contain λ, then S+ = S*. S+ can contain λ only when S contains the word  initially.

32 S* and S** Theorem 1: for any set S of strings we have S* = S** Proof:
every word in S** is made up of factors from S*. Every factor from S* is made up of factors from S. Therefore, every word in S** is made up of factors from S. Therefore, every word in S** is also a word in S*. (is contained in or equal to)

33 Regular Expressions we can describe a language definition looked similar to

34 Regular Expressions we can guess the meaning of the languages, however it can be defined in a particular way that gets hard to guess. For example:

35 Another new method of language definition
We shall develop some new language-definition symbolism that will be much more precise than the … For example: consider the language L4 We can define it with closure Let Then for shorthand, we could have written

36 Simple expression By using Kleene star, we can have a simple expression instead of … In order to distinguish between x from alphabet of x from Kleene star x*, we will use bold face x* instead to make it different.

37 Language(x*) We can also define L4 as
Since x* is any string of x’s, L4 is then the set of all possible string of x’s of any length (including λ)

38 Example Suppose we wish to describe the language L over the alphabet
where “all words of the form one a follow by some number of b’s (maybe no b’s at all)” we may write

39 ab* means a(b*), not (ab)*
Parentheses are not letters in the alphabet of this language, so they can be used to indicate factoring without accidentally changing the words. Like the powers in algebra ab* means a(b*), not (ab)*

40 We can use the + notation and write
Lang.(xx*) vs. Lang.(x+) means ? We start each word of L1 by writing down an x and then we follow it with some string of x’s (which may be no more x’s at all.) We can use the + notation and write

41 L = (xx*) and L = (x+) The language L1 defined above can also be defined by any of these expressions: xx* x+ xx*x* x*xx* x+ x* x*x+ x*x*x*xx* Remember x* can always be λ

42 Example ab*a is the set of all string of a’s and b’s that have at least two letters, that begin and end with a’s, and that have nothing but b’s inside (if anything at all).

43 ba and aba are not in this Language
Example a*b* contains all the strings of a’s and b’s in which all the a’s (if any) come before all b’s (if any) notice that ba and aba are not in this Language

44 a*b* vs. (ab)* (ab)* can contain abab but a*b* can’t contain abab

45 A* = {x1x2x3 … xk | k  0 and each xi  A}
Regular Operations Let A and B be languages. We define the regular operations union, concatenation, and star as follows. Union : AB = {x|x  A or x  B} Concatenation : (simply no written) A  B = {xy |x  A and y  B} Star : A* = {x1x2x3 … xk | k  0 and each xi  A}

46 Union () Also written as x+y
xy where x and y are strings of characters from an alphabet means “either x or y” Also written as x+y

47 Example Consider the language T defined over the alphabet
all the words in T begin with an a or a c and then are followed by some number of b’s.

48 Finite language L We can define any finite language by our new expression. For example, consider a finite language L contains all the strings of a’s and b’s of length 3 exactly: The first letter can be either a or b. so do the 2nd and 3rd letter. L = language((ab) (ab) (ab))

49 Finite language (cont.)
or we can simply write shortly as L = language(ab)3 if we write (ab)*, it means the set of all possible strings of letters from the alphabet including the null string λ

50 a(ab)*b = a(arbitrary string)b
Examples If we write a(ab)* we can describe all words that begin with the letter a. If we would like to describe all words that begin with an a and end with b, we can define by the expression a(ab)*b = a(arbitrary string)b

51 Formal definition of regular expressions
The new definition we have talked about is claimed as “Regular Expression”. Languages which are able to be described by RE, are called “Regular Languages”. Not every languages are able to be described by RE. Regular languages may also be described by another fine definitions, besides the RE.

52 Regular Expression The symbols that appear in RE are
the letters of the alphabet  the symbol of null string  or λ parentheses ( ) star operator *  or + sign

53 Formal definition of regular expressions
Say that R is a regular expression if R is a for some a in the alphabet ,  or λ, , (R1R2), where R1 and R2 are regular expressions, (R1  R2), where R1 and R2 are regular expressions, or (R1*), where R1 is regular expression.

54 Regular Expressions’ rules
Rule 1: Every letter of  can be made into a regular expression Rule 2: If r1 and r2 are regular expressions, then so are (i) (r1) (ii) r1r2 or r1 r2 (iii) r1r2 (iv) r1* Rule 3: Nothing else is a regular expression.

55 Recursive definition of regular set
Basis : , { } and {a}, for every a are regular sets over . Recursive step: Assume X and Y are regular set over  , then the sets: XY, XY and X* are regular sets over  Closure: X is a regular set over  iff it can be obtained from basis elements by a finite number of applications of ecursive step.

56 Parentheses We use parentheses ( ) as an option to eliminate the ambiguity when we apply * or + to the expressions. For example: if r1 = aab then what is r1* ? Is the r1* = aa+b* or (aa+b)* ? They are both REs but very different. Ans. the later choice. In this case we should put the ( ) when we substitute aab to r1*

57 Null Language  or λ is the symbol of null string in regular expression.  is the symbol for “Null Language” Don’t confuse! R = λ represents the language containing a single string, the empty string.  {λ} R =  represents the language that doesn’t contain any strings.

58 Definitions If we let R be any regular expression, R = R :
Adding the empty language to any other language will not change it. R   = R : Adding the empty string to any other language will not change it. R may not equal to R e.g., if R = 0, the L(R) = {0} but L(R) ={0,} R   may not equal to R e.g., if R = 0, the L(R) = {0} but L(R  ) =

59 Example Let consider the language defined by (ab)*a(ab)*
What does it produce ? Ans. The language which is the set of all words over the alphabet  = {a,b} that have an a in somewhere. Only words which are not in this language are those that have only b’s and the word 

60 Union of two languages Those words which compose of only b’s are defined by the expression b*. (b* also includes the null string ) Therefore, the language of all strings over the alphabet  = {a,b} are all strings = (all strings with an a)  (all string without an a) (ab)* = (ab)*a (ab)*  b*

61 (ab)*a (ab)* a (ab)*
Example How can we describe the language of all words that have at least 2 a’s ? Ans (ab)*a (ab)* a (ab)* = (some beginning)(the first a)(some middle)(the second a)(some end) where the arbitrary parts can have as many a’s (or b’s) as they want.

62 Example Is there any other RE that can define the language with at least 2 a’s ? Ans. Yes. For example: b*ab*a(ab)* =(some beginning of b’s (if any))(the first a) (some middle of b’s)(the second a) (some end)

63 Equivalent expressions
(ab)*a (ab)*a (ab)* = b*ab*a(ab)* Both expressions are equivalent because they both describe the same item. We could write language ((ab)*a (ab)*a (ab)*) = language(b*ab*a(ab)*) = all words with at least two a’s = (ab)*ab*ab* = b*a(ab)* ab*

64 Example If we wanted all words with exactly 2 a’s, we could use the expression b*ab*ab* it can describes such words as aab, baba, bbbabbbab, … Question: Can it make the word aab ? Ans : Yes, by having the first and second b* = λ

65 Example How about the language with at least one a and at least one b ? (ab)*a (ab)*b (ab)* It can only produce words which an a precede ab. To produce words which have ab precede an a, we can describe by (ab)*b(ab)*a (ab)* Thus, the set of all words : (ab)*a (ab)*b (ab)* (ab)*b (ab)*a (ab)*

66 (ab)*a (ab)*b (ab)* (ab)*b (ab)*a (ab)*
Example (ab)*a (ab)*b (ab)* can produce all words with at least one a and at least one b, However, it doesn’t contain the words of the forms some b’s followed by some a’s. These exceptions are all defined by bb*aa* Thus, we have all strings over  = {a,b} (ab)*a (ab)*b (ab)* (ab)*b (ab)*a (ab)* = (ab)*a (ab)*b (ab)* bb*aa*

67 (ab)*a (ab)*b (ab)* bb*aa*
generates all words which have both a and b in them somewhere. Words which are not included in the above expression are words of all a’s, all b’s or  a*, b* Now, we have all words which can be generated above the alphabet (ab)* = (ab)*a (ab)*b (ab)* bb*aa*  a*  b*

68 Theory of Computer Science
THANK YOU


Download ppt "LANGUAGES Prepared by: Paridah Samsuri Dept. of Software Engineering"

Similar presentations


Ads by Google