Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching, Modifying, and Encoding Text. Parts: 1) Forming Regular Expressions 2) Encoding and Decoding.

Similar presentations


Presentation on theme: "Searching, Modifying, and Encoding Text. Parts: 1) Forming Regular Expressions 2) Encoding and Decoding."— Presentation transcript:

1 Searching, Modifying, and Encoding Text

2 Parts: 1) Forming Regular Expressions 2) Encoding and Decoding

3 FORMING REGULAR EXPRESSIONS

4 ^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+)) ([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

5 using System.Text.RegularExpressions; namespace TestRegExp { class Class1 { [STAThread] static void Main(string[] args) { if (Regex.IsMatch(args[1], args[0])) Console.WriteLine("Input matches regular expression."); else Console.WriteLine("Input DOES NOT match regular expression."); }

6 C:\>TestRegExp ^\d{5}$ 1234 Input DOES NOT match regular expression. C:\>TestRegExp ^\d{5}$ 12345 Input matches regular expression.

7 Regular Expression Language Elements \dMatches a digit character. Equivalent to “[0-9]”. \wMatches any word character, including underscore. Equivalent to “[A-Za-z0-9_]”. [^char]Match all chars but not specified char \bSpecifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries, which are the first or last characters in words separated by any nonalphanumeric characters. \numberBackreference. For example, (\w)\1 finds doubled word characters. \k Named backreference. For example, (? \w)\k finds doubled word characters. The expression (? \w)\43 does the same. You can use single quotes instead of anglebrackets — for example, \k'char'. Examples

8 ^abcabc$ abctrue yzabcfalsetrue abcdetruefalse String Start and End Reference

9 to*nto+n tontrue tooontrue tntruefalse Wildcards

10 to{3}nto{1,3}nto{3,}n tnfalse tonfalsetruefalse tooontrue toooonfalse true Wildcards

11 to?nto.n tntruefalse tontruefalse tooonfalse totnfalsetrue tojnfalsetrue Wildcards

12 to[ro]n toontrue torntrue tonfalse toronfalse Wildcards

13 foo(loo){1,3}hoofoo(loo|roo|)hoo fooloohootrue fooloolooloohootruefalse foohoofalse foololohoofalse fooroohoofalsetrue Groups

14 How to Extract Matched Data string input = "Company Name: Contoso, Inc."; Match m = Regex.Match(input, @"Company Name: (.*$)"); Console.WriteLine(m.Groups[1]); // Display: “Contoso, Inc.”

15 String Extension(String url) { Regex r = new Regex(@"^(? \w+)://[^/]+?(? :\d +)?/", RegexOptions.Compiled); return r.Match(url).Result("${proto}${port}"); } //http://www.contoso.com:8080/letters/readme.html //http:8080

16 How to Replace Substrings Using Regular Expressions String MDYToDMY(String input) { return Regex.Replace(input, "\\b(? \\d{1,2})/(? \\d{1,2})/(? \\d{2,4})\\b", "${day}-${month}-${year}"); } // From: “Today is 03/06/09” // To: “Today is 06-03-09”

17 ■ Regular expressions enable you to determine whether text matches almost any type of format. Regular expressions support dozens of special characters and operators. The most commonly used are “^” to match the beginning of a string, “$” to match the end of a string, “?” to make a character optional, “.” to match any character, and “*” to match a repeated character. ■ To match data using a regular expression, create a pattern using groups to specify the data you need to extract, call Regex.Match to create a Match object, and then examine each of the items in the Match.Groups array. ■ To reformat text data using a regular expression, call the static Regex.Replace method. Summary

18 ENCODING AND DECODING

19 E-mail: Content-Type: text/plain; charset=ISO-8859-1 Content-Type: text/plain; charset="Windows-1251" Web page: “ISO-8859-1” corresponds to code page 28591, “Western European (ISO)” “Windows-1251” corresponds to code page 1251 cover languages that use the Cyrillic alphabet such as Russian, Bulgarian and other languages ASCII – American Standard Code for Information Interchange 0 – 127 English communication ANSI/ISO – American National Standards Institute / International Organization for Standardization 128 – 255 National codepages

20 www.unicode.org Unicode encodings: ■ Unicode UTF-32 encoding ■ Unicode UTF-16 encoding ■ Unicode UTF-8 encoding Unicode is one big code page covered everything.

21 Using the Encoding Class // Get Korean encoding Encoding e = Encoding.GetEncoding("Korean"); // Convert ASCII bytes to Korean encoding byte[] encoded; encoded = e.GetBytes("Hello, world!"); // Display the byte codes for (int i = 0; i < encoded.Length; i++) Console.WriteLine("Byte {0}: {1}", i, encoded[i]);

22 How to Examine Supported Code Pages EncodingInfo[] ei = Encoding.GetEncodings(); foreach (EncodingInfo e in ei) Console.WriteLine("{0}: {1}, {2}", e.CodePage, e.Name, e.DisplayName);

23 How to Specify the Encoding Type when Writing a File StreamWriter swUtf7 = new StreamWriter("utf7.txt", false, Encoding.UTF7); swUtf7.WriteLine("Hello, World!"); swUtf7.Close(); StreamWriter swUtf8 = new StreamWriter("utf8.txt", false, Encoding.UTF8); swUtf8.WriteLine("Hello, World!"); swUtf8.Close(); StreamWriter swUtf16 = new StreamWriter("utf16.txt", false, Encoding.Unicode); swUtf16.WriteLine("Hello, World!"); swUtf16.Close(); StreamWriter swUtf32 = new StreamWriter("utf32.txt", false, Encoding.UTF32); swUtf32.WriteLine("Hello, World!"); swUtf32.Close();

24 How to Specify the Encoding Type when Reading a File string fn = "file.txt"; StreamWriter sw = new StreamWriter(fn, false, Encoding.UTF7); sw.WriteLine("Hello, World!"); sw.Close(); StreamReader sr = new StreamReader(fn, Encoding.UTF7); Console.WriteLine(sr.ReadToEnd()); sr.Close();

25 Summary ■ Encoding standards map byte values to characters. ASCII is one of the oldest, most widespread encoding standards; however, it provides very limited support for non-English languages. Today, various Unicode encoding standards provide multilingual support. ■ The System.Text.Encoding class provides static methods for encoding and decoding text. ■ Call Encoding.GetEncodings to retrieve a list of supported code pages. ■ To specify the encoding type when writing a file, use an overloaded Stream constructor that accepts an Encoding object. ■ You do not typically need to specify an encoding type when reading a file. However, you can specify an encoding type by using an overloaded Stream constructor that accepts an Encoding object.

26 Your Key Competences ■ Use regular expressions to determine whether a string matches a specific pattern. ■ Use regular expressions to extract data from a text file. ■ Use regular expressions to reformat text data. ■ Describe the importance of encoding, and list common encoding standards. ■ Use the Encoding class to specify encoding formats, and convert between encoding standards. ■ Programmatically determine which code pages the.NET Framework supports. ■ Create files using a specific encoding format. ■ Read files using unusual encoding formats.

27 Key Terms ■ code page ■ regular expression ■ Unicode


Download ppt "Searching, Modifying, and Encoding Text. Parts: 1) Forming Regular Expressions 2) Encoding and Decoding."

Similar presentations


Ads by Google