Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.

Similar presentations


Presentation on theme: "Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction."— Presentation transcript:

1 Tools for Text Review

2 Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction is well-defined –Each instruction can be completed in finite time –The process terminates

3 Algorithms (2) Examples: Linear search Binary search Boyer-Moore Shift table for Boyer-Moore Huffman code Grammatical rules

4 Algorithms (3) Issues Preprocessing Efficiency Notation

5 Abstract Data Types Abstraction of common ideas or features of computer systems Definition: A set of objects and a collection of operations on those objects

6 Abstract Data Types (2) Examples: Strings of characters with operations first, last, head, tail, concat, substr, match, in Trees (rooted, unordered) with operations insert node, delete node, localize Networks with operations from web browser: forward, back, home, go

7 ADT Tree Definition: A set of nodes and a set of links connecting pairs of nodes such that –No node is linked to itself (no loops) –One node is designated as the root –No pair of nodes is joined by more than one link (no superhighways) –There is a unique path from any node to any other node (no cycles) Shows hierarchy

8 ADT Network Definition: A set of nodes and a set of links connecting pairs of nodes such that –No node is linked to itself (no loops) –Each link has a direction –No pair of nodes is joined by more than one link in the same direction (no superhighways) Sources and sinks

9 ADT Binary Tree Definition: A (rooted) tree with the properties that –Each node has either 0, 1 or 2 child nodes. –The child nodes are ordered (usually called left and right)

10 Measures of ADTs Strings: length Trees: degree of node, level of node, height of tree Networks: degree of node = out degree – in degree Arrays: dimension, size In general: counts

11 Data structures Ways of storing information array, an indexed set of values ASCII coded character –1 byte = 8 bits –256 choices –Expressed in hexadecimal notation

12 Arrays Nonpositional binary digram array Positional binary digram array Boyer-Moore shift table ASCII code chart

13 Text Structure Characters: letters, digits, alphanumeric, white space, punctuation Words: with or without punctuation Lines Sentences Paragraphs Files

14 Tools for Text Searching Spell checking Grammar checking Displaying Encrypting Compressing

15 Searching Alphabet Set of strings Wildcard notation –* matches 0 or more characters –? matches exactly one character –[ ] designates a finite set of characters

16 Searching (2) Linear search –Ordered vs. unordered list Binary search –Efficiency compared to linear search Indexed search –Modeled on thumb tabs

17 Spell Checking Detection Correction N-gram analysis Edit distance

18 Grammar Checking You: right; Checker says: right Action: none You: right; Checker says: wrong Action: ignore You: wrong; Checker says: right Action: A difficulty You: wrong; Checker says: wrong Action: make correction

19 Displaying Markup –HTML tag: –Generic identifier –Attribute: name = “value” –Shows hierarchy of content Fonts

20 Encrypting Character based –Shift –Monoalphabetic substitution (cryptograms) –Polyalphabetic substitution Numerically based –PGP: pretty good privacy –Public key encryption

21 Compressing Frequency-based Example: Huffman coding


Download ppt "Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction."

Similar presentations


Ads by Google