Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures and Algorithms

Similar presentations


Presentation on theme: "Data Structures and Algorithms"— Presentation transcript:

1 Data Structures and Algorithms
Hongfei Yan Feb. 24, 2016

2 Data Structure and Algorithm Analysis
Data structure, methods of organizing large amounts of data Algorithm analysis, the estimation of the running time of algorithms.

3 Data Structure and Algorithm Analysis
Data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. E.g., B-tree for database, hash table for compiler Algorithm analysis is the determination of the amount of resources (such as time and storage) necessary to execute them. Usually, the efficiency or running time of an algorithm is stated as a function relating the input length to the number of steps (time complexity) or storage locations (space complexity).

4 人类公开资源(Kelly, Kevin (may 14, 2006). “Scan This Book!”. )
估计至少有 320万本书,7.50亿篇文章 2千5百首歌, 50万部电影 5亿个图像,3百万个视频、电视节目和短片 1000亿网页 超星扫描了200多个图书馆中的130万本中文书, 大约是1949年以来中文出版书籍的一半 Kelly, Kevin (may 14, 2006). “Scan This Book!”. New York Times Magazine. But of all the statistics in Kelly’s New York Times Magazine piece, it’s the China-related ones that impressed me the most: Like many other functions in our global economy…the real work has been happening far away, while we sleep. We are outsourcing the scanning of the universal library. Superstar, an entrepreneurial company based in Beijing, has scanned every book from 900 university libraries in China. [A May 14 correction says the actual number is 200--libraries of all kinds, not merely the university variety.] It has already digitized 1.3 million unique titles in Chinese, which it estimates is about half of all the books published in the Chinese language since It costs $30 to scan a book at Stanford but only $10 in China.

5 Google Books计划 2004年12月开始,目的扫描书和杂志,使用字符识别软 件确认文本的字、词、句和段落,将数字化图像转化为 数据化文本。 2013年4月,扫描3千万本 2010年,全世界估计有1.3亿本书 2008年11月,数字化700万本 2007年,数字化100万本。 2007年9月,发布”My Library” Google曾表示,他們目前一天可掃描三千本書籍。到2007年3月為止,Google已經數位化100萬本圖書,根據紐約時報的估計,花費了約500萬美元。[9]在2008年10月28日Google說,通過此项服务,他們有700萬本的圖書被搜索,其中包括掃描的20000個出版商的合作夥伴。[10] 這700萬冊圖書当中,100萬基礎上與出版商有“完全預覽”的協議,100萬屬於在公有領域,以及其餘500萬是絕版或商用。

6 ReCaptcha与数据再利用(Luis von Ahn)
人们需要从计算机光学字符识别程序无法识别的文 本扫描项目中读出两个单词并输入。 其中一个单词其他用户也识别过,从而可以从该用户的输入中判 断注册者是人; 另一个单词则是有待辨识和解疑的新词。为了保证准确度,系统 会将同一个模糊单词发给五个不同的人,直到他们都输入正确后 才确定这个单词是对的。 在这里,数据的主要用途是证明用户是人,但它也 有第二个目的:破译数字化文本中不清楚的单词。 Data 这一切给冯·安这位家里经营糖果厂的危地马拉人带来了相当高的知名度,使他能够在取得博士学位后进入卡内基梅隆大学工作,教授计算机科学;也使他在27岁时获得了50万美元的麦克阿瑟基金会“天才奖”。但是,当他意识到每天有这么多人要浪费10秒钟的时间输入这堆恼人的字母,而随后大量的信息被随意地丢弃时,他并没有感到自己很聪明。 于是,他开始寻找能使人的计算能力得到更有效利用的方法。他想到了一个继任者,恰如 其分地将其命名为ReCaptcha。和原有随机字母输入不同,人们需要从计算机光学字符识别程序无法识别的文本扫描项目中读出两个单词并输入。其中一个单词其他用户也识别过,从而可以从该用户的输入中判断注册者是人;另一个单词则是有待辨识和解疑的新词。为了保证准确度,系统会将同一个模糊单词发给五个不同的人,直到他们都输入正确后才确定这个单词是对的。在这里,数据的主要用途是证明用户是人,但它也有第二个目的:破译数字化文本中不清楚的单词。ReCaptcha的作用得到了认可,2009年谷歌收购了冯·安的公司,并将这一技术用于图书扫描项目。

7 slides borrowed from Data-Rich Computing: Where It’s At @hadoop summit 2008 Phillip B. GibbonsIntel Research Pittsburgh

8 开放数据 2009年的47个数据集, 2015年3月达到12万, 2016年2月19.48万。

9 The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East (IDC & EMC, December 2012) Source: The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East (IDC & EMC, December 2012) The Digital Universe In December 2012, IDC and EMC estimated the size of the digital universe (that is, all the digital data created, replicated and consumed in that year) to be 2,837 exabytes (EB) and forecast this to grow to 40,000EB by 2020 — a doubling time of roughly two years. One exabyte equals a thousand petabytes (PB), or a million terabytes (TB), or a billion gigabytes (GB). So by 2020, according to IDC and EMC, the digital universe will amount to over 5,200GB per person on the planet. In 2012 the US and Western Europe still accounted for over half (51%) of the digital universe (see diagram above right), but by 2020 IDC and EMC estimate that 62 percent will be attributable to emerging markets, with China alone accounting for 21 percent. Big data Not all of the myriad streams of data generated by and about people (and, increasingly, things) in this digital universe will be actually or even potentially useful. According to IDC and EMC, some 33 percent of 2020's 40,000EB (13,200EB) total might be valuable if analysed. In 2012, the figure is 23 percent of the 2,837EB total (652EB) — with only 3 percent (85EB) suitably tagged and just half a percent actually analysed. That still amounts to EB (14,185 petabytes, or million terabytes) — 'big data' in anyone's book, but a mere footprint on a vast and largely unexplored cosmos of information.

10 To promote the development of robust and reusable software
take a consistent object-oriented viewpoint data should be presented as being encapsulated with the methods that access and modify them. think of data objects as instances of an abstract data type (ADT), which includes a repertoire of methods for performing operations on data objects of this type. there may be several different implementation strategies for a particular ADT, and explore the relative pros and cons of these choices.

11 Desired outcomes have knowledge of the most common abstractions for data collections (e.g., stacks, queues, lists, trees, maps). understand algorithmic strategies for producing efficient realizations of common data structures. analyze algorithmic performance, both theoretically and experimentally, and recognize common trade-offs between competing strategies. wisely use existing data structures and algorithms found in modern programming language libraries. have experience working with concrete implementations for most foundational data structures and algorithms In support of the last goal, we present many example applications of data structures throughout the book, including the processing of file systems, matching of tags in structured formats such as HTML, simple cryptography, text frequency analysis, automated geometric layout, Huffman coding, DNA sequence alignment, and search engine indexing.

12

13

14

15 Prerequisites

16 Contents 01 Python Primer (P2-51)
02 Object-Oriented Programming (P57-103) 03 Algorithm Analysis (P ) 04 Recursion (P ) 05 Array-Based Sequences (P ) 06 Stacks, Queues, and Deques (P ) 07 Linked Lists (P ) 08 Trees (P ) 09 Priority Queues (P ) 10 Maps, Hash Tables, and Skip Lists (P ) 11 Search Trees (P ) 12 Sorting and Selection (P ) 13 Text Processing (P ) 14 Graph Algorithms (P ) 15 Memory Management and B-Trees (P ) ISLR_Print6

17 01 Python Primer 1.1 Python Overview 1.2 Objects in Python
1.3 Expression, Operators and Precedence 1.4 Control Flow 1.5 Functions 1.6 Simple Input and Output 1.7 Exception Handling 1.8 Iterators and Generators 1.9 Additional Python Conveniences 1.10 Scopes and Namespaces 1.11 Modules and Import Statement

18 Python The Python programming language was originally developed by Guido van Rossum in the early 1990s Python 2, was released in 2000, and Python 3, released in 2008. Python is an interpreted language. Commands are executed through the Python interpreter. The interpreter receives a command, evaluates that command, and reports the result of the command. A programmer defines a series of commands in advance and saves those commands in a text file known as source code or a script. For Python, source code is conventionally stored in a file named with the .py suffix (e.g., demo.py).

19 An Example Program

20 Objects in Python Python is an object-oriented language and classes form the basis for all data types. Python’s built-in classes: the int class for integers, the float class for floating-point values, the str class for character strings.

21 Identifiers, Objects, and the Assignment Statement
The most important of all Python commands is an assignment statement: temperature = 98.6 This command establishes temperature as an identifier (also known as a name), and then associates it with the object expressed on the right-hand side of the equal sign, in this case a floating-point object with value 98.6.

22 Identifiers Identifiers in Python are case-sensitive, so temperature and Temperature are distinct names. Identifiers can be composed of almost any combination of letters, numerals, and underscore characters. An identifier cannot begin with a numeral and that there are 33 specially reserved words that cannot be used as identifiers:

23 Types Python is a dynamically typed language, as there is no advance declaration associating an identifier with a particular data type. An identifier can be associated with any type of object, and it can later be reassigned to another object of the same (or different) type. Although an identifier has no declared type, the object to which it refers has a definite type. In our first example, the characters 98.6 are recognized as a floating-point literal, and thus the identifier temperature is associated with an instance of the float class having that value.

24 Objects The process of creating a new instance of a class is known as instantiation. To instantiate an object we usually invoke the constructor of a class: w = Widget() This is assuming that the constructor does not require any parameters. If the constructor does require parameters, we might use a syntax such as w = Widget(a, b, c) Many of Python’s built-in classes a literal form for designating new instances. For example, the command temperature = 98.6 results in the creation of a new instance of the float class.

25 Calling Methods Python supports functions a syntax such as sorted(data), in which case data is a parameter sent to the function. Python’s classes may also define one or more methods (also known as member functions), which are invoked on a specific instance of a class using the dot (“.”) operator. For example, Python’s list class has a method named sort that can be invoked with a syntax such as data.sort( ). This particular method rearranges the contents of the list so that they are sorted.

26 Build-In Classes A class is immutable if each object of that class has a fixed value upon instantiation that cannot subsequently be changed. For example, the float class is immutable.

27 The bool Class The bool class is used for logical (Boolean) values, and the only two instances of that class are expressed as the literals: True and False The default constructor, bool( ), returns False. Python allows the creation of a Boolean value from a nonboolean type using the syntax bool(foo) for value foo. The interpretation depends upon the type of the parameter. Numbers evaluate to False if zero, and True if nonzero. Sequences and other container types, such as strings and lists, evaluate to False if empty and True if nonempty.

28 The int Class The int class is designed to represent integer values with arbitrary magnitude. Python automatically chooses the internal representation for an integer based upon the magnitude of its value. The integer constructor, int( ), returns 0 by default. This constructor can also construct an integer value based upon an existing value of another type. For example, if f represents a floating-point value, the syntax int(f) produces the truncated value of f. For example, int(3.14) produces the value 3, while int(−3.9) produces the value −3. The constructor can also be used to parse a string that represents an integer. For example, the expression int( 137 ) produces the integer value 137.

29 The float Class The float class is the floating-point type in Python.
The floating-point equivalent of an integral number, 2, can be expressed directly as 2.0. One other form of literal for floating-point values uses scientific notation. For example, the literal 6.022e23 represents the mathematical value 6.022×1023. The constructor float( ) returns 0.0. When given a parameter, the constructor, float, returns the equivalent floating-point value. float(2) returns the floating-point value 2.0 float(‘3.14’) returns 3.14

30 The list Class A list instance stores a sequence of objects, that is, a sequence of references (or pointers) to objects in the list. Elements of a list may be arbitrary objects (including the None object). Lists are array-based sequences and a list of length n has elements indexed from 0 to n−1 inclusive. Lists have the ability to dynamically expand and contract their capacities as needed. Python uses the characters [ ] as delimiters for a list literal. [ ] is an empty list. [‘red’, ‘green’, ‘blue’] is a list containing three string instances. The list( ) constructor produces an empty list by default. The list constructor will accept any iterable parameter. list(‘hello’) produces a list of individual characters, [‘h’, ‘e’, ‘l’, ‘l’, ‘o’].

31 The tuple Class The tuple class provides an immutable (unchangeable) version of a sequence, which allows instances to have an internal representation that may be more streamlined than that of a list. Parentheses delimit a tuple. The empty tuple is () To express a tuple of length one as a literal, a comma must be placed after the element, but within the parentheses. For example, (17,) is a one-element tuple.

32 The str Class String literals can be enclosed in single quotes, as in ‘hello’, or double quotes, as in "hello". A string can also begin and end with three single or double quotes, if it contains newlines in it.

33 The set Class Python’s set class represents a set, namely a collection of elements, without duplicates, and without an inherent order to those elements. Only instances of immutable types can be added to a Python set. Therefore, objects such as integers, floating-point numbers, and character strings are eligible to be elements of a set. The frozenset class is an immutable form of the set type, itself. Python uses curly braces { and } as delimiters for a set For example, as {17} or {‘red’, ‘green’, ‘blue’} The exception to this rule is that { } does not represent an empty set. Instead, the constructor set( ) returns an empty set.

34 The dict Class Python’s dict class represents a dictionary, or mapping, from a set of distinct keys to associated values. Python implements a dict using an almost identical approach to that of a set, but with storage of the associated values. The literal form { } produces an empty dictionary. A nonempty dictionary is expressed using a comma-separated series of key:value pairs. For example, the dictionary {‘ga’ : ‘Irish’, ‘de’ : ‘German’} maps ‘ga’ to ‘Irish’ and ‘de’ to ‘German’. Alternatively, the constructor accepts a sequence of key-value pairs as a parameter, as in dict(pairs) with pairs = [(‘ga’, ‘Irish’), (‘de’, ‘German’)].

35 Expressions and Operators
Existing values can be combined into expressions using special symbols and keywords known as operators. The semantics of an operator depends upon the type of its operands. For example, when a and b are numbers, the syntax a + b indicates addition, while if a and b are strings, the operator + indicates concatenation.

36 Logical Operators Python supports the following keyword operators for Boolean values: The and and or operators short-circuit, in that they do not evaluate the second operand if the result can be determined based on the value of the first operand.

37 Equality Operators Python supports the following operators to test two notions of equality: The expression, a is b, evaluates to True, precisely when identifiers a and b are aliases for the same object. The expression a == b tests a more general notion of equivalence.

38 Comparison Operators Data types may define a natural order via the following operators: These operators have expected behavior for numeric types, and are defined lexicographically, and case-sensitively, for strings.

39 Arithmetic Operators Python supports the following arithmetic operators: For addition, subtraction, and multiplication, if both operands have type int, then the result is an int; if one or both operands have type float, the result is a float. True division is always of type float, integer division is always int (with the result truncated)

40 Bitwise Operators Python provides the following bitwise operators for integers:

41 Sequence Operators Each of Python’s built-in sequence types (str, tuple, and list) support the following operator syntaxes:

42 Sequence Comparisons Sequences define comparison operations based on lexicographic order, performing an element by element comparison until the first difference is found. For example, [5, 6, 9] < [5, 7] because of the entries at index 1.

43 Operators for Sets Sets and frozensets support the following operators:

44 Operators for Dictionaries
The supported operators for objects of type dict are as follows:

45 Operator Precedence

46 Functions and Control Flow

47 Program Structure Common to all control structures, the colon character is used to delimit the beginning of a block of code that acts as a body for a control structure. If the body can be stated as a single executable statement, it can technically placed on the same line, to the right of the colon. However, a body is more typically typeset as an indented block starting on the line following the colon. Python relies on the indentation level to designate the extent of that block of code, or any nested blocks of code within.

48 Conditionals

49 Loops While loop: For loop: Indexed For loop:

50 Break and Continue Python supports a break statement that immediately terminate a while or for loop when executed within its body. Python also supports a continue statement that causes the current iteration of a loop body to stop, but with subsequent passes of the loop proceeding as expected.

51 Functions Functions are defined using the keyword def.
This establishes a new identifier as the name of the function (count, in this example), and it establishes the number of parameters that it expects, which defines the function’s signature. The return statement returns the value for this function and terminates its processing.

52 Information Passing Parameter passing in Python follows the semantics of the standard assignment statement. For example is the same as and results in

53 Simple Output The built-in function, print, is used to generate standard output to the console. In its simplest form, it prints an arbitrary sequence of arguments, separated by spaces, and followed by a trailing newline character. For example, the command print(‘maroon’, 5) outputs the string ‘maroon 5\n’. A nonstring argument x will be displayed as str(x).

54 Simple Input The primary means for acquiring information from the user console is a built-in function named input. This function displays a prompt, if given as an optional parameter, and then waits until the user enters some sequence of characters followed by the return key. The return value of the function is the string of characters that were entered strictly before the return key. Such a string can immediately be converted, of course:

55 A Simple Program Here is a simple program that does some input and output:

56 Files Files are opened with a built- in function, open, that returns an object for the underlying file. For example, the command, fp = open(‘sample.txt’), attempts to open a file named sample.txt. Methods for files:

57 Exception Handling Exceptions are unexpected events that occur during the execution of a program. An exception might result from a logical error or an unanticipated situation. In Python, exceptions (also known as errors) are objects that are raised (or thrown) by code that encounters an unexpected circumstance. The Python interpreter can also raise an exception. A raised error may be caught by a surrounding context that “handles” the exception in an appropriate fashion. If uncaught, an exception causes the interpreter to stop executing the program and to report an appropriate message to the console.

58 Common Exceptions Python includes a rich hierarchy of exception classes that designate various categories of errors

59 Raising an Exception An exception is thrown by executing the raise statement, with an appropriate instance of an exception class as an argument that designates the problem. For example, if a function for computing a square root is sent a negative value as a parameter, it can raise an exception with the command:

60 Catching an Exception In Python, exceptions can be tested and caught using a try-except control structure. In this structure, the “try” block is the primary code to be executed. Although it is a single command in this example, it can more generally be a larger block of indented code. Following the try-block are one or more “except” cases, each with an identified error type and an indented block of code that should be executed if the designated error is raised within the try-block.

61 Iterators Basic container types, such as list, tuple, and set, qualify as iterable types, which allows them to be used as an iterable object in a for loop. An iterator is an object that manages an iteration through a series of values. If variable, i, identifies an iterator object, then each call to the built-in function, next(i), produces a subsequent element from the underlying series, with a StopIteration exception raised to indicate that there are no further elements. An iterable is an object, obj, that produces an iterator via the syntax iter(obj).

62 Generators The most convenient technique for creating iterators in Python is through the use of generators. A generator is implemented with a syntax that is very similar to a function, but instead of returning values, a yield statement is executed to indicate each element of the series. For example, a generator for the factors of n:

63 Conditional Expressions
Python supports a conditional expression syntax that can replace a simple control structure. The general syntax is an expression of the form: This compound expression evaluates to expr1 if the condition is true, and otherwise evaluates to expr2. For example: Or even

64 Comprehension Syntax A very common programming task is to produce one series of values based upon the processing of another series. Often, this task can be accomplished quite simply in Python using what is known as a comprehension syntax. This is the same as

65 Packing If a series of comma-separated expressions are given in a larger context, they will be treated as a single tuple, even if no enclosing parentheses are provided. For example, consider the assignment This results in identifier, data, being assigned to the tuple (2, 4, 6, 8). This behavior is called automatic packing of a tuple.

66 Unpacking As a dual to the packing behavior, Python can automatically unpack a sequence, allowing one to assign a series of individual identifiers to the elements of sequence. As an example, we can write This has the effect of assigning a=7, b=8, c=9, and d=10.

67 Modules Beyond the built-in definitions, the standard Python distribution includes perhaps tens of thousands of other values, functions, and classes that are organized in additional libraries, known as modules, that can be imported from within a program.

68 Existing Modules Some useful existing modules include the following:


Download ppt "Data Structures and Algorithms"

Similar presentations


Ads by Google