Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer Science

Similar presentations


Presentation on theme: "Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer Science"— Presentation transcript:

1 Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer Science www.soe.ucsc.edu/classes/cmps010/Spring11 ejw@cs.ucsc.edu 6 April 2011

2 UC SANTA CRUZ Class website  http://www.soe.ucsc.edu/classes/cmps010/Spring11/  Please write this down, and bookmark it  Holds:  Syllabus (including homework due dates)  Homework assignment descriptions  Description of course readings  Links to class lecture notes  The final exam is scheduled for Tuesday, June 7, 8am-11am  This class will have a final exam. Please plan on this.

3 UC SANTA CRUZ Tutoring available  Learning Support Services (LSS)  Has tutoring available for students in CMPS 10  Students meet in small groups, led by a tutor  Students are eligible for up to one-hour of tutoring per week per course, and may sign-up for tutoring at https://eop.sa.ucsc.edu/OTSS/tutorsignup/ beginning April 5th at 10:00am. https://eop.sa.ucsc.edu/OTSS/tutorsignup/  Brett Care - bcare@ucsc.edu is the tutor for CMPS 10 that LSS has hiredbcare@ucsc.edu

4 UC SANTA CRUZ Abstraction and Models  Converting the real world into data:  Create a model of the real world  Represent that model in data  How do you model the real world?  Involves a process called abstraction  Abstraction  Prerequisite: know your problem or application  Focus on aspects of the real world that are important to the problem  Add those elements to your model  Omit elements of the real world that aren’t relevant  Implies: the same real world scenario can be modeled in many ways, depending on the problem at hand physical world model data (inside computer) abstraction representation

5 UC SANTA CRUZ Representing models as data  Most models can be represented using:  Basic data types  Integers  Floating point  Boolean  Characters  Strings  Basic data structures  Arrays  Lists  Stacks/Queues  Trees  Graphs

6 UC SANTA CRUZ Boolean  A boolean data type represents true or false  This is represented as a 1 (true) or a 0 (false)  How much space does a boolean require? It varies.  The minimum required space is 1 bit  However, typically a boolean is stored in an entire byte  8 bits, as in the C# language  Only use one bit:  00000001 = true  00000000 = false  … or in an integer  16 or 32 bits, as in the C language which lacks a standard boolean type  0000000000000001 = true (16 bits)  0000000000000000 = false (16 bits) www.popwuping.com/culture/true-urban-park-in- bangkok.php tv.wearefalse.com

7 UC SANTA CRUZ Character  A single letter, number, punctuation, symbol, etc.  Historically, in the US characters were represented using the US-ASCII code (uses 7 bits of an 8 bit byte)  This was superseded by ISO/IEC 8859  Provided support for special characters used in specific languages, along with accented characters  Examples: ß (German), ñ (Spanish), å (Swedish and other Nordic languages) and ő (Hungarian)  But, didn’t handle representation of ideographic languages  Led to development of many standards for this specific purpose, and for other languages not covered by ISO 8859  Historically a character of storage meant one byte (8 bits)  This still holds true in many discussions today Lead type blog.davidcaputo.net/category/design/

8 UC SANTA CRUZ UNICODE  Today, the UNICODE standard is rapidly becoming standard  Can represent every character in every human language with an alphabet  Contains more than 109,000 characters covering 93 scripts  Initial idea comes from Joe Becker and Mark Davis in 1987  Today, maintained by UNICODE consortium  Each character has a unique 32 bit identifier  But, 32 bit per character is a lot of space  So, have multiple encodings  UTF-8: most popular, maximizes backward popularity with US-ASCII, 8bits/byte for US-ASCII, more bytes for other scripts (variable width)  UTF-16: most common scripts are 16 bits, less common ones are more (variable width)  UTF-32/UCS4: each character uses 32 bits (4 bytes)  One of the great unsung achievements of computer science www.macchiato.com/

9 UC SANTA CRUZ Strings  A string is a sequence of characters  “Hello, world!” is the most famous string.  Two main ways to represent:  A sequence of characters, ended with a 0 (null character)  A length, and then that many characters 13  Each character is represented according to some character encoding (UTF-8, UTF-16, US-ASCII, etc.) Hello,World!/0 null character Hello,World!

10 UC SANTA CRUZ Basic Data Structures

11 UC SANTA CRUZ Modeling and sets  When modeling the real world, there is a need to model sets of things  Can also think of this as a group of things, collection of things, etc.  Examples:  The temperature at my house measured every hour over a day  All of the songs in my music collection  All of the houses on a street  All of the people in my family  People standing in line at a restaurant  Frequently, these sets have a natural order  Temperature over a day:  First temperature reading at hour 0, then the second at hour 1, etc.  Houses on a street:  Order by house number

12 UC SANTA CRUZ Representing sets  Many different data structures have been developed to represent sets  Array  A set with fixed length  Elements can be added anywhere  Can go directly to any element  Lists  A set with variable length  Elements can be added anywhere  Need to search list for specific element  Stack/queue  A set with variable length  Elements can be added only at beginning (stack) or end (queue)  Can only retrieve element from beginning (stack/queue) www.setgame.com

13 UC SANTA CRUZ Arrays  Used to represent sets of fixed length  Also represents mathematical vectors and matrices of fixed size  Once set, cannot change the size of an array (biggest limitation)  But, this limitation permits fast lookup of values (biggest strength)  How do they work (1-dimensional)  Given an integer index can:  Retrieve an element of the array  array[index]  value  Set an element of the array  value  array[index]  Can have an array comprised of any basic data type  Array of integers, array of floats, array of strings, etc.

14 UC SANTA CRUZ Array example  Consider a set of temperature values at a location, with temperature readings taken once every hour  Have a total of 24 readings each day, and this won’t change  Model for one day of readings  A set of 24 ordered temperature readings  Representation  Use an array to represent the ordered set of 24 readings  Use a floating point number to represent each temperature reading temperature is array[24] of float

15 UC SANTA CRUZ Array Example: 24 hours of temperature  Typical use:  temperature[0] = 52.5 Sets the temperature value for hour 0 to 52.5  noon_temp = temperature[12] The variable noon_temp takes the value of the temperature array at hour 12 (noon), 68.2 52.5 52.0 51.7 51.2 50.8 50.1 0: 1: 2: 3: 4: 5: 49.8 51.6 55.7 57.2 61.4 65.8 6: 7: 8: 9: 10: 11: 68.2 70.4 72.5 72.9 72.1 70.3 12: 13: 14: 15: 16: 17: 68.3 61.8 58.0 56.4 54.3 52.6 18: 19: 20: 21: 22: 23:

16 UC SANTA CRUZ 2-dimensional arrays  It is also possible to have 2 and more dimensional arrays  Represents tabular data, or matrices  In this case, have indices for the row and column of the data  Example:  Temperature readings for a year temperature is array[365][24] of float  Noon_Jan_First_temp = Temperature[0][12]  The temperature on January 1, at noon

17 UC SANTA CRUZ Array: pros and cons  Pros  Permits fast access to elements of the array  Array notation maps well to certain kinds of problems (mathematical matrices)  Cons  Array size is fixed, and cannot grow  In many situations, the amount of data is unknowable in advance  Example:  Your music collection. Can you predict how many songs you’ll acquire over your lifetime?  For this situation, would be better to have a representation that can grow or shrink over time

18 UC SANTA CRUZ List  Used to represent sets of variable length  It is possible to change the length of a list by adding and removing members  Are slower than arrays for looking up members  How do they work  List.add(element)  Adds element to end of the list  List.remove(element)  Searches list, and removes first one that matches element  List.Insert(position, element)  Adds element at specified position in list  Can have a list of any basic data type  List of integers, list of floats, list of strings, etc.

19 UC SANTA CRUZ List example  Consider a list of the titles of songs you own  This list will grow over time… and may shrink  Maybe you delete the Miley Cyrus in your collection?  Model  A set of song titles  Representation  Use a list to represent the set and a string to represent the title  Songtitles is List of string Songtitles is a list of strings

20 UC SANTA CRUZ List example  Start with this list, called Songlist  0, “Poker face”  1, “Video killed the radio star”  2, “Rock star”  Add a song to the list  Songlist.Add(“Beat it”)  0, “Poker face”  1, “Video killed the radio star”  2, “Rock star”  3, “Beat it”  Songlist.Remove(“Rock star”)  0, “Poker face”  1, “Video killed the radio star”  2, “Beat it”  Songlist.Retrieve(1)  Gives the value, “Video killed the radio star”  Songlist.Insert(1, “Let’s Go”)  0, “Poker face”  1, “Let’s Go”  2, “Video killed the radio star”  3, “Beat it”

21 UC SANTA CRUZ Linked List  The typical implementation of a list is as a linked list  Each element holds a pointer to the next element in the list  Can also have each element point to the next and previous element in the list (permits fast “previous item” capability) A singly linked list (en.wikipedia.org/wiki/Linked_list) 0: 12, 1: 99, 2: 37 A doubly linked list (en.wikipedia.org/wiki/Linked_list) 0: 12, 1: 99, 2: 37

22 UC SANTA CRUZ History of Linked List  Linked lists emerged early in computing  1955-1956 by Allen Newell, Cliff Shaw, Herbert Simon while developing language IPL  In 1958, the language LISP (List Processor) was developed at MIT by John McCarthy  Made lists (implemented as linked lists) a fundamental part of the language  Today, most major programming languages provide a built-in list data type, often with multiple variations

23 UC SANTA CRUZ List: Pros and Cons  Pros  Can handle a list of any length  Cons  Slower access than with arrays  Slower to add elements into a list  Notation for accessing elements not as convenient as arrays  Some languages allow use of array notation (e.g., list[index]) with lists

24 UC SANTA CRUZ Stack and Queue  Stacks and queues are used to represent ordered sets where you typically want to access the most recently added element (the “top”)  Accessing the top element is called “pop” for Stacks, “dequeue” for Queues  With a stack, the elements can only be added (push) to the top  With a queue, the elements can only be added (enqueue) to the bottom Stack en.wikipedia.org/wiki/Stack_(data_structure) Queue en.wikipedia.org/wiki/Queue_(data_structure)

25 UC SANTA CRUZ Stack example  Consider a list of web page addresses (URLs) you have visited in your browser  If you hit the “back” button, you would like to go to the last page you visited  Model  A set of ordered web page URLs  Representation  Use a stack to represent the set, and a string to represent each URL  Visited_pages is stack of string  Visited_pages is a stack data structure where each element is a string  The Visited_pages structure ensures that the last element added is the first element removed (last-in, first-out, LIFO)

26 UC SANTA CRUZ Stack example  Assume the starting history is as follows and you’re at the page www.engadget.com  0: www.ucsc.edu  1: games.soe.ucsc.edu  2: www.acm.org  (www.engadget.com) isn’t added until you go to a new page, and it becomes history  Now, you browse over to Slashdot (www.slashdot.com)  Visited_pages.push(“www.engadget.com”)  0: www.engadget.com  1: www.ucsc.edu  2: games.soe.ucsc.edu  3: www.acm.org  Then, you decide to hit back button  Visited_pages.pop()  Returns: www.engadget.com  0: www.ucsc.edu  1: games.soe.ucsc.edu  2: www.acm.org  Browser reloads this page

27 UC SANTA CRUZ Queue example  Consider people waiting in line at the coffee cart  It might be nice if you could just give your name, and they would call you when it’s your turn  Model  An ordered set of customer names.  Representation  Use a queue to represent the set, and a string to represent the customer name  Coffee_line is queue of string  Coffee line is a data structure where each element is a string  The queue data structure ensures no one will cut in line (first-in line, first-served, or first-in, first-out, FIFO)

28 UC SANTA CRUZ Queue example  Assume the line is currently:  0: “Ada Lovelace”  1: “Charles Babbage”  2: “Grace Hopper”  A new person, Alan Turing comes to the end of the line  Coffee_line.enqueue(“Alan Turing”)  Line is now (Turing is added to the end of the queue)  0: “Ada Lovelace”  1: “Charles Babbage”  2: “Grace Hopper”  3: “Alan Turing”  Now the next person in line is served  Coffee_line.dequeue()  Returns “Ada Lovelace”  List is now (Ada came from the from the front of the queue)  0: “Charles Babbage”  1: “Grace Hopper”  2: “Alan Turing”


Download ppt "Characters, Strings, Basic Data Structures UC Santa Cruz CMPS 10 – Introduction to Computer Science"

Similar presentations


Ads by Google