Basic Data Structures Stack and Queue List Bucket and Hash.

Presentation on theme: "Basic Data Structures Stack and Queue List Bucket and Hash."— Presentation transcript:

Basic Data Structures Stack and Queue List Bucket and Hash

Memorize the Data Memorization is a basic function of computers The efficiency on the utility depends on the way (structure) of memorization   cost for memorization, cost for search… - - write the data to a book as they come - - make shelves of books We should choose the ways, according to the objectives to store the data Way of memorization is said “data structure”

The Way of Memory The unit of memory can keep a value - - write the data to a book as they come   allocate memory of some quantity, and write data from memory with the smallest indices   array, stack, queue… - - make shelves of books   structure the memory units by linking them with indices/pointers   list, heap, binary tree, bucket, hash… See them, one by one

Memory by Array Question: Question: we want to memorize the data, that come one by one. How do we memorize them?  Answer:  Answer: write the latest at the next of the current end However, memory has no function of keeping the time the value is written Even if they have, it is not easy to find the last one; there are huge number of memory units …thus, we use a memory as a variable keeping the position of the unit of written last

StackStack See an example V ０１２ V array counter The structure of a pair of “array” and “counter (for the last position)” is called stack (the counter is called a “stack pointer”)

Delete a Value Next, we think about reading and deleting the values written to the memory - - read an arbitral value, and delete it   read the last one (at stack pointer), and decrease stack pointer - - delete ”xxx”-th value   copy the last position to “xxx”, and decrease stack pointer - - delete the values “xxx”   scanning the array is needed V ５ V array stack pointer VVV

A Stack Subroutine Implementation of stack Use with “stack” and “value” int STACK_push ( STACK *S, int a){ if ( S->t == S->max ) return (1); // overflow error S->h[S->t] = a; S->t ++; return (0); } int STACK_pop ( STACK *S, int *a){ if ( S->t == 0 ) return (1); // underflow error S->t --; *a = S->h[S->t]; return (0); } VVVVV typedef struct { int *h; // array for data int end; // size of array int t; // counter } STACK typedef struct { int *h; // array for data int end; // size of array int t; // counter } STACK h[] end t t

Examples of Usage Reverse the given string ABCDEFGH “undo” function for word processors… h[] end

Column: Stack without Overflow Stack has a limit given by the array size In some case, we don’t know the amount of data to be stacked (such as, read file and memorize all the numbers in the file; it’s OK if we are allowed to scan the file before the execution…) When an overflow occurs, we make a new stack of a larger size, and copy the old one to the new one However, if we increment the size by one, overflow occurs in every insertion, and thus wastful Stack has a limit given by the array size In some case, we don’t know the amount of data to be stacked (such as, read file and memorize all the numbers in the file; it’s OK if we are allowed to scan the file before the execution…) When an overflow occurs, we make a new stack of a larger size, and copy the old one to the new one However, if we increment the size by one, overflow occurs in every insertion, and thus wastful

Column: Stack without Overflow (2) When we make a new stack, doubling the size is efficient   Once overflow occurs, the number of cells of stacks existing in the memory at the same time is bounded by the number of values times three The total cost for copy is also bounded by the twice the current number of values   no loss in the sense of time complexity When we make a new stack, doubling the size is efficient   Once overflow occurs, the number of cells of stacks existing in the memory at the same time is bounded by the number of values times three The total cost for copy is also bounded by the twice the current number of values   no loss in the sense of time complexity

Column: Stack without Overflow (3) A code is written as this void STACK_push ( STACK *S, int a){ if ( S->t == S->max ){ // overflow error int i, *h = malloc (sizeof(int)*max*2 ); // using realloc is easy for ( i=0 ; i t ; i++ ) h[i] = S->h[i]; free ( S->h ); S->h = h; } S->h[S->t] = a; S->t ++; } A code is written as this void STACK_push ( STACK *S, int a){ if ( S->t == S->max ){ // overflow error int i, *h = malloc (sizeof(int)*max*2 ); // using realloc is easy for ( i=0 ; i t ; i++ ) h[i] = S->h[i]; free ( S->h ); S->h = h; } S->h[S->t] = a; S->t ++; }

F I L O Read and delete an arbitral one value (= the last one)   used in the case, for example, a user put ★ ’s on the display and computer deletes all when the button is pushed () - - The value written last is read first - - Such a data structure is called FILO （ First In Last Out ） V 5 V array counter VVV

Queue; FIFO In some case, we want to read first the value written first (FIFO; First In First Out )   For example, delete ★ ’s in the order of putting   counter services follow this rule (customer = value) Such a data structure is called ”queue” V 5 V array counter VVV

Counters for Queue A queue needs a pointer to indicate “the place at which the value is written first”   so, we need two counters (pointers), for the position to be read, and the position to be written   the position to be read is called “head” that to be written is called “tail” V ７ V array VVV ２ tail head

OverflowOverflow Stack overflows when numbers come more than its size Queue overflows after inserting n+1 values, even though we deleted many values   Set the tail to the head of the array, when overflow   head also When the tail passes the head, really overflow occurs V １０ V array tail VVV head ５ ０

Adjustment for Passing When the tail catches up the head, something happens (all cells are written some values)   this situation is the same as the empty queue!   we cannot distinguish them Ways out of this are + + prepare flag to distinguish them (one bit) + + not write the last cell (size of queue will be n-1) ５ array ５ VVVVVVVVVV tail head

A Subroutine for Queue An implementation of queue is the following they input a queue structure and the value to be written int QUEUE_ins ( QUEUE *Q, int a){ if (( Q->t +1 ) % Q->end == Q->s ) return (1); // overflow error Q->h[Q->t] = a; Q->t = ( Q->t +1 ) % Q->end; return (0); } int QUEUE_ext ( QUEUE *S, int *a){ if ( Q->s == Q->t ) return (1); // underflow error *a = Q->h[Q->s]; Q->s = ( Q->s +1 ) % Q->end; return (0); } VVV typedef struct { int *h; // array for data int end; // size of array int s; // counter for head int t; // counter for tail } QUEUE typedef struct { int *h; // array for data int end; // size of array int s; // counter for head int t; // counter for tail } QUEUE h[] end t t s s

Example of Usages Input numbers one by one, and output five of them at once, at some points Draw the trajectory of the mouse cursor, with the fixed length (store the locations of mouse cursor in queue (ex., 30 locations in each second), and delete the ones before the specified period) h[] end

List: Ins/Del with Keeping the Order Arrays are simple and useful, but need much cost for keeping the ordering Can we have advantage for the ordering, with possibly lose some advantages on other functions   For example, random access can be lost (we can access to the k-th element in constant time for any k) + + customers in the line of a counter service, with allowing cancel and breaking into the line + + edition of document; insert/delete/move words, sentences, and sections, even pictures, in the sequence of letters (and objects)

Idea: Simulate a Chain A (real) chain is useful, in such a situation however, finding the kth is not light   simulate this structure in the computer In a chain, the neighboring relations are fixed, but the place is not. Thus, each ring (cell) of the chain can be located at any place in the memory, and adjacency has to be kept Each cell has to store its neighbor (previous, and next) thus, each cell has three values + + the value + + the previous cell (position, or pointer) + + the next cell (position, or pointer) 1 1 5 5 7 7 3 3

Strategy for Insertion/deletion When detach/insert a ring of a chain, we,   cut the relation to the neighbors (of the ring)   for a cell, change the “adjacent relation” of its “neighbors” For insertion, change the adjacency relation of the cells, on the place to be inserted (the inserting cell becomes a new neighbor) For deletion, directly connect each other, the neighbors of removing cell Both can be done in constant time when the target cell is given, and does not change the order 1 1 5 5 7 7 3 3

Structure Using Pointer Define this structure and allocate one block for each request   no limit for the size (length), while arrays have Note: definition of LIST needs LIST itself, thus we need a trick of using _LIST_ The head and tail of a list has to be kept in memory, otherwise the list will be lost in the large memory space (a LIST structure can point head/tail ） typedef struct _LIST_ { struct _LIST_ *prv; // pointer struct _LIST_ *nxt; // pointer int h; // value } LIST typedef struct _LIST_ { struct _LIST_ *prv; // pointer struct _LIST_ *nxt; // pointer int h; // value } LIST typedef struct { LIST *prv; // pointer LIST *nxt; // pointer int h; // value } LIST typedef struct { LIST *prv; // pointer LIST *nxt; // pointer int h; // value } LIST

Code (Initialization) For initialization, prepare a LIST structure ● as the root of the list, and set nxt and prv to itself, to represent an empty list int LIST_init ( LIST *L ){ L->prv = L; L->nxt = L; } After inserting several cells to this empty list, the nxt/prv of ● points head / tail ● ● ● ● 1 1 5 5 7 7 3 3

InsertionInsertion Insertion is done with giving (the pointers to) the cell to be inserted, and the cell just before the place to be inserted change the pointers of the cells on the place int LIST_ins ( LIST *l, LIST *p ){ p->nxt->prv = l; l->nxt = p->nxt; p->nxt = l; l->prv = p; } Notice that the order of changing the pointers is crucial In some bad orderings, the operation will not be done correctly ● ● 1 1 5 5 7 7 3 3

DeletionDeletion Deletion of a given cell is done by connecting the previous cell and the next cell by pointers int LIST_del ( LIST *l ){ l->nxt->prv = l->prv; l->prv->nxt = l->nxt; } The pointers of “l” need not to be modified, since it is out (further, if we want to recover the cell in the list, in future, we can immediately identify the place to be recovered by looking at the non-deleted pointers) ● ● 1 1 5 5 7 7 3 3

In Usual Textbooks Generally, head/tail are supposed to point “NULL”   nxt/prv is NULL, then that is the end Theoretically beautifully, but bothering for programming   Insertion/deletion concerned with the edge needs an exception   Insert before NULL, set the prv of NULL to X are impossible, so we have to avoid them by several if statements with considering the place to be operated 1 1 5 5 7 7 3 3 list

Loop along a List Tracing a list can be done by going to the nxt repeatedly, starting from ● LIST *p; int e; for ( p= ● ->nxt ; p!= ● ; p=p->nxt ){ e = p->h; … } Opposite direction is done by using prv ● ● 1 1 5 5 7 7 3 3

Recover a Cell A cell just removed (the neighbors are not operated) can be recovered by inserting it to the position at which the cell was int LIST_recov ( LIST *l ){ LIST_ins ( l, l->prv); } The position is stored at prv/nxt In this way, we can recover all removed cells in the opposite order of the removal Removed cells can be identified by setting prv := NULL still nxt indicates the position ● ● 1 1 5 5 7 7 3 3 1 1 5 5 7 7 3 3

Usages of List Insert 1 to n to a list one by one so that i is inserted to the position next to j that is randomly chosen from 1…i-1 (random permutation is generated in linear time) Jobs in a time scale, new job comes, and some jobs will be canceled

A Sophisticated Usage We have n pairs of values: (x 1,y 1 ),…, (x n,y n ) We want to know the nk’/m th largest x value in the pairs whose y has rank of at most k, for each k, k’ (=1,…,m) Straightforward method spends O( n 2 log n) time A sophisticated algorithm using a list spends only O( n( m+log n)) time

A Sophisticated Usage (2) Make a list of pairs sorted by values of x Store pointers to nk/m th largest values for all k Make a list of pairs sorted by values of y, and trace the list from the largest; at the same time, the currently visited pair is removed from the first list Update the positions of rank of 1/m th to m/m th This can be done by shifting the positions to right or left, according to the value of removed cell (we know the number of cells on the left side and right side) Intuitively, the complexity is; O( n 2 log n)  O( n( m+log n)) thus n times faster

Single Link List If we are always given the previous cell for insertion/deletion, we do not have to have pointers to the previous cell (only to the next cell) Operations will be limited, memory/program/speed will be efficient + + Making slides? + + Merging sorted sequences of numbers 1 1 5 5 7 7 3 3 ● ●

List Realized by Array List is realizable by arrays of cells instead of bothering pointers (on memory allocation, segmentation fault,…) Advantage: Advantage: cells have indices, thus we can easily allocate weight/extra value for all cells just allocating an array Disadvantage: Disadvantage: array needs cost to re-size Many applications in the real world needs fixed number of cells, thus no disadvantage In this case, all cells are stored in one structure typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST

Example of Array List The i-th cells of arrays h, prv, and nxt are h, prv, and nxt of cell i Consider the first/last cell as the root of the list typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST 1 1 3 3 4 4 2 2 0 0 4 4 0 0 3 3 1 1 2 2 V V V V V V V V 0 1 2 3 4 ( ● ) h prv nxt

BucketBucket Queues and lists are useful but not so for the search   ex) find all values of 1 digit Some structure would make the search efficient A simple case is classification by values, since we usually want to find values near by the given key

Idea of Bucket We prepare one structure (array, list, etc.) for each value, then the values are classified Ex) Ex) we are given numbers from 0 to 99, and classify them according to their digits in ten’s place Each structure is realized by a list, since we don’t know the size of the structures after inputting all the numbers 0123456789

Usage of Bucket Sorting numbers by their digits in ten’s place Transposition of a sparse matrix A: 1,9,5 B: 1,2 C: 1,6,7 D: 4,5 0123456789

Application: Radix Sort Buckets can sort the numbers by a digit in linear time Using this, we repeatedly sort the numbers from lower-orders to higher orders, with keeping the past order in the ties We need two buckets, but they can be one; we insert all numbers in the first bucket, keeping the ordering, and re-sort by scanning the bucket 0123456789

HashHash Buckets are useful when the values are classified finely, however, then, we need many buckets   we need huge memory moreover, scanning the bucket takes long time … then, are their any trade-off; such as high, but not perfect, accuracy, and non-large memory Further, we can restrict ourselves to just “find this value” neither “larger than this”, nor “between XXX and YYY”

Idea of Hash Consider the case of string data Question ”Is string S inside this bucket?” will be answered quickly, if we prepare buckets for all possibilities   needs much memory space However, we can assume usually that strings are not many So, let’s use the first two letters for the bucket classification   Two strings will be in the same bucket even if they have different third letters; this reduces the memory for buckets   if a bucket is empty, checking its inside is light however, if it contains many, check involves long scan so the operation will be heavy

Bucket for Strings ”Doesn’t bucket accept only of numbers?” Yes! Thus we have to convert a string to a number (index), and classify the strings according to the index 1: ABCABC 2: ABBBBB 3: CCCBBB ”First two letters” is converted to a number, such as when alphabet size is three, suppose that A=0,B=1,C=2 and regard a string as a 3-digit number AB  1, CC  8 0123456789

Bias of Distribution Sometimes, first two letters are not uniform in the data ex., English words (“st”, “th”, and “re” are frequent)   some buckets will have many, and the others will have few … then, can we use some good mapping functions instead of “first two letters”? (such function is called hash function, the functional value of a data is called hash key, or hash value)   further, considering the real world applications, similar value should have (much) different hash values Ex) Ex) For x 1,x 2,x 3,…, the modulo of (x 1 ) 1 +(x 2 ) 2 +(x 3 ) 3 and ((x 1 +1)x 2 +1)x 3 … by (#buckets)

Determine the Size How can we determine the size of hash? (#buckets) Basically, any bucket should have few, in particular, 1 or 2 values   originally, we have n values, thus O(n) is acceptable for #buckets Then, we can set them to constant multiplication of n particularly, we have no loss of space complexity As same as stacks, we double the size when the hash overflows   no loss on the time complexity

SummarySummary Stack and queue: combination of array and counters adopts sequentially coming data List: store the adjacency relation between data so that the order of the data is kept efficiently on insertion and deletion Bucket: make the search easy by classifying the data by their values Hash: buckets with hash keys for keeping both classification accuracy and memory efficiency high

Download ppt "Basic Data Structures Stack and Queue List Bucket and Hash."

Similar presentations