Presentation is loading. Please wait.

Presentation is loading. Please wait.

This presentation is intended to be viewed in slideshow mode

Similar presentations


Presentation on theme: "This presentation is intended to be viewed in slideshow mode"— Presentation transcript:

1 This presentation is intended to be viewed in slideshow mode
This presentation is intended to be viewed in slideshow mode. If you are reading this text, you are not in slide show mode. Hit the F5 function key to enter slideshow mode. Linked Lists

2  Linked Lists Introduction and Motivation
Linked Lists versus Linear Lists Abstract Properties and Operations Animation of a Dynamic Linear Structure Implemented as a Linked List Building (Insertion Into) a Linked List: Simplest Case Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations Bi-directional Lists Circular Lists Headed Lists Summary

3 Motivation Both Real World Engineering and Pedagogical
Linked lists are the simplest of the fundamental computer science objects known as dynamically linked structures They are dynamic in that their size can vary during execution as individual items are inserted into or removed from the list, like the list of students in CS315, which can and usually does change after the semester starts as some students add the course and others drop it Linked lists are extremely useful in and of themselves – e.g., the CS315 roster – and they provide the basis for even more interesting (both theoretically and practically) objects such as stacks, queues, and binary trees, the most beautiful data structures in the universe Implementation of even the simplest of linked list operations requires careful thought, selection among multiple design alternatives, and excruciating care with the coding details – all skills that apprentice software engineers need to cultivate

4 Linked Lists and Linear Lists
Linked lists are one of two common implementations of a slightly more abstract concept known as a linear list (a one-dimensional array is the other implementation) Unlike, for example, binary trees, the abstract properties of linear lists by themselves are not to me that fascinating; what makes them worth our study here is their tremendous real world utility and the importance of mastering their linked implementation alternatives that we’ll use throughout CS315 (and that you’ll often use professionally later) But I promised in my introduction to this course that we would begin our discussion of every data structure this semester with a discussion of its abstract properties so I’ll include them here for logical consistency – but I’ll be brief (for me ;-)

5 Abstract Properties of Linear Lists
Each item in the list except the first has a unique predecessor and each except the last has a unique successor The uniqueness of the predecessor/successor relationship means that the list can be laid out as illustrated above and is the reason that simple lists are often known as linear lists A list may or may not be empty, but, in the abstract, there is no maximum size Note that an array does have a maximum size, the one you declare, and, as far as C is concerned, its size is constant This is a theme we’ll see again and again – an implementation of an abstract data structure may induce properties or limitations that the abstract structure itself does not have George Washington John Adams Thomas Jefferson … George Bush Barack Obama

6 Abstract Operations on Linear Lists
Traversal – visiting (e.g., printing out) each item in the list; comes in two flavors: first-to-last and last-to-first Search or find – determining whether a given item is present Deletion – of a given item Note that insertion can’t be defined without more information There are many possible places where insertion could occur In front of the front Behind the last Somewhere in the middle (where?) So a definition of “insertion” will require more knowledge about where and why; for linear lists, its behavior is not uniquely defined just from the linearity property

7 Abstract Operations on Linear Lists (cont’d)
Any abstract data structure has certain operations and characteristics defined even in the abstract, regardless of implementation Whether or not a given application needs a given operation on some data structure is an engineering issue, not a theory of data structures one As an engineer, you need to know what the properties and operations are so that you can pick an appropriate structure for your problem ─ it may have more than what you need, but it better not have less

8 Abstract Operations on Linear Lists (cont’d)
For linear lists, the abstract properties and operations may seem pretty obvious and hence this discussion mostly unnecessary, but later this semester we’ll be dealing with data structures whose properties and operations are much less obvious and I want to start out doing things properly, i.e., defining our structures in the abstract before moving to questions of implementation

9  Linked Lists Introduction and Motivation
Linked Lists Versus Linear Lists Abstract Properties and Operations Animation of a Dynamic Linear Structure Implemented as a Linked List Building (Insertion Into) a Linked List : Simplest Case Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations

10 Example of a Dynamically Linked Data Structure
Suppose we want to write a program that works with a set of numbers whose cardinality varies widely – perhaps a variable number of temporary data entry clerks input the numbers one at a time until they get tired and tell our program “no more”; we couldn’t know in advance how many clerks we would have available on any given day nor how many items any given clerk would actually enter in any given day If our code declared too big of an array size, most of the time we'd be wasting most of it But if we declared too small an array and filled it up, we couldn't keep going without major problems A dynamic data structure whose size can increase one cell at a time whenever we want it to is what we need A linked list implemented with pointers is an important technique for implementing such a dynamic data structure

11 Dynamically Growing a Linked List
We'll need a pointer to the start of our list; it should be appropriately named; for this example, we'll be stunningly original and call it “start” Initially, before the operator provides any input numbers, the list is empty A pointer value of NULL is a standard way to indicate the end of the list; so if the start is NULL, (pointing nowhere) that means the list is empty A NULL pointer is typically shown graphically by an arrow pointing to ground – that's the electrical engineer's symbol for a ground there, the start When we get a number, we use malloc to dynamically obtain some storage space for it and link it into the list 18 -2 461 23 314 And so on ... Notice that the structures in the list have no names; they can only be accessed via some pointer – names are part of the communication from the programmer to the compiler; these objects, above, are not compiled objects; they are created dynamically during program execution, the compiler is long gone The next time we get a new number, we again get some new space for it and link it in to the end of the growing list Note that the list elements are not simple integer cells but structures of some sort, since each must have space for a pointer in it as well as the integer

12  Linked Lists Building (Insertion Into) a Linked List: Simplest Case
Introduction and Motivation Building (Insertion Into) a Linked List: Simplest Case Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations

13 Example of Building a Linked List
The previous animation was designed for simplicity of imagery, to get us started visualizing the concept of a linked list start pointed to the original (oldest) item in the list, new items being inserted to the right, at the far end of the list, after previous items; the list growing in the direction of its links start older newer Although the code for this version of insertion is simple enough, it’s not actually the very simplest version to code. Insertion at the front of the list, new items being inserted before (to the left of) previous items is actually slightly simpler, so to keep our sample code as simple as possible, that’s what we’ll look at Later, we’ll also look at why and how we might insert in the middle of the list rather than at one of the ends

14 Animation for the Code Example We’re About to Do
start will always point to the most recent item in the list, new items being inserted in front (to the left) of older items And we’ll color code the various components of our structure so help us tell them apart This is what we’ll code: integer pointer to our structure start

15 The Creation of a Linked List: The First Insertion
At this point we have completed the insertion of the first element into a previously empty list The temp pointer actually still points into our list, too; but so what? We're done with it for now and won't refer to it again until we create a new item Our list is the structure (set of items) pointed to by start; the existence and properties of that list are not affected by the existence of other “leftover” pointers pointing to any of those items start, or whatever other name you choose in your code, always points to the complete list; temp is just that, a “working” variable used to help construct the list, but not part of the list itself – a more descriptive (i.e., better) name for it would have been newItemForInsertion, but I didn’t have room for anything that long on these charts Note that the & is used normally to provide an address to scanf for the location of the storage for the user's input As it happens, we could have left out the parentheses and just written &temp->someInteger, but unless one remembers the operator precedence tables, better safe than sorry; and I think the version here is easier to read in any event, no? Although this very first time through this insertion loop we could store the address returned by malloc directly into start instead of temp, that won't work for any subsequent insertions On the next slide, we'll see that this logic we’re writing here is general enough so that it actually works for all our insertions Since we want to insert our new LIST_ITEM in front of the current start of our list, we need to make our new one point to the item currently at the front, i.e., pointed to by start At this point in our example here there aren't any items in the list so we just wind up setting temp->next to NULL (the current value of start) which is fine, it indicates that this item is the last item in the list, which it will be – when there's only one item in the list, as there is for now in this current example, it is both the start and the end of the list, no? We'll also need a temporary pointer to hold the address returned by malloc for a new cell before it's linked in to the list typedef struct listItem { int someInteger; struct listItem *next; } LIST_ITEM; And this statement, by setting start equal to temp makes start point to the same place as temp does, thus completing the insertion of the new LIST_ITEM into our list at the very front Note that this statement and the one before have to be done in exactly the right order or the list will wind up looking rather strange! Note the use of sizeof(LIST_ITEM) to make sure that malloc obtains the right amount space for us LIST_ITEM *start = NULL , *temp; ; This definition defines a recursive structure, also known as a self-referential structure, since each item of this type contains a pointer to another item of the same type Here it's embedded within a typedef just to make the rest of the program a little easier to read Look at the next line (when it gets here ;-) and think about how you'd write it if there were no typedef That's one of the major uses for a typedef, brevity and improved comprehensibility while (???) { } Now let's look at the logic to add a new LIST_ITEM into the list We'll do this inside of a while loop since we don't know how many items the user will want to add First, we need to get some new memory for our new cell with our old friend the malloc function temp = ??? = malloc Note that we still need to put the * in front of the variable name to indicate that what we want to declare here is a pointer to a LIST_ITEM, not a LIST_ITEM itself, which, I'm sure you remember, is what we'd get if we left out the * in front of the name temp (sizeof(LIST_ITEM)); Without the typedef, this line would be struct listItem *start = NULL; which is completely equivalent, except possibly for readability Note that the name LIST_ITEM here needn't have any relationship to the name listItem, above, as far as the compiler is concerned Stylistically, however, it seems a good idea to make them related somehow, and C is case sensitive, so all caps for typedef's is a common convention (but not a requirement) printf("Enter your integer: "); scanf("%d", &(temp->someInteger) ); The malloc will return the address for the new memory it just allocated to us We need to store that address in a pointer variable somewhere or we'll never be able use this new chunk of memory in the future Also note that at this point we actually have a completely valid linked list; it's just empty at the moment Now that our typedef is set up, we need to declare a pointer to the start of the list, like the one we saw in the simpler animation earlier Anyway, now we'll ask the user to provide an integer value for our new list item Here's where we want to store the user input; and this integer cell can be referenced as temp->someInteger, correct? temp->next = start; start = temp; start 37 temp

16 The Creation of a Linked List: More Insertions
This little while loop of five statements – and two of them are printf and scanf, which are really not about linked lists at all – is all it takes to build a linked list as long as we want, until our operator tells us to quit, for example, since it's 5PM Let's watch it at work a few times while (???) { } temp = , *temp; ; malloc (sizeof(LIST_ITEM)); start = temp; typedef struct listItem { int someInteger; struct listItem *next; } LIST_ITEM; LIST_ITEM *start = NULL temp->next = start; printf("Enter your integer: "); scanf("%d", &(temp->someInteger)); This little while loop of five statements that we just developed – and two of them are printf and scanf, which are really not about linked lists at all – is all it takes to build a linked list as long as we want, until our operator tells us to quit, for example, since it's 5PM Let's watch it at work a few more times in our loop Make the new item point to the item currently at the front of the list, thus positioning the new item in front of the old start of the list Reset the start pointer to point to the new item that is now at the front of the list And so on … start -2 23 5 37 temp

17  Linked Lists Traversal and Traversal-Based Operations
Introduction and Motivation Building (Insertion Into) a Linked List : Simplest Case Traversal and Traversal-Based Operations Traversal Search/Find Deletion Insertion into the middle (building an ordered list) Variations, Embellishments, and Elaborations

18 Traversal A Fundamental Operation on Linked Lists (and Linked Structures in General, For That Matter) Since the traversal pointer is still pointing to a valid item, we’re not done with the traversal yet … LIST_ITEM *trvPtr = start; We’re still not done with the traversal yet … And since trvPtr is now NULL, we’ve reached the end of the list and are therefore done with the traversal But first, to know what to store in trvPtr, we have to evaluate the expression trvPtr->next To traverse a data structure is to go through it one item at a time, “visiting” each item in turn The purpose of the visit is application dependent; maybe, for example, we just want to add up all the values in the list, or maybe we’re searching to find some specific value For our example here, where we’re concerned with the mechanics of traversal and not what we do on a visit, let’s just print out the integer in each item as we visit it } trvPtr = trvPtr->next; { while (trvPtr != NULL) printf(“%d”, trvPtr->someInteger); Note that we do not want to use our start pointer as our traversal pointer If start is the only pointer we have to the start of our list and we change it, as we’re certainly going to do with our trvPtr, we’ll lose forever all ability to access the actual start of our list We’ll need a traversal pointer to keep track of where we currently are in the list We’ll initialize it to the start of the list, since, as the name suggests, that’s the usual starting point for doing much of anything with a linked list ;-) -2 23 5 37 Since next is a pointer, what’s stored here is an address, which we represent in these diagrams as an arrow, so the value of the expression trvPtr->next is this arrow So it is this address (arrow) that will be stored in trvPtr by the assignment statement we’re currently executing … so let’s visit the current item … Since this is an assignment statement, we’re going to change the bit pattern that’s stored in trvPtr Since trvPtr is a pointer, that means we’re changing what it points to This statement’s syntax and semantics are very typical of those we use in navigating linked structures in general, so let’s make sure we understand the details of how it works So the current value of the expression trvPtr->next is the bit pattern stored here, in the component named next of the structure currently pointed to by trvPtr Currently, trvPtr contains the address of (designates, or points to) this structure So the result of this statement is to advance trvPtr to the next item in our list Here’s the traversal code It’s pretty simple; let’s step through it We’ve lost access to -2 and can never get it back … and then move on to the next one start -2 23 5 So we need a separate traversal pointer that we can change 37 trvPtr

19 Reminder On Interpreting and Using the Graphics of Linked Structures
start -2 23 5 37 The arrows portray the logical connectivity, or topology, of our structures, which is all that we really care about The picture above emphasizes that this structure is a linear list (one item after another), other topologies are possible and used for other purposes; we’ll look at some later this semester The actual layout in memory need not look anything at all like our logical view, so long as the pointers still provide the correct topology; here, below, is a partial memory map showing how our linear list might actually be laid out in memory: 37 -2 5 23 start •••

20 Traversal Is a Fundamental Operation
It is important all by itself: E.g., print out the list of all students enrolled in CS315 It is used at the beginning of several other key operations: Search/find Delete Insert-in-order

21 Search/Find E.g., Look Up a Phone Number Given a Last Name
Traverse the list, checking each item to see if it is the one being searched for, stopping either when we reach the searched for item or when we complete the traversal and have no more items to check If we find what we were searching for, the search is said to be successful If the traversal completes without finding the desired data, the search is unsuccessful; the target item is not present in the list

22 Delete a Specified Item
start 37 5 23 -2 delPtr deleteItem(5); free is the system service call that returns memory to the OS, e.g., free(delPtr); free is thus the opposite of malloc Note that we are freeing up the memory that delPtr points to, not actually delPtr itself As a matter of good programming practice, everything obtained via malloc should eventually be explicitly returned to the OS via a free call – as part of a program’s cleanup- before-shutdown processing, if not before But since what we do with an item after deleting it from a structure is application dependent, we won’t show this step in our code here; we’re only concerned with the theory and practice of deletion As a programming matter, we should save the address of this item in some pointer variable (it’s currently in delPtr) Maybe we’d want to insert it into a different structure after we deleted it from this one – e.g., after a student flunks CS315, remove the student record from the list of current Computer Science majors and then insert it into a list of Sociology majors At the very least, if we were really all done with this item, we’d want to return its storage to the operating system High level pseudocode: Search for the item to be deleted using the standard search/traversal logic If the search is unsuccessful, do nothing – or maybe report that the item to be deleted was not found If the item to be deleted is found, adjust list pointers as necessary to remove it from the list There are some interesting issues here; let’s look at an example Here’s the situation after we have traversed the list and successfully found the item to be deleted Let’s say, for example, that it’s the item containing 5 that we want to delete Item 5 has been deleted from the list Note that the item itself still exists; it’s just not part of the linked list anymore: item 23 now points to item 37 Non trivial question: Just exactly how do we designate the pointer we need to adjust? But how do we designate this cell that we want to set to delPtr->next ? = delPtr->next; We know what to set it to: ??? = delPtr->next; As far as the list itself is concerned, all we need to do to delete the item is to adjust one pointer ???

23 Delete a Specified Item (cont’d) Adjusting the Pointers
Reminder In terms of code development, do the general case first; in this case, that’s when item to be deleted is somewhere in the middle of the list Then figure out if you need special cases – i.e., places where the logic for the general case won’t work Typical special cases involve working at the ends of the list In this example, the general purpose logic works correctly to delete the last item, but fails to delete the first item properly, so here we only needed one special case; sometimes you’ll need more, sometimes fewer; but when trying to figure out your algorithm, start with the general case And start by drawing/annotating pictures showing which pointers get adjusted when and to point to where Only then, when you’re sure you’ve got a workable algorithm, worry about translating it into code That’s known as separation of concerns, a more formal name for doing one thing at a time deleteItem(2); deleteItem(5); One standard method involves the use of two traversal pointers: a leader and a trailer, where trailer is always kept one item behind the leader so that when the leader finds the item to be deleted, the trailer’s next pointer is the one that must be adjusted to actually do the deletion of the item pointed to by the leader So here’s the actual deletion code: trail->next = lead->next; Note that there will be a special case to handle the deletion of the very first item in the list: /* Traverse/search to find the item to be deleted */ if (lead == start) /* It is the first item that is to be deleted */ start = start->next; else /* “Normal” deletion logic */ trail->next = lead->next Designating the pointer to the item to be deleted (bypassed) will require some additional work There are two more or less obvious approaches start -2 23 5 37 delPtr lead trail

24 Another Way to Delete a Specified Item
deleteItem(5); Note that if the illustration here were the whole story, we never saved the address of this cell anywhere (it’s what used to be in delPtr->next), so now we can’t access this cell anymore We can’t even give it back to the OS, since the free call requires the address of the cell to be freed up, and our code has no record of it anymore Oopsy, we probably should have saved the old value of delPtr->next somewhere An alternative to the two pointer (leader/trailer) method is to only use a single, trailing pointer at the cost of slightly more complicated expressions in both the “find-the-item-to-be-deleted” logic and the actual deletion logic itself while ( delPtr->someInteger != 5 ) delPtr = delPtr->next; while ((delPtr->next)->someInteger != 5) delPtr = delPtr->next; Since the next component is itself a pointer type, the value of the expression delPtr->next is an address/arrow that points to this, the next item of our list delPtr->next = (delPtr->next)->next; And we have, as before, deleted item 5 from the list, although, as before, the item itself continues to exist … leaving us no way to designate the pointer that needed to be adjusted to actually do the deletion Here’s the loop condition we now want to use so that the loop stops with delPtr pointing to the item before the item to be deleted … C programming note: C evaluates multiple -> operators from left to right so the parentheses can be omitted from (delPtr->next)->someInteger E.g., delPtr->next->next evaluates to what we want and to me, is slightly easier to read; this is a rare case where knowing and taking advantage of the C operator precedence rules and leaving out unnecessary parentheses makes code more readable, not less Since this item, designated by delPtr->next, is itself a structure that has components, (delPtr->next)->someInteger designates this component, named someInteger, so the value of the expression (delPtr->next)->someInteger is 5 delPtr->next designates the next component of the item pointed to by delPtr Similarly, (delPtr->next)->next designates the next field here … … but whose next field is the one that will have to be adjusted to actually do the desired deletion Here’s our old traversal loop, that stops when delPtr points to the item to be deleted … It’s important to understand the meaning of expressions like this that have multiple -> operators … so here’s the pointer adjustment logic that does the actual deletion … start -2 23 5 37 … by copying this value … … into this cell delPtr

25 Still Need to Check for Special Cases
if (start->someInteger == valueToBeDeleted) start = start->next; /* Delete the very first item in the list */ else /* Traverse to find the valueToBeDeleted */ { delPtr = delPtr->next; delPtr->next = (delPtr->next)->next; /* Do the deletion */ } As before, when we check for problems we’ll see that the general case logic doesn’t handle all possible cases To make this general case look truly general, let’s replace 5, the specific value used in the last example, with a more general valueToBeDeleted Here’s the special case our general logic can’t handle, deleting the first item in the list while ((delPtr->next)->someInteger != delPtr = delPtr->next; delPtr = start; valueToBeDeleted) 5) One problem with this “single trailing pointer” approach is that since delPtr is initialized to start, the first item the traversal loop, above, actually looks in for the valueToBeDeleted is actually the second item in the list – i.e., it starts looking at 23, not at 2 And there’s no way to initialize delPtr to anything earlier than the start so that this code could check the first item in the list The result is that this code, above, can’t find the item containing -2 and so can’t delete it Note that this time the general case traversal is inside an ‘if’ statement whereas last time (separate leading and trailing pointers) the traversal logic was the same in all cases and only the actual deletion logic itself had the special case which resulted in the ‘if’ statement being inside the while loop There’s no general pattern to special cases; the only firm rule to follow is that one should develop the general case logic first, then figure out what the special cases are (they vary from problem to problem) , then add logic for them wherever and however necessary Putting the traversal ‘while’ loop as one case of an enclosing ‘if’ statement solves the problem And the same general case logic we used before correctly handles everything else start -2 23 5 37 delPtr

26 More Special Cases, Still Using Only One (Trailing) Traversal Pointer
Note that we need to add a check to our traversal/search loop to keep us from advancing beyond the end of the list and then trying to de-reference a NULL pointer, which would cause our program to blowup Also note that we are relying on the fact that the && operator is a short-circuit operator to keep our code from trying to evaluate delPtr->next->someInteger when delPtr->next is NULL if (start == NULL) /*List is empty */ printf(“Can’t delete from an empty list, dummy”); else if (start->someInteger == valueToBeDeleted) start = start->next; /* Delete first item from list */ else /* Traverse the list looking for the valueToBeDeleted */ { delPtr = start; /* Initialize the traversal pointer */ while ( (delPtr->next != NULL) && (delPtr->next->someInteger != valueToBeDeleted) ) delPtr = delPtr->next; /* Move to next item */ if (delPtr->next == NULL) /* Fell off the end of the list */ printf(“Can’t find %d in the list”, valueToBeDeleted) else /* The traversal loop found the item to be deleted */ delPtr->next = (delPtr->next)->next; /* Do the deletion */ } The search loop completed without ever finding the valueToBeDeleted The search loop looks for the valueToBeDeleted We should also deal with the case that the valueToBeDeleted isn’t in our list at all, which comes in two flavors: The list is empty The list isn’t empty but doesn’t contain the valueToBeDeleted We didn’t check for these cases in the two pointer (leading and trailing) code that we developed earlier; we should have Maybe the same code can cover both cases here, maybe it can’t; we’ll have to check and see In any event, let’s look here at the complete code for deletion using only a single (trailing) traversal pointer The empty list case The special case for deleting the first item in the list The search loop terminated when it found the valueToBeDeleted (that’s 5, in this single trailing pointer example illustrated here, not 23), this line of code deletes it start 37 5 23 -2 The code above covers all the bases delPtr

27  Linked Lists Traversal and Traversal-Based Operations
Introduction and Motivation Building (Insertion Into) a Linked List : Simplest Case Traversal and Traversal-Based Operations Traversal Search/Find Deletion Insertion into the middle (Building an Ordered List) Variations, Embellishments, and Elaborations

28 Ordered Lists In an ordered list, items are ordered on the basis of some data item, known as the key, as illustrated in the ascending list below Traversal, search, and deletion are the same as we saw before; but insertion is different The earlier (simpler) insertion animations only inserted at one end of the list Now, to keep the list in order, we’ll usually need to insert into the middle, no? Suppose we wish to insert a node with a key of 24 into the list below, how would you go about it? Here’s some pseudocode: Traverse the list to find the place to insert the new item so as to preserve the ordering Adjust the appropriate pointers to actually do the insertion We need to insert the new 24 node before the 30 one newItemForInsertion 24 start 5 10 30 37

29  Linked Lists Variations, Embellishments, and Elaborations
Introduction and Motivation Building (Insertion Into) a Linear Linked List Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations Bi-directional lists, including fascinating (or at least important ;-) sidebars on: Topology again l-values, r-values, and how compilers really process an assignment operator; all of which are necessary to understand expressions like a->b->c->d->e = a->b->c->d->e Circular Lists Headed Lists Summary

30 Bi-Directional Lists Since all we’re interested in is the list mechanics, I’m not going to bother making up other data items to be past of our list items Just as a structure can have more than one integer or floating point component, if we find it useful for whatever problem we’re trying to solve, it can certainly have more than one pointer struct biDirectionalListItem { … struct biDirectionalListItem *previous, *next; }; So here’s what a bidirectional list might look like: Note that here we used multiple (two, to be precise) pointers of the same type; since that’s all we need for a bidirectional list More complex problems (not simply bidirectional lists) might require multiple pointers of multiple types start Later we’ll look at other topologies – multi-linked structures that are not linear (a bi-directional list is still linear) such as orthogonal lists (for sparse matrices) or binary trees, the most beautiful data structures in the universe

31 Benefits for Bi-Directional Lists
It makes deletion a lot cleaner (neither of our two previous deletion algorithms was exactly elegant, were they?) After we find the item to be deleted, we don’t need leading or trailing pointers to help with the deletion delete(D) start A B C D E delPtr Insertion into an ordered list is simplified, too, much as deletion is Traversal in reverse order becomes possible; without a great deal of difficulty, it isn’t possible with a one directional list Is that important? It depends; some applications need to be able to traverse in both directions, some don’t Your job, as an engineer, is to know the capabilities, plusses, and minuses of each of the tools/techniques in the standard armamentarium

32 Sidebar: Programmers and Topologies
It’s up to you, the programmer, to insure that you set up the pointers so that your lists have the desired topologic properties – e.g., bidirectional linearity If you wanted to, you could use the struct biDirectionalListItem that we defined on the last slide to make a linked thingy that looked like so: I can’t imagine why anyone would want to create a monstrosity like that (although some of you will probably manage something like it the first few times you try to create a bidirectional linear list ;-) but each item in the monster thingy would still be a struct biDirectionalListItem; only the connection topology would be different C provides features to support pointers and structures, lists are up to you; there are languages that directly support lists and basic list operations (LISP, for example), but neither C nor any of its descendants are among them: the topologic properties of linked structures are up to the programmer E D C B A start

33 Another Sidebar (Fairly Important): The Assignment Operator (or ‘=’ Sign)
The point of this digression is make sure you know what really happens when you write things like delPtr->next->previous = delPtr->previous, and delPtr->previous->next = delPtr->next, the two lines used to do the deletion from the bidirectional list There’s nothing particularly new or tricky here, but its crucial to what we’re doing so I want to go over the concepts pretty precisely delPtr->next->previous = delPtr->previous Delete(D) start A B C D E delPtr->previous->next = delPtr->next delPtr

34 Another Sidebar: The Assignment Operator In More Detail
delPtr->next->previous = delPtr->previous The compiler must do three things for an assignment (=) For the left side: Figure out the type of the value of the expression to the left of the = sign (e.g., delPtr->next->previous); it must be an address value (pointer type) or the expression won’t compile Generate the code to evaluate that expression in real time when the assignment is executed For the right side: Figure out the type of the value for the expression to the right of the = sign; it must be a type that’s legal for the address from the left-side expression to contain (e.g., delPtr->previous = *diameter won’t compile) Generate the code to store (assign) the computed value from the right hand expression in the memory location whose address will be computed by the code generated for the expression on the left side

35 The Problem with an Overly Simplistic View of the Assignment Operator (cont’d)
But given, for example, the declarations int x,y; if the assignment operation looks like x = y+2, the compiler would seem to have a problem: As far as we know so far, the type of a simple variable, like x, in an expression is the type the variable was declared as, int, in this case But when we look more formally at the details, as we just did, the assignment operator requires the expression on the left to be an address type, not an integer

36 The Solution So here’s the compiler’s real logic:
Ordinarily, the value of a simple variable name like x is what we expect, i.e., the value stored there and the type of that value is the type the variable was declared as But, if the next operator to the right is an assignment operator*, the value of the variable name is the address of the variable and the type of that value/expression is the correct pointer type – e.g., an integer pointer if x is an integer * Not just =, but +=, -=, *=, /=, %=, &=, &&=, ^^=, etc; several of these are used rarely, if ever, but the compiler allows them and they are assignments

37 The Exception in Evaluating the Assignment Operator (cont’d)
To continue being technically precise, a simple variable name is thus overloaded in C, as in most modern imperative languages: It can be evaluated by the compiler to one of two completely different values (bit patterns) depending on where it is in a larger expression To the immediate left of an assignment operator, its value is its address; elsewhere, it’s the bit pattern stored at that address (unless we explicitly use the & operator to ask for its address) The two different possible values for the same variable name are referred to as its l-value and its r-value The l- (or left) value being the address The r- (or right) value being the bit pattern stored there Which value the compiler chooses to use depends on where the variable name is, if it’s in an assignment statement

38 Sidebar: Assignment Evaluation (cont’d)
The evaluation of an expression like delPtr->next->previous = delPtr->previous has the same issue and the same resolution Because it’s to the left of the assignment operator, the value of this expression will be the address of the cell designated delPtr->next->previous, namely the address of the cell for the component named previous of the structure pointed to by delPtr->next Because it’s to the right of the = sign, the value of this expression will be the bit pattern stored in the cell designated by the expression delPtr->previous 5 10 15 Because both expressions are the same type, struct biDirectionalListItem *, or pointer to a struct biDirectionalListItem, the compiler is happy to generate the code to do the evaluations and make the assignment delPtr

39 Last Sidebar: C is Being Nice to Us for a Change
delPtr->next->previous = delPtr->previous Got it? OK, then how about this as a possible exam question: Given the picture below, consider the expression delPtr->next->previous->previous->next->someInteger Is it legal (will it compile)? If so, what is its r-value? delPtr->next->previous->previous->next->someInteger I can’t imagine ever needing to write such an expression, usually we’re working “closer” to some named pointer (e.g., delPtr), but if we need to, C allows it and you now know how to decipher it (congratulations ;-) Each -> operator here … The highlighted expression, above, is designating the component cell named previous in the structure pointed to by the highlighted arrows in this picture If the expression is on the left of an = sign, as it is here, its value is the l-value (address) of this cell … … but if the highlighted expression above were not to the left of an assignment operator, it would be evaluated to its r-value, which, since the type of a previous cell is a pointer type, is the address of some other cell, Note the lovely (and hardly coincidental) correspondence between the syntax of C and the pictures we need to understand the semantics 5 10 15 delPtr … corresponds to one arrow that must be followed to evaluate (understand) the expression

40  Linked Lists Circular Lists Introduction and Motivation
Building (Insertion Into) a Linear Linked List Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations Bi-directional Lists Circular Lists Headed Lists Summary

41 Circular Lists aPtrToTheList An empty circular list Note that many authors (but not all) consider a circular list to be merely an implementation technique for a linear list, since each item still has a unique predecessor and a unique successor The successor of an item is the item it points to The predecessor of an item is the one that points to it If, as shown here, the list is implemented uni-directionally, it won’t have pointers to predecessors, but that’s an implementation issue, not a topological one; i.e. – the property that each node has a unique predecessor and a unique successor is the abstract, or topological, property, known as linearity, regardless of whether or not the underlying implementation is uni- or bi-directional (or circular) What’s this? And what line of code could you write to achieve it? This is the general case illustrated here There are two special cases, one pretty normal for linked lists in general, one not Not as ubiquitous as linear lists, but still quite useful Often used in operating systems – which we’ll explore in a bit more detail in CS420, e.g.: Round robin scheduling Contiguous memory management with a first fit allocation policy

42  Linked Lists Headed lists Introduction and Motivation
Building (Insertion Into) a Linear Linked List Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations Ordered lists Bi-directional lists Circular lists Headed lists Basic Concept Example: Sparse Matrices Summary

43 Headed Lists A.k.a. Lists With Sentinels
Anyway, the motivation for all this stuff will, I hope, become much clearer when we look at orthogonal lists to represent sparse matrices, coming up next It (sparse matrices) really is a pretty solution to an important problem Remind me to bring in and show you a fairly good textbook (Gollmann, Computer Security), where the author, whom I otherwise mostly like, states that a certain implementation of a key data structure in computer security can’t be used in most cases because there’s no good, general solution to a problem with one aspect of the implementation Bullshit – you’ll know how to solve that problem after a few more charts here A list header (a.k.a sentinel node) is an item in a linked list that Is the first item in the list (the head of the list) May not be deleted Has some special (application specific) semantics or meaning associated with some value in some field that identifies it a sentinel (header) node ─ the examples will make this clearer, I hope; bear with me Headed lists are very useful and very common; some problems cannot easily be solved any other way We’ll see a particularly lovely and important example when we look at an implementation of sparse matrices via orthogonal lists, but let’s start with a simpler example For our simple example, here’s a list is to contain only positive integers, but for this example, assume we also need to keep track of the length of the list for some reason Rather than declaring a separate variable for length, store the length in a sentinel node, but make it negative so as to ensure that the header (sentinel) is not mistaken for an ordinary (positive) node … and the sentinel must be created dynamically and any deletion algorithm must be sure not to later delete it by accident Note the difference: Here, start is a declared variable, but it is a simple pointer not a structure … Sometimes, the header is a declared node that starts the list, all other list nodes being created normally (i.e., dynamically, during execution) Whereas here, start is a declared structure … start start 78 52 2 17 -4

44  Linked Lists Example: Sparse Matrices Introduction and Motivation
Building (Insertion Into) a Linear Linked List Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations Ordered Lists Bi-directional Lists Circular Lists Headed Lists Basic Concept Example: Sparse Matrices Summary

45 Introduction and Motivation for Sparse Matrices
Although sparse matrices themselves are interesting and important objects, they don’t really belong here since they’re not linear lists But they are built from linear lists and what interests us here is that the lists must be headed or we can’t get this sparse matrix structure to do what we need to So looking at sparse matrices will give us a chance to see what drives the need for headed lists and how we work with them

46 A Multi-Linked Structures Example: Orthogonal Lists for Sparse Matrices
A sparse matrix is one where the majority of the entries in the matrix are 0 Economists, for example, might want to keep track of the extent to which changes in the price of one commodity, product, or service (CPS) are correlated with changes in others They prepare a complete list of CPS’s, possibly millions of entries long for a large national economy, then make a square matrix M, where each entry 0 ≤ mi,j ≤ 1 is the correlation coefficient between CPSi and CPSj But the vast majority of the mi,j are all 0; I mean, how much do you think the price of steel correlates with the price of, for example, bubble gum?

47 A Multi-Linked Structures Example: Orthogonal Lists for Sparse Matrices (cont’d)
The natural representation of a matrix in a programming language like C is obviously a 2-dimensional array But if there were a million commodities, the array would have a trillion entries; if most of them were 0, that would be filling a lot of memory with zeroes; that seems wasteful Very few modern systems will let you use a 106 x 106 array in any event Even if such a declaration compiled, it would probably blow up in execution (that’s what happens on prclab)

48 A Multi-Linked Structures Example: Orthogonal Lists for Sparse Matrices (cont’d)
What we want is some other data structure (not an array) that just stores the non-zero elements of M To find the value of some mi,j, we search the structure; if the search is successful, we know the (non-zero) value of mi,j, if the search is unsuccessful, we know its value is 0 One implementation technique for sparse matrices involves what are called orthogonal lists

49 A Multi-Linked Structures Example: Orthogonal Lists for Sparse Matrices (cont’d)
Here’s a diagram of what each member of this sparse matrix structure* would look like: Each item is a member of two separate lists All the non-zero items in row i form a single ordered list, ordered by j (their column number), linked by their nextInRowPtr All the non-zero items in column j form a single ordered list, ordered by i (their row number), linked by their nextInColPtr nextInColPtr nextInRowPtr i j mi,j i and j are commodity numbers mi,j is the correlation coefficient between commodity i and commodity j * The term structure is overloaded here. The sparse matrix is an example of the sort of theoretic object called a data structure that we study in computer science, particularly in CS315. As an implementation matter, each element of the sparse matrix will be a structure in the C programming sense, (Many other languages don’t use the word “structure” this way; Ada, for example, calls such things “records”.) Anyway, the figure, above, is a pictorial representation of the C structure that would comprise one element of a sparse matrix; the next slide illustrates how such elements fit together to make a sparse matrix

50 An Example of a Multi-Linked Structure: Orthogonal Lists for Sparse Matrices (cont’d)
We’re trying to insert m882, 713 Inserting the new node into the correct row list is easy enough: Search the column of row headers to find the correct row number, 882 In the example here, row 882 already exists; but if that row header didn’t exist, it would mean that there were no non-zero elements in that row yet; so we’d create the new row by inserting a new node with row number 882 and column number of -1 into the ordered column of row headers The problem now is finding the column list for column 713, if it even exists, which in this example it does, but of course we won’t always be inserting into existing columns If column 713 does exist, we need to find it so that we can find this node, so that we can insert our new node after it by adjusting the nextInColPtr pointer here and in our new node So how do we find this node, or figure out that the entire column doesn’t even exist yet? -1 - 326 667 713 801 startOfTheMatrix Assume, for example, that you wish to know the value of m56,492 Since this data structure is not an array, you can’t simply ask C to retrieve it for you by writing m[56][492] As discussed earlier, your code has to search for it Where do you start? And what happens, for example, when you get to m152,326 or m927,713, where do you go next and how do you get there? What we’re really asking here, of course, is how do we traverse this beast? It’s not obvious 152 -1 - 168 882 617 927 0.46 152 326 0.31 Now the search logic is easy: Traverse the column of row headers by following the nextInColPtr Each time you move down to a new row, traverse it by following the nextInRowPtr before moving down to the next row One answer could be to search the entire matrix, which we can do now that we added the column of row headers, looking for entries that have the column number we want But that seems awfully inefficient; there’s got to be a better way OK, things are going pretty well so far; we can search this structure to find out if mi,j is in it How about insertion? Let’s try inserting m882,713 The answer is to add a row of column headers to our structure Then, after we insert the new node into the correct row list by searching the column of row headers, we can then search the row of column headers for the correct column (713 in this case), creating a new column if necessary … Once the correct row header is found or created, insert m882,713 into that row Since the row lists are ordered by column number, the new node, m882,713, in this example, must be inserted between the nodes for m882,667 and m882,801 So here’s an illustration of a (very!) sparse 1000 x 1000 matrix having only 7 non-zero elements in it: m152,326, m168,667, m168,801, m617,713, m882,667, m882,801, and m927,713 Not very realistic, but it makes the artwork here a lot easier and it will still show the key issues we’ll have to deal with 168 667 0.62 168 801 0.38 The solution here is to add a column of row headers The column is an ordered linked list, ordered by row #, linked by the nextInColPtr A row of our sparse matrix will have a row header in this column if and only if the matrix has at least one non- zero value in the row 617 713 0.22 882 667 0.53 0.05 882 801 0.17 … and then insert the new node into its column list, column lists being ordered by row number, of course ? 927 713 0.46

51 Here is the Complete Algorithm (Pseudocode) for the Insertion of mi,j
start Create the node for mi,j Row insertion: Search the column of row headers to find the header for row i If row i does not yet exist, create a header node for row i and insert it into the column of row headers Insert the mi,j node into row i Column insertion: Search the row of column headers to find the header for column j If column j does not yet exist, create a header node for column j and insert it into the row of column headers Insert the mi,j node into column j Note that in this particular problem (insertion into a sparse matrix), the order of insertion into row and column lists is irrelevant; you could do the column insertion first, followed by the row insertion, or the row insertion first followed by the column insertion; there’s no ultimate difference, since the two operations are independent of one another

52 The “real” sparse matrix elements
Sentinel Marks … and a negative row # for a column header -1 - 326 667 713 801 startOfTheMatrix Note that these header nodes are not elements of the sparse matrix from the mathematical standpoint The “real” sparse matrix elements But to C, a node is a node is a node 152 -1 - 168 882 617 927 152 326 0.31 The sentinel mark is some attribute of some value for some data item in the node that’s “special” and so can be used to identify a node as a sentinel or header rather than a node containing real data Here, I used a negative column # for a row header … Alternatively, for this example matrix, I could have set the mi,j to some negative value for both row and column headers, since any valid correlation coefficient must be ≥ 0 For that matter, I could actually have used mi,j = 0 to mark a header, since no real element in this structure would have a 0 for mi,j; by definition, the only elements that are supposed to be here are those with non-zero mi,j values 168 667 0.62 168 801 0.38 In this example so far, our algorithms haven’t actually needed to identify sentinel nodes, but let’s complicate our life a little bit ;-) 617 713 0.22 882 667 0.53 0.05 882 801 0.17 927 713 0.46

53 Sentinel Marks (cont’d)
And a row traversal pointer initialized to currentRowPtr -1 - 326 667 713 801 startOfTheMatrix travPtr 152 -1 - 168 882 617 927 0.46 152 326 0.31 Let’s look in slightly more detail at how the traversal algorithm for this structure would work currentRowPtr 168 667 0.62 168 801 0.38 We’ll need a pointer to the current row being traversed, initialized to startOfTheMatrix->nextInColPtr And here’s a complete traversal algorithm: while (currrentRowPtr != NULL) { travPtr = currentRowPtr->nextInRowPtr; while (travPtr != NULL) { visit(travPtr); travPtr = travPtr->nextInRowPtr; } currentRowPtr = currentRowPtr->nextInColPtr; } 617 713 0.22 882 667 0.53 0.05 882 801 0.17 927 713 0.46

54 Sentinel Marks (cont’d)
… and then traverse the new row as before by repeatedly setting travPtr = travPtr->nextInRowPtr until we reached the end (NULL) Starting from the startOfTheMatrix… -1 - -1 326 - -1 667 - -1 713 - -1 801 - startOfTheMatrix travPtr We already saw the row major traversal: We traversed the column of row headers and traversed each new row we came to To do a column major traversal, we’d traverse the row of column headers and traverse each new column as we came to it That’s it; the implementation of a sparse matrix, a multi- linked structure, built out of orthogonal circular lists with headers with sentinel marks Pretty slick, no? 152 -1 - 168 882 617 927 0.46 Most implementations would probably circularize the column lists as well as the row lists, so that the matrix could be traversed in either row major order or column major order 152 326 0.31 The problem comes at the end of a row … … taking travPtr right back to where we need it to be so that we can move down to the next row Now at the end of a row, the normal row traversal movement follows this nextInRowPtr pointer… … we need to get travPtr back here so we can move down to the next row by following the nextInColPtr; but we have no way to get back here Of course we recognize by the sentinel mark that we’ve just arrived at a row header rather than an ordinary sparse matrix element and so we must have just completed traversing a row and so it’s time to follow the nextInColPtr to get down to a new row rather than following the nextInRowPtr again and going into an infinite loop ─ circular lists being easily prone to that ;-) 168 667 0.62 168 801 0.38 No sweat; let’s just circularize the row lists! Now let’s do it with only a single traversal pointer ‒ no separate currentRowPtr to keep track of what row is being traversed … we could certainly move travPtr down to a new row by setting travPtr = travPtr->nextInColPtr … 617 713 0.22 882 667 0.53 0.05 882 801 0.17 927 713 0.46

55 Summary of the Sparse Matrix
The sparse matrix implementation we just saw built a multi-linked structure out of orthogonal circular headed lists The sparse matrix is our first, it surely will not be our last, example of a multi-linked structure, one where each element has more than one pointer component This is a theme we will see over and over again in CS315: complex data structures being built up out of simpler data structures which in turn are built out of simpler data structures until we eventually get to something the language itself supports (pointers, in this example)

56 Summary of the Sparse Matrix (cont’d)
There are multi-linked structure like (yes!) binary trees where the linked elements are not organized into linear lists, but let’s leave that for another day (month, actually ;-) The reason I put this sparse matrix problem here rather than waiting until later in this course has less to do with multi-linked structures, although it is a great example, than that the sparse matrix is a pretty example (well, I think it’s pretty ;-) of an important real world problem that uses linked lists and that can’t be done without making those lists headed lists The circularity was added mostly just to torture you – but it did eliminate the need for a separate named pointer to the current row, hardly worth the effort in this case, but under other circumstances, circularity can be more important

57  Linked Lists Summary Introduction and Motivation
Building (Insertion Into) a Linear Linked List Traversal and Traversal-Based Operations Variations, Embellishments, and Elaborations Ordered Lists Bi-directional Lists Circular Lists Headed Lists Summary

58 Linear List Variants and Embellishments: Summing It All Up
You can mix and match all the linked list variants discussed here as your application requires; they are completely independent and every possible combination – e.g., unordered circular uni-directional with a header, bidirectional with a header but not circular, ordered circular no header, etc, etc – has been used for some problem or other Your job, as always, is to know what techniques are available (that’s where Gollmann slipped up) and which are called for by what types of problems Sometimes it’s obvious – if you need to print out ordered lists both forward and backward, bi-directional lists are clearly the way to go Sometimes it’s not – which is why there’s more to good software engineering than merely being a good code-slinger


Download ppt "This presentation is intended to be viewed in slideshow mode"

Similar presentations


Ads by Google