Download presentation
Presentation is loading. Please wait.
Published byHortense Stevens Modified over 6 years ago
1
Array of Structures Contrast this with the corresponding parallel array example in the array chapter. Using an array of structures instead offers several advantages, including: simplified function parameter lists, only one array to be passed instead of several physically all relevant data is stored in a single data structure, eliminating concerns about keeping corresponding data straight via uniform index use The syntax for member access here is straightforward if you simply think about it logically: Itinerary is the name of an array (of structures) Itinerary[Idx] is the name of an element of the array (which is a struct of type Trip) Itinerary[Idx].MPG is the name of a member of that struct variable. In most applications of struct types, an array (or some other data structure) is used to organize a collection of individual struct variables. An array of structures provides an alternative to using a collection of parallel arrays: Blacksburg, VA Knoxville, TN 244 205 ?? 1 Nashville, TN 178 155 . . . 4 Little Rock, AR Texarkana, AR 141 160 const int MAXTRIPS = 500; Trip Itinerary[MAXTRIPS]; Of course, to refer to a member of one of the struct variables in the array we must combine array syntax with struct syntax. To set the first array element would require the syntax: Itinerary[0] = firstLeg; To set the MPG member of the fifth struct variable would require the syntax: Itinerary[4].MPG = 52.88;
2
Designing with Structures
Of course, we're really trying to build up to the notion of an ADT; that's not formally introduced until CS 1704. Structured types allow the creation of specialized data types which better model the logical relationships among the data values being stored and manipulated. This allows the designer to consider the problem from a higher, more natural, level. For example, we may design the step "print a trip" and defer the details of that until a later refinement of the design. Of course, we may do that without structured types, but the use of structured types encourages designers to think at that level. Using a structured type may also promote code reuse if the type is suitable for inclusion in independent applications. The Trip type presented in these notes could be relevant to a number of different programs. Properly speaking, a data type is a collection of values and the operations that may be performed on them. For a struct type, most of these operations will be implemented in separate user-defined functions, and those may also be recyclable into other applications.
3
Application: the Trip Program Revisited
This is the basis for a fairly standard terminal CS 1044 project. There should be a complete formal specification on the course website by the time these notes are reached, and a formal design analysis, as well as the complete source code for a working implementation. The following slides will present portions of that implementation, along with some design discussion. Consider implementing the simple trip program using an array of structures to organize the trip data. Clearly we may use an array of Trip variables, as defined earlier, to store the data: Origin Destination Miles Time Blacksburg, VA Knoxville, TN 244 3:25 Knoxville, TN Nashville, TN 178 2:35 Nashville, TN Memphis, TN 210 3:17 Memphis, TN Little Rock, AR 137 2:05 Little Rock, AR Texarkana, TX 141 2:10 . . . By using an array of structures we will be able to implement the same operations as with a collection of parallel arrays, but the data organization will be much simpler. Almost all of the functions will have much simpler parameter lists. We will consider portions of such an implementation here.
4
The Overall Organization
Note the simplicity of the design of main(). Essentially we have an input section, a calculation section, and an output section. Each of those delegates all details to one or more subsidiary functions. (This should be the same as when parallel arrays were used. The gain here is that there are a LOT fewer function parameters to keep track of.) main() declares only those variables that are needed to store data used in two or more of these subsidiary functions. We maintain a separation of calculation from input and output because that promotes clarity. It would be possible to delegate the calculations for the summary report to the output code, simplifying the structure of main() somewhat. IMO that's a poor tradeoff. Since the data is all encapsulated in a single array of structures we achieve a very simple overall design: int main() { Trip Itinerary[MAXTRIPS]; // list of trip data int numTrips = 0; // number of trips reported. numTrips = readTripData(Itinerary); // Read the given trip data. // Calculate the average MPH for each of the trips: calcAllMPH(Itinerary, numTrips); // Calculate the statistics for the summary report: int totalMiles, totalMinutes; double overallMPH; calcStats(Itinerary, numTrips, totalMiles, totalMinutes, overallMPH); // Write output file: writeReport(Itinerary, numTrips, totalMiles, totalMinutes, overallMPH); return 0; // Terminate a successful execution. }
5
Managing Structured Input
Note that, compared to the parallel array example earlier in the notes, the input manager function has only a single array parameter. We assume that the input file name is a global constant, rather than hard-wiring it locally. That makes modification somewhat easier. The call would resemble: numTrips = readTripData(Itinerary); Logically, reading and storing the trip data is relatively straightforward, given our earlier discussions. However, the syntax for using an array of struct variables bears examination: int readTripData(Trip List[]) { int tripsRead = 0; // counter for trips ifstream In(dataFileName.c_str()); // Open the input file. // If the input file does not exist, this will detect that. // We handle that by printing an error message and stopping the program. if ( In.fail() ) { cout << "Data file not found: " // Write an error message... << dataFileName // including the file name. << endl // Bang "return" << "Exiting now..." << endl; // Finish the message. exit(1); // Terminate a failed execution. } // continues on next slide . . .
6
Managing Structured Input
We’ve discussed the importance of "buffering" input data before committing it to a data structure already… this is a good place to remind them. The use of a "helper" function allows us to isolate (most of) the design logic that depends upon the specific formatting of the input data. Note the use of an aggregate assignment to copy the returned Trip variable into an array cell. This could be done member-by-member but that would require writing more code (for no reason), inviting the introduction of logical errors (such as switching origin and destination). Here is the input management code. Note the use of a local Trip variable to "buffer" the input and the use of a helper function: // continued from preceding slide . . . In.ignore(INT_MAX, '\n'); // Skip over the two header lines in the In.ignore(INT_MAX, '\n'); // trip data input file. Trip nextTrip; nextTrip = readOneTrip(In); // Try to read data for a trip. while ( In && (tripsRead < MAXTRIPS) ) { List[tripsRead] = nextTrip; // Store the trip that was just read tripsRead++; // and count it. nextTrip = readOneTrip(In); // Try to read data for another trip. } In.close(); return tripsRead; // Tell the caller how many trips were read.
7
Reading the Structured Record
We’ve discussed the importance of "buffering" input data before committing it to a data structure already… this is a good place to remind them. The use of a "helper" function allows us to isolate (most of) the design logic that depends upon the specific formatting of the input data. Here we read and return data for a single trip: Trip readOneTrip(ifstream& In) { Trip newTrip; // Trip variable to "bundle" the data int tripHours, tripMinutes; getline(In, newTrip.Origin, '\t'); // Read: the name of the trip origin. getline(In, newTrip.Destination, '\t'); // and the trip destination. In >> newTrip.Miles; // the length of trip in miles In >> tripHours; // hours field for trip time In.ignore(1, ':'); // colon separator In >> tripMinutes; // minutes field for trip time newTrip.Minutes = convertHHMMtoMin(tripHours, tripMinutes); In.ignore(INT_MAX, '\n'); // Skip to beginning of next input line. return newTrip; // Return a copy of the Trip variable to the caller. }
8
Managing Updates to the Array
Point out the actual parameters in the call to the helper function calcMPH(). The helper function expects an integer distance and an integer time value: whether they are bundled within a structure on the calling side is of no interest to the the helper function (note the formal parameters). There is, I suppose, ALMOST a weak "pattern" here in that the top-level function handles selecting the data elements (in some order) from the data structure and passing them to a helper function which understands the internal structure/logic of those elements. The point would be clearer if calcMPH() took an entire Trip as a parameter… The input code does not update the Trip records to include the average speed. This function manages that. Note the use of another helper function. void calcAllMPH(Trip Itinerary[], int numTrips) { int Idx; for (Idx = 0; Idx < numTrips; Idx++) { Itinerary[Idx].MPH = calcMPH(Itinerary[Idx].Miles, Itinerary[Idx].Minutes); } As usual, we have a for loop to traverse the data array so we can process its cells one-by-one. The caller "extracts" the appropriate members of the current Trip variable and passes them to the helper function. double calcMPH(int Miles, int Minutes) { return (MINPERHOUR * double(Miles) / double(Minutes)); }
9
Alternative Implementation
Now the aforementioned pattern is clear. This notion, of separating the traversal logic from the processing logic is a fairly important one in later courses. Either approach will work, but this achieves a separation that promotes generality. The implementation of calcAllMPH() could be recycled to handle other updates to the list of Trip variables by simply renaming the function and calling a different helper. The preceding implementation is correct but it requires both functions to incorporate information about the internal structure of a Trip variable. Here is a revision that makes the top-level function relatively unaware of those internals. void calcAllMPH(Trip Itinerary[], int numTrips) { int Idx; for (Idx = 0; Idx < numTrips; Idx++) { setMPH(Itinerary[Idx]); } Now the caller simply passes an entire data element of the array to its helper function. The caller has no need to "know" anything about the internal organization of a Trip variable. That's all encapsulated within the helper function now. void setMPH(Trip& T) { T.MPH = MINPERHOUR * double(Miles) / double(Minutes); }
10
Writing the List to File
The body of the top-level function here is essentially isomorphic to the earlier one… that's the "traverse and hand off" pattern again. (That's not an "official" name… I'm sure there is one.) The array of structs is passed by constant reference because writing its contents should not modify it. The individual Trip variables are passed to the helper function by constant reference instead of by value primarily to avoid the expense of making a copy. Compare the top-level function here to calcAllMPH(). See a "pattern"? void writeTripData(ofstream& Out, const Trip Itinerary[], int numTrips) { int Idx; for (Idx = 0; Idx < numTrips; Idx++) { writeOneTrip(Out, Itinerary[Idx]); } Note the two uses of pass by constant reference to prevent unnecessary modification of data. void writeOneTrip(ofstream& Out, const Trip& toPrint) { Out << left << setw(MAXNAMELENGTH + 1) << toPrint.Origin; Out << setw(MAXNAMELENGTH + 1) << toPrint.Destination; Out << right << setw( 7) << toPrint.Miles << setw(10) << toPrint.Minutes << setw( 8) << setprecision(1) << toPrint.MPH << endl; }
11
Hierarchical Organization
Hierarchical organization is often referred to as (a type of) aggregation. Using a struct hierarchy well promotes code reuse (or type reuse), and improves the logical organization of data. Once we get to classes in 1704 and especially 2704 the notion of hierarchy becomes much more powerful since the type will include its own operations. In some cases a designer will plan a structured type with members that are also reasonably considered to be structured themselves. For example, we might reconsider the design of the Trip type to allow a more useful description of the origin and destination. Logically each of these is just a "place" which we might consider to be represented by a city name and a state abbreviation: struct Place{ string City; string State; }; struct Trip { Place Origin; Place Destination; int Miles; int Minutes; double MPH; }; This approach provides two custom data types, instead of one. Variables of type Place might well be useful in other applications. Hierarchical organization is often designed as a natural way to take advantage of aleady-existing custom types. Of course a hierarchical organization also requires slightly more complex syntax when accessing the members of a structured member.
12
Member Access in a Hierarchy
Again, it's simple if the syntax is read sensibly: toSmithsonian is the name of a Trip variable toSmithsonian.Origin is the name of a member of toSmithsonian toSmithsonian.Origin is of type Place, so it has members City and State toSmithsonian.Origin.City is the name of a string member of the Place variable The usual syntax rules apply. It's just a matter of keeping straight exactly what the syntax means. Given the type definitions from the previous slide: Trip toSmithsonian; toSmithsonian.Origin.City = "Blacksburg"; toSmithsonian.Origin.State = "VA"; . . . toSmithsonian.Miles = xxx; toSmithsonian.Minutes = xxx; Alternatively we could use an aggregate assignment to set the Place members of our Trip variable: Place WashDC; WashDC.City = "Washington"; WashDC.State = "DC"; Trip toSmithsonian; toSmithsonian.Destination = WashDC; . . .
13
Searching In many applications it is necessary to search a list of data elements for one (or more) that matches some specific criterion. For example: - find the largest integer in a list of test scores - find the location of the string "Fred" in a list of names - find all Trip variables whose destination is "Jackson, WY" There are several basic issues when searching: - how do we recognize and keep track of matches to the search criteria? - how do we terminate the search? - what do we report? (value, location, yes/no, . . .) - how do we recognize that there is no matching data element? - what do we do if no matching data element is found? Here we will only consider the problem of searching among the elements of an array.
14
Simple Array Search Returning a dummy value to indicate failure is only a sensible option if there are reasonable values of the correct type that could not, logically, be data values. If you're storing test scores or areas of polygons or names of presidents then there are good candidates. If you're storing characters entered by a user at the keyboard or values of a polynomial function, good candidates are hard to find. Consider searching the given array for the values 5 and 41. Trace the logic, paying attention to the stop criteria and what information might be returned in each case. There's also the issue of what to do in the event that there are multiple "hits" on the given search criteria. Returning the first match and stopping is a common approach, but it's not always suitable. Point is: this is an issue that must be considered, handled, and documented. When searching an array of simple data elements, we may resolve these questions easily: - We recognize matches by comparing with the equality operator. - We terminate the search if we find an element that equals the search target, or if we have rejected the last used cell in the array. - We may report a match by providing the index or a copy of the element. - We recognize that there is no matching data element if we reach the end of the used cells in the array without finding a match. - If no matching element is found, we return an impossible index (e.g., -1) or a dummy value that cannot logically be a valid data element. unused A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] A[10] A[11] A[12] A[13] A 13 7 43 5 3 19 2 23 29 ??
15
Linear Search Function
Questions: should Size be the dimension or the usage of List[]? why doesn't the loop condition cause an access violation if the dimension of List[] is Size and there is no match? where does the declaration of MISSING go? what will this function return if there are multiple occurrences of Target in List[]? An alternative implementation (my preference actually) would be: for (Scan = 0; Scan < Size; Scan++) { if ( Target == List[Scan] ) return Scan; } return MISSING; The simplest technique for searching a list is to just "walk" the list, examining each element in turn until you either find a match or reject the last used cell: const int MISSING = -1 ; int LinearSearch(const int List[], int Target, int Size) { int Scan = 0; // begin search at first cell // Continue searching while there are more entries // to consider and the target value has not been // found: while ( ( Scan < Size ) && (List[Scan] != Target) ) Scan++ ; if ( Scan < Size ) // Target was found return( Scan ); else return ( MISSING ); // Target was not found }
16
Application: a Simple Database
This is a fairly standard example of the terminal CS 1044 project. There should be a complete formal specification on the course website by the time these notes are reached, and a formal design analysis, as well as the complete source code for a working implementation. The following slides will present portions of that implementation, along with some design discussion. Consider implementing an application that will store data for a list of trips and perform lookup operations on that list (e.g., report the distance or time for a trip given a specific origin and destination). Clearly we may use an array of Trip variables, as defined earlier, to store the data. Assume that the application will use two input files, one containing the trip data as we've seen before, and a second input file that will contain the lookup commands the program is supposed to process, using the general syntax: <command string><tab><tab-separated command parameters> For example: mileage Knoxville, TN Nashville, TN time Tucumcari, NM Albuquerque, NM neighbors Birmingham, AL . . .
17
Searching a List of Structured Data
The search function is deliberately designed (by ignoring trip direction) to be a bit less than trivial. An alternative would be to pass the origin and destination strings "naked". Wrapping them inside a Trip variable allows us to make the caller unaware of the internal structure of a Trip (and simplifies the parameter list, although not the implementation of the test). Is there be anything wrong with the alternative test code below? if ( (toFind.Origin == dB[Idx].Origin || toFind.Origin == dB[Idx].Destination) && (toFind.Destination == dB[Idx].Destination || toFind.Destination == dB[Idx].Origin) ) { . . . In this case, the array elements are complex structures and the search function must access the appropriate member(s) to make the comparison: int reportMileage(const Trip& toFind, const Trip dB[], int numTrips) { int Idx; for (Idx = 0; Idx < numTrips; Idx++) { if ( (toFind.Origin == dB[Idx].Origin && toFind.Destination == dB[Idx].Destination) || (toFind.Origin == dB[Idx].Destination && toFind.Destination == dB[Idx].Origin) ) { return dB[Idx].Miles; } return MISSING; The designer has determined that the direction of the trip does not matter, which complicates the Boolean test for a match somewhat. How would the implementation change if direction did matter? Note the use of const in the parameter list. Is this good design? Why or why not?
18
Binary Search Binary search is often called bisection search. Overemphasize the fact that binary search is only guaranteed to work correctly on a sorted list. It doesn't hurt to trace the execution (later) on an unsorted list to demonstrate a false negative. Linear search will always work, but there is an alternative that is, on average, much faster. The difficulty is that binary search may only be applied under the following: Assumption: the list is sorted in ascending (or descending) order: List[0] <= List[1] <= <= List[Size-1] Binary search: Examine the middle element. If the target element is not found, determine to which side it falls. Divide that section in half and examine the middle element, etc, etc ... ? x y z
19
Binary Search Function
It's very important to discuss (again) the need to be sure that the index values passed in are correct. Both wrt searching and sorting, remind the students continually of the importance of remembering the distinction between the dimension of the array and the usage (the number of cells holding real data). Discuss what would happen if a search or sort algorithm were applied using the dimension as the upper limit. This is a classic error, and students make it a LOT, even after being warned. You CANNOT emphasize the distinction too much. An implementation of binary search is more complex logically than a linear search, but the performance gain is impressive: const int MISSING = -1; int BinSearch(const int List[] , int Target, int Lo, int Hi) { int Mid; while ( Lo <= Hi ) { Mid = ( Lo + Hi ) / 2; // find "middle" index if ( List[Mid] == Target ) // check for target return ( Mid ); else if ( Target < List[Mid] ) Hi = Mid - 1; // look in lower half else Lo = Mid + 1; // look in upper half } return ( MISSING ); // Target not found } Note this implementation allows the caller to search a specified portion of List[].
20
Binary Search Trace Suppose an array contains the values:
Be careful with the termination condition on the second (unsuccessful) search. Consider performing a binary search for the value 2 using the array given on slide 2. Suppose an array contains the values: 1 2 3 4 5 6 7 8 9 10 11 12 13 A 21 35 41 43 51 82 85 86 93 ?? Consider searching A[] for the value 86. What would the call look like? Trace the execution: Lo Mid Hi Consider searching A[] for the value 50. What would the call look like? Trace the execution: Lo Mid Hi
21
Cost of Searching For CS 1044 this is purely optional material… it does serve to illustrate the difference a good algorithm can make (ignoring the cost of building a sorted list for searching). Suppose the array to be searched contains N elements. The cost of searching may be measured by the number of array elements that must be compared to Target. Using that measure: Best Case Worst Case Average Case Linear Search 1 N N/2 Binary Search 1 log2N log2N To get an idea of how much cheaper binary search is, note that N = log2N = 10 N = log2N = 20 Of course, binary search is only feasible if the array is sorted
22
Sorting Many computer applications involve sorting the items in a list into some specified order. For example, we have seen that a list may be searched more efficiently if it is sorted. To sort a group of items, the following relationships must be clearly defined over the items to be sorted: a < b a > b a = b Ascending order: smallest ... largest Descending order: largest … smallest When designing or choosing an algorithm for sorting, one goal is to minimize the amount of work necessary to sort the list of items. Generally the amount of work is measured by the number of comparisons of list elements and/or the number of swaps of list elements that are performed.
23
Decision Trees To sort three items (a, b, c), how many comparisons must be made? Naively we may consider the question by drawing a diagram representing the decisions that must be made to arrive at a solution: a < b c O r d e T F The decision tree above assumes that the three items (a, b, c), are unique.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.