Presentation is loading. Please wait.

Presentation is loading. Please wait.

String C and Data Structures Baojian Hua

Similar presentations


Presentation on theme: "String C and Data Structures Baojian Hua"— Presentation transcript:

1 String C and Data Structures Baojian Hua bjhua@ustc.edu.cn

2 What ’ s a String? A string is a sequence of characters: Every character ci (0 ≤ i<n) is taken from some character set (say the ASCII or the UniCode) Ex: “hello, world”, “string1\tstring2\n” Essentially, a string is a linear list But different operations

3 Isn ’ t String a “ char* ” ? C ’ s convention for string representation C has no built-in string type Every string is a char array (char *) terminated with char ‘\0’ Operations are available in standard library (see the library ): char *strcpy (char *s, const char *ct); char *strcat (char *s, const char *ct); … Operations are array-based and thus efficient

4 Problems with C String? Weakness of C ’ s “ char * ” string: Most strings are constants See demo of C’s “char *”… Why? May be space-consuming why? Operations may be too slow strcmp (char *, char *);

5 Problems with C String? Some operations are dangerous: Ex: strcpy (“ab”, “1234”) Notorious source of bugs it’s programmers’ duty to prevent these Some viruses take advantage this… Morris’s worm in 1988 the world’s first wide-spread See demo for this…

6 “ String_t ” ADT We want an ADT “String_t”: hides the concrete representation of string offers more flexible operations and cures security problems To keep compatible with C, ‘\0’ is reserved as terminator

7 Interface // in file “string.h” #ifndef STRING_H #define STRING_H #define T String_t typedef char *T; // CDT. Why? T String_new (char *s); int String_size (T s); int String_isEmpty (T s); int String_nth (T s, int n); T String_concat (T s1, T s2); #undef T #endif

8 Array-based Implementation // in file “string.c” #include “string.h” // Basic idea is to heap-allocate arrays \0 0 n str

9 Operations: “ new ” String_t String_new (char *s) { int len = strlen (s); String_t p = malloc ((len+1) * sizeof(*p)); while (*p++ = *s++) ; return p; } \0 0 n p

10 Operations int String_size (String_t s) { return strlen (s); } int String_nth (String_t s, int n) { return s[n]; } // Recall the definition typdef char *String_t; // do we really need these functions? size

11 Operations: “ concat ” \0 s1 \0 s2 p \0

12 Operations: “ concat ” String_t String_concat (String_t s1, String_t s2) { int n1 = strlen (s1); int n2 = strlen (s2); Strint_t p = malloc ((n1+n2+1) *sizeof(*p)); // copy both s1 and s2 to p, leave to you …; return p; }

13 Summary so far The string representation discussed so far is functional style again: functional==data never change we always make new data from older ones Java and ML also have functional strings But for some operations, buffer may be used for efficiency purpose

14 Problem? It may be too slow: consider how to implement this: int strcmp(char *dst, char *src); It may be too space consuming: e.g., these two calls generate two strings: String_new ( “ hello ” ); So, we need a high-level “ string ” to resolve these

15 Interface // in file “str.h” #ifndef STR_H #define STR_H #define T Str_t typedef struct T *T; // ADT! T Str_new (char *s); int Str_size (T s); int Str_isEmpty (T s); int Str_nth (T s, int n); T Str_concat (T s1, T s2); #undef T #endif

16 Array-based Implementation // in file “str.c” #include “string.h” #include “str.h” struct Str_t { char *s; int size; int hashCode; // omitted for now }; // maintain an internal // cache of all “str”! List_t allStrs = 0; \0 0 n s size str

17 Operations: “ new ” Str_t Str_new (char *s) { // #1: search the “allStrs” list to try to find // “s” // #2: if #1 succeeds, then return the result // #3: else cook a new node, put it to “allStr” // and return it return p; } data next data next data next l …

18 Operations: “ new ” Str_t Str_new (char *s) { // we only write #3 here: Str_t temp = malloc (sizeof (*temp)); temp->str = s; temp->size = strlen (s)+1; List_insertHead (allStrs, temp); return p; } data next data next data next l …

19 Operations: “ equals ” int Str_equals (Str_t s1, Str_t s2) { return s1==s2; // Fast! } data next data next data next l …

20 Summary For fast string comparison and less memory usage, we modify the data structure of string a technique called memoization Interplay between DS and algorithm again


Download ppt "String C and Data Structures Baojian Hua"

Similar presentations


Ads by Google