Download presentation
Presentation is loading. Please wait.
1
Strings
2
Outline In this lesson, we will: Define strings
Describe how to use character arrays for strings Look at: The length of strings Copying strings Concatenating strings Consider string operations, specifically distances Learn how to manipulate strings Look at other alphabets and Unicode
3
post pots spot stop tops opts spto
Strings An array stores a list of values E.g., temperatures, voltages, positions, speeds, etc. Generally, each value has independent significance An array of characters, however, has the following properties: The significance comes from how the characters are strung together: post pots spot stop tops opts spto The characters come from a small alphabet If the characters of an array come from a fixed alphabet, the array is called a string of characters, or simply a string The alphabet for C++ strings is the set of all ascii characters More inclusive strings use Unicode The length of a string is the number of characters
4
Strings One structure that could be used for strings is as follows:
struct string_t { char *string; std::size_t length; std::size_t capacity; }; void string_init( string_t &str, std::size_t cap ) { str.capacity = cap; str.string = new char[str.capacity]; str.length = 0; } The string itself may be shorter than the capacity of the array
5
String length C++ strings, however, are much simpler:
They are an array of characters where the entry following the last character is the null character '\0' with a value of 0x00 We will prefix all identifiers that are pointers to strings with "s_" We can determine the length of a string: std::size_t string_length( char *s_str ) { for ( std::size_t k{0}; true; ++k ) { if ( s_str[k] == '\0' ) { return k; }
6
String length Important: 'a' is a single null character
"a" is an array occupying 2 bytes The first entry is 'a' and the second is '\0' Oddly enough, "\0" is an array occupying 2 bytes Both entries are '\0'
7
String length Suppose we have an argument:
Any characters after the '\0' are ignored 1 2 3 4 5 6 7 8 9 10 11 E C ␢ \0 ? s_str std::size_t string_length( char *s_str ) { for ( std::size_t k{0}; true; ++k ) { if ( s_str[k] == '\0' ) { return k; }
8
String length We initialize k to zero and step through the array s_str
1 2 3 4 5 6 7 8 9 10 11 E C ␢ \0 ? s_str k: 0 std::size_t string_length( char *s_str ) { for ( std::size_t k{0}; true; ++k ) { if ( s_str[k] == '\0' ) { return k; }
9
String length When we get to the null character, we return: s_str k: 7
1 2 3 4 5 6 7 8 9 10 11 E C ␢ \0 ? s_str k: 7 std::size_t string_length( char *s_str ) { for ( std::size_t k{0}; true; ++k ) { if ( s_str[k] == '\0' ) { return k; }
10
String length Question: What happens if you forget to include a null character? It will continue until it finds a '\0' (0x00) or it causes a segmentation fault 1 2 3 4 5 6 7 8 9 10 11 E C ␢ ? s_str std::size_t string_length( char *s_str ) { for ( std::size_t k{0}; true; ++k ) { if ( s_str[k] == '\0' ) { return k; }
11
String copying If we want to copy a string, we must determine its length and copy its characters over char *string_copy( char *s_str ) { std::size_t length{string_length( s_str )}; char *s_result{new char[length + 1]}; for ( std::size_t k{0}; k <= length; ++k ) { s_result[k] = s_str[k]; } return s_result;
12
String copying Suppose we have an argument: s_str C + \0
1 2 3 4 5 6 7 8 9 10 11 C + \0 s_str char *string_copy( char *s_str ) { std::size_t length{string_length( s_str )}; char *s_result{new char[length + 1]}; for ( std::size_t k{0}; k <= length; ++k ) { s_result[k] = s_str[k]; } return s_result;
13
String copying We find the length of the string s_str length: 3 C + \0
1 2 3 4 5 6 7 8 9 10 11 C + \0 s_str length: 3 char *string_copy( char *s_str ) { std::size_t length{string_length( s_str )}; char *s_result{new char[length + 1]}; for ( std::size_t k{0}; k <= length; ++k ) { s_result[k] = s_1[k]; } return s_result;
14
String copying We allocate memory for a new array of the appropriate capacity 1 2 3 4 5 6 7 8 9 10 11 C + \0 s_str 1 2 3 s_result length: 3 char *string_copy( char *s_str ) { std::size_t length{string_length( s_str )}; char *s_result{new char[length + 1]}; for ( std::size_t k{0}; k <= length; ++k ) { s_result[k] = s_1[k]; } return s_result;
15
String copying Copy over the characters: s_str s_result length: 3 C +
1 2 3 4 5 6 7 8 9 10 11 C + \0 s_str 1 2 3 C + \0 s_result length: 3 char *string_copy( char *s_str ) { std::size_t length{string_length( s_str )}; char *s_result{new char[length + 1]}; for ( std::size_t k{0}; k <= length; ++k ) { s_result[k] = s_1[k]; } return s_result;
16
String copying We return the address of the new array: s_str s_result
1 2 3 4 5 6 7 8 9 10 11 C + \0 s_str 1 2 3 C + \0 s_result length: 3 char *string_copy( char *s_str ) { std::size_t length{string_length( s_str )}; char *s_result{new char[length + 1]}; for ( std::size_t k{0}; k <= length; ++k ) { s_result[k] = s_1[k]; } return s_result;
17
String concatenation If we want to concatenate two strings, we must determine the lengths and then copy them to a dynamically allocated array char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result;
18
String concatenation Suppose we have two arguments: s_1 s_2 B j a r n
1 2 3 4 5 6 7 8 9 10 11 B j a r n e \0 s_1 1 2 3 4 5 6 7 8 9 10 S t r o u s p \0 s_2 char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result;
19
String concatenation We determine their lengths s_1 s_2 length_1: 6
1 2 3 4 5 6 7 8 9 10 11 B j a r n e \0 s_1 1 2 3 4 5 6 7 8 9 10 S t r o u s p \0 s_2 length_1: 6 legnth_2: 10 char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result;
20
String concatenation We allocate a new array s_1 s_2 str length_1: 6
1 2 3 4 5 6 7 8 9 10 11 B j a r n e \0 s_1 1 2 3 4 5 6 7 8 9 10 S t r o u s p \0 s_2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 str length_1: 6 legnth_2: 10 char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result;
21
String concatenation We copy over the first six characters: s_1 s_2
1 2 3 4 5 6 7 8 9 10 11 B j a r n e \0 s_1 1 2 3 4 5 6 7 8 9 10 S t r o u s p \0 s_2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 B j a r n e str length_1: 6 legnth_2: 10 char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result;
22
String concatenation We copy over all characters including '\0' from the second 1 2 3 4 5 6 7 8 9 10 11 B j a r n e \0 s_1 1 2 3 4 5 6 7 8 9 10 S t r o u s p \0 s_2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 B j a r n e S t o u s p \0 s_result length_1: 6 legnth_2: 10 char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result; Notice we did not reset 'k' prior to starting the second loop – We continue copying to where we left off…
23
String concatenation We return the address of the new array s_1 s_2
1 2 3 4 5 6 7 8 9 10 11 B j a r n e \0 s_1 1 2 3 4 5 6 7 8 9 10 S t r o u s p \0 s_2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 B j a r n e S t o u s p \0 s_result length_1: 6 legnth_2: 10 char *string_concat( char *s_1, char *s_2 ) { std::size_t length_1{string_length( s_1 )}; std::size_t length_2{string_length( s_2 )}; char *s_result{new char[length_1 + length_2 + 1]}; std::size_t k{0}; for ( std::size_t i{0}; i < length_1; ++i, ++k ) { s_result[k] = s_1[i]; } for ( std::size_t i{0}; i <= length_2; ++i, ++k ) { s_result[k] = s_2[i]; return s_result;
24
Operations on strings There is a significant amount of work into strings Extracting or finding substrings Describing or finding patterns Matching case or not Defining whitespace and finding only whole words
25
Distances between strings
One important question is how similar are two strings? How close are two strings? Consider: "Et tu, Brute?" "t tu, Brute?" "Et ut, Brute?" "Et tu, Brune?" The Levenshtein distance is defined as the minimum number of edits required to convert one string to another One edit is defined as Inserting or removing a character Replacing a character Swapping two adjacent characters
26
Distances between strings
For example, you could use the Levenshtein distance to determine which words to suggest in a spell checker For example: “incomprehssible” is not a word, but incomprehssible incomprehssible incompressible incomprehesible incomprehensible This word is: One edit away from incompressible Two edits away from incomprehensible Recommend “incompressible” first…
27
Distances between strings
What’s wrong with this picture? The distance is context insensitive Ideas cannot be incompressible, so suggest the second first…
28
Distances between strings
Recall the properties of the Euclidean distance: dist( A, B ) ≥ 0 dist( A, B ) = 0 if and only if A = B dist( A, B ) = dist( B, A ) dist( A, B ) ≤ dist( A, C ) + dist( C, A ) All of these properties hold for the Levenshtein distance between strings
29
Editing strings If we want to edit a string, it cannot be a literal string: #include <iostream> int main(); int main() { char *s_str{(char *)"Hello world."}; s_str[12] = '!'; // segmentation fault!!! return 0; }
30
Editing strings If we want to edit a string, it cannot be a literal string: #include <iostream> int main(); int main() { char *s_str{(char *)"Hello world."}; std::cout << s_str << std::endl; char *s_copy{string_copy( s_str )}; // Replace the 11th character with '!' s_copy[11] = '!'; std::cout << s_copy << std::endl; delete[] s_copy; return 0; } Output Hello world. Hello world!
31
Editing strings We can also swap two adjacent characters: Output
#include <iostream> int main(); int main() { char *s_str{(char *)"Hello wrold."}; std::cout << s_str << std::endl; char *s_copy{string_copy( s_str )}; // Swap characters 7 and 8 char ch{s_copy[8]}; s_copy[8] = s_copy[7]; s_copy[7] = ch; std::cout << s_copy << std::endl; delete[] s_copy; return 0; } Output Hello wrold. Hello world.
32
Editing strings Erasing a character requires a for-loop: Output
#include <iostream> int main(); int main() { char *s_str{(char *)"Hello world."}; std::cout << s_str << std::endl; char *s_copy{string_copy( s_str )}; // Remove the 5th character for ( std::size_t k{5}; s_str[k] != '\0'; ++k ) { s_copy[k] = s_copy[k + 1]; } std::cout << s_copy << std::endl; delete[] s_copy; return 0; Output Hello world. Helloworld.
33
Editing strings Inserting a character requires a for-loop, but it also requires that the array is sufficiently large… What happens if it is not? Exercise: What do you have to do to add two exclamation marks to the end of the string "Hello world!"? char *s_str{(char *)"Hello world!"}; std::cout << s_str << std::endl; char *s_copy{string_copy( s_str )}; Output Hello world! Hello world!!!
34
Strings in other alphabets
Other alphabets include: Morse code uses five characters: dot dash inter-character space inter-word space inter-sentence space Note: “SOS” is · · · · · · while the mayday sos is · · · · · · inter-character spaces
35
Strings in other alphabets
Western European alphabets often include additional characters on top of ascii; however, Unicode allows for most alphabets German ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß Swedish ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ Italian ABCDEFGHILMNOPRSTUVZ Slovenian ABCČDEFGHIJKLMNOPRSŠTUVZŽ Polish AĄBCĆDEĘFGHIJKLŁMNŃOÓPQRSŚTUWXYZŹŻ Greek ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ Russian АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩъыьЭЮ Я Persian ا ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی Gurmukhi ੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜ
36
Strings in nature Even better, deoxyribonucleic acid (dna) is a string with a four-characters alphabet: cytosine C guanine G adenine A thymine T All the algorithms developed by computer scientists for analyzing and manipulating strings were immediately transferable to the analysis and manipulation of dna This is one of the beauties of abstraction
37
Summary Following this lesson, you now
Know that strings are sequences of characters Those characters come from a fixed alphabet Know the most primitive means of storing strings are null-character-terminated arrays of char Understand how to: Calculate the length of a string Copy a string Concatenate two strings Know that the last two require dynamic memory allocation Understand string distances Are aware that Simple strings are limited to ascii Other languages require Unicode
38
References [1] No references?
39
Colophon These slides were prepared using the Georgia typeface. Mathematical equations use Times New Roman, and source code is presented using Consolas. The photographs of lilacs in bloom appearing on the title slide and accenting the top of each other slide were taken at the Royal Botanical Gardens on May 27, 2018 by Douglas Wilhelm Harder. Please see for more information.
40
Disclaimer These slides are provided for the ece 150 Fundamentals of Programming course taught at the University of Waterloo. The material in it reflects the authors’ best judgment in light of the information available to them at the time of preparation. Any reliance on these course slides by any party for any other purpose are the responsibility of such parties. The authors accept no responsibility for damages, if any, suffered by any party as a s_result of decisions made or actions based on these course slides for any other purpose than that for which it was intended.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.