Download presentation

Presentation is loading. Please wait.

Published byDianna Neale Modified about 1 year ago

1
Types

2
Definition A type is a set V of values, and a set O of operations onto V. Examples from C++: The int type: V = {INT_MIN,... -1, 0, 1,..., INT_MAX-1, INT_MAX} O = { >, +, -, *, /, %, =, ++, --,...} The char type: V = {NUL,..., '0', …, '9', …, 'A',..., 'Z',..., ‘a',..., 'z', DEL} O = { >, =, ++, --, isupper(), islower(), toupper(), … } The string type: V = {"", "A", "B", "C",..., "AA", "AB", "AC",..., "AAA",...} O = { >, +, +=, [], find(), substr(), … }

3
Primitive Data Types Nearly all languages provide a set of primitive data types. E.g. NameVC++AdaLisp boolfalse, trueboolbooleanboole charthe set of charscharcharactercharacter intthe integersintintegerinteger realthe realsdoublefloatreal

4
Integer Most common primitive data type Often several sizes are supported Usually supported directly by hardware Leftmost bit representing sign Python’s long integer type not supported directly by hardware Negative representations Sign-magnitude Twos-complement Ones-complement

5
Floating Point Models real numbers, but the representations are often only approximations e.g. π or e would require infinite space on any system Worse, simple decimal numbers in base-10 can not be represented exactly in base-2 e.g. 0.1 is in binary Arithmetic operations result in loss of accuracy Rounding and truncation errors Languages for scientific use support at least two representations: float and double

6
Floating-Point Representations IEEE floating-point standard 754

7
Decimal to Binary Conversion splits the integer and fractional parts, converts each separately to binary e.g is is ( = ) = = x 2 1 First 1 is implicit in the representation Exponent uses a bias of 127, that is, 127 is added to the true exponent Thus, is: Note: zero is represented as all zeros

8
Boolean Types Perhaps the simplest of all Range of only two: 0 (for false) or 1 (for true) Introduced in ALGOL 60, and included in most general- purpose languages since Could be implemented in a single bit, but often implemented using a byte

9
Character Types Stored as numeric encodings Most common encoding: ASCII Uses 0 to 127 to code 128 different characters ISO (Latin-1) another 8-bit character code Allows 256 different characters Unicode began in 1991, a 16-bit character code In 2000, also specified a 32-bit version Java was the first widely used language to use Unicode JavaScript, Python, Perl and C# have followed

10
Character String Types Values consist of sequences of characters Design issues: Strings as character array, or primitive? Static or dynamic length? Common operations Assignment Catenation Substring reference Comparison Pattern Matching

11
Strings in various languages Historically, neither Fortran nor Algol 60 had support for strings Cobol had statically sized strings C, C++ strings are arrays of chars Inherently unsafe Ada strings are fixed size Fortran 95, Perl, Java, Python have String as a built-in type

12
String Length Options Static length Length of string is set when created, cannot be changed e.g. Python, Ruby Limited dynamic length Varying length up to a declared maximum Use end of string character e.g. C Dynamic length e.g. JavaScript, Perl, standard C++ library Advantage: maximum flexibility Disadvantage: overhead of dynamic storage allocation and deallocation

13
Creating New Types Given the fundamental types, new types can be created via type constructors. Each constructor has 3 components: The syntax used to denote that constructor; The set of elements produced by that constructor; and The operations associated with that constructor. Three constructors: Product, Function, and Kleene closure

14
Constructor 1: Product The product of two sets A and B is denoted A B. The product constructor is the basis for aggregates. A B consists of all ordered pairs (a, b): a A, b B. A B C consists of all ordered triples (a, b, c): a A, b B, c C. A B … N consists of all ordered n-tuples (a, b, …, n): a A, b B, …, n N. Example: the set bool char has 256 elements: { …, (true, 'A'), (false, 'A'), (true, 'B'), (false, 'B'), …, }. Operations associated with product are the projection operations: first, applied to an n-tuple (s 1, s 2, …, s n ) returns s 1. second, applied to an n-tuple (s 1, s 2, …, s n ) returns s 2. nth, applied to an n-tuple (s 1, s 2, …, s n ) returns s n.

15
Product Example: C++ Structs struct Student { int id; double gpa; char gender; }; Student aStudent; The C++ “dot-operator” is a projection operation: Formally, a Student consists of: Formally, a particular Student: aStudent.id = 12345; aStudent.gpa = 3.75; aStudent.gender = 'F'; is the 3-tuple: (12345, 3.75, 'F'). cout << aStudent.id // extract id << aStudent.gpa // extract gpa << aStudent.gender // extract gender << endl; int real char

16
Constructor 2: Function The set of all functions from a set A to a set B is denoted (A) B. The function constructor is the basis for subprograms. A particular function f mapping A to B is denoted f (A) B. Examples: The set (char) bool contains all functions that map char values into bool values, some C examples of which include: isupper('A') true islower('A') false isalpha('A') true isdigit('A') false isalnum('A') true isspace('A') false The set (char) char contains all functions that map char values into char values, some C examples of which include: tolower('A') 'a' toupper('a') 'A'

17
Function and Product What does this set contain?(int int) int Examples? +((2, 3)) 5 -((2,3)) -1 *((2, 3)) 6 /((2,3)) 0 Suppose we define an aggregate named IntPair: struct IntPair { int a, b; }; and then define a function named Add(): int Add(IntPair ip) { return ip.a + ip.b; }; Add() is a member of the set: (int int) int The function constructor lets us create new operations for a language. All functions that map pairs of integers into an integer.

18
Function Arity Product serves to denote an aggregate or an argument-list. What does this set contain?(int int) bool All functions that map pairs of integers into a boolean. Examples? <((2, 3)) true >((2,3)) false Definition: The number of operands an operation requires is its arity. Operations with 1 operand are unary operations, with arity-1. Operations with 2 operands are binary operations, with arity-2. Operations with 3 operand are ternary operations, with arity-3. ... ==((2, 3)) false !=((2,3)) true

19
Example: Ternary Operation The C/C++ conditional expression has the form: The C/C++ conditional expression is a ternary operation, which in this case is a member of the set: 0 ? 1 : 2 producing 1 if 0 is true, and producing 2 if 0 is false. int minimum(int first, int second) { return (first < second) ? first : second; }; Here is a simple minimum() function using it: ?:(bool int int) int

20
Operator Positioning Operators are also categorized by their position relative to their operands: Prefix, infix, and postfix notation are different conventions for the same thing; a language may choose any of them: Infix operators appear between their operands: Prefix operators appear before their operands: Postfix operators appear after their operands: * (2 + 3) * (4 - 2) * x < ybinary, infixtrue, false(< x y)binary, prefixtrue, false ++xunary, prefixx+1(incf x)unary, prefixx+1 ( )binary, prefix23 !flagunary, prefixneg. of flag (princ x str)binary, prefixx x++unary, postfixxNone C++ ExprCategoryValueLisp ExprCategoryValue binary, infix23 cout << xbinary, infixcout (not flag)unary, prefixneg. of flag

21
Constructor III: Kleene Closure Kleene Closure is the basis for representing sequences. The Kleene Closure of a set A is denoted A*. Example: The Kleene Closure of bool -- bool* -- is the infinite set: { (), (false), (true), (false, false), (false, true), (true, false), (true, true), (false, false, false), … } For a tuple t A*, the operations include: The Kleene Closure of a set is the set of all tuples that can be formed using elements of that set. null(A*) bool null((false)) false first(A*) A first((false, true)) false rest(A*) A* rest((false, true, true)) (true, true) rest((true, true, false)) (true,false) null(()) true null((true)) false first((true, false)) true

22
Kleene Closure Example The C/C++ notation: If char is the set of ASCII characters, what is char* ? Thus,int* denotes a sequence (array, list, …) of integers; int intStaticArray[32]; int * intDynamicArray = new int[n], vector intVec; list intList; real* denotes a sequence (array, list, …) of reals; and so on. The infinite set of all tuples formed from ASCII characters. (AKA the set of all character strings). is just a different syntax for: "Hello" ( 'H', 'e', 'l', 'l', 'o' )

23
Sequence Operations A subscript operation can be defined like this (pseudocode): Sequence operations can be built via null(), first(), and rest() In Lisp: first is called car rest is called cdr. char & operator[](int * a, int i) { if (i > 0) return operator[](rest(a), i-1); else return first(a); }; An output operation can be defined like this (pseudocode): void print(ostream out, int * a) { if ( !null(a) ) { out << first(a) << ' '; print(out, rest(a)); } };

24
Practise Using Constructors The logical and operation (&&): Give formal descriptions for: How many operands does it take? What types are its operands? What type of value does it produce? 2 bool, bool bool So && is a member of(bool bool) bool The C++/STL substring operation ( str.substr(i,n) ): How many operands does it take? What types are its operands? What type of value does it produce? 3 string, int, int string So substr() is a member of:(string int int) string The logical negation operation (!):

25
Practise C++ record: struct Student { int myID; string myName; bool iAmFullTime; double myGPA; }; An accessor method: struct Student { int myID; int id() const ; string myName; bool iAmFullTime; double myGPA; }; How does this affect our Student description?

26
More Practise A “completely functional” class: class Student { public: Student(); Student(int, string, bool, double); int id() const; string name() const; bool fullTime() const; double gpa() const; void read(istream &); void print(ostream &) const; private: int myID; string myName; bool iAmFullTime; double myGPA; };

27
Summary of Constructors Product constructor allows us to add record types Record is an aggregate of values of unrestricted types, with each value being accessible via a name (i.e. projection) Kleene closure constructor allows us to add sequence types A sequence is an aggregate of values of the same type e.g. Arrays (adjacent memory), Lists (possibly non-adjacent) Function constructor allows us to add operations Using available operations e.g. projection, first(), rest()

28
Ordinal Types A type in which the range of possible values can be easily associated with the set of positive integers. e.g. integer, char, boolean Two user-defined ordinal types often supported Enumerations Subrange

29
Modeling Real-World Values Suppose we want to model the seven “ROY G BIV” colors. This approach requires the human to map colors to integers. Instead: const int RED=0, ORANGE=1, YELLOW=2, GREEN=3, BLUE=4, INDIGO=5, VIOLET=6; int aColor = BLUE; enum Color { RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET } ; Color aColor = BLUE; Most imperative languages support such enumerations… Ada: type Color = ( RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET ) ; aColor : Color := BLUE; An enumeration is a type whose values are explicitly listed. One approach:

30
Enumerations: Compiler-Side An enumeration’s values must be valid identifiers: Thus, after processing ::= enum identifier { } ; and the compiler treats a declaration: enum NewType { id 0, id 1, id 2, …, id N-1 }; as being (approximately) equivalent to: const int id 0 =0, id 1 =1, id 2 =2 …, id N-1 =N-1 }; enum Color { RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET }; so far as the compiler is concerned: RED==0 && ORANGE==1 && YELLOW==2 && … && VIOLET==6

31
Enumerations: User Side Enumerations thus provide an automatic means of mapping: (identifier) int whose chief benefit is better program readability: enum ElementName { HYDROGEN, HELIUM, … }; ElementName anElement; //... switch (anElement) { case HYDROGEN: atomicNumber = 1; break; case HELIUM: atomicNumber = 2; break; … } Enumerations allow real- world ‘values’ to be represented using real- world names, instead of (arbitrary) integers.

32
Enumerations and SmallTalk OO purists replace enums with class hierarchies: This permits the creation of real-world objects: Color Red Orange … Violet Indigo Element Hydrogen Helium … E113 E112 as opposed to real-world values provided by an enumeration. For this reason, “pure” OO languages like Smalltalk don’t provide an enumeration mechanism. // Smalltalk aColor := new Blue. // Smalltalk anElement := new Helium.

33
Subrange A type whose values are a subset of an existing type If a subrange variable is declared: // Ada subtype TestScore is Integer range ; subtype CapitalLetter is Character range 'A'..'Z'; type DaysOfWeek is (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday); subtype WeekDay is DaysOfWeek range Monday..Friday; WeekDay today; today := Saturday; and assigned an invalid value: then an exception occurs that, if not caught, halts the system. This is an essential feature for life-critical systems.

34
Array Types An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.

35
Array Design Issues What types are legal for subscripts? Are subscripting expressions in element references range checked? When are subscript ranges bound? When does allocation take place? What is the maximum number of subscripts? Can array objects be initialized? Are any kind of slices allowed?

36
Array Indexing Indexing (or subscripting) is a mapping from indices to elements array_name (index_value_list) an element Index Syntax FORTRAN, PL/I, Ada use parentheses Ada explicitly uses parentheses to show uniformity between array references and function calls because both are mappings Most other languages use brackets

37
Arrays Index (Subscript) Types FORTRAN, C: integer only Pascal: any ordinal type (integer, Boolean, char, enumeration) Ada: integer or enumeration (includes Boolean and char) Java: integer types only C, C++, Perl, and Fortran do not specify range checking Java, ML, C# specify range checking

38
Subscript Binding and Array Categories Three choices to make: Type of binding to subscript ranges Time of binding to storage Location of storage Static: subscript ranges are statically bound and storage allocation is static (before run-time) Advantage: efficiency (no dynamic allocation)

39
Subscript Binding and Array Categories (continued) Fixed stack-dynamic: subscript ranges are statically bound, but the allocation is done at declaration time (during execution) Advantage: space efficiency Stack-dynamic: subscript ranges are dynamically bound and the storage allocation is also dynamic (done at run-time) Advantage: flexibility (the size of an array need not be known until the array is to be used)

40
Subscript Binding and Array Categories (continued) Fixed heap-dynamic: similar to fixed stack-dynamic: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated from heap, not stack) Heap-dynamic: binding of subscript ranges and storage allocation is dynamic and can change any number of times Advantage: flexibility (arrays can grow or shrink during program execution)

41
Examples C and C++ arrays that include static modifier are static C and C++ arrays without static modifier are fixed stack- dynamic Ada arrays can be stack-dynamic C and C++ provide fixed heap-dynamic arrays C# includes a second array class ArrayList that provides heap-dynamic Perl and JavaScript support heap-dynamic arrays

42
Array Initialization Some languages allow initialization at the time of storage allocation C, C++, Java, C# example int list [] = {4, 5, 7, 83} Character strings in C and C++ char name [] = “freddie”; Java initialization of String objects String[] names = {“Bob”, “Jake”, “Joe”};

43
More Examples Ada List : array (1..5) of Integer := (1, 3, 5, 7, 9); Bunch : array (1..5) of Integer := (1=>17, 3=>34, others =>0); Python [expression for iterate_var in array if condition] [x * x] for x in range(12) if x%3 == 0] [0, 9, 36, 81]

44
Array Operations Ada allows array assignment but also catenation (&) Fortran provides elemental operations Operate between pairs of array elements For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements)

45
Rectangular and Jagged Arrays A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements myArray[3,7] A jagged matrix has rows with varying number of elements Possible when multi-dimensioned arrays actually appear as arrays of arrays myArray[3][7]

46
Slices A slice is some substructure of an array; nothing more than a referencing mechanism Slices are only useful in languages that have array operations

47
Slice Examples Fortran 95 Integer, Dimension (10) :: Vector Integer, Dimension (3, 3) :: Mat Integer, Dimension (3, 3) :: Cube Vector (3:6) is a four element array

48
Slices Examples in Fortran 95

49
Implementation of Arrays Access function maps subscript expressions to an address in the array Access function for single-dimensioned arrays: address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size) (arrayBaseAddress - firstIndex ElementSize) + k ElementSize At Issue: There is an efficiency-vs-convenience tradeoff: Accesses to 0-relative arrays require two fewer operations: (arrayBaseAddress - 0 ElementSize) + i ElementSize = arrayBaseAddress + i ElementSize Programmer-specified index values can be pretty convenient: type LetterCounter is array(CapitalLetter) of integer; type DailySales is array(WeekDay) of real;

50
Accessing Multi-dimensioned Arrays Two common ways: Row major order (by rows) – used in most languages column major order (by columns) – used in Fortran Efficiency issue: sequential memory accesses will be faster For each dimension of an array, one add and one multiply instruction are required for the access function.

51
Associative Arrays An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys User defined keys must be stored Design issues: What is the form of references to elements

52
Associative Arrays in Perl Names begin with %; literals are delimited by parentheses %hi_temps = ("Mon" => 77, "Tue" => 79, “Wed” => 65, …); Subscripting is done using braces and keys $hi_temps{"Wed"} = 83; Elements can be removed with delete delete $hi_temps{"Tue"}; Ideal when the data to be stored is paired, and not every element must be processed.

53
Type Systems A type system is a set of rules by which a language associates types with expressions. E.g.: Early Fortran version had only integers and reals. Declarations not required: implicit typing of identifier Identifiers beginning with I-N are integers; all others are reals. Literals with decimal points are real; others are integers. Type System Rule: If E1 and E2 are expressions of the same type T, then E1+E2, E1-E2, E1*E2, and E1/E2 produce a result of type T. I+N produces a value of type integer; X+Y produces a value of type real. Expressions like X+I (e.g., 0.5+1)or N-Y generate type errors. The system generates a type-error when its rules do not permit a type to be associated with an expression.

54
Type System Formalism Ada defines: +(int int) int and+(real real) real but neither+(real int) real nor +(int real) real so both and are valid expressions; but neither nor are valid expressions. Arithmetic expressions mixing reals and ints cause type errors. If f is a function from (S) T, and s S, then f(s) T. Ada’s other arithmetic operators behave the same way. Why would Ada’s designers choose such a type system? Ada is designed for building life-critical systems... Ada’s type system is perhaps the most strict of any HLL. Ada compilers catch errors that slip by in other languages.

55
Coercion Ada is unusual in rejecting mixed-type arithmetic expressions; its goal is to prevent the unwanted loss of information. To prevent information loss, such languages take an expression: “expand” the “smaller” operand: and then perform the “larger” operation:+(real real) The automatic conversion of an operands type to prevent rejection by the type system is called a type coercion. Some languages describe this with the term promotion; others describe it as widening. Most HLLs permit arithmetic types to be freely intermixed.

56
Overloading Formally: For any function f(D) R : An overloaded function is defined for more than one domain. Note: operators like +, -, *,... are context-sensitive. In a + b: The set of all possible arguments (D) is the function’s domain; The set of all possible results (R) is the function’s range. +, -, *, / are overloaded in most HLLs To process such operations, the compiler must check the context (operand types) and find a function whose domain matches. A type error occurs when no function definition has that domain. + means “perform integer addition” if a and b are integers; + means “perform real addition” if a and b are reals. Overloaded symbols have different meanings in different contexts.

57
Type Checking A type system enforces its rules by type checking: Type checking is accomplished at two levels: 1. Static checking: check for type-errors at compile -time. 2. Dynamic checking: check for type-errors at run-time. Ada performs both static and dynamic checking, but the language is designed to maximize the number of errors that can be detected statically (i.e, by the compiler). Analyzing the code, looking for type errors Only permitting programs without type errors to execute. A program with no type errors is described as type safe.

58
Static Checking Examples In C++ expressions of the form: x % y In C++ expressions of the form: sqrt(x) the symbol table contains both the type T of x and the domain-set D for which sqrt() is defined, allowing the compiler to reject the expression if T D. Original C did not require that function prototypes contain parameter types, making it impossible for the compiler to type-check function calls (ANSI-C corrected this). the compiler can look up the types of x and y (in a data structure called the symbol table) and reject the expression if both are not of type int.

59
Dynamic Checking Examples Dynamic checking is checking for errors undetectable at run-time by inserting checks before the code for an expression. x / y // without dynamic checks mov x, R0 div R0, y -- with dynamic checking mov x, R0 mov y, R1 cmp R1, #0 be DivideByZero div R0, R1 A[i] // without dynamic checks mov A, R0 add R0, i -- with dynamic checking mov A, R0 mov i, R1 cmp R1, firstIndex blt IndexTooLow cmp R1, lastIndex bgt IndexTooHigh add R0, R1 Expression: Dynamic checking is time- and space-expensive...

60
Type Strength A language is strongly-typed if it has a strict type system. Language type systems have tended to get stronger as they evolve through different versions. AdaC++SmalltalkLisp weaker stronger JavaC (pre-ANSI) Fortran-I, -IIFortran-IVFortran-77Fortran-90 The importance of type-strength has increased as the systems being built have increased in size, complexity, and importance. A language is weakly-typed if it has a loose type system. From this perspective, languages lie somewhere on a continuum, based on the the strength of their type system:

61
Type Compatibility What determines if two types T1 and T2 are compatible (e.g., can T1 arguments be passed to T2 parameters)? typedef int IntArray[32]; IntArray x, y; // Are x, y compatible? int x[32]; void f(int y[32]); // Are x, y compatible? struct Student {struct Employee { int id; int id; string name;string name; }; Student stu;Employee emp; // Are stu, emp compatible? struct Student { struct Employee { int studentID; int empID; string studentName; string empName; }; Student stu; Employee emp; // Are stu, emp compatible?

62
Equivalence Languages that use structural equivalence view two types as equivalent if they have the same memory structure. Languages that use name equivalence view two types as equivalent if they are declared using the same name. Compatibility depends on whether a language views two types as equivalent. There are two broad categories of equivalence: To illustrate, suppose that we have these declarations: struct Student { struct Employee { int studentID; int empID; string studentName; string empName; }; Student stu; Employee emp; // Are stu, emp equivalent?

63
Structural Equivalence (SE) SE1: A type name is structurally equivalent to itself. Structural equivalence relies on three “rules”: Student stu1; Student stu2; Since their types have the same name, stu1 and stu2 are structurally equivalent. SE2: Two types formed by applying the same constructor to SE types are structurally equivalent. Student stu; Employee emp; Since both are members of (int string), stu and emp are structurally equivalent. SE3: If one type is an alias of another, the two types are structurally equivalent. typedef Student Transfer; Student stu; Transfer trans; Since Transfer is an alias of Student, stu and trans are structurally equivalent.

64
Name Equivalence (NE) Pure NE: To be equivalent, types must have the same name. There are different varieties of name equivalence: If we declare: procedure print(IntArray anArray); then Ada’s type system will only accept a1 and a3 as arguments. a2’s type has a name, but it is a different name from the others. a4 and a5’s types have no name: anonymous types in Ada. -- Ada uses pure name equivalence type IntArray is array(1..32) of Integer; type IntList is array(1..32) of Integer; a1: IntArray; a2: IntList; a3: IntArray; a4: array(1..32) of Integer; a5: array(1..32) of Integer; Since a1 and a3 are declared with the same name, they are equivalent.

65
Name Equivalence (ii) Transitive NE: A type name is equivalent to itself (pure NE), plus it can be declared equivalent to other type names. -- C++ uses transitive name equivalence struct Student { struct Employee { int idNumber; int idNumber; string name; string name;}; typedef Student Transfer; Student stu;Employee emp; Transfer trans; stu and trans are compatible; emp is not compatible to either. If we declare: void print(Student aStudent); then the type system will only accept stu or trans as arguments, but will reject emp as an argument.

66
Which is better? Consider type-checking on record arguments/parameters: Type-checking is much simpler under name equivalence, as the type-checker just has to do a single comparison (T1 == T2). Under structural equivalence, the type-checker must do an exhaustive field-by-field comparison (e.g., nested records??). Name equivalence encourages abstraction: NE encourages detail-hiding (ADT) by rejecting anonymous types: SE discourages abstraction by accepting anonymous types: SE may permit programs to be written faster (abstraction takes time). Such programs may be harder to maintain; may be type-unsafe. procedure Put(seq: Sequence); procedure Put(seq: array(1..32) of Integer); Any Sequence accepted. Nothing accepted.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google