Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Kenneth C. Louden, 20031 Chapter 6 - Data Types Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden.

Similar presentations

Presentation on theme: "© Kenneth C. Louden, 20031 Chapter 6 - Data Types Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden."— Presentation transcript:

1 © Kenneth C. Louden, 20031 Chapter 6 - Data Types Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden

2 Chapter 6K. Louden, Programming Languages2 Introduction The data type of an identifier is perhaps its most important attribute. When computed statically, it can improve the –efficiency –security –correctness –readability of a program. When computed dynamically and attached to values, rather than identifiers (a la Scheme) the data type can no longer provide for most of these, but it still can provide: –security

3 Chapter 6K. Louden, Programming Languages3 What is a data type? Some name like int, double, etc. that you have to write before a variable name. Just joking. But what do those names ( int, double, etc.) really mean? They indicate which values can be stored in the location referred to by the name. Thus, the name int in Java stands for the set of integers, or more precisely for the finite subset of integers: { x | x an integer & -2147483648  x < 2147483648} So we could say: a data type is a set of values. Unfortunately, this ignores some basic properties.

4 Chapter 6K. Louden, Programming Languages4 What is a data type (2) Typically, we are interested not only in the actual values themselves, but what we can do with them, i.e. what operations we can apply to them. For example, +, -, *, /, % are operations on int s. The data type should provide exact information on what these operations are and how they act. Thus, a second, more complete definition is: a data type is a set of values, together with a set of operations on those values having certain properties. Data types are rarely specified with this kind of completeness—usually some assumptions are made about what operations are available and/or what they do.

5 Chapter 6K. Louden, Programming Languages5 How to define a data type? List all of its values: enum RGBColor {Red, Green, Blue}; // C++ datatype RGBColor = Red|Green|Blue; (*ML*) (what are the operations here?) Imitate mathematics (built-in types): double, int. Apply a type constructor to already existing types: type IntReal = int * real; (* ML built-in Cartesian product type constructor *) class IntReal {int x; double y;} // Java Type constructors can also imitate mathematical set operations: Cartesian product, union, sequence, function: union IntOrReal { int x; double y;};// C++

6 Chapter 6K. Louden, Programming Languages6 Defining a data type (2) Important to remember that type constructors and built-in types only imitate mathematics: there are always substantial differences (e.g. finiteness, and the kinds of operations that are available). Also important to separate type constructors from value constructors: type constructors are functions from types to types; value constructors are functions from values to values. In Java, con- structors are value constructors, while the class definition mechanism itself is a type constructor. Data types can be studied as math objects—they are algebras. And algebraic ideas can be translated into language syntax as abstract data types (Chapter 9), sometimes called algebraic types.

7 Chapter 6K. Louden, Programming Languages7 Defining a data type (3) Many of the mechanisms for defining types fail to specify the precise operations available. Note that the Java class construct does (most of the time) specify the operations. Even when the operations are specified, the properties of these operations often are not. Sometimes the properties are put into comments (preconditions and postconditions). Types may or may not get explicit names: –In Java, inner classes may be anonymous. Also, there is no way to give a specific array type a name except by wrapping it in a class definition. –In ML, the built-in type constructors can be used without explicit naming.

8 Chapter 6K. Louden, Programming Languages8 Using data types in translation Each type carries information about the size and structure of its data; this can be used to make code efficient. Types can be used to check whether the operations of the program make sense: type checking. During type checking, two basic issues arise: –How to compare two types: type equivalence. –How to construct a type that is not given explicitly: type inference. Many different algorithms are available for both of these: collectively referred to as a type system. A language is strongly typed if its type system guarantees statically (as far as possible) that no data-corrupting errors can occur during execution.

9 Chapter 6K. Louden, Programming Languages9 Using data types (2) Additional property of a strongly typed language: all errors that cannot be checked statically (such as subscript out of bounds), generate runtime errors. Java, ML are strongly typed, Scheme, C are not. Unsafe programs: those with data errors. Legal programs: those accepted by the type system. In a strongly typed language all legal programs are safe (could take this as a definition). Unfortunately, there may be many safe illegal programs. A type system tries to maximize both flexibility and security, where flexibility means: reduce the number of safe illegal programs & reduce the amount of type information the programmer must supply.

10 Chapter 6K. Louden, Programming Languages10 Overview of Java types Can be confusing because Java has two type systems, one static & one dynamic: –The static system is based on the declared type. –The dynamic system is based on actual class membership of objects during execution. Built-in types in Java: boolean, byte, char, short, int, long, float, double. Type constructors: –Class –Interface –Array

11 Chapter 6K. Louden, Programming Languages11 Simple types No internal structure; sometimes called scalar types. Most predefined types are simple, but not all ( java.lang types are classes, so not simple). There can also be user-defined simple types (e.g. enum, subrange, but none in Java). Many simple types are ordinal types (having an order with a first and last element): int, char are ordinal in Java, but boolean is not. Values of simple types are usually conditioned by hardware (Java tries for independence). Operations are implicit and may or may not conform to mathematical rules.

12 Chapter 6K. Louden, Programming Languages12 Type constructors as set operations Cartesian products Unions Subsets Arrays/functions Sequences/lists Recursive types No intersection! (In general, types should not overlap)

13 Chapter 6K. Louden, Programming Languages13 Cartesian Products Finite combinations of previously defined types. In mathematics, the components are selected by position. In most languages, the components are selected by name. ML has a very pure form of Cartesian product: –("a",2): string * int –#1 ("a",2) returns "a". Java classes are related but certainly not identical to Cartesian products: components are selected by name, and there are methods. C structs are closer than Java classes.

14 Chapter 6K. Louden, Programming Languages14Unions Values belong to one of a finite set of types. In mathematics, sets can overlap, so a value could be in more than one set. As noted previously, we don't really want values to be in different sets (non-empty intersection). Thus, unions are usually disjoint: a value can only be in one set at a time. True for C/C++; the following code prints garbage: union IntOrReal { int x; double y;} u; u.x = 1; cout << u.y << endl; Note that in this code we force the compiler to think that a value is in the wrong set. This is dangerous and part of why C++ is not strongly typed.

15 Chapter 6K. Louden, Programming Languages15 Unions (2) Unions can also be discriminated, when the values are tagged with their set membership. If enforced, this makes unions type safe and would outlaw the previous C++ code. Thus, C/C++ unions are not discriminated. ML has discriminated unions: datatype IntOrReal = IsInt of int | IsReal of real; The constructor names IsInt, IsReal are the tags/discriminants: > IsInt 2; val it = IsInt 2 : IntOrReal In C++ we can apply a tag manually: struct IntOrReal { bool isInt; union {int x; double y;};};

16 Chapter 6K. Louden, Programming Languages16 Unions (3) Java doesn't have explicit unions. But does it have unions at all? Yes! Consider: public abstract class A {…}; public class B extends A {…}; public class C extends A {…}; Now A represents the union of B and C ! Are these disjoint? Yes! Are they discriminated? Yes again! (Consider the instanceof operator.)

17 Chapter 6K. Louden, Programming Languages17Subsets A subset type may be an explicit subrange indication for values, either as a runtime check, or as a separate type, as in the Ada: -- runtime check that an int is between 0 and 9: subtype Digit1 is integer range 0..9; -- new type holding an integer between 0 and 9: type Digit2 is range 0..9; It could also be a subtype: a type that implements all the operations of another type. Note that subsets are not always subtypes and subtypes are not always mathematical subsets. ( Digit1 above is not really even a static type, despite the keyword.) C enum s are subranges of int (no runtime check). Public inheritance is a form of subtyping.

18 Chapter 6K. Louden, Programming Languages18 Arrays and functions An array in C or Java is like a function from a finite subrange 0...n-1 of the integers. In Java you can't give an explicit name to an array type, but you can in C/C++: typedef int[10] IntArray; More general function types are available in many languages (but not in Java): type IntFunc = int -> int; (* ML *) val inc:IntFunc = fn x => x+1; (* ML *) // a function constant in C/C++: int incfn(int x) { return x+1; } typedef int (*IntFunc)(int); // why *? // a function var, initialized to incfn: IntFunc inc = incfn;

19 Chapter 6K. Louden, Programming Languages19 Vectors, lists and sequences Some languages also have vectors, which are like arrays, but often with more flexibility, especially dynamic resizability. Lists are similar to vectors, except they can only be accessed by counting down from the first element. Thus, the list type is really a recursive type: datatype 'a List (* ML *) = EmptyList | Cons of 'a * 'a List; All functional languages that I am aware of have built-in lists. Sequences are also like arrays, except that they are typically (potentially) infinite. In this guise they are called streams. Many functional languages have built-in streams (Scheme but not ML).

20 Chapter 6K. Louden, Programming Languages20 Recursive types A recursive type is a set that contains itself as an element, right? Wrong! Sets cannot in general contain themselves as elements (Russell's paradox: the set of all sets that do not contain themselves as elements). A recursive type should better be named a recursively-defined type. Indeed, in math there are many sets that are recursively defined: the set of arithmetic expressions (recursive grammar), the set of integers—indeed, any set that is defined inductively (e.g., if x is an integer, then so is x+1).

21 Chapter 6K. Louden, Programming Languages21 Recursive types (2) The problem with a recursive type is that it is generally infinite, and values in the type can be arbitrarily large. As with virtually all situations with elements of unpredictable size (lists, arrays, calls), languages use indirection, or pointers, to deal with them. Many languages require the indirection to be explicit in a recursive type definition: struct IntList { int head; IntList* tail; }; // C++ pointer Java of course has implicit object indirection: class IntList { int head; IntList tail; } // Java

22 Chapter 6K. Louden, Programming Languages22 Recursive types (3) Every recursively defined type must (like induction) have a base case and at least one recursive (or inductive) case. In many languages, particularly those with explicit indirection, the base case is implicitly the null pointer: IntList* list = 0; // C/C++ IntList list = null; // Java In other languages, the base case must be explicitly represented in the recursive definition: datatype IntList (* ML *) = Null | Struct of int * IntList; The ML definition is much closer to the math definition of IntList as a set: IntList = { Null }  Int  IntList

23 Chapter 6K. Louden, Programming Languages23 Mathematics of Recursive Sets The actual values in a recursively defined set must be computed from a recursive equation such as IntList = {Null}  Int  IntList. This equation says that IntList is a fixed point (or fixpoint) of the function f(X) = {Null}  Int  X. A least fixpoint solution is found as the union of partial solutions: IntList = {Null}  Int  {Null}  Int  Int  {Null}  … Least fixpoint solutions occur also for recursive functions (we did not study this).

24 Chapter 6K. Louden, Programming Languages24 Type structure of Java

25 Chapter 6K. Louden, Programming Languages25 Type structure of C

26 Chapter 6K. Louden, Programming Languages26 Type equivalence Languages differ substantially over when their type checking algorithms consider two types to be the same. Historically, languages like Fortran and Algol used structural equivalence: two types are the same if they have the same structure. Thus, using Java syntax, if we define class A { int x; double y;} and class B { int x; double y;} then the sets that A and B represent are the same: A a = new B() is ok. Obviously, this is not Java's rule, nor is it C's. Structural equivalence is also difficult to verify for recursive types.

27 Chapter 6K. Louden, Programming Languages27 Type equivalence (2) Structural equivalence is reasonable for some built-in type constructors, especially non- recursive ones. For example, C uses structural equivalence for pointers, arrays, and functions. Even Java uses structural equivalence for arrays. ML also uses structural equivalence for types defined in a type declaration: type dollars = real; type cents = int; fun pennies (d:dollars):cents = round (d * 100.0); pennies (2.0:real); (* ok *) Structural equivalence leaves unspecified whether the order in a structure matters, or the field names, or both (the usual choice).

28 Chapter 6K. Louden, Programming Languages28 Type equivalence (3) A strict alternative to structural equivalence is name equivalence: two types are the same if and only if they have the same name. Easy to implement, but depends on ability to name. Without names, structural equivalence must be used. Java uses name equivalence for classes and interfaces, structural equivalence for arrays. ML uses name equivalence for types declared in a datatype declaration (which may be recursive): datatype Dollars = Dollars of real; (* now Dollars 2.0 is not the same as 2.0 *) If naming can be mixed with construction, intermediate algorithms can be used.

29 Chapter 6K. Louden, Programming Languages29 Type equivalence (4) Example in C: applying struct constructs a new type, applying typedef doesn't: struct A { char x; int y; }; struct B { char x; int y; }; typedef struct A C; typedef C* P; typedef struct A * R; typedef int S[10]; typedef int T[5]; typedef int Age; typedef int (*F)(int); typedef Age (*G)(Age); Types struct A and C are equivalent, but they are not equivalent to struct B ; types P and R are equivalent; types S and T are equivalent; types int and Age are equivalent, as are function types F and G.

30 Chapter 6K. Louden, Programming Languages30 Type checking Determining whether code uses legitimate operations according to its types. Involves both type inference and equivalence. Also involves applying often complex rules for relaxing exact type matching under certain circumstances, usually called type compatibility rules. Assignment compatibility refers to the compatibility rules governing assignments. Simple example: x = y / 2 + 3.5. Clearly x and y must be numeric. Can x be an int ? Can x be a float ? Can y be a long ? What are the (implicit) types of the literals 2 and 3.5 ?

31 Chapter 6K. Louden, Programming Languages31 Type checking (2) Type checking using compatibility rules involves type conversion from one type to another, compatible type (see later slides). Type checking of assignments involves verifying that the left hand side has a computable address, called an l-value, and that the value of the right hand side (called an r-value) is capable of being stored at that address, or capable of conversion to a value that can be. OO languages have further assignment compatibility rules: assignment of a subclass object to a superclass variable is allowed; assignment of a superclass object to a subclass variable without a cast is not.

32 Chapter 6K. Louden, Programming Languages32 Type checking (3) Back to previous example: x = y / 2 + 3.5 ; Suppose the type of x is float and the type of y is long. Does this statement type check? First determine the type of y / 2 : 2 is implicitly an int. Then, since y is a long, 2 is converted automatically ("promoted") to a long, and the type of y / 2 is long. Now determine the type of the sum: the left operand is a long and the right operand is a double (implicitly). By the rules of Java, a long can be promoted to a double, so the result is a double. However, a double cannot be assigned to a float, so a type error occurs at that point.

33 Chapter 6K. Louden, Programming Languages33 Type conversion Type conversion can be classified two ways: –Does the conversion require written code? –Does the internal representation change, or just the type? The 1st classification has two categories: –automatic or implicit conversion (no code) –manual or explicit conversion (code must be written) The 2nd classification also has two categories: –The value representation in memory changes –The value representation in memory doesn't change, just the type All four combinations of these can occur.

34 Chapter 6K. Louden, Programming Languages34 Type conversion (2) Implicit conversions in Java include numerical promotion and upcasting. In general, this may involve either representation change (e.g. int to double ), or simply changing the perceived type without representation changes (e.g. upcasting). Explicit conversions in Java also may or may not involve representation changes: –Casts typically do not involve bit changes (downcasts, but numeric casts do change the representation) –Applying conversion functions are representation changes, e.g. Math.round Some languages outlaw automatic conversions and use conversion functions only (ML, Ada).

35 Chapter 6K. Louden, Programming Languages35 Polymorphic type checking Hindley-Milner style uses type variables ( 'a, 'b, etc. in ML) and a process called unification (a version of pattern matching): –Any type variable unifies with any type expression (and is instantiated to—is identified with—that expression). –Any two type constants (i.e., literals like int or double ) unify only if they are the same type. –Any two type constructions (i.e., applications of type constructors) unify if and only if they are applications of the same type constructor and all of their component types also (recursively) unify. The type of an identifier is the most general type that can result from the application of type unification to its definition (sometimes called a most general unifier, or mgu).

36 Chapter 6K. Louden, Programming Languages36 Polymorphic type checking (2) Every use of an identifier that is polymorphically typed must involve a type that is a specialization of its most general type: a more restricted form that is compatible with the general type. For example, int -> int is a specialization of 'a -> 'a, and int -> real is a specialization of 'a -> 'b, but int -> real is not a specialization of 'a -> 'a. In an H-M type system, there is a further restriction of the use of polymorphic types, in that each use of a polymorphic argument in a function call must specialize to the same type (so-called let-bound polymorphism). Example: f (x, y, g) = (g x, g y) has ML type 'a * 'a * ('a -> 'b) -> 'b * 'b

37 Chapter 6K. Louden, Programming Languages37 Polymorphic type checking (3) Further problem in H-M type checking: what if the same type variable occurs in two different places in two type expressions that are to be unified? An infinite regress can occur! Consider the ML definition: fun f g = g f; The ML type checker assigns f the type 'a->'b and g the type 'a. Then the rhs says type 'a is actually a function type 'c->b', and since f is a parameter to g, 'c = 'a->'b. Thus 'a = 'c->'b = ('a->'b)->'b. But what is 'a ? Trying to solve this equation for 'a leads to an infinite process. To prevent this, unification must implement the occur check to make sure it does not try to unify a type variable 'a with a type expression that contains 'a.

38 Chapter 6K. Louden, Programming Languages38 Extended H-M example fun max (x,y,gt) = if gt(x,y) then x else y; Possible syntax tree with annotated type variables (using greek letters and C-style notation for functions):

39 Chapter 6K. Louden, Programming Languages39 Extended H-M example (2) Add the type information gathered by the type checker at the call node:

40 Chapter 6K. Louden, Programming Languages40 Extended H-M example (3) Add the type information gathered by the type checker at the if node:

41 Chapter 6K. Louden, Programming Languages41 Extended H-M example (4) Finally, at the level of the root, unify the return type of the max function (  ) with the type of the body ( , the type of the if node), and the result is the most general type of the max function:  (*)( , , bool (*)( ,  )) Or, in ML notation: 'a * 'a * ('a * 'a -> bool) -> 'a

Download ppt "© Kenneth C. Louden, 20031 Chapter 6 - Data Types Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden."

Similar presentations

Ads by Google