Chapter 6 Exploring Types and Equivalence

Chapter 6 Exploring Types and Equivalence
Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Box U-255 Storrs, CT (860) 486–4818 (Office) (860) (CSE Office)

Review a Number of Other PPTs
Org of Programming Languages-Cheng (Fall 2004) Sebesta chapter 6 condensed –easier to follow Type Systems and Structures- Bermúdez Steve’s Types and Type Checking Type Systems – Doupé

CSE 452: Programming Languages
CSE 452: Programming Languages Data Types

High-level Programming Languages
Where are we? Machine Language High-level Programming Languages Assembly Language Functional Logic Object Oriented Imperative Concepts specification (syntax, semantics) variables (binding, scoping, types, …) statements (control, selection, assignment,…) Implementation compilation (lexical & syntax analysis) You are here

Terminology Strong typing:language prevents you from applying an operation to data on which it is not appropriate. Static typing: compiler can do all the checking at compile time. Examples: Common Lisp is strongly typed, but not statically typed. Ada is statically typed. Pascal is almost statically typed. Java is strongly typed, with a non-trivial mix of things that can be checked statically and things that have to be checked dynamically.

Type System Has rules for : Type equivalence
(when are the types of two values the same?) Type compatibility (when can a value of type A be used in a context that expects type B?) Type inference (what is the type of an expression, given the types of the operands?)

Type compatability/equivalence
Compatability: tells you what you can do More useful concept of the two Erroneously used interchangeably Equivalence: What are important differences between type declarations? Format does not matter: struct { int a, b; } Same as struct { struct{ int a, b; AND int a; } int b;}

Equivalence: two approaches
Two types: name and structural equivalence Name Equivalence: based on declarations More commonly used in current practice Strict name equivalence: Types are equivalent if refer to same declaration Loose name equivalence: Types are equivalent if they refer to same outermost constructor (refer to same declaration after factoring out any type aliases) Structural Equivalence: based on meaning/semantics behind the declarations. Simple comparison of type descritpions Substitute out all names; Expand all the way to built-in types

Data Types A data type defines a collection of data objects, and
a set of predefined operations on the objects type: integer operations: +, -, *, /, %, ^ Evolution of Data Types Early days: all programming problems had to be modeled using only a few data types FORTRAN I (1957) provides INTEGER, REAL, arrays Current practice: Users can define abstract data types (representation + operations)

Character Types Characters are stored in computers as numeric codings
Traditionally use 8-bit code ASCII, which uses 0 to 127 to code 128 different characters ISO also use 8-bit character code, but allows 256 different characters Used by Ada 16-bit character set named Unicode Includes Cyrillic alphabet used in Serbia, and Thai digits First 128 characters are identical to ASCII used by Java and C#

Character String Types
Values consist of sequences of characters Design issues: Is it a primitive type or just a special kind of character array? Is the length of objects static or dynamic? Operations: Assignment Comparison (=, >, etc.) Catenation Substring reference Pattern matching Examples: Pascal Not primitive; assignment and comparison only Fortran 90 Somewhat primitive; operations include assignment, comparison, catenation, substring reference, and pattern matching

Character Strings Examples Ada
N := N1 & N2 (catenation) N(2..4) (substring reference) C and C++ Not primitive; use char arrays and a library of functions that provide operations SNOBOL4 (a string manipulation language) Primitive; many operations, including elaborate pattern matching Perl and JavaScript Patterns are defined in terms of regular expressions; a very powerful facility Java String class (not arrays of char); Objects are immutable StringBuffer is a class for changeable string objects

Character Strings String Length Static – FORTRAN 77, Ada, COBOL
e.g. (FORTRAN 90) CHARACTER (LEN = 15) NAME; Limited Dynamic Length – C and C++ actual length is indicated by a null character Dynamic – SNOBOL4, Perl, JavaScript Evaluation (of character string types) Aid to writability As a primitive type with static length, they are inexpensive to provide Dynamic length is nice, but is it worth the expense? Implementation

Ordinal Data Types Range of possible values can be easily associated with the set of positive integers Enumeration types user enumerates all the possible values, which are symbolic constants enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun}; Design Issue: Should a symbolic constant be allowed to be in more than one type definition? Type checking Are enumerated types coerced to integer? Are any other types coerced to an enumerated type?

Enumeration Data Types
Examples Pascal cannot reuse constants; can be used for array subscripts, for variables, case selectors; can be compared Ada constants can be reused (overloaded literals); disambiguate with context or type_name’(one of them) (e.g, Integer’Last) C and C++ enumeration values are coerced into integers when put in integer context Java does not include an enumeration type, but provides the Enumeration interface can implement them as classes class colors { public final int red = 0; public final int blue = 1; }

Subrange Data Types An ordered contiguous subsequence of an ordinal type e.g., is a subrange of integer type Design Issue: How can they be used? Examples: Pascal subrange types behave as their parent types; can be used as for variables and array indices type pos = 0 .. MAXINT; Ada Subtypes are not new types, just constrained existing types (so they are compatible); can be used as in Pascal, plus case constants subtype POS_TYPE is INTEGER range 0 ..INTEGER'LAST; Evaluation Aid to readability - restricted ranges add error detection

Arrays Indexing is a mapping from indices to elements
map(array_name, index_value_list)  an element Index Syntax FORTRAN, PL/I, Ada use parentheses: A(3) most other languages use brackets: A[3] Subscript Types: FORTRAN, C - integer only Pascal - any ordinal type (integer, boolean, char, enum) Ada - integer or enum (includes boolean and char) Java - integer types only

Arrays Five Categories of Arrays (based on subscript binding and binding to storage) Static Fixed stack dynamic Stack dynamic Fixed Heap dynamic Heap dynamic

Arrays Static range of subscripts and storage bindings are static
e.g. FORTRAN 77, some arrays in Ada Arrays declared in C and C++ functions that include the static modifier are static Advantage: execution efficiency (no allocation or deallocation) Fixed stack dynamic range of subscripts is statically bound, but storage is bound at elaboration time Elaboration time: when execution reaches the code to which the declaration is attached Most Java locals, and C locals that are not static Advantage: space efficiency

Arrays Stack-dynamic range and storage are dynamic, but fixed from then on for the variable’s lifetime e.g. Ada declare blocks declare STUFF : array (1..N) of FLOAT; begin ... end; Advantage: flexibility - size need not be known until array is about to be used

Arrays Fixed Heap dynamic
Binding of subscript ranges and storage are dynamic, but are both fixed after storage is allocated Binding done when user program requests them, rather than at elaboration time and storage is allocated on the heap, rather than the stack In Java, all arrays are objects (heap-dynamic) C# also provides fixed heap-dynamic arrays

Arrays Heap-dynamic subscript range and storage bindings are dynamic and not fixed e.g. (FORTRAN 90) INTEGER, ALLOCATABLE, ARRAY (:,:) :: MAT (Declares MAT to be a dynamic 2-dim array) ALLOCATE (MAT (10, NUMBER_OF_COLS)) (Allocates MAT to have 10 rows and NUMBER_OF_COLS columns) DEALLOCATE MAT (Deallocates MAT’s storage) Perl and JavaScript support heap-dynamic arrays arrays grow whenever assignments are made to elements beyond the last current element Arrays are shrunk by assigning them to empty array Perl: @myArray = ( );

Arrays Number of subscripts (dimensions) FORTRAN I allowed up to three
FORTRAN 77 allows up to seven Others - no limit Array Initialization Usually just a list of values that are put in the array in the order in which the array elements are stored in memory Examples: FORTRAN - uses the DATA statement Integer List(3) Data List /0, 5, 5/ C and C++ - put the values in braces; let compiler count them int stuff [] = {2, 4, 6, 8}; Ada - positions for the values can be specified SCORE : array (1..14, 1..2) := (1 => (24, 10), 2 => (10, 7), 3 =>(12, 30), others => (0, 0)); Pascal does not allow array initialization

Arrays: Operations Ada
Assignment; RHS can be an aggregate constant or an array name Catenation between single-dimensioned arrays FORTRAN 95 Includes a number of array operations called elementals because they are operations between pairs of array elements E.g., add (+) operator between two arrays results in an array of the sums of element pairs of the two arrays Slices A slice is some substructure of an array FORTRAN 90 INTEGER MAT (1 : 4, 1 : 4) MAT(1 : 4, 1) - the first column MAT(2, 1 : 4) - the second row Ada - single-dimensioned arrays only LIST(4..10)

Arrays Implementation of Arrays
Access function maps subscript expressions to an address in the array Single-dimensioned array address(list[k]) = address(list[lower_bound]) + (k-1)*element_size = (address[lower_bound] – element_size) (k * element_size) Multi-dimensional arrays Row major order: 3, 4, 7, 6, 2, 5, 1, 3, 8 Column major order 3, 6, 1, 4, 2, 3, 7, 5, 8 4 7 2 5

Associative Arrays An unordered collection of data elements that are indexed by an equal number of values called keys also known as hashes Design Issues: What is the form of references to elements? Is the size static or dynamic?

Associative Arrays Structure and Operations in Perl Names begin with %
Literals are delimited by parentheses %hi_temps = ("Monday" => 77, "Tuesday" => 79,…); Subscripting is done using braces and keys e.g., $hi_temps{"Wednesday"} = 83; Elements can be removed with delete e.g., delete $hi_temps{"Tuesday"};

Records A (possibly heterogeneous) aggregate of data elements in which the individual elements are identified by names Design Issues: What is the form of references? What unit operations are defined?

Records Record Definition Syntax
COBOL uses level numbers to show nested records; others use recursive definitions COBOL 01 EMPLOYEE-RECORD. EMPLOYEE-NAME. 05 FIRST PICTURE IS X(20). 05 MIDDLE PICTURE IS X(10). 05 LAST PICTURE IS X(20). HOURLY-RATE PICTURE IS 99V99. Level numbers (01,02,05) indicate their relative values in the hierarchical structure of the record PICTURE clause show the formats of the field storage locations X(20): 20 alphanumeric characters 99V99: four decimal digits with decimal point in the middle

Records Ada: Type Employee_Name_Type is record First: String (1..20);
Middle: String (1..10); Last: String (1..20); end record; type Employee_Record_Type is record Employee_Name: Employee_Name_Type; Hourly_Rate: Float; Employee_Record: Employee_Record_Type;

Records References to Record Fields COBOL field references
field_name OF record_name_1 OF … OF record_name_n e.g. MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE_RECORD Fully qualified references must include all intermediate record names Elliptical references allow leaving out record names as long as the reference is unambiguous - e.g., the following are equivalent: FIRST, FIRST OF EMPLOYEE-NAME, FIRST OF EMPLOYEE-RECORD

Records Operations Assignment Initialization Comparison
Pascal, Ada, and C allow it if the types are identical In Ada, the RHS can be an aggregate constant Initialization Allowed in Ada, using an aggregate constant Comparison In Ada, = and /=; one operand can be an aggregate constant MOVE CORRESPONDING In COBOL - it moves all fields in the source record to fields with the same names in the destination record

Copyright © 2015 Pearson. All rights reserved.
Tuple Types A tuple is a data type that is similar to a record, except that the elements are not named Used in Python, ML, and F# to allow functions to return multiple values Python Closely related to its lists, but immutable Create with a tuple literal myTuple = (3, 5.8, ′apple′) Referenced with subscripts (begin at 1) Catenation with + and deleted with del Copyright © 2015 Pearson. All rights reserved.

Tuple Types (continued)
ML val myTuple = (3, 5.8, ′apple′); - Access as follows: #1(myTuple) is the first element - A new tuple type can be defined type intReal = int * real; F# let tup = (3, 5, 7) let a, b, c = tup This assigns a tuple to a tuple pattern (a, b, c) Copyright © 2015 Pearson. All rights reserved.

List Types Lists in Lisp and Scheme are delimited by parentheses and use no commas (A B C D) and (A (B C) D) Data and code have the same form As data, (A B C) is literally what it is As code, (A B C) is the function A applied to the parameters B and C The interpreter needs to know which a list is, so if it is data, we quote it with an apostrophe ′(A B C) is data Copyright © 2015 Pearson. All rights reserved.

List Types (continued)
List Operations in Scheme CAR returns the first element of its list parameter (CAR ′(A B C)) returns A CDR returns the remainder of its list parameter after the first element has been removed (CDR ′(A B C)) returns (B C) - CONS puts its first parameter into its second parameter, a list, to make a new list (CONS ′A (B C)) returns (A B C) LIST returns a new list of its parameters (LIST ′A ′B ′(C D)) returns (A B (C D)) Copyright © 2015 Pearson. All rights reserved.

List Operations in ML Lists are written in brackets and the elements are separated by commas List elements must be of the same type The Scheme CONS function is a binary operator in ML, :: 3 :: [5, 7, 9] evaluates to [3, 5, 7, 9] The Scheme CAR and CDR functions are named hd and tl, respectively Copyright © 2015 Pearson. All rights reserved.

F# Lists Like those of ML, except elements are separated by semicolons and hd and tl are methods of the List class Python Lists The list data type also serves as Python’s arrays Unlike Scheme, Common Lisp, ML, and F#, Python’s lists are mutable Elements can be of any type Create a list with an assignment myList = [3, 5.8, "grape"] Copyright © 2015 Pearson. All rights reserved.

Python Lists (continued) List elements are referenced with subscripting, with indices beginning at zero x = myList[1] Sets x to 5.8 List elements can be deleted with del del myList[1] List Comprehensions – derived from set notation [x * x for x in range(6) if x % 3 == 0] range(12) creates [0, 1, 2, 3, 4, 5, 6] Constructed list: [0, 9, 36] Copyright © 2015 Pearson. All rights reserved.

Haskell’s List Comprehensions The original [n * n | n <- [1..10]] F#’s List Comprehensions let myArray = [|for i in > [i * i) |] Both C# and Java supports lists through their generic heap-dynamic collection classes, List and ArrayList, respectively Copyright © 2015 Pearson. All rights reserved.

Unions A type whose variables are allowed to store different type values at different times during execution Design Issues for unions: What kind of type checking, if any, must be done? Should unions be integrated with records? Examples: FORTRAN - with EQUIVALENCE No type checking Pascal both discriminated and nondiscriminated unions type intreal = record tagg : Boolean of true : (blint : integer); false : (blreal : real); end; Problem with Pascal’s design: type checking is ineffective

Unions Example (Pascal)…
Reasons why Pascal’s unions cannot be type checked effectively: User can create inconsistent unions (because the tag can be individually assigned) var blurb : intreal; x : real; blurb.tagg := true; { it is an integer } blurb.blint := 47; { ok } blurb.tagg := false; { it is a real } x := blurb.blreal; { assigns an integer to a real } The tag is optional! Now, only the declaration and the second and last assignments are required to cause trouble

Unions Examples… Ada C and C++ Java has neither records nor unions
discriminated unions Reasons they are safer than Pascal: Tag must be present It is impossible for the user to create an inconsistent union (because tag cannot be assigned by itself -- All assignments to the union must include the tag value, because they are aggregate values) C and C++ free unions (no tags) Not part of their records No type checking of references Java has neither records nor unions Evaluation - potentially unsafe in most languages (not Ada)

Unions Types A union is a type whose variables are allowed to store different type values at different times during execution Design issue Should type checking be required? Copyright © 2015 Pearson. All rights reserved.

Discriminated vs. Free Unions
C and C++ provide union constructs in which there is no language support for type checking; the union in these languages is called free union Type checking of unions require that each union include a type indicator called a discriminant Supported by ML, Haskell, and F# Copyright © 2015 Pearson. All rights reserved.

Unions in F# Defined with a type statement using OR type intReal = | IntValue of int | RealValue of float;; intReal is the new type IntValue and RealValue are constructors To create a value of type intReal: let ir1 = IntValue 17;; let ir2 = RealValue 3.4;; Copyright © 2015 Pearson. All rights reserved.

Unions in F# (continued)
Accessing the value of a union is done with pattern matching match pattern with | expression_list1 -> expression1 | … | expression_listn -> expressionn - Pattern can be any data type - The expression list can have wild cards (_) Copyright © 2015 Pearson. All rights reserved.

To display the type of the intReal union: let printType value = match value with | IntVale value -> printfn ″int″ | RealValue value -> printfn ″float″;; If ir1 and ir2 are defined as previously, printType ir1 returns int printType ir2 returns float Copyright © 2015 Pearson. All rights reserved.

Evaluation of Unions Free unions are unsafe Do not allow type checking Java and C# do not support unions Reflective of growing concerns for safety in programming language Copyright © 2015 Pearson. All rights reserved.

Sets A type whose variables can store unordered collections of distinct values from some ordinal type Design Issue: What is the maximum number of elements in any set base type? Example Pascal No maximum size in the language definition (not portable, poor writability if max is too small) Operations: in, union (+), intersection (*), difference (-), =, <>, superset (>=), subset (<=) Ada does not include sets, but defines in as set membership operator for all enumeration types Java includes a class for set operations

Sets Evaluation If a language does not have sets, they must be simulated, either with enumerated types or with arrays Arrays are more flexible than sets, but have much slower set operations Implementation Usually stored as bit strings and use logical operations for the set operations

Pointers A pointer type is a type in which the range of values consists of memory addresses and a special value, nil (or null) Uses: Addressing flexibility Dynamic storage management Design Issues: What is the scope and lifetime of pointer variables? What is the lifetime of heap-dynamic variables? Are pointers restricted to pointing at a particular type? Are pointers used for dynamic storage management, indirect addressing, or both? Should a language support pointer types, reference types, or both? Fundamental Pointer Operations: Assignment of an address to a pointer References (explicit versus implicit dereferencing)

Pointers Problems with pointers: Dangling pointers (dangerous)
A pointer points to a heap-dynamic variable that has been deallocated Creating one (with explicit deallocation): Allocate a heap-dynamic variable and set a pointer to point at it Set a second pointer to the value of the first pointer Deallocate the heap-dynamic variable, using the first pointer Lost Heap-Dynamic Variables ( wasteful) A heap-dynamic variable that is no longer referenced by any program pointer Creating one: Pointer p1 is set to point to a newly created heap-dynamic variable p1 is later set to point to another newly created heap-dynamic variable The process of losing heap-dynamic variables is called memory leakage

Pointers Examples: Pascal Ada used for dynamic storage management only
Explicit dereferencing (postfix ^) Dangling pointers are possible (dispose) Dangling objects are also possible Ada a little better than Pascal Some dangling pointers are disallowed because dynamic objects can be automatically deallocated at the end of pointer's type scope All pointers are initialized to null Similar dangling object problem (but rarely happens, because explicit deallocation is rarely done)

Pointers Examples… C and C++
Used for dynamic storage management and addressing Explicit dereferencing and address-of operator Can do address arithmetic in restricted forms Domain type need not be fixed (void * ) float stuff[100]; float *p; p = stuff; *(p+5) is equivalent to stuff[5] and p[5] *(p+i) is equivalent to stuff[i] and p[i] (Implicit scaling) void * - Can point to any type and can be type checked (cannot be dereferenced)

Pointers Examples… FORTRAN 90 Pointers
Can point to heap and non-heap variables Implicit dereferencing Pointers can only point to variables that have the TARGET attribute The TARGET attribute is assigned in the declaration, as in: INTEGER, TARGET :: NODE A special assignment operator is used for non-dereferenced references REAL, POINTER :: ptr (POINTER is an attribute) ptr => target (where target is either a pointer or a non pointer with the TARGET attribute)) This sets ptr to have the same value as target

Pointers Examples… C++ Reference Types Java
Constant pointers that are implicitly dereferenced Used for parameters Advantages of both pass-by-reference and pass-by-value Java Only references No pointer arithmetic Can only point at objects (which are all on the heap) No explicit deallocator (garbage collection is used) Means there can be no dangling references Dereferencing is always implicit

Pointers Evaluation Dangling pointers and dangling objects are problems, as is heap management Pointers are like goto's--they widen the range of cells that can be accessed by a variable Pointers or references are necessary for dynamic data structures--so we can't design a language without them

Type Systems and Structures
Type Systems and Structures Programming Language Principles Lecture 22 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida

Type Systems (cont’d) Statically typed language: strongly typed, with enforcement occurring at compile time. Examples: ANSI C (more so than classic C), Pascal (almost, untagged variant records) Some (few) languages are completely untyped: Bliss, assembly language. Dynamic (run-time) type checking: RPAL, Lisp, Scheme, Smalltalk. Other languages (ML, Miranda, Haskell) are polymorphic, but use significant type inference at compile time.

Type Definitions (cont’d)
Three approaches to describe types: Denotational. A type is a set of values (domain). An object has a type if its value is in the set. Constructive: A type is either atomic (int, float, bool, etc.) or is built (constructed) from atomic types, i.e. arrays, records, sets, etc. Abstraction: A type is an interface: a set of operations upon certain objects.

Classification of Types
Scalar (a.k.a. discrete, ordinal) types: The terminology varies (bool, logical, truthvalue). Scalars sometimes come in several widths (short, int, long in C, float and double, too). Integers sometimes come "signed" and "unsigned."

Classification of Types (cont’d)
Enumerations: Pascal: type day = (yesterday, today, tomorrow) A newly defined type, so: var d: day; for d := today to tomorrow do ... Can also use to index arrays: var profits: array[day] of real; In Pascal, enumeration is a full-fledged type.

C: enum day {yesterday,today,tomorrow }; equivalent to: typedef int day; const day yesterday=0; today=1; tomorrow=2;

Subrange types. Values are a contiguous subset of the base type values. The range imposes a type constraint. Pascal: type water_temp = ;

Type Equivalence Structural equivalence: Two types are equivalent if they contain the same components. Varies from one language to another.

Type Equivalence (cont’d)
Example: type r1 = record a,b: integer; end; type r2 = record b: integer; a: integer; var v1: r1; v2: r2; v1 := v2; Are these types compatible ? What if a and b are reversed ? In most languages, no. In ML, yes.

Name equivalence: based on type definitions: usually same name. Assumption: named types are intended to be different. Alias types: definition of one type is the name of another. Question: Should aliased types be the same type?

In Modula-2: TYPE stack_element = INTEGER; MODULE stack; IMPORT stack_element; EXPORT push, pop; procedure push (e:stack_element); procedure pop ( ): stack_element; Stack module cannot be reused for other types.

Strict name equivalence: aliased types are equivalent. type a = b considered both declaration and definition. Loose name equivalence: aliased types not equivalent. type a = b considered a declaration; a and b share the definition.

In Ada: compromise, allows programmer to indicate: alias is a subtype (compatible with base type) subtype stack_element is integer;

In Ada, an alias is a derived type (not compatible) subtype stack_element is integer; type celsius is new REAL; type fahrenh is new REAL; Now the stack is reusable, and celsius is not compatible with fahrenh.

Type Conversion and Casts
Many contexts in which types are expected: assignments, unary and binary operators, parameters. If types are different, programmer must convert the type (conversion or casting).

Three Situations Types are structurally equivalent, but language requires name equivalence. Conversion is trivial. Example (in C): typedef number int; typedef quantity int; number n; quantity m; n = m;

Three Situations (cont’d)
Different sets of values, but same representation. Example: subrange 3..7 of int. Generate run-time code to check for appropriate values (range check).

Three Situations (cont’d)
Different representations. Example (in C): int n; float x; n = x; Generate code to perform conversion at run-time.

Type Conversions Ada: name of type used as a pseudofunction:
Example: n = integer(r); C, C++, Java: Name of type used as prefix operator, in ()s. Example: n = (int) r;

Type Conversions (cont’d)
If conversion not supported in the language, convert to pointer, cast, and dereference (ack!): r = *((float *) &n); Re-interpret bits in n as a float.

Type Conversions (cont’d)
OK in C, as long as n has an address (won't work with expressions) n and r occupy the same amount of storage. programmer doesn't expect run-time overflow checks !

Type Compatibility and Coercions
Coercion: implicit conversion. Rules vary greatly from one language to another.

Type Compatibility and Coercions (cont’d)
Ada: Types T and S are compatible (coercible) if either T and S are equivalent. One is a subtype of the other (or both subtypes of the same base type). Both are arrays (same numbers, and same type of elements). Pascal: same as Ada, but allows coercion from integer to real.

C: Many coercions allowed. General idea: convert to narrowest type that will accommodate both types. Promote char (or short int) to int, guaranteeing neither is char or short. If one operand is a floating type, convert the narrower one: float -> double -> long double

Note: this accommodates mixtures of integer and floating types. If neither type is a floating type, convert the narrower one: int-> unsigned int-> long int-> unsigned long int

Examples char c; /* signed or unsigned -- implementation? */
short int s; unsigned int u; int i; long int l; unsigned long int ul; float f; double d; long double ld;

Examples (cont’d) i + c; /* c converted to int */
i + s; /* s converted to int */ u + i; /* i converted to unsigned int */ l + u; /* u converted to long int */ ul + l; /* l converted to unsigned long int */ f + ul; /* ul converted to float */ d + f; /* f converted to double */ ld + d; /* d converted to long double */

Conversion during assignment. usual arithmetic conversions don't apply. simply convert from type on the right, to type on the left.

Examples char c; /* signed or unsigned -- implementation? */
short int s; unsigned int u; int i; long int l; unsigned long int ul; float f; double d; long double ld;

Examples s = l; /* l's low-order bits -> signed number */
s = ul; /* ditto */ l = s; /* s signed-extended to longer length */ ul = s: /* ditto, ul's high-bit affected ? */ s = c; /* c extended (signed or not) to */ /* s's length, interpreted as signed */ f = l; /* l converted to float, precision lost */ d = f: /* f converted, no precision lost */ f = d; /* d converted, precision lost */ /* result may be undefined */

Type Inference Usually easy. Type of assignment is type of left-side.
Type of operation is (common) type of operands.

Type Inference (cont’d)
Not always easy. Pascal: type A: ; B: : var a: A; b: B; What is the type of a+b ? In Pascal, it's the base type (integer).

Ada: The type of the result would be an anonymous type The compiler would generate run-time checks for values out of bounds. Curbing unnecessary run-time checks is a major problem.

Pascal allows operations on sets: var A: set of 1..10; B: set of ; C; set of 1..15; i: 1..30; C := A + B * [1..5,i]; The type of the expression is set of integer (the base type). Range check is required when assigning to C.

Type safety in Java

Records (structs) and Variants (unions)
In Pascal,

Representation in Pascal:

Records (structs) and Variants (unions, cont’d)
Usage: var copper: element; copper.name := 'Cu'; Record can be "packed", filing in holes, but forcing compiler to generate code that can access fields using multi-instruction sequences (less efficient).

Packed Representation

Usage: element copper; strcpy(copper.name,"Cu");

Most languages allow assignment of one record to another, but if not, a "block_copy" routine can solve the problem. Most languages don't allow equality comparison. A "block_compare" routine might have problems with garbage in the holes.

Compilers often rearrange fields to reduce space:

In Pascal,

Pascal with Statements
Introduce a nested scope, in which record fields are visible without record name. Useful for deeply nested structures. Example: with copper do begin name := 'Cu'; atomic_number := 29; atomic_weight := metallic := true; end;

Pascal with Statements (cont’d)
Problems with Pascal's with statement: Can only manipulate fields of ONE record, not two. Not a shortcut for copying fields from one record to another. Local names that match field name become inaccessible. Can be difficult to read, especially in long or deeply nested with statements.

Module-2 allows aliases for complicated expressions: WITH e=copper DO BEGIN e.name := 'Cu'; e.atomic_number := 29; e.atomic_weight := e.metallic:= true; END;

Can access one than one record at a time: WITH e=copper, f=iron DO e.metallic := f.metallic; END;

In Modula-3, the with statement goes further: WITH d = (...) DO IF d <> 0 THEN val := n/d ELSE val := 0;

C gets around this using the conditional expression: { double d = (...); val = (d ? n/d : 0); }

C has no need for a with statement, just use pointers: element *e = { ... } element *f = { ... } e->name = f->name; e->atomic_number = f.atomic_number; e->atomic_weight = f.atomic_weight; e->metallic = f.metallic ;

Variant Records Choice between alternative fields.
Only one is valid at any given time.

Example (Pascal)

Example (Pascal, cont’d)
"naturally_occuring" is the "tag", which indicates whether the element contains A source and a prevalence, or A half_life.

Example (Pascal, cont’d)

Variant Records (cont’d)
Unions are not integrated with structs, so there are additional names: element e; e.extra_fields.natural_info.source = 3; e.extra_fields.half_life = 3.5;

In general, type safety is compromised: type tag = (is_int, is_real, is_bool); var irb: record case which: tag of is_int: (i:integer); is_real: (r:real); is_bool: (b:Boolean); end;

Usage: irb.which := is_real; irb.r := 3.0; irb.i := 7; (* run-time error *)

Changing the tag field should make all other fields in the variant uninitialized, but it's very expensive to keep track of at run-time. Most compilers won't catch this: irb.which := is_real; irb.r := 3.0; irb.which := is_int; writeln(irb.i); (* uninitialized, or worse, shares space with irb.r *)

Worse yet, the tag field is optional: type tag = (is_int, is_real, is_bool); var irb: record case tag of (* 'which' field is gone ! *) is_int: (i:integer); is_real: (i:real); is_bool: (i:Boolean); end;

No way to catch irb.r := 3.0; writeln(irb.i); Designers of Modula-3 dropped variant records, for these safety reasons. Similarly, designers of Java dropped union of C and C++.

Variants in Ada Must have a tag (discriminant).
If tag changes, all fields in the variant must be changed, by assigning a whole record (A := B;), or assigning an aggregate.

Example (with discriminant default value)

Variants in Ada (cont’d)
Declaration can use the default: copper: element; Declaration can override the default: plutonium: element (false); americium: element (naturally_occuring => false);

The type declaration may: provide a default (constrained discriminant), which cannot be changed. not provide a default (unconstrained discriminant); then every variable declaration must do so, and the tag may be changed.

In short, discriminants are never uninitialized. In Ada, variants are required to appear at the end of the record. The compiler assigns a constant address to every field.

Variants in Modula-2 In Modula-2, this restriction is dropped. Usually, a fixed address is assigned to each field, leaving holes where variants differ in size.

Variants in Modula-2 (cont’d)

What are Main Issues in Type Checking?
A Higher Level View Type Equivalence: Conditions under which Types are the Same Tracking of Scoping – Nested Declarations Type Compatibility Conversion/casting, Nonconverting casts, Coercion Type Inference Determining the Type of a Complex Expression Reviewing Remaining Concepts of Note Overloading, Polymorphism, Generics From: Chapter 6 of Compilers: Principles, Techniques and Tools, Aho, et al., Addison-Wesley

Structural vs. Name Equivalence of Types
Two Types are “Structurally Equivalent” iff they are Equivalent Under Following 3 Rules: SE1: A Type Name is Structurally Equivalent to Itself SE2: T1 and T2 are Structurally Equivalent if they are Formed by Applying the Same Type Constructors to Structurally Equivalent Types SE3: After a Type Declaration: Type n=T, the Type Name n is Structurally Equivalent to T SE3 is “Name Equivalence” What Do Programming Languages Use? C: All Three Rules Pascal: Omits SE2 and Restricts SE3 to be a Type Name can only be Structurally Equivalent to Other Type Names

Type Equivalence Structural equivalence: equivalent if built in the same way (same parts, same order) Name equivalence: distinctly named types are always different Structural equivalence questions What parts constitute a structural difference? Storage: record fields, array size Naming of storage: field names, array indices Field order How to distinguish between intentional vs. incidental structural similarities? An argument for name equivalence: “They’re different because the programmer said so; if they’re

Type Equivalence Records and Arrays
Would record types with identical fields, but different name order, be structurally equivalent? When are arrays with the same number of elements structurally equivalent? type PascalRec = record a : integer; b : integer end; val MLRec = { a = 1, b = 2 }; val OtherRec = { b = 2, a = 1 }; type str = array [1..10] of integer; type str = array [1..2 * 5] of integer; type str = array [0..9] of integer;

Consider Name Equivalence in Pascal
How are Following Compared: By Rules SE1, SE2, SE3, all are Equivalent! However: Some Implementations of Pascal next, last – Equivalent p, q, r, - Equivalent Other Implementations of Pascal q, r, - Equivalent How is Following Interpreted? type link = cell; var next : link; last : link; p : cell; q, r : cell; type link = cell; np = cell; npr = cell; var next : link; last : link; p : np; q, r : npr;

What about Classes and Equivalence?
Are these SE1? SE2? Or SE3? What Does Java Require? public class person { private String lastname, firstname; private String loginID; private String password; }; public class user { private String lastname, firstname; private String loginID; private String password; };

Alias Types and Name Equivalence
Alias types are types that purely consist of a different name for another type Is Integer assignable to a Stack_Element? Levels? Can a Celsius and Fahrenheit be assigned to each other? Strict name equivalence: aliased types are distinct Loose name equivalence: aliased types are equivalence Ada allows additional explicit equivalence control: TYPE Stack_Element = INTEGER; TYPE Level = INTEGER; TYPE Celsius = REAL; TYPE Fahrenheit = REAL; subtype Stack_Element is integer; type Celsius is new real; type Fahrenheit is new real;

Why is Degree of Type Equivalence Critical?
Governs how Software Engineers Develop Code… Why? SE2 Alone Doesn’t Promote Well Designed, Thought Out, Software … Why? Impacts on Team-Oriented Software Development… How? With SE2 Alone, Errors are Harder to Locate and Correct… Why? Increases Compilation Time with SE2 Alone … Why?

Type Conversion Certain contexts in certain languages may require exact matches with respect to types: aVar := anExpression value1 + value2 foo(arg1, arg2, arg3, … , argN) Type conversion seeks to follow these exact match rules while allowing programmers some flexibility in the values used Using structurally-equivalent types in a name-equivalent language Types whose value ranges may be distinct but intersect (e.g. subranges) Distinct types with sensible/meaningful corresponding values (e.g. integers and floats)

Type Conversion Refers to the Conversion Between Different Types to Carry out Some Action in a Program Often Abused within a Programming Language (C) Typically Used in Arithmetic/Boolean Expressions r := i + r; (Pascal) f := i + c; (C) Two Kinds of Conversion: Implicit: Automatically done by Compiler Explicit: Type-Casts: Programmer Initiated (Ord, Chr, Trunc) If X is a real array, which works faster? Why for I:=1 to N do X[I] := 1; for I:=1 to N do X[I] := 1.0; A Good Optimizing Compiler will Convert 1st option!

Type Casting Syntax Ada C/C++/Java Some SQLs n : integer; r : real;
... r := real(n); // Sample is specific to Java, but shares common syntax. Object n; String s; ... s = (String)n; -- Timestamp is a built-in data type; charField is -- a varchar (string) field of some table. select charField::timestamp from…

Non-Converting Type Casts
Type casts that explicitly preserve the internal bit-level representation of values Common in manipulating allocated blocks of memory Same block of memory may be viewed as arrays of characters, integers, or even records/structures Block of memory may be read from a file or other external source that is initially viewed as a “raw” set of bytes

Non-Converting Type Casts - Examples
Ada – Explicit Unchecked Conversion Subroutine C/C++ (Not Java): Pointer Games • C++: explicit cast types static_cast, reinterpret_cast, dynamic_cast function cast_float_to_int is new unchecked_conversion(float, integer); void *block; // Gets loaded up with some datafrom a file. Record *header = (Record *)block; // Record is struct. int i = static_cast<int>(d); // Assume d is double. Record *header = reinterpret_cast<Record *>(block); Derived *dObj = dynamic_cast<Derived *>(baseObj); // Derived is a subclass of Base.

Type Coercion Sometimes absolute type equivalence is too strict; type compatibility is sufficient Type equivalence vs. type compatibility in Ada (strict): Types must be equivalent One type must be a subtype of another, or both are subtypes of the same base type Types are arrays with the same sizes and element types in each dimension Pascal extends slightly, also allowing: Base and subrange types are cross-compatible Integers may be used where a real is expected Type coercion is an implicit type conversion between compatible but not necessarily equivalent types

Type Coercion Issues Sometimes viewed as a weakening of type securitY
Mixing of types without explicit indication of intent Opposite end of the spectrum: C and Fortran Allow interchangeable use of numeric types Fortran: arithmetic can be performed on entire arrays C: arrays and pointers are roughly interchangeable C++ Add Programmer Extensible Coercion Rules class ctr { public: ctr(int i = 0, char* x = "ctr") { n = i; strcpy(s, x); } ctr& operator++(int) { n++; return *this; } operator int() { return n; } // Coercion to int operator char*() { return s; } // Coercion to char * private: int n; char s[64]; };

Type Inference Type inference refers to the process of determining the type of an arbitrarily complex expression Generally not a huge issue — most of the time, the type for the result of a given operation or function is clearly known, and you just “build up” to the final type as you evaluate the expression In languages where an assignment is also an expression, the convention is to have the “result” type be the type of the lefthandside But, there are occasional issues, specifically with subrange and composite types

Examples of Type Inference
Subranges — in languages that can define types as subranges of base types (Ada, Pascal), type inference can be an issue: What should c’s type be? Easy answer: always go back to the base type (integer in this case) type Atype = 0..20; Btype = ; var a : Atype; b : Btype; c : ????; c := a + b;

What if the result of an expression is assigned to a subrange? a := 5 + b; (* a and b are defined on last slide *) The primary question is bounds checking — operations on subranges can certainly produce results that break away from their defined bounds Static checks: include code that infers the lowest and highest possible results from an expression Dynamic check: static checks are not always possible, so the last resort is to check the result at runtime

Composite types What is the type of operators on arrays? We know it’s an array, but what specifically? (particularly for languages where the index range is part of the array definition) Examples: Strings in languages where strings are exactly character arrays (Pascal, Ada)

Sets In languages that encode a base type with a set (e.g. set of integer), what is the “type” of unions, intersections, and differences of sets? Examples: Particularly tricky when a set is combined with a subrange Same as subrange handling: static checks are possible in some cases, but dynamic checks are not completely avoidable var A : set of 1..10; B : set of ; C : set of 1..15; i : 1..30; ... C := A + B * [1..5, i];

Overloading The Same Symbol has Different Meanings in Different Contexts Many Examples: + : int  int  int + : real  real  real + : set  set  set (union) + : string  string  string (concatenate) == (compares multiple types) … >> cout (outputs multiple types) Impacts on Conversion since During Code Generation we must Choose “Correct” Option based on Type

Overloading Coercion Requires the Need to Convert Expression Before Generating Code real := real * int – need to use real * real := real * int_to_real(int) – do the conversion After Conversion, Code Generation can Occur Overloading has Increased Attention with Emergence of Object-Oriented Langauges C++ and Java Allow User Defined Overloaded Definitions for +, -, *, etc. Programmer Definable Routines (e.g., SORT) can be Overloaded based on Type

Overloading A very handy mechanism Available in C++/Java/...
What is it? Provide multiple definition of the same function over different types. Example [C++] int operator+(int a,int b); float operator+(float a,float b); float operator+(float a,int b); float operator+(int a,float b); Complex operator+(Complex a,Complex b); ....

Polymorphism Essential Concept: A Function is Polymorphic if it can be Utilized with Arguments/Parameters of More than 1 Type The EXACT, SAME, Piece of Code is being Utilized Why is Polymorphism Important? Consider a List of Items in C: struct item { int info; struct item *next; } Write a Length Function function LEN(list: *item) : integer; In theory – only need to access *next … However, in C, you can’t reuse this Length Function for Different Structures Write a Similar Version for Each Different Structure

Polymorphism Allows us to write Type Independent Code
Polymorphism is Supported in ML, a Strongly Type Functional Programming Language: fun length (lptr) = if null (lptr) then 0 else length(tail(lptr)) + 1; Overloading is an Example of ad-hoc Polymorphism Parametric Polymorphism Essentially has Type as a Parameter to Function Stack Operations: Create (T: stack_type) Arithmetic Operations: Plus (X, Y: T: T is a type) This leads to Generics!

Generics Classic Programming Problem
Develop Stack or List Code in C that Applies to a Single Type (e.g., List of DeliItems) Need Same Code for ProduceItems, MeatItems, etc., Copy and Edit Original Code Minimal Reuse, Error Prone, Time Consuming What do Generics Offer? A Type-Parameterizable Class Abstracts Similar Behavior into a Common Interface Reusable and Consistent Class Illustrate via C++

A Generic Stack Class template <class T> stack { private: T* st,
int top; int size; public: void stack(int x) {st = new T[x]; top = 0; size = x;} stack() {st = new T[100]; top = 0; size = 100;} ~stack() {delete st;} push(T entry) { st[top++] = entry;} }; main() { stack<int> S1(10); // Creates int Stack with 10 Slots stack<char> S2; // Creates char Stack with 100 Slots stack<Item> ItemDB(10000); // Stack of Grocery Items }

A Generic Set Class template <class ItemType> class Set {
DLList<ItemType>* _members; // Set Templates Uses Double-Linked List Template public: Set() { _members = new DLList<ItemType>(); } void Add(ItemType* member) {_members->queue(member);} void Remove(ItemType* member) {_members->remove(member);} int Member(ItemType* member) {return _members->isMember(member);} int NullSet(){return _members->isEmpty();} } main() { Set<int> IntSet; // Creates an Integer Set Set<Item> ItemDB; // Creates an Item Set }

Generics Imagine a code fragment that Computes the length of a list
Java-style class ListNode<T> { T data; Node<T> next; Node<T>(T d,Node<T> n) { data = d; next = n; } int length() { if (next != null) return 1 + next.length(); else return 1; This does not refer to T at all, it can be implemented only once! So... What is the type of this method ?

Benefits of Generics Strong Promotion of Reuse Develop Once, Use Often
stack and Set Examples Eliminate Errors and Inconsistencies Between Similar and Separate Classes Simplifies Maintenance Problems Focuses on Abstraction and Design Promotes Correctness and Consistent Program Behavior Once Designed/Developed, Reused with Consistent Results If Problems, Corrections in Single Location

Type Systems CSE 340 – Principles of Programming Languages Fall 2015
Type Systems CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University

Type Compatibility Which assignments are allowed by the type system?
a = b;? int a; float b; float a; int b;

Type Inference Types of expressions or other constructs as a function of subexpression types a + b a int; b float Returns a float in C Error in ML a * b a string; b int Error in most languages Returns a string in Python

Type Compatibility Principally about type equivalence
How to determine if two types are equal? Type cm : integer; Type inch : integer; cm x; inch y; x = y?

Name Equivalence Types must have the exact same name to be equivalent
Type cm : integer; Type inch : integer; cm x; inch y; x = y? // ERROR

Name Equivalence a: array [0..4] of int; b: array [0..4] of int;
a = b? Not allowed under name equivalence

Name Equivalence a, b: array [0..4] of int; a = b?
Not allowed because array [0..4] of int is not named

Name Equivalence Type A: array [0..4] of int; a: A; b: A; a = b?
Allowed, because both a and b have the same name

Internal Name Equivalence
If the program interpreter gives the same internal name to two different variables, then they share the same type a, b: array [0..4] of int; c: array [0..4] of int; a = b? Yes, because interpreter/compiler gives the same internal name to a and b a = c? No, because interpreter/compiler gives different internal name to c than to a and b

Structural Equivalence
Same built-in types Pointers to structurally equivalent types Type cm : integer; Type inch : integer; cm x; inch y; x = y? // Allowed!

int* a; float* b; a = b? Not structurally equivalent, because int and float are not structurally equivalent

Determining struct structural equivalence Two structures st1 { x1: W1, x2: W2, …, xk: Wk } st2 { y1: Q1, y2: Q2, ..., yk: Qk } st1 and st2 are structurally equivalent iff W1 structurally equivalent to Q1 W2 structurally equivalent to Q2 ... Wk structurally equivalent to Qk

struct A { a: int, b: float } struct B { b: int, a: float } A foo; B bar; foo = bar?

struct A { a: int, b: float } struct B { b: float, a: int } A foo; B bar; a = b?

Determining array structural equivalence Two Arrays T1 = array range1 of t1 T2 = array range2 of t2 T1 and T2 are structurally equivalent iff: range1 and range2 have (1) the same number of dimensions and (2) the same number of entries in each dimension t1 and t2 are structurally equivalent

Determining function structural equivalence Two functions T1 = function of (t1, t2, t3, …, tk) returns t T2 = function of (v1, v2, v3, ..., vk) returns v T1 and T2 are structurally equivalent iff: For all i from 1 to k, ti is structurally equivalent to vi t is structurally equivalent to v

Determining Structural Equivalence
The goal is to determine, for every pair of types in the program, if they are structurally equivalent Seems fairly simple, just keep applying the previous 5 rules until the base case 1 or 2 is reached How to handle the following case: T1 = struct { a: int; p: pointer to T2; } T2 = struct { a: int; p: pointer to T1; } Applying the rules states that T1 is structurally equivalent to T2 iff pointer to T1 is structurally equivalent to pointer to T2, which is true if T1 is structurally equivalent to T2

Chapter 6 Exploring Types and Equivalence

Similar presentations

Presentation on theme: "Chapter 6 Exploring Types and Equivalence"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 6 Exploring Types and Equivalence

Similar presentations

Presentation on theme: "Chapter 6 Exploring Types and Equivalence"— Presentation transcript:

Similar presentations

About project

Feedback