Data.

Characteristics Location Data type Structure Size

Characteristics Location Data type Structure Size Where it is stored
global/static location set at compile time automatic variables location set at runtime (offset from stack position)

Characteristics Location Data type Structure Size
Defines acceptable values and interpretation char integer float Boolean pointer/reference

Characteristics Location Data type Structure Size Contents and layout
Primitive Array Record

Characteristics Memory consumed Location Fixed/Variable Data type
Structure Size Memory consumed Fixed/Variable Bounds Lower Upper

Primitives: Integers Usually a range of integer values corresponding to the number of bits used to store the value C: char, short, int, long, long long But some languages support arbitrary-sized integers Java: BigNum Python: Long

Primitives: Integers C supports signed and unsigned integers
Java supports only signed integers This limitation is at the JVM level, so languages like Clojure and Scala have the same restriction

Primitives: Integers Most hardware represents signed integers using twos-complement notation: Any integer for which the most significant bit is a one is considered negative. To invert the sign of a negative number, you invert all of the bits and add one. b is -17 b + 1 = b The advantage is that the same adder logic can be used for all integers

Primitives: Floating Point
A means of approximating real numbers in a fixed amount of space IEEE 754 16, 32, 64, 128, bit floats

Sign bit, Significand, Exponent The significand is interpreted as having an assumed one as the most significant bit The granularity is a function of the size of the value

Take 16-bit float as an example For numbers greater than 256, the fractional part has a granularity of 0.25 Consider 300

Primitives: Decimal IBM mainframes provided efficient operations
PL/I and COBOL provided primitives Binary-Coded Decimal Either eight bits or four bits per digit

Primitives: Decimal Advantage Accurate representation of money
Binary floating point representations can't describe 0.1

Primitives: Character
Historically, an unsigned 8-bit value Some character sets and protocols only supported seven-bit characters SMTP (simple mail transfer protocol)

Primitives: Character
Eight-bit characters are not adequate for representing all of the worlds character sets Unicode provides code-points for many languages It is better to think of Unicode encoding on the entire string The most popular, UTF-8, uses a different number of bytes per code-point depending on its value English (7-bit ASCII) - one byte Korean - three bytes

Primitives: Integer Subranges
Allows you to specify the minimum and maximum value of an integer Pascal provided this Type T = 0..51; From a type theory perspective, these are a bit problematic. We usually like to think of integer types as closed under addition, but the sum of two variables of type T should be stored in a bigger type Type TT = In general, these types require runtime checks to be maintained. This is a really simple version of a dependent type (about which we may say more later).

Primitives: Enumerations
A version of integer subranges that names each of the available values enum workdays { Monday, Tuesday, Wednesday, Thursday, Friday}; They are implemented as an integer "under the hood" They are particularly useful for C's switch statement

Complex Data String Arrays Associative Arrays Records Unions

Strings An ordered collection of characters Options Mutable?
C,C++ : yes Java, Python : no Size stored as metadata? C : no C++ : yes (std::string) Java: yes

Strings in C

Strings in Java

Arrays Collection of one or more data elements
Dimensions may be fixed or dynamic Fixed dimensions may be known at compile time or at runtime In C99, a function may declare an array with size set as a function of the function's parameters.

Dynamic Arrays Grow as needed Two implementations
Contiguous memory with resize C++ std::vector Segmented C++ std::deque

Dynamic Arrays: std::vector

Dynamic Arrays: std::deque

Multi-Dimensional Arrays
Guaranteed rectangular (solid) In C, int a[2][3] looks like But in Fortran, it would be 0,0 0,1 0,2 1,0 1,1 1,2 0,0 0,1 1,0 1,1 2,0 2,1

Arrays of Arrays

Associative Array Also called key-value pairs
Any object can be used as a key

Associative Array C++ provides two versions std::map
Requires that keys provide a < (less-than) operator Implemented with red-black tree std::unordered_map Implemented with a hash table

Record A data structure composed of a fixed number of elements
Each of which is at a known offset from the beginning of the structure That may be different data types

Record In C, these are structs struct data { char a; int b; short c;
float d; double e; }

float d; double e; } How much memory does this consume?

float d; double e; } How much memory does this consume? Nominally: = 19 bytes

float d; double e; } But most architectures perform better on values that are aligned in memory according to their size How much memory does this consume?

float d; double e; } How much memory does this consume? 24 bytes!

Union types In C, this is like a structure, but for which its elements overlap in memory union U { float floatVal; int intVal; char charVal; };

Union types In C, this is like a structure, but for which its elements overlap in memory union U { float floatVal; int intVal; char charVal; }; Only consumes four bytes (size of largest member)

Union types In C, this is like a structure, but for which its elements overlap in memory U u; Elements accessed as u.floatVal; or u.intVal; union U { float floatVal; int intVal; char charVal; }; Only consumes four bytes (size of largest member)

Union types In C, this is like a structure, but for which its elements overlap in memory U u; Elements accessed as u.floatVal; or u.intVal; union U { float floatVal; int intVal; char charVal; }; Only consumes four bytes (size of largest member) No type checking is done! You can write as an integer and read as a float!

Algebraic Data Types Available in languages like ML, Haskell, etc.
Based on building data types out of the operators + and * A record of name, age, and favorite color would be String * integer * color

Algebraic data types They are useful for building structures without resorting to the use of null pointers Consider a binary tree data BinTree: | leaf | node(value :: Number, left :: BinTree, right : BinTree) end

Algebraic Data Types data BinTree: | leaf | node(value :: Number, left :: BinTree, right : BinTree) end Every element in the binary tree must be either a node or a leaf The only way to access the value and left/right fields of a node version of a BinTree is a test that ensures that it actually is a node rather than a leaf

No null-pointer exceptions!
Algebraic Data Types data BinTree: | leaf | node(value :: Number, left :: BinTree, right : BinTree) end Every element in the binary tree must be either a node or a leaf The only way to access the value and left/right fields of a node version of a BinTree is a test that ensures that it actually is a node rather than a leaf No null-pointer exceptions! Ever!

Data.

Similar presentations

Presentation on theme: "Data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data.

Similar presentations

Presentation on theme: "Data."— Presentation transcript:

Similar presentations

About project

Feedback