 Unit ii Matrices and models.

Presentation on theme: "Unit ii Matrices and models."— Presentation transcript:

Unit ii Matrices and models

Introduction -What is GSL?
The GNU Scientific Library GSL is a numerical library for C and C++ programmers. GSL provides a well-defined C language GSL is thread-safe i.e they do not use static variables. Memory is always associated with objects and not with functions.

Compiling and Linking The library header files are installed in their own `gsl' directory. One should write include statements with a `gsl/' directory prefix like: #include <gsl/gsl_math.h> Compile: gcc -c example.c Link: gcc example.o -lgsl -lgslcblas -lm

I - Naming conventions Every function in GSL begins with gsl_
The first arguments of these functions will be objects acted upon. Eg : gsl_matrix_ , gsl_vector_ etc.

Apophenia Apophenia is an open statistical library for working with data sets and statistical models. It provides functions on the same level as those of the typical stats package Apophenia’s functions begin with apop_. Major functions begin with datat type like apop-data,apop_model etc.

II-Basic matrix and vector operations
Matrix is an array of numbers with dimensions M (rows) by N (columns) 3 by 6 matrix element 2,3 is (3) Vector can be considered a 1 x M matrix

Simplest oprerations on matrices are:
Element by element addition, multiplication, etc. Same can be done with vectors Following are the list of functions gsl provides for the same.

Vectors Vectors are defined by a gsl_vector structure which describes a slice of a block. Different vectors can be created which point to the same block. A vector slice is a set of equally-spaced elements of an area of memory.

Vectors Operations with vectors Structure Vector allocation
Accessing elements of vectors Vector operations

Structure of a vector The gsl_vector structure contains five components: the size, the stride, a pointer to the memory where the elements are stored,  data, a pointer to the block owned by the vector,  block, if any, and an ownership flag, owner.

The structure is very simple and looks like this,
typedef struct { size_t size; size_t stride; double * data; gsl_block * block; int owner; } gsl_vector;

Vector allocation The functions for allocating and accessing vectors are defined in gsl_vector.h The functions for allocating memory to a vector follow the style of malloc and free. gsl_vector * gsl_vector_alloc (size_t n) : it cre ates a vector of length n  gsl_vector * gsl_vector_calloc (size_t n) : This function allocates memory for a vector of length n and initializes all the elements of the vector to zero.

void gsl_vector_free (gsl_vector
void gsl_vector_free (gsl_vector * v)This function frees a previously allocated vector v.

Accessing vector elements
The functions for accessing the elements of a vector are : double gsl_vector_get (const gsl_vector * v, size_t i) This function returns the i-th element of a vector v. If i lies outside the allowed range of 0 to n-1 then the error handler is invoked and 0 is returned or matrix are defined in gsl_vector.h

void gsl_vector_set (gsl_vector * v, size_t i, double x)
This function sets the value of the i-th element of a vector v to x. If i lies outside the allowed range of 0 to n-1 then the error handler is invoked  double * gsl_vector_ptr (gsl_vector * v, size_t i)  const double * gsl_vector_const_ptr (const gsl_vector * v, size_t i) These functions return a pointer to the i-th element of a vector v.

Vector operations  int gsl_vector_add (gsl_vector * a, const gsl_vector * b) This function adds the elements of vector b to the elements of vector a. The result ai + bi is stored in a and b remains unchanged. The two vectors must have the same length. int gsl_vector_sub (gsl_vector * a, const gsl_vector * b) This function subtracts the elements of vector b from the elements of vector a .result is stored in a

int gsl_vector_mul (gsl_vector * a, const gsl_vector * b)
int gsl_vector_div (gsl_vector * a, const gsl_vector * b)  int gsl_vector_scale (gsl_vector * a, const double x) : This function multiplies the elements of vector a by the constant factor x. int gsl_vector_add_constant (gsl_vector * a, const double x) : This function adds the constant value x to the elements of the vector a.

Example Output : \$ ./a.out v_0 = 1.23 v_1 = 2.23 v_2 = 3.23
#include <stdio.h> #include <gsl/gsl_vector.h> int main (void) { int i; gsl_vector * v = gsl_vector_alloc (3); for (i = 0; i < 3; i++) { gsl_vector_set (v, i, i); } gsl_vector_free (v); return 0; } Output : \$ ./a.out v_0 = 1.23 v_1 = 2.23 v_2 = 3.23

Matrices Matrices are defined by a gsl_matrix structure which describes a generalized slice of a block. Like a vector it represents a set of elements in an area of memory, but uses two indices instead of one.

The gsl_matrix structure contains six components,
the two dimensions of the matrix, a physical dimension, a pointer to the memory where the elements of the matrix are stored,  data, a pointer to the block owned by the matrix block, if any, an ownership flag, owner. The physical dimension determines the memory layout and can differ from the matrix dimension to allow the use of submatrices.

gsl_matrix structure typedef struct { size_t size1; size_t size2; size_t tda; double * data; gsl_block * block; int owner; } gsl_matrix;

Matrices are stored in row-major order, meaning that each row of elements forms a contiguous block in memory. The number of rows is size1. The range of valid row indices runs from 0 tosize1-1. Similarly size2 is the number of columns. The range of valid column indices runs from 0 to size2-1. The physical row dimension tda, or trailing dimension, specifies the size of a row of the matrix as laid out in memory.

example In the following matrix size1 is 3, size2 is 4, and tda is 8. The physical memory layout of the matrix begins in the top left hand-corner and proceeds from left to right along each row in turn. XX XX XX XX XX XX XX XX XX XX XX XX xx represents unused memory locations The functions for allocating and accessing matrices are defined in gsl_matrix.h

Accessing matrix elements
The functions for accessing the elements of a matrix use the same range checking system as vectors. double gsl_matrix_get (const gsl_matrix * m, size_t i, size_t j) This function returns the (i,j)-th element of a matrix m.  void gsl_matrix_set (gsl_matrix * m, size_t i, size_t j, double x) This function sets the value of the (i,j)-th element of a matrix m to x.

double * gsl_matrix_ptr (gsl_matrix * m, size_t i, size_t j)
const double * gsl_matrix_const_ptr (const gsl_matrix * m, size_t i, size_t j) These functions return a pointer to the (i,j)-th element of a matrix m.

Matrix operations  int gsl_matrix_add (gsl_matrix * a, const gsl_matrix * b) This function adds the elements of matrix b to the elements of matrix a. The result a(i,j) + b(i,j) is stored in a and b remains unchanged. The two matrices must have the same dimensions.  int gsl_matrix_sub (gsl_matrix * a, const gsl_matrix * b)

Matrix operations int gsl_matrix_mul_elements (gsl_matrix * a, const gsl_matrix * b) int gsl_matrix_div_elements (gsl_matrix * a, const gsl_matrix * b)  int gsl_matrix_scale (gsl_matrix * a, const double x)  int gsl_matrix_add_constant (gsl_matrix * a, const double x)

Example #include <stdio.h> #include <gsl/gsl_matrix.h> int main (void) { int i, j; gsl_matrix * m = gsl_matrix_alloc (10, 3); for (i = 0; i < 10; i++) { for (j = 0; j < 3; j++) {gsl_matrix_set (m, i, j, *i + j); } gsl_matrix_free (m); return 0; }

Apophenia

Uses of apophenia Can be used for simple stats-package--like fitting of models, where the user gathers data, cleans it, and runs a series of regressions  can use the library as input to the design of other systems, like fitting a model and then using the fitted model to generate agents in your simulation, or designing hierarchical models built from simpler base models.

workflow of a typical fitting-a-model project
Read the raw data into the database using apop_text_to_db. Use SQL queries handled by apop_query to massage the data as needed. Use apop_query_to_data to pull some of the data into an in-memory apop_data set.

4 Call model estimation like apop_estimate to fit the parameters
5 Interrogate the returned estimate, by dumping it to the screen with apop_model_print, sending its parameters and variance-covariance matrices to additional tests Or send the model's output as the input to another model.

Apop_data The apop_data structure represents a data set
It joins together a gsl_vector, a gsl_matrix, an apop_name, and a table of strings. It can be used everywhere a gsl_matrix or a gsl_vector can be used.

Apop_data The structure basically includes six parts: a vector
a matrix a grid of text elements a vector of weights names for everything: row names, a vector name, matrix column names, text names. a link to a second page of data

ex : consider data for a weighted OLS regression
ex : consider data for a weighted OLS regression. It includes an outcome variable in the vector, dependent variables in the matrix and text grid, replicate weights, and column names in bold labeling the variables: Example :

Apophenia will generally assume that one row across all of these elements describes a single observation or data point.  apop_data_get, apop_data_set, and apop_data_ptr are used  these functions consider the vector to be the -1st column, so using the data set in the example ,apop_data_get(sample_set, .row=0, .col=-1) == 1.

Reading data : can be done using apop_text_to_data or apop_text_to_db and then apop_query_to_data.
Subsets of data can be generated as required using  APOP_DATA_ROWS

Means of creating apop_data set
Apop_quer_to_text Apop_query_to_data Apop_matrix_to_data Apop_vector_to_data Apop_data_alloc

Apop_query_to_text Dump the results of a query into an array of strings. Returns:An apop_data structure with the text element filled. Arg/fmt : a printf style SQL query If apop_opts.db_name_column matches a column of the output table, then that column is used for row names, and therefore will not be included in the text.

Apop_query_to_data Queries the database, and dumps the result into an apop_data set. If apop_opts.db_name_column is set (it defaults to being "row_names"), and the name of a column matches the name, then the row names are read from that column. Returns : If no rows are returned, NULL; else an apop_data set with the data in place. Most data will be in the matrix element of the output

Apop_query_to_data Queries the database, and dumps the first column of the result into a gsl_vector. Uses apop_query_to_data internally, then throws away all but the first column of the matrix. If apop_opts.db_name_column is set, then I'll ignore that column. It gets put into the names of the apop_data set, and then thrown away when I look at only the gsl_matrix part of that set. If the query returns zero rows of data or no columns, the function returns NULL.

Returns : A gsl_vector holding the first column of the returned matrix

Apop_matrix_to_data Wraps an apop_data structure around an existing gsl_matrix. The matrix is not copied, but is pointed to by the new apop_data struct. Parameters: (m) The existing matrix you'd like to turn into an apop_data structure. Returns:The apop_data structure whose matrix pointer points to the input matrix. The rest of the struct is basically blank.

apop_vector_to_data Wraps an apop_data structure around an existing gsl_vector. The vector is not copied, but is pointed to by the new apop_data struct. Parameters: (v )The data vector Returns:an allocated, ready-to-use apop_data structure.

Apop_data_alloc Allocate a apop_data structure, to be filled with data. Has three arguments, like apop_data_alloc(2,3,4): vector size, matrix rows, matrix cols. If the first argument is zero, you get a NULL vector. Two arguments, apop_data_alloc(2,3), would allocate just a matrix, leaving the vector NULL. One argument, apop_data_alloc(2), would allocate just a vector, leaving the matrix NULL. Zero arguments, apop_data_alloc(), will produce a basically blank set, with  out->matrix== out->vector==NULL.

Get,set and point

Set apop_data_set(in, row, col, data) is much like the GSL's  gsl_matrix_set(in->matrix, row, col, data), but with some differences: The apop_data set has names, so we can get/set elements using those names. The versions that take a column/row name use apop_name_find for the search; The apop_data set has both matrix and vector elements.

Set For those that take a column number, column -1 is the vector element. For those that take a column name, It will search the vector last---if It doesn't find the name among the matrix columns, but the name matches the vector name, it return column -1. If you give both a .row and a .rowname, it goes with the name; similarly for .col and .colname. The column (like all defaults) is zero unless stated otherwise, so apop_data_get(dataset, 1) gets item (1, 0) from the matrix element ofdataset

structure int apop_data_set(apop_data * data, const size_t row,
const int col, const double val, const char * colname, const char * rowname, const char * page )

Example Set a data element.
Eg : the following would all set row 3, column 8, of d to 5: apop_data_set(d, 3, 8, 5); apop_data_set(d, .row = 3, .col=8, .val=5); apop_data_set(d, .row = 3, .colname="Column 8", .val=5); apop_data_set(d, .row = 3, .colname="Column 8", 5); //invalid---the value doesn't follow the colname.

Returns:The value at the given location.
Parameters: Data : The data set. Must not be NULL. Row : The row number of the desired element. If rowname==NULL, default is zero. Col : The column number of the desired element. -1 indicates the vector. If colname==NULL, default is zero. rowname : The row name of the desired element. If NULL, use the row number. colname : The column name of the desired element. If NULL, use the column number. page : The case-insensitive name of the page on which the element is found. If NULL, use first page. val : The value to give the point. Returns:The value at the given location.

Apop_data_get double apop_data_get( const apop_data * data,
const size_t row, const int col, const char * rowname, const char * colname, const char * page ) Returns the data element at the given point.

Apop_data_ptr double* apop_data_ptr (apop_data * data, const int row,
const int col, const char * rowname, const char * colname, const char * page ) Get a pointer to an element of an apop_data set.

all of these functions use the Designated initializers syntax for inputs.
i.e. apop_text_to_db("infile.txt", "intable", 0, 1, NULL);

Forming partitioned matrices

The entire data set can be copied.
Two data matrices can be stacked one on top of other. (stack rows) Two data matrices can be stacked one to the right of other. (stack columns) Two data vectors can be stacked. For this we use apop_data_stack function

Apop_data_stack apop_data* apop_data_stack (apop_data * m1,
char posn, char inplace ) Put the first data set either on top of or to the left of the second data set. The fn returns a new data set, meaning that at the end of this function, until you apop_data_free() the original data sets, you will be taking up twice as much memory.

Parameters: m1the upper/rightmost data set (default = NULL)
m2the second data set (default = NULL) Posn : If 'r', stack rows of m1's matrix above rows of m2's. if 'c', stack columns of m1's matrix to left of m2's (default = 'r') Inplace : If 'i' 'y' or 1, use apop_matrix_realloc and apop_vector_realloc to modify m1 in place; Otherwise, allocate a new vector, leaving m1 unmolested. (default='n') Returns:The stacked data, either in a new apop_data set or m1

Shunting data Copying stuctures

TO F R O M Text file DB table Double[] Gsl_vector Gsl_matrix Apop_data
Q P V S F R O M

METHODS OF CONVERSION C- copying F-Fuction call Q-querying P-Printing
V-views S-Subelements.

C – copying

Download ppt "Unit ii Matrices and models."

Similar presentations