Presentation is loading. Please wait.

Presentation is loading. Please wait.

13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet

Similar presentations


Presentation on theme: "13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet"— Presentation transcript:

1 13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet
Présentation pour le pôle Prévoyance de la Caisse de Dépôt et de Gestion Rabat, Maroc. LIAM2 Introduction and demo model 13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet

2 Introduction to Liam2 Tool for the development of dynamic microsimulation models with dynamic cross-sectional ageing. ≠ a microsimulation model (<> Midas) Simulation framework that allows for comprehensive modelling and various simulation techniques Prospective / Retrospective simulation Work in progress … Immigration Weights More sophisticated regressions and simulation techniques Speed optimisation You get it for free!

3 How to get it. Check http://liam2.plan.be This website contains
The LIAM 2 executable. A synthetic dataset of 20,200 individuals grouped in 14,700 households in HDF5 format. A small model containing Fertility and mortality (aligned) Educational attainment level Some labour market characteristics Documentation A LIAM 2 user guide A ready-to-use “bundle” of notepad integrated with LIAM 2 and the synthetic dataset.

4 Overview Written in Python Input Output
High level open source language Efficient libraries mostly C Input Model description: text file (YAML) Alignment: CSV files Internal data engine: HDF5 file format and library for storing scientific data (meteorology, astronomy, …) Output HDF5 file, CSV file on demand Interactive console

5 Model definition: the simulation file
Declare entities (=data) What is modelled? (person, household, enterprise, …) Entity characteristics fields: what do we know about an individual? what do we want to know? How can we store the data? Flag: boolean (eg. alive/dead, male/female) discrete/category: integer (eg. single/married/divorced/…) Continue/value: float (eg. Income) links: interaction between entities same kind: who is the mother? different kinds: in what household does the person live? Globals (=data) External time series Eg. macroeconomic context

6 Model definition: the simulation file
Simulation (=model) Processes: What happens to the entities in their lives? In what order? input: Which input file to use? output: Where is the output? start period: periods: How many periods do we want to simulate?

7 Simple simulation file
entities: person: fields: # period and id are implicit - age: int - gender: bool   processes: age: age + 1 isfemale: gender = True simulation: - person: [age, isfemale] input: file: base.h5 output: file: output.h5 start_period: 2002 # first simulated period periods: 20

8 Liam2 bundled with Notepad++
model/YAML Interactive console

9 Liam2 bundled with Notepad++
Simulation file (YAML-format, yml extension, highlighting) indentation (grouping, levels) colon, dash, brackets, double quotes, quotes, ... comment (#) Console run: F6 import: F5 output interactive (history)

10 LIAM 2 – demo model First simulation simple entity simple functions
first run some output

11 basic simulation setup
demo01.yml

12 Basic setup Description of the data : entities
fields: name type = bool (boolean), int (integer) or float initialdata (data from input or new data) The model definition: processes model definition (transformation, regressions, alignment, ...) Order of the processes: simulation database (input, output) what processes and when? (model order) start_period, # periods

13 Basic simulation file entities: person: fields:
# period and id are implicit - age: int - gender: bool   processes: age: age + 1 simulation: - person: [age] input: file: base.h5 output: file: output.h5 start_period: 2002 # first simulated period periods: 20

14 Simple simulation (to run the file, press F6)
entities: person: fields: # period and id are implicit - age: int - gender: bool   # fields not present in input - agegroup: {type: int, initialdata: false} processes: age: age + 1 agegroup: 5 * trunc(age / 5) simulation: - person: [age, agegroup]  input: file: simple2001.h5 output: file: simulation.h5   start_period: 2002 periods: 2

15 Console output Using simulation file: 'C:\usr\Liam2Suite\Synthetic\demo00.yml' reading data from C:\usr\Liam2Suite\Synthetic\simple2001.h5 ... person ... period 2002 - loading input data * person ... done (0 ms elapsed). -> individuals - 1/2 age ... done (2 ms elapsed). - 2/2 agegroup ... done (3 ms elapsed). - storing period data * person ... done (2 ms elapsed). period 2002 done (0.01 second elapsed). period 2003 top 10 processes: - agegroup: 0.01 second - age: 3 ms total for top 10 processes: 0.01 second

16 Output Internal format = HDF5 file Write to the console
show(expr1[, expr2, … ]): evaluates the expressions and shows the result dump(expr1[, expr2, …, filter, missing, header): produces a table with the expressions given as argument evaluated over many (possibly all) individuals of the dataset. Write to CSV-files csv(expr1[, expr2, …, suffix, fname, mode]): function writes values to a csv-file Pivot tables: groupby(expr1[, expr2, …, filter=None, percent=False])

17 Some functions Expressions Mathematical functions Aggregate functions
Arithmetic operators: +, -, *, /, ** (exponent), % (modulo) Comparison operators: <, <=, ==, !=, >=, > Boolean operators: and, or, not Conditional expressions: if(condition, expression_if_true, expression_if_false) Mathematical functions abs, log, exp, round, trunc, ... Aggregate functions grpcount, grpsum, grpavg, grpstd, grpmax, grpmin Temporal functions lag, value_for_period, duration, tavg, tsum Random functions Uniform, normal, randint

18 Simple simulation (to run the file, press F6)
entities: person: fields: # period and id are implicit - age: int - gender: bool # fields not present in input - agegroup: {type: int, initialdata: false} processes: age: age + 1 agegroup: if(age < 50, 5 * trunc(age / 5), 10 * trunc(age / 10)) # produces 2 csv files (one per period): "person_20xx.csv“ # default name for csv-file = {entity}_{period}.csv dump_info: csv(dump(id, age, gender)) show_demography: show(groupby(agegroup, gender))

19 … simulation: processes: - person: [age, agegroup,
dump_info, show_demography] input: file: simple2001.h5 output: file: simulation.h5 # first simulated period start_period: 2002 periods: 2

20 Interactive console Welcome to LIAM interactive console.
help: print this help q[uit] or exit: quit the program entity [name]: set the current entity (this is required before any query) period [period]: set the current period (if not set, uses the last period simulated) fields [entity]: list the fields of that entity (or the current entity) show is implicit on all commands >>> period 2002 current period set to 2002 >>> entity person current entity set to person >>> grpcount(gender) 10100 >>> grpcount(not gender)

21 Remarks All output functions can be used both during the simulation and in the interactive console Some examples - show show(groupby(age, gender, filter=age<=10)) show(grpcount(age >= 18)) show(grpcount(not dead), grpavg(age, filter=not dead)) show("Count:", grpcount(), "\nAverage age:", grpavg(age), "\nAge std dev:", grpstd(age)) Some examples – csv csv(grpavg(age)) csv(period, grpavg(age), fname=‘avg_income.csv’, mode=‘a’) Some examples – groupby groupby(trunc(age/10),gender) groupby(trunc(age/10),gender, percent=True)

22 links, init, procedures, choice demo02.yml

23 Links: model interaction
second entity (eg household) links: interaction between entities (eg. persons, households) one2many (one household has many persons) person: fields: # period and id are implicit - age: int - gender: bool ... - hh_id: int household: fields: # period and id are implicit - nb_persons: int - nb_children: int links: persons: {type: one2many, target: person, field: hh_id}

24 Use the links: aggregate functions
entities: household: fields: # period and id are implicit - nb_persons: {type: int, initialdata: false} - nb_children: {type: int, initialdata: false} links: persons: {type: one2many, target: person, field: hh_id} processes: household_composition: - nb_persons: countlink(persons) - nb_children: countlink(persons, age < 18) To use information stored in the linked entities you have to use aggregate functions countlink (eg. countlink(persons) gives the numbers of persons in the household) sumlink (eg. sumlink(persons, income) sums up all incomes from the members in a household) avglink (eg. avglink(persons, age) gives the average age of the members in a household) minlink, maxlink (eg. minlink(persons, age) gives the age of the youngest member of the household)

25 many2one and the “.”-function
entities: person: fields: - age: int - gender: bool # link fields - hh_id: int links: household: {type: many2one, target: household, field: hh_id} many2one : link the item of the entity to one other item in the same (eg. a person to its mother) or another entity (eg. a person to its household). To access a the value field of a linked item, you use: link_name.field_name processes: # produces "person_20xx_info.csv" dump_info: csv(dump(id, age, gender, household.nb_persons), suffix='info') id age gender hh_id household.nb_persons 1 TRUE 5 2 FALSE 3 4 6 7 8 9 10 12

26 many2one and the “.”-function
person: fields: # period and id are implicit - age: int - gender: bool # link fields - mother_id: int - partner_id: int - hh_id: int links: mother: {type: many2one, target: person, field: mother_id} partner: {type: many2one, target: person, field: partner_id} household: {type: many2one, target: household, field: hh_id} children: {type: one2many, target: person, field: mother_id} Some examples: mother.age mother.mother.age age - partner.age

27 Simulation: init - processes
- household: [init_region_id, household_composition] processes: - household: [household_composition] - person: [ageing, dump_info] input: file: simple2001.h5 output: file: simulation.h5 # first simulated period start_period: 2002 periods: 2 init: executes the processes in start_period - 1 (here 2001) to initialise the household variables processes: executes in 2002, 2003

28 Simulation: procedures – local variables
processes: ageing: - age: age + 1 - juniors: 5 * trunc(age / 5) - plus50: 10 * trunc(age / 10) - agegroup: if(age < 50, juniors, plus50) dump_info: csv(dump(id, age, gender, hh_id, household.nb_persons, mother.age, partner.age), suffix='info') show_demography: show(groupby(agegroup, gender)) procedures single process (ex. dump_info) multi process (ex. ageing) local variables temporary: only available in the ageing procedure not stored (ex. juniors, plus50 in the ageing procedure)

29 Stochastic changes I: probabilistic simulation
entities: household: fields: # period and id are implicit - nb_persons: {type: int, initialdata: false} - nb_children: {type: int, initialdata: false} - region_id: {type: int, initialdata: false} links: persons: {type: one2many, target: person, field: hh_id} processes: init_region_id: - region_id: choice([0, 1, 2, 3], [0.1, 0.2, 0.3, 0.4]) choice region_id: 10% chance to get 0, 20% for 1, 30% for 2 and 40% for 3 beware: sum of prob. = 100%

30 regressions, macros, new, remove
demo03.yml

31 Stochastic changes II: behavioural equations
Logit: logit_regr(expr, filter=None, align=percentage) logit_regr(expr, filter=None, align='filename.csv') Alignment : align(expr, [take=take_filter,] [leave=leave_filter,] fname=’filename.csv’) Continuous (expr + normal(0, 1) * mult + error_var): cont_regr(expr, filter, mult, error_var) Clipped continuous (always positive): clip_regr(expr, filter, mult, error_var) Log continuous (exponential of continuous): log_regr(expr, filter, mult, error_var)

32 logit + align example logit_regr(expr, filter, align) Expr
processes: ageing: - age: age + 1 birth: - to_give_birth: logit_regr(0.0, filter=not gender and (age >= 15) and (age <= 50), align='al_p_birth.csv') logit_regr(expr, filter, align) Expr filter: select individuals from entity apply alignment using al_p_birth.csv age period 2002 2003 2004 2005 2006 2007 15 16 17 18 19 20 21 22

33 macros: easier to read, maintain
processes: ageing: - age: age + 1 birth: - to_give_birth: logit_regr(0.0, filter=not gender and (age >= 15) and (age <= 50), align='al_p_birth.csv') person: fields: - age: int . . .  macros: MALE: True FEMALE: False ISMALE: gender ISFEMALE: not gender processes: ageing: - age: age + 1 birth: - to_give_birth: logit_regr(0.0, filter=ISFEMALE and (age >= 15) and (age <= 50), align='al_p_birth.csv') macros defined on entity level re-evaluated on each execution

34 Life cycle functions – new – create new entities
birth: - to_give_birth: logit_regr(0.0, filter=ISFEMALE and (age >= 15) and (age <= 50), align='al_p_birth.csv') - new('person', filter=to_give_birth, mother_id = id, hh_id = hh_id, age = 0, partner_id = UNSET, civilstate = SINGLE, gender = choice([MALE, FEMALE], [0.51, 0.49]) ) new entity name: what (same or other eg. household on marriage) filter: who set initial values to a selection of variables

35 Life cycle functions – remove – remove entities
death: - dead: if(ISMALE, logit_regr(0.0, align='al_p_dead_m.csv'), logit_regr(0.0, align='al_p_dead_f.csv')) - civilstate: if(partner.dead, WIDOW, civilstate) - partner_id: if(partner.dead, UNSET, partner_id) - show('Avg age of dead men', grpavg(age, filter=dead and ISMALE)) - show('Avg age of dead women', grpavg(age, filter=dead and ISFEMALE)) - show('Widows', grpsum(ISWIDOW)) - remove(dead) remove filter: who has to removed? Item is removed form the entity set No data is available for that period and later Historical data is still accessible Links must be cleaned manually if necessary

36 Remove empty households
entities: household: fields: - nb_persons: {type: int, initialdata: false} links: persons: {type: one2many, target: person, field: hh_id} processes: household_composition: - nb_persons: countlink(persons) - nb_children: countlink(persons, age < 18) clean_empty: remove(nb_persons == 0) . . . simulation:  - person: [list of processes] - household: [household_composition, clean_empty]

37 Debugging possibilities
show and dump functions skip_shows: if set to True, annuls all show() functions interactive console period entity output: aggregate, groupby functions breakpoint breakpoint () breakpoint(2021) step (or s) resume (or r) random_seed fix random seed: if you want to have several runs of a simulation use the same random numbers.

38 matching, change links demo04.yml

39 Matching - aka Marriage market
matches individuals from subset 1 with individuals from subset 2 Give each individual in subset 1 a particular order (orderby) Compute the score of all (unmatched) individuals in subset 2 take the best score matching( set1filter=boolean_expr, set2filter=boolean_expr, orderby=difficult_match, score=coef1 * field1 + coef2 * other.field )

40 Marriage marriage: - in_couple: ISMARRIED
- to_couple: if((age >= 18) and (age <= 90) and not in_couple, if(ISMALE, logit_regr(0.0, align='al_p_mmkt_m.csv'), logit_regr(0.0, align='al_p_mmkt_f.csv')), False) - difficult_match: if(to_couple and ISFEMALE, abs(age - grpavg(age, filter=to_couple and ISMALE)), nan) - partner_id: if(to_couple, matching(set1filter=ISFEMALE, set2filter=ISMALE, score= * other.age * other.age ** 2 ... orderby=difficult_match), partner_id) - justcoupled: to_couple and (partner_id != UNSET)  - civilstate: if(justcoupled, MARRIED, civilstate)

41 New links, change links new link change the value of the linked field
marriage: - in_couple: ISMARRIED ...  - civilstate: if(justcoupled, MARRIED, civilstate) - newhousehold: new('household', filter=justcoupled and ISFEMALE, region_id=choice([0, 1, 2, 3], [0.1, 0.2, 0.3, 0.4])) - hh_id: if(justcoupled, if(ISMALE, partner.newhousehold, newhousehold), hh_id) - csv(dump(id, age, gender, partner.id, partner.age, partner.gender, hh_id, filter=justcoupled), suffix='new_couples') new link change the value of the linked field

42 break links, lag demo05.yml

43 Remove links divorce: - agediff: if(ISFEMALE and ISMARRIED, age - partner.age, 0) # select females to divorce - divorce: logit_regr( * household.nb_children * dur_in_couple * agediff * agediff ** , filter = ISFEMALE and ISMARRIED and (dur_in_couple > 0), align = 'al_p_divorce.csv') # break link to partner - to_divorce: divorce or partner.divorce - partner_id: if(to_divorce, UNSET, partner_id) - civilstate: if(to_divorce, DIVORCED, civilstate) - dur_in_couple: if(to_divorce, 0, dur_in_couple) # move out males - hh_id: if(ISMALE and to_divorce, new('household', region_id=household.region_id), hh_id)

44 globals, regr + align demo06.yml

45 1. Graduate people ineducation: # unemployed if graduated
- workstate: if(ISSTUDENT and (((age >= 16) and IS_LOWER_SECONDARY_EDU) or ((age >= 19) and IS_UPPER_SECONDARY_EDU) or ((age >= 24) and IS_TERTIARY_EDU)), UNEMPLOYED, workstate) - show('num students', grpsum(ISSTUDENT))

46 2. Retire people globals globals: periodic: - WEMRA: float # retire
- workstate: if(ISMALE, if((age >= 65), RETIRED, workstate), if((age >= WEMRA), RETIRED, workstate)) globals variables that do not relate to any particular entity periodic globals can have a different value for each period

47 3. Pick people … to work in 2002 inwork: - work_score: UNSET # men
- work_score: if(ISMALE and (age > 15) and (age < 65) and ISINWORK, logit_score( * age * age ** * age **3 * ISMARRIED ), work_score) - work_score: if(ISMALE and (age > 15) and (age < 50) and ISUNEMPLOYED, logit_score( * age * age ** * age **3 ), work_score) # women # align on Number of Workers / Population by age class - work: if((age > 15) and (age < 65), if(ISMALE, align(work_score, leave=ISSTUDENT or ISRETIRED, fname='al_p_inwork_m.csv'), align(work_score, leave=ISSTUDENT or ISRETIRED, fname='al_p_inwork_f.csv')), False) - workstate: if(work, INWORK, workstate) - workstate: if(not work and lag(ISINWORK), -1, workstate)

48 4. Pick people … to be unemployed in 2002 + 5. Remain …
unemp_process: - unemp_score: -1 - unemp_condition: (age > 15) and (age < 65) and not ISINWORK # Probability of being unemployed from being unemployed previously - unemp_score: if(unemp_condition and lag(ISUNEMPLOYED), logit_score( * age * age ** ), unemp_score) # Probability of being unemployed from being inwork previously - unemp_score: if(unemp_condition_m and lag(ISINWORK), logit_score( * age * age ** ), unemp_score) # Alignment of unemployment based on those not selected by inwork # [Number of new unemployed / (Population - Number of Workers)] by age # The here below condition must correspond to the here above denumerator - unemp: if((age > 15) and (age < 65) and not ISINWORK, align(unemp_score, leave=ISSTUDENT or ISRETIRED, fname='al_p_unemployed.csv'), False) - workstate: if(unemp, UNEMPLOYED, workstate) - workstate: if((workstate==-1) and not unemp, OTHERINACTIVE, workstate)

49 import data demo_import.yml

50 Import data (to run the file, press F5)
# this is an "import" file. To use it press F5 in liam2 environment, or run # the following command in a console: # INSTALL_PATH\liam2 import demo_import.yml output: simple2001.h5 entities: person: path: input\person.csv fields: # period and id are implicit - age: int - gender: bool - ... household: path: input\household.csv # if fields are not specified, they are all imported    

51 Optional globals: periodic: path: input\globals_transposed.csv
transposed: true entities:  person: path: input\person.csv fields: - age: int - gender: bool  # if you want to keep your csv files intact but you use different names # in your simulation that in the csv files, you can specify name changes # here. The format is: "newname: oldname" oldnames: gender: male # if you want to invert the value of some boolean fields (True -> False # and False -> True), add them to the "invert" list below. invert: [gender]


Download ppt "13/05/2013 Gijs Dekkers Gaëtan de Menten Raphaël Desmet"

Similar presentations


Ads by Google