A starting point for: Using simulation in parallel computing for faster sample size calculations in complex random effects models Toni Price, University.

a starting point for: Using simulation in parallel computing for faster sample size calculations in complex random effects models Toni Price, University of Bristol

MLPowSim Developed in a separate ESRC-funded project Generates both MLwiN macro code and R language code for performing sample size calculations on multilevel models Works for a selection of multilevel nested and crossed designs Text-based interface Uses C code to gather user input and generate output

Initial objective: Use MLPowSim as a basis and extend to support a broader range of models –Good starting point, but would benefit from an automated way of testing that generated code matches expected output (especially as new and more complex models are added)

First step Put into a cohesive framework: Streamline duplicated code (e.g. for user input which is similar across different models) –Also improves code maintenance (e.g. bug fixes impacting fewer lines of code) Improve input validation –Makes for a better user experience and reduces crashes Automate testing of generated code and results Add multiple user interfaces, e.g. command line / file input / web-based

Ruby is … Much like Python in a number of ways Cross-platform A good choice for metaprogramming Excellent for text processing … though in the end boils down to personal preference

… moving to Ruby In the words of the official Ruby site (http://www.ruby-lang.org/en/) Ruby ishttp://www.ruby-lang.org/en/ A dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write. (… I agree!)

Input methods Command line –Current input method File input –Useful during development –Facilitates automated testing Web interface –Familiar mode of input –Easy to use

File input – Example for a 1-level model # Input params # # Example 1 (p. 8 in MLPowSim user manual) # MLwiN code output general: output_lang: mlwin rnd_num_seed: 1 sig_level: 0.025 n_sims: 1000 model: n_levels: 1 response_type: normal est_method: igls include_fixed_intercept: yes n_explanatory_vars: 0 estimates: beta_0: -0.140 sigma_sq_e: 1.051 sample_size: level_1: low: 20 hi: 600 step: 20

File input – Example for a 2-level model # Input params # # Example 8 (p. 39 in MLPowSim user manual) # MLwiN code output general: output_lang: mlwin rnd_num_seed: 1 sig_level: 0.025 n_sims: 1000 model: n_levels: 2 is_balanced: yes structure: nested #=> nested | cross-classified response_type: normal est_method: igls include_fixed_intercept: yes include_random_intercept: yes n_explanatory_vars: 0 estimates: beta_0: -0.177 sigma_sq_u: 0.151 sigma_sq_e: 0.916 sample_size: level_2: low: 10 hi: 50 step: 10 level_1: low: 10 hi: 60 step: 10

Advantages of adding a Web interface More accessible –No download required –Indexed by search engines –Cross-platform (Windows/Mac/Linux) Up-to-date version available as soon as deployed –Centralised bug fixes –New features No distribution overhead Opportunity to collect usage information –E.g. model parameters … aligned with e-Stat objectives

Disadvantages of Web interface Constrained by browser functionality Need to be online to use it Needs hosting resources … fine for code-generation app as it stands, but would be too resource-intensive to run simulations and model-fitting on server

[Demo of command-line and Web- based interfaces for MLPowSim]

Improving speed Another, parallel (so to speak ) objective is using parallelization to speed up run-time for generated power calculation code Have taken an initial look at using capabilities of multi- core processors by executing more than one run simultaneously Exploratory code makes use of Unix (Linux) forking to create sub-processes This approach will not work on Windows (since Windows does not support forks) –Precludes possibility of using this approach for MLwiN

For now, doing tests on R code in Linux Initial results (very rough, just a starting point): Model: 1-Level, Normal response, Fixed intercept, No explanatory variables R code with sample sizes from 400 to 600 in steps of 100 (i.e. 400, 500, 600) Improving speed … contd.

Run Number Sequential – time elapsed Up to 2 processes – time elapsed 0117.6866750712.83575702 0217.5854368213.25761509 0317.8137650512.79697299 0417.6101121912.83477187 0518.7816619912.75112796 0617.8356270813.84314704 0721.4254090813.58644199 0817.8140871513.47865105 0922.3243789714.23557377 1022.4737401013.32088089 1122.3798799514.26538086 Run Number Sequential – time elapsed Up to 2 processes – time elapsed 1220.4330320413.26997709 1317.9780461813.83806705 1419.5405221012.82744908 1517.6132490613.11621904 1617.6031761213.17418599 1717.7417841014.06648993 1818.0162870912.95062184 1917.8514990813.40029383 2018.0229938012.85679889 2117.9226839512.86912799

Improving speed … contd. Summary Number of runs:20 (excluding 1st run) Sequential time (ave):18.94 secs Forked time (ave):13.34 secs Percentage reduction:29.58 %

Where to from here? … this is just a small start … Extend MLPowSim to support more models –Add test cases for code generation to cope with more models Add automated tests for verifying actual numerical output Further develop Web interface Continue investigating speed improvements through parallelization

A starting point for: Using simulation in parallel computing for faster sample size calculations in complex random effects models Toni Price, University.

Similar presentations

Presentation on theme: "A starting point for: Using simulation in parallel computing for faster sample size calculations in complex random effects models Toni Price, University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A starting point for: Using simulation in parallel computing for faster sample size calculations in complex random effects models Toni Price, University.

Similar presentations

Presentation on theme: "A starting point for: Using simulation in parallel computing for faster sample size calculations in complex random effects models Toni Price, University."— Presentation transcript:

Similar presentations

About project

Feedback