Presentation is loading. Please wait.

Presentation is loading. Please wait.

Douglas Jacobsen Bioinformatics Computing Consultant Genepool Modules Setting up your environment at NERSC - 1 - July 29, 2013.

Similar presentations


Presentation on theme: "Douglas Jacobsen Bioinformatics Computing Consultant Genepool Modules Setting up your environment at NERSC - 1 - July 29, 2013."— Presentation transcript:

1 Douglas Jacobsen Bioinformatics Computing Consultant Genepool Modules Setting up your environment at NERSC July 29, 2013

2 Topics 1.UNIX Environment Basics 2.Constructing a default environment, dotfiles 3.Introduction to Modules 4.Extension to Modules – ModulesReloaded 5.Using modules interactively 6.Using modules in a batch job 7.Constructing basic modules for your software 8.Constructing pipeline modules - 2 -

3 Motivation for this training Most-common tickets at NERSC are issues with environment settings /jgi/tools is being retired; old settings need to be changed! The modules system on genepool has been updated to ease the transition and future production work Examples modulefiles in: – /global/projectb/shared/data/training/modules - 3 -

4 The UNIX Environment What is it? Key/value store for every process What does the UNIX environment do for you? – controls which programs you can easily run PATH Many linux systems have default PATH of: PATH = /usr/local/bin:/usr/bin:/bin – Sets up linking paths to allow your programs to run LD_LIBRARY_PATH – Controls how your programs run MANPATH, PKG_CONFIG_PATH, PS1, OMPI_MCA_ras Really the environment is a way for you to communicate with your programs – Useful convenience variables on the command line and scripts: SCRATCH, NERSC_HOST, BOOST_ROOT - 4 -

5 The UNIX Environment: The Rules Each process has its own environment Each process can manipulate it’s own environment but no others A child process inherits its parent’s environment A “login” shell reads special “dotfiles” which may reset parts of the environment init bashls memtime blastx perl /bin/sh $data = `cat $file | sort ` catsort

6 Looking at the environment $ env # dump the whole environment $ echo $NERSC_HOST # just see NERSC_HOST $ echo $PATH # view the compound variable PATH $ env | grep MODULE # just variables with ‘MODULE’ We’ll be looking at the environment a lot today, these are two easy ways to interrogate the environment from either bash or tcsh What shell are you using? (hint, check $SHELL) - 6 -

7 Changing the environment bash (default on genepool) export MYVAR=“test”# when writing, don’t use ‘$’ echo $MYVAR # when reading, use ‘$’ export PATH=$HOME/bin:$PATH# prepend your PATH export MYVAR=“${MYVAR}2”# append ‘2’ to MYVAR tcsh setenv MYVAR “test” Echo $MYVAR setenv PATH $HOME/bin:$PATH setenv MYVAR “${MYVAR}2” - 7 -

8 NERSC Dotfiles – Your default Environment Pt 1 When you first login (or a batch script runs), a login shell is executed – A login shell is generated for every job – even if you transmit your environment, the login shell environment is overlayed on top of the transmitted environment A login shell sources special files in your home directory, your dotfiles bash users (files evaluated in this order): – $HOME/.profile(read-only symlink, do not change) – $HOME/.bash_profile.ext(user customizable) – $HOME/.bashrc(read-only symlink, do not change) – $HOME/.bashrc.ext(user customizable) tcsh users (files evaluated in this order): – $HOME/.tcshrc(read-only symlink, do not change) – $HOME/.tcshrc.ext(user customizable) – $HOME/.login(read-only symlink, do not change) – $HOME/.login.ext(user customizable) zsh, ksh execute some dotfiles, but NERSC support is being phased out /bin/sh does not properly source the dotfiles (BEWARE!) - 8 -

9 Using Software and the UNIX Environment Providing large-scale installations of software for many different users on an HPC system presents a number of challenges: – Different users need different software, use different shells – Some users need different specific versions, including older versions – All users need to access the software quickly and easily from “everywhere” [network-mounted, non-standard paths] – Providing a user interface for accessing that software can be challenging Example: How would you use software installed in /usr/common/jgi/aligners/blast+/ Answer: – Add /usr/common/jgi/aligners/blast+/2.2.28/bin to PATH; – csh: setenv PATH /usr/common/jgi/aligners/blast+/2.2.28/bin:$PATH – bash: export PATH=/usr/common/jgi/aligners/blast+/2.2.28/bin:$PATH - 9 -

10 What are Modules? A “module” is something that can be loaded or unloaded dynamically into the environment. Modules have a name Modules have a version can have many versions Modules can have a default version To refer to the default version of a module, use: e.g. module load gcc To refer to a specific version of a module, use: / e.g. module load gcc/4.8.1

11 Modules Interactive Example Basic Commands: module load [ …]Load a module module unload [ …]Remove a module module listList all loaded modules module show See module effects module availSee all modules module purgeRemove all modules Try the following: – Load the default blast+ module – Load the latest version of the hdf5 module (hint: not default) – Unload the above modules but leave the rest intact – What effects does the jgitools module have? – What versions of RSeQC are available on genepool? (try using grep) Why didn’t grep work for the last step? –module avail | grep RSeQC won’t work – module communicates with you on stderr (stdout is used internally)

12 module list Currently Loaded Modulefiles: 1) modules 7) mysql/ ) nsg/ ) PrgEnv-gnu/4.6 3) uge/ ) perl/ ) jgitools/ ) readline/6.2 5) oracle_client/ ) python/ ) gcc/ ) usg-default-modules/1.4 module load blast+ module load hdf5/ module list Currently Loaded Modulefiles: 1) modules 8) PrgEnv-gnu/4.6 2) nsg/ ) perl/ ) uge/ ) readline/6.2 4) jgitools/ ) python/ ) oracle_client/ ) usg-default-modules/1.4 6) gcc/ ) blast+/ ) mysql/ ) hdf5/ module unload blast+ hdf5 module list Currently Loaded Modulefiles: 1) modules 7) mysql/ ) nsg/ ) PrgEnv-gnu/4.6 3) uge/ ) perl/ ) jgitools/ ) readline/6.2 5) oracle_client/ ) python/ ) gcc/ ) usg-default-modules/1.4 module -t avail 2>&1 | grep RSeQC RSeQC/2.3.2 RSeQC/2.3.6(default) More awkward in tcsh, but possible: ( module –t avail ) | & grep RSeQC

13 Basic Modules Functionality Modules manipulate the environment – Loading can: Set an environment variable (possibly by replacing) Append (or prepend) to a compound environment variable Unset an environment variable *can* execute a command (not recommended if the command changes the state of the system) – ‘module unload’ reverses the effects of the ‘module load’ – Which effects of a module might be irreversible? Answer: – setenv won’t restore the environment to its original state – multiple modules calling ‘setenv’ or ‘unsetenv’ on the same variable might lead to an inconsistent state (those modules should conflict) – Executing system calls which change system state (e.g. xhost) are not trivially reversible by unloading the module

14 Modules: conflicting and swapping Some modules are incompatible – E.g. both wublast and blast+ provide different blastn, blastx, etc. executables – To prevent these modules from being simultaneously loaded, they conflict module load wublast module load blast+ blast+/2.2.26(25):ERROR:150: Module 'blast+/2.2.26' conflicts with the currently loaded module(s) 'wublast/ ’ Most of the time, only a single version of a module should be loaded at a time: – e.g., doesn’t make sense to load more than one version of gcc – Try: module purge## cleans everything out module load gcc Module load gcc/4.8.1 – Error? to change from gcc/4.6.3 (the default) to gcc/4.8.1 (the latest), swap! module swap gcc gcc/ or- module swap gcc/

15 Setting up your own modules Modules are described by modulefiles – One version per modulefile, in a directory named for the module; – Collections of modules are found in $MODULEPATH – Try looking at $MODULEPATH – Add your own modules directory: genepool$ mkdir $HOME/modules genepool$ mkdir $HOME/modules/my_first_module genepool$ module use $HOME/modules Try looking at $MODULEPATH again genepool$ module avail my_first_module – Why doesn’t it show up? No modulefiles installed yet… next slide

16 Simple modulefile (TOO SIMPLE) #%Module1.0 ## ## Required internal variables setnamegcc setversion4.6.3 setroot/usr/common/usg/languages/$name/$version\_1 ## List conflicting modules here conflict $name ## Software-specific settings exported to user environment prepend-pathPATH$root/bin prepend-pathLD_LIBRARY_PATH$root/lib prepend-pathLD_LIBRARY_PATH$root/lib64 prepend-pathPKG_CONFIG_PATH$root/lib/pkgconfig setenvGCC_DIR$root WARNING: This example is simplified, do not use in production on genepool. Refer to later ModulesReloaded examples. Module identifier string (REQ) Comment Internal variables Don’t load more than one gcc! The actual environment adjustments } } Modulefiles are written in (somewhat overloaded) TCL.

17 Common Environment Variables in Modules Modules for software packages commonly set: – PATH – LD_LIBRARY_PATH – PYTHONPATH – PERL5DIR Every usg/jgi module for software also sets an environment variable pointing to the base of the distribution: – E.g. BOOST_ROOT, PERL_DIR, PYTHON_DIR, GIT_PATH Exercise: – Load the python module first – Use ‘module info’ to investigate the effects of: graphviz RSeQC Smrtanalysis – Are there commonalities? Differences? Be VERY careful about manipulating these environment variables!!!

18 Modules have dependencies Python needs some of gcc’s libraries Perl needs some of gcc’s libraries Python also needs readline’s libraries For the python module to function, both the gcc and readline modules need to be loaded For the perl module to function, the gcc module needs to be loaded

19 Complexity of module dependencies on genepool Highly inter-connected graph of dependencies The most highly connected nodes: gcc perl python oracle-jdk openmpi Many modules are disconnected from the network, possibly because they are: Statically compiled Only rely on base- system functionality Dependencies haven’t been modelled yet

20 ModulesReloaded Automatically checks and loads dependencies Automatically unloads orphaned dependencies Differentiates between user-loaded modules and auto- loaded modules when manipulating modules Does more extensive error checking – Modules failing to load return exit status 1 (echo $?) Supports “variant” modules – Single modulefiles for multiple installations of similar software Enables reporting of upcoming changes to modules system Enhances logging capabilities of modules system

21 ModulesReloaded AutoLoad/Unload Exercise: – Start by unloading all modules. – Load the python module. – Which modules were loaded? – Next, load the perl module. – Which modules are loaded now? – Now, unload the python module – Check module list – Finally, unload the perl module. – Check module list – Look at the details of the perl and python modules

22 ModulesReloaded AutoLoad/Unload Exercise: – Start by unloading all modules. [module purge] – Load the python module. [module load python] – Which modules were loaded? [gcc, readline, python] – Next, load the perl module. [module load perl] – Which modules are loaded now? [gcc, readline, python, perl] – Now, unload the python module [module unload python] – Check module list [gcc, perl] – Finally, unload the perl module. [module unload perl] – Check module list [None!] – Look at the details of the perl and python modules. module show perl module show python

23 ModulesReloaded AutoLoad/Unload In the previous exercise, you should have noticed that the perl and python modules each depended on the gcc module (among others). – The gcc module won’t get unloaded while another loaded module still depends on it

24 ModulesReloaded User’s Choice! Exercise: – Load the default hmmer module – Load the repeatmasker module – Why did that just happen? – ModulesReloaded tracks which modules the user directly requests (vs. those just loaded as dependencies), and won’t swap or remove them automatically. – Unload hmmer, then try loading repeatmasker

25 ModulesReloaded Variants Programming Environments are integrated sets of modules – Attempt to provide a seamless and coherent build environment – regardless of compiler. Exercise: – Purge all your modules. – Load ‘PrgEnv-gnu’ – Load ‘boost’ – Examine the BOOST_ROOT environment variable – Swap to ‘PrgEnv-gnu/4.8’ – Examine the BOOST_ROOT environment variable again https://www.nersc.gov/users/computational-systems/genepool/programming/

26 ModulesReloaded Variants The ‘boost’ module is a ‘variant’ module – When loaded, it detects which programming environment (PrgEnv) is loaded – When the PrgEnv is swapped, the variant module is also reloaded – A variant module cannot be loaded without its provider (e.g. boost cannot be loaded without some PrgEnv) Earlier, we had to load python before we could interrogate RSeQC – because RSeQC is a variant on ‘python’ (instead of ‘PrgEnv’)

27 ModulesReloaded Variants “Normal” Module PrgEnv-provider Module PrgEnv-client Module Default Module Non-default Module Legend PrgEnv and Compilers Software Libraries (and Deps) Each programming environment provide the ‘PrgEnv’ attribute which is required by the libraries. The PrgEnv meta-modules conflict with each other; but the compilers do not.

28 Changing default module versions may be disruptive to some users To advertise the change a warning is communicated by modules Example: – The default version of blast+ is planned to be changed on August 6. – Load the default blast+ module – Unload the blast+ module – Load blast+/ (which is the default) module load blast+ WARNING: The default version of blast+ will be changing from to on 2013/08/06. Please try blast+/ Please contact with any questions. module unload blast+ WARNING: The default version of blast+ will be changing from to on 2013/08/06. Please try blast+/ Please contact with any questions. module load blast+/ The warning is only sent to users accessing the default without specifying a version ModulesReloaded DefaultChange

29 NERSC Dotfiles – Your default Environment Pt 2 Default modules are loaded in the.bashrc/.tcshrc files – System files load ‘uge’,’nsg’,’jgitools’ uge adds the scheduler Jgitools puts /jgi/tools/bin into your PATH –.bashrc loads ‘usg-default-modules’ – usg-default-modules autoloads: PrgEnv-gnu perl python oracle-client mysql – Are any additional modules auto- loaded as prerequisites? You can add your own ‘module load’ commands to.bashrc.ext /.tcshrc.ext – Do this with care – modules added in the default environment become somewhat infectious

30 NERSC Dotfiles – Your default Environment Pt 2 What happens if a user does the following in a their.bashrc.ext file? module load smrtanalysis export PERL5LIB=$HOME/perl export LD_LIBRARY_PATH=/house/groupdirs/randd/lib:$LD_LIBRARY_PATH – Is something wrong here? – Answer: PERL5DIR shouldn’t be replaced. This is invalidating the effects of the smrtanalysis module. Instead, use: export PERL5LIB=$HOME/perl:$PERL5LIB What about this: export PATH=/jgi/tools/bin:$PATH – Is there something wrong with this? – Answer: The jgitools module is loaded very early in the environment. The jgitools module already implements this functionality. The many things in /jgi/tools/bin may override other settings you want

31 NERSC Dotfiles – Your default Environment Pt 2 Best Practices: – Do put your settings in a “genepool”-only section of.bashrc.ext /.tcshrc.ext if [ “$NERSC_HOST” == “genepool” ]; then … fi – Limit the number of modules you load by default, it can complicate handing off batch scripts later – Do not replicate module functionality i.e. don’t set environment variables with paths into /usr/common directly Only add to variables like PATH, LD_LIBRARY_PATH, PYTHONPATH, PERL5DIR as these are commonly

32 Using Modules in your Work

33 Using Modules Interactively Use modules precisely as we have been in the exercises Modules are great for interactive use!

34 Using Modules in Batch Scripts #!/bin/bash –l #$ -l ram.c=10G #$ -l h_rt=8:00:00 set –e module purge module load PrgEnv-gnu/4.6 module load uge module load blast+/ module load python/2.7.4 #…. Run your programs here …. Ensures login environment is initialized UGE options Kill script if any commands give non-zero exit status Clear all the modules, and then reload all needed modules by version

35 Using Modules in Batch Scripts Using this approach: – Your batch script will terminate if something goes wrong (non-zero exit status) – No extraneous modules will be loaded, ensuring exactly the calculation you want to be run is run with no surprises – Using the precise version numbers means your script will work even after new defaults are installed – Purging the modules first will allow your script to work in other users’ hands without requiring anybody to change their dotfiles

36 Using Modules in Production Pipelines Consider creating a pipeline module – e.g. jigsaw/5.1 – The pipeline module could be a pure ‘meta-module’ or point to it’s own relevant scripts (and still be a meta- module) – A meta-module purely loads other modulefiles E.g., PrgEnv-gnu – A full-featured modulefile could: Load other modulefiles Add entries to PATH, PERL5LIB, other parts of the environment

37 Writing a meta-modulefile A pure meta-module #%Module1.0 ## ## Required internal variables setnameMyPipeline setversion1.0 ## List conflicting modules here set mod_conflict [list $name] ## List prerequisite modules here set mod_prereq_autoload [list blast+/ mothur/ qiime/1.7.0] set mod_prereq [list blast+/ mothur/ qiime/1.7.0] ## Source the common modules code-base source /usr/common/usg/Modules/include/usgModInclude.tcl ## Software-specific settings exported to user environment setenvMYPIPELINE_VER$version mod_conflict replaces the conflict keyword to trap and exit with status 1 mod_prereq_autoload is the list of modules to autoload mod_prereq is the list of modules to enforce are loaded first. This sets up the automatic load/swap protections. usgModInclude.tcl is the ModulesReloaded include code. This should be included before any environment manipulations.

38 Writing a meta-modulefile A full featured pipeline-module #%Module1.0 ## ## Required internal variables setnameMyPipeline setversion1.0 setroot{/path/to/my/group/stuff/$name/$version} ## List conflicting modules here set mod_conflict [list $name] ## List prerequisite modules here set mod_prereq_autoload [list blast+/ mothur/ qiime/1.7.0] set mod_prereq [list blast+/ mothur/ qiime/1.7.0] ## Source the common modules code-base source /usr/common/usg/Modules/include/usgModInclude.tcl ## Software-specific settings exported to user environment setenvMYPIPELINE_VER$version setenvMYPIPELINE_ROOT$root prepend-pathPATH$root/bin root should evaluate to the filesystem path for your pipeline. The braces instruct TCL to not evaluate it immediately. The include code will do the evaluation and perform additional error checking. Position all your environment manipulations after the include file. Do set an environment variable for the version and root of your pipeline.

39 Using Pipeline Modules in Batch Scripts #!/bin/bash –l #$ -l ram.c=10G #$ -l h_rt=8:00:00 set –e module purge module load PrgEnv-gnu/4.6 module load python/2.7.4 module use /path/to/my/groups/modulefiles module load MyPipeline/1.0 #…. Run your programs here …. Ensures login environment is initialized UGE options Kill script if any commands give non-zero exit status Clear all the modules, load any needed variant- provider modules Add your modulefiles to MODULEPATH (module use) Load your pipeline module

40 Conclusion and Best Practices

41 Best Practices - Dotfiles If you make changes to compound environment variables, make sure to only add to them – PATH, LD_LIBRARY_PATH, PERL5DIR, PYTHONPATH (many more) Do not replace modules functionality in your dotfiles: – Don’t add /jgi/tools/bin to PATH – Don’t add any absolute paths in /usr/common to your environment Limit the number of default modules – Large numbers of default modules complicates giving scripts to others (they need to change their default environment to run your script) – Instead setup convenience meta-modules or pipeline modules and load them as-needed

42 Best Practices - Modules Avoid embedding absolute paths in your scripts – Instead use the environment variables set in your modules – This reduces maintenance work on your script and centralizes the work to a single place – the modulefile In production scripts, purge the modules and load them by- version – This ensures the script runs reproducibly Unloading modules and re-loading is sometimes more reliable than swapping – ModulesReloaded, for example, can’t unload orphaned dependencies when swapping: module swap PrgEnv-gnu PrgEnv-intel module swap PrgEnv-intel PrgEnv-gnu – The above will leave the intel module loaded due to a bug in the underlying modules system (will investigate and fix in the future)

43 Best Practices - General Logout (and back in again) – Seriously, environments do not age like a fine wine – With consistent use of modules, however, they should be more stable

44 More Information The NERSC website has a great deal of information about this: – Genepool User Environment: environment/ environment/ – Running CGI Scripts with Modules: https://www.nersc.gov/users/computational-systems/genepool/user- environment/scriptenv-loading-modules-before-starting-a-script/ https://www.nersc.gov/users/computational-systems/genepool/user- environment/scriptenv-loading-modules-before-starting-a-script/ – Using modules within Python: https://www.nersc.gov/users/computational-systems/genepool/user- environment/working-with-modules-within-perl-and-python/ https://www.nersc.gov/users/computational-systems/genepool/user- environment/working-with-modules-within-perl-and-python/ – ModulesReloaded Coming soon…

45 EOF

46 National Energy Research Scientific Computing Center


Download ppt "Douglas Jacobsen Bioinformatics Computing Consultant Genepool Modules Setting up your environment at NERSC - 1 - July 29, 2013."

Similar presentations


Ads by Google