Presentation on theme: "Atlas Status Update Chris Fuson. 2 Atlas Update - Timeline March 06, 2014 –Installed patches that targeted memory contention on the meta data server to."— Presentation transcript:
Atlas Status Update Chris Fuson
2 Atlas Update - Timeline March 06, 2014 –Installed patches that targeted memory contention on the meta data server to address server side performance problems February 26, 2014 –Installed patch to reduce impact of close operations to address server side meta data performance problems February 10, 2014 –Titan’s Lustre client rolled back to to address client side performance problems January 28, 2014 –Titan’s Lustre client upgraded to 2.4. Un-mounted Widow. January 10, 2014 –As the user load from this transition increased, we began to see problems with both the Lustre server and client (compute node) performance January 07, 2014 –Widow[1-3] became read-only December 05, 2013 – Atlas was mounted on all OLCF systems, announced, and opened for use
3 Atlas Update - Current Following the March 06, 2014 change to reduce memory contention on the metadata server, we continue to see qualified improvements in the interaction with Atlas. Improvements have been substantial for several applications that were negatively affected before. We encourage users to continue testing their application performance in light of these changes and report their results. We will continue to pursue the remaining issues, and will intentionally address them outside of the production environment as to minimize further interruption to the Atlas file systems. Your feedback is incredibly valuable. Please continue to report problems related to the file system, including any specific timings for I/O operations, to
4 Atlas Update – Stripe Count Warning Warning: Stripe Counts Greater than 160 Not Currently Supported Warning: “-1” should NOT be used while setting up striping patterns The 1.8 Lustre clients running on Titan do not support stripe counts greater than 160. Interaction from Titan (including ‘lfs getstripe’) with files that have a stripe greater than 160 is problematic. If ‘lfs setstripe’ was used to set the stripe of a directory or file and the stripe count was set to a value greater than 160 or ‘-1′, you should reduce the stripe value. titan-ext3 1004> lfs setstripe -c -1 test.file titan-ext3 1005> lfs getstripe test.file | grep stripe_count lmm_stripe_count: 1008 *** glibc detected *** lfs: munmap_chunk(): invalid pointer: 0x fed0 *** titan-ext3 1004> lfs setstripe -c -1 test.file titan-ext3 1005> lfs getstripe test.file | grep stripe_count lmm_stripe_count: 1008 *** glibc detected *** lfs: munmap_chunk(): invalid pointer: 0x fed0 *** Please note the stripe count is only an issue on Titan; the count is not an issue on Eos, Rhea, or the Data Transfer Nodes due to the more recent Lustre client version in use on those systems.
5 Atlas Update – Reduce Stripe Count Create new directory with reduced striping Copy data into new directory – cp for small data amounts – dcp from the Data Transfer Nodes for larger amounts of data dtn04 115> mkdir NewDir dtn04 116> lfs setstripe -c 128 NewDir dtn04 117> cp test.file NewDir/. dtn04 118> lfs getstripe NewDir/test.file | grep stripe_count lmm_stripe_count: 128 dtn04 119>