Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud computing and federated storage Doug Benjamin Duke University.

Similar presentations


Presentation on theme: "Cloud computing and federated storage Doug Benjamin Duke University."— Presentation transcript:

1 Cloud computing and federated storage Doug Benjamin Duke University

2 Currency of the Realm How does a physics experiment determine it is successful? o Referred papers published and cited by peers (impact factor) o Students trained Success is not measured in bytes delivered or events simulated High Energy physics computing by and large is a means to an end and not the end it self. Very much like building and operating a detector subsystem Both Cloud computing and federated storage when done properly help to enhance the physics output.

3 Wouter Verkerke, NIKHEF 3 Physics analysis workflow – Iterations required NB: Typical life span of analysis = 6-9 months How often are analysis steps typically iterated and why? –Simulation sample production: 1-3 times Samples can be increased in size, New generators become available, New version of simulation/reconstruction software required by collaboration etc… –Data sample production: 1-2 times New data becomes available over time. Changes in good-run-list. Data is reprocessed with new version of reconstruction –Data analysis chain 1 ‘ntuple making’: 3-4 times New input samples available Changes/improvement to overlap removal algorithms (which must be executed before preselection) –Data analysis chain 2 ‘ntuple analysis’: 20-100 times Core of the physics analysis: need to test out many ideas on event selection algorithms, derived quantity construction –Statistical analysis of data: O(100-1000) times Every time ‘chain 2’ is newly executed plus more times to test new ideas on fitting data modeling etc…

4 Why Cloud computing? Cloud’s provide a mechanism to give the user a “private cluster” on shared resources To the user – he/she sees a cluster at their disposal To the organization – users running on shared resources and hopefully level out demand o Not always true – peaks ahead of conferences Don’t the resources already exist? o Yes but …. o In the US stimulus funded clusters (more than 40 institutions got funding in 2009-2010) will be 5 years old in 2015 o Need a solution for the future.

5 Some issues for Cloud analysis cluster User work flow is such that they spin over their ntuples and derivatives many times. Need persistent storage that can be used for intermediate output (think Drop Box) Intermediate output will be input to further processing Need to be able to share files amongst members of the same analysis group – could be as small as two people or much larger Has to be as fast and as reliable as what people have now

6 EC2 Testing/configuration March 2012 - tested small xrootd storage cluster on EC2 1 redirector and 3 data servers Used spot instances and normal instances with both ephemeral storage or EBS storage to setup xrootd data servers Measure rates of copying into EC2 appliance Work done in Eastern Zone (Virginia) Input data – US ATLAS federated system Work done over month (remember this number)

7 Write rates to EC2 ephemeral storage Ephemeral storage (two partitions), join the EC2 storage with LVM, OS/xrootd sees - one ext4 partition Avg 16 MB/s (one data server) 3 data servers used Total Avg ~ 45 MB/s write

8 EBS write rates Extreme copy 12 MB/s Many sites transfer Standard copy - Site to site transfer Notice the peak in the first bin < 0.5 MB/s Single lvm ext4 partition Can use xrootd federation to load files into Amazon EC2 storage More work needs to be done to understand the lock up with frm_xfrd daemon Write rates – 16 MB/s Avg per storage node with two ephemeral drives Need to further study the reliability of Xrootd extreme copy mode Worth the effort due to performance gains and potential reliability gains (multi site sources)

9 Conclusions from EC2 work EC2 gave reasonable write performance – though variable At time xrdcp in extreme mode not functional due to bug – subsequently fixed Costs were very surprising ~ $600 for several 100’s GB of storage on the servers, instance running time etc. Not surprising – o EC2 Rates (today) $0.10 GB/month normal EBS, $0.125 GB/month IO enhanced EBS, EBS snapshot – $0.125 GB/month o 1 TB -> $100/month to $125/month o Local group disk at over 10 sites in US > 50 TB/ site, most > 100 TB

10 Future grid https://portal.futuregrid.org/ Analysis cluster project -

11 Analysis cluster project https://portal.futuregrid.org/projects/257

12 Future grid setup 3 VM’s (2 cores each) Centos 5.7 Xrootd 3.2.4.1 Root 5.34.1 compiled locally 10 GB disk each machine (total space) 1 xrootd redirector 1 xrootd data server 1 xrootd client machine All at University of Florida – nimbus cloud site

13 Data transfer in cloud Xrdcp command run on data server reference redirector machine and Redirector at BNL Avg – 11.2 MB/s RMS – 2.1 MB/s

14 Data processing rate Test program – o ATLAS D3PD analysis – cut flow job – Standard Module D3PD from 2011. o TTreeCache w/ learning phase o Activate only Branches that one uses in Analysis o Root 5.34.1 Processing Rate on desktop at ANL with data over LAN - ~ 1200 Hz

15 Future activities Continue work – using OpenStack instance on Future grid – measure performance Extend activity to OpenStack instance at BNL Setup OpenStack cloud at ANL Tier 3 serving data over LAN. o Measure processing performance for analysis in VM o Analysis on physical hardware at hypervisor layer o Gives apples to apples comparison Using data from Federation – repeat measurements at ANL In October – use virtualized resources at Midwest Tier 2

16 Some Issues Cloud middleware an issue – many different varieties, EC2, Openstack, Nimbus, Cloudstack, Google compute engine… Wild Wild west.. Contextualization a challenge – need to understand the life cycle management of the VM’s VM startup time will affect user’s throughput Most of the effort in my experiment (ATLAS) focused on MC production in the cloud. Need to strengthen the focus on chaotic analysis

17 Conclusion Cloud analysis activities have moved from Vaporware last year to concrete activities this year o CMS processing on EC2 o ATLAS tests on ECS – Future Grid – soon private clouds at BNL and ANL Federated storage with WAN data access key to all of these activites Much more work to be done – with a variety of work flows and input files ATLAS has to migrate users to using optimized code for this type of activity – a big Challenge


Download ppt "Cloud computing and federated storage Doug Benjamin Duke University."

Similar presentations


Ads by Google