Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of User Requested Data in US ATLAS Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November.

Similar presentations


Presentation on theme: "Management of User Requested Data in US ATLAS Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November."— Presentation transcript:

1 Management of User Requested Data in US ATLAS Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November 14, 2012

2 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, Outline User Analysis Output Central Deletion Service Victor USERDISK cleanup Monitoring and Notifications DaTRI LOCALGROUPDISK policy

3 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, Storing User Analysis Output User analysis output in US is stored in USERDISK of the site where the job has run Only US sites have USERDISKS. In non-US sites the destination for output is SCRATCHDISK US has specific policy for USERDISK maintenance/cleanup – more relaxed/user-friendly than for SCRATCHDISK (details later) Both space tokens are temporary storage, but users can subscribe their data to different locations using DaTRI request system (details later) Typical destination for user data by DaTRI requests is LOCALGROUPDISK or GROUPDISK for longer storage, or even to SCRATCHDISK for further temporary storage Datasets in LOCALGROUPDISK or GROUPDISK by default don’t have limited lifetime, so these space tokens (unlike some other space tokens) are not cleaned up on a regular basis

4 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, Central Deletion Service Cleanup of all space tokens is carried out through the central deletion service The very basic command to submit a dataset for deletion is: dq2-delete-replicas The command will submit the dataset deletion to the Central Deletion Service and right away put it on queue Deletion service flow for datasets is: ToDelete -> Waiting -> Resolved -> Queued -> Deleted. It also shows the status ToDelete -> Deleted for file count, as well as for the space. Errors are also shown, if any. Currently the typical deletion rate for US sites is 2-4 Hz for T2-s and 7-8 Hz for T1 One can change/optimize the deletion rate tweaking some site specific parameters in deletion service configuration file Load, bottlenecks and other srm issues can create timeouts, reduction of the deletion rate and cause errors If site has more than 100 errors in 4 hours, the ADCoS shifter must file a ggus ticket

5 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, Cleanup Decision - Victor Daily monitoring of the space tokens to detect low space availability and trigger space cleanup is done by the system called Victor Victor takes care of only those space tokens which need regular cleanup It prepares a list of datasets to be sent to central deletion system. A grace period of 1 day is exercised SCRATCHDISK – cleanup is triggered when free space is 55%. DATADISK – when free space is getting low. Only “secondary” type of datasets are triggered for deletion, older than 15 days. Popularity of datasets is taken into account. forT2-s cleanup is triggered when free space 15% for T1 cleanup is triggered when free space 750TB PRODDISK – cleanup is triggered when free space 12TB. Only datasets older than 31 days. The issue is also to cleanup the pandamover files, done locally GROUPDISK – cleanup defined by the group responsible person

6 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, USERDISK Cleanup The USERDISK cleanup is done on average every 2 months We target datasets older than 2 months Targeted user datasets are matched with dataset owner DN from dq2 catalog and dataset lists per DN are created Notification is sent to users about the upcoming cleanup of the datasets with a link to the list and some basic information on how to proceed if the dataset is still needed We maintain and use a list of DN to address associations, and regularly take care of the missing/obsolete s After the notification the users have 10 days to save the data they need This cleanup procedure is used during the last 4 years Very smooth operation, no complains, users happy

7 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, USERDISK Cleanup Notification Question whether the user is well informed on all available options to save the data targeted for deletion Excerpt from the notification with the information for users: You are advised to save any dataset, which is still of interest, to your private storage area. You may also use your local group disk storage area xxx_LOCALGROUPDISK if such area has been defined. Please contact your local T1/T2/T3 responsible of disk storage for further assistance. If the list contains datasets of common interest to a particular physics group, please contact that group representative to move your datasets to xxx_ATLASGROUPDISK area. If you are going to copy your dataset to xxx_LOCALGROUPDISK or xxx_ATLASGROUPDISK please use the Subscription Request page: If you are going to copy your dataset to any private storage area (not known to grid) please use dq2-get. See the link for help: https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo This must cover all the practical options…

8 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, Storage Monitoring, Notifications Storage monitoring from ddm group: Drop-down menus provide other storage tables and plots, grouped by space tokens, clouds, etc. Also notifications with the list of space tokens, which run low on free space, and if any space token runs out of space ( < 0.5TB ) and is blacklisted Notification thresholds: T1 DATADISK < 10TB T2 DATADISK < 2TB PRODDISK < 20% USERDISK < 10% Others < 10TB

9 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, DaTRI Data Transfer Request Interface (DaTRI) – to submit transfer requests, also provides monitoring of the transfer status Request can be placed by web interface or automatically as output destination of the analysis job All the links are available at the left bar of Panda Monitor page under the Datasets Distribution drop-down menu Users need to be registered within DaTRI. Registration link is in the main page. Also there is a link to check the registration status. Also if you are not sure, use the opportunity to check your certificate for usatlas role DaTRI request on web interface – basically you fill dataset pattern, destination and justification for transfer

10 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, DaTRI Submitted DaTRI request has following states/stages: PENDING -> AWAITING_APPROVAL -> AWAITING_SUBSCRIPTION -> SUBSCRIBED -> TRANSFER -> DONE Once scheduled for approval, a request ID will be assigned Error message if dataset pattern is not correct, dataset is empty, destination site has not enough space, group quota at the destination site is exceeded, etc. Each cloud has DaTRI coordinators for manual approval. In US Kaushik De, Armen Vartapetian Approval to GROUPDISKs done by group representatives An automatic approval if summary size is < 0.5TB, and only if user has usatlas role (a very common issue/problem) Monitoring provides also link to the dashboard, as well as replica status for each dataset Plan to provide a functionality within DaTRI web interface to upload list/pattern of user datasets for deletion. Help users to get rid of the obsolete data

11 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, LOCALGROUPDISK Policy Intended as a long term storage for users Unpledged resource (main concern T1/T2) No ADC policy or recommendations for management Central cleaning only for aborted and failed tasks The main issue is the absence of the usage and cleanup policy. Because of that, tendency to grow in size Usage tables for some of the US LOCALGROUPDISK-s in backup slides Common trend is that usually there are 2-3 super users per site who occupy more than half of the space (there may be a group behind such user). A dozen of top users occupy more than 90% of the space, and there are many more users with less of a share Similar situation with storage distribution can be seen in other clouds as well Part of that data may have more relevance to GROUPDISK or even DATADISK (move data to pledged resources).

12 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, LOCALGROUPDISK Policy Some datasets with many replicas. Some of them owned by the same top users. The situation will become unsustainable if the number of such top users will grow over time Some datasets with only replica, and big chunk of that is not used for a while. Put in place policy/path for their retirement Popularity analysis may help to distinguish datasets which may be obsolete, and candidates for retirement We may start with soft space limit of 2-3TB per user per site Start to ask questions when size is above that Particularly for the datasets not used for N months (1 year?) – check if user still needs them Approval mechanism for sample transfers > N TB (10TB?). Centralized approval and decision for space allocation for big samples. LOCALGROUPDISK management policy is currently under discussion at RAC

13 Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, BACKUP

14 BNL localgroupdisk, used space 196TB User DNUsed Space (TB)# of Datasets /dc=org/dc=doegrids/ou=people/cn=david adams /dc=org/dc=doegrids/ou=people/cn=anyes taffard /dc=org/dc=doegrids/ou=people/cn=andrew haas /dc=org/dc=doegrids/ou=people/cn=caleb lampen /dc=org/dc=doegrids/ou=people/cn=shuwei ye /c=ru/o=rdig/ou=users/ou=mephi.ru/cn=mikhail titov7380 /c=uk/o=escience/ou=manchester/l=hep/cn=john almond5777 /dc=org/dc=doegrids/ou=people/cn=jacob searcy /dc=org/dc=doegrids/ou=people/cn=vivek jain /dc=org/dc=doegrids/ou=people/cn=douglas benjamin /dc=org/dc=doegrids/ou=people/cn=tarrade fabien /dc=org/dc=doegrids/ou=people/cn=stephanie majewski /dc=org/dc=doegrids/ou=people/cn=venkatesh kaushik Total for Top 13 Users (used space > 2TB), see the list above176 Total for Remaining 35 Users (used space < 2TB)20 Total Used Space196

15 SLAC localgroupdisk, used space 355TB User DNUsed Space (TB)# of Datasets /dc=ch/dc=cern/ou=organic units/ou=users/cn=eifert /dc=ch/dc=cern/ou=organic units/ou=users/cn=toshi68352 /dc=org/dc=doegrids/ou=people/cn=anyes taffard /dc=org/dc=doegrids/ou=people/cn=brokk toggerson /dc=org/dc=doegrids/ou=people/cn=andrew haas /dc=org/dc=doegrids/ou=people/cn=steven andrew farrell /dc=org/dc=doegrids/ou=people/cn=jason veatch /dc=org/dc=doegrids/ou=people/cn=michael werth /dc=org/dc=doegrids/ou=people/cn=bart clayton butler /dc=org/dc=doegrids/ou=people/cn=alaettin serhan mete /dc=org/dc=doegrids/ou=people/cn=david wilkins miller /dc=org/dc=doegrids/ou=people/cn=robert w. gardner jr /dc=org/dc=doegrids/ou=people/cn=venkatesh kaushik /dc=org/dc=doegrids/ou=people/cn=maximilian swiatlowski /dc=org/dc=doegrids/ou=people/cn=douglas benjamin Total for Top 15 Users (used space > 2TB), see the list above343 Total for Remaining 19 Users (used space < 2TB)12 Total Used Space355

16 MWT2+ILLINOISHEP localgroupdisk, used space 302TB User DNUsed Space (TB)# of Datasets /dc=org/dc=doegrids/ou=people/cn=samuel meehan /dc=org/dc=doegrids/ou=people/cn=david lesny /dc=org/dc=doegrids/ou=people/cn=frederick luehring /dc=org/dc=doegrids/ou=people/cn=anton kapliy /dc=org/dc=doegrids/ou=people/cn=jordan scott webster /c=uk/o=escience/ou=oxford/l=oesc/cn=maria fiascaris /dc=org/dc=doegrids/ou=people/cn=antonio boveia /dc=org/dc=doegrids/ou=people/cn=constantinos melachrinos /c=ru/o=rdig/ou=users/ou=mephi.ru/cn=mikhail titov312 /dc=org/dc=doegrids/ou=people/cn=robert w. gardner jr /dc=org/dc=doegrids/ou=people/cn=joseph tuggle /dc=org/dc=doegrids/ou=people/cn=douglas benjamin /dc=org/dc=doegrids/ou=people/cn=elizabeth jue hines Total for Top 13 Users (used space > 2TB), see the list above291 Total for Remaining 20 Users (used space < 2TB)11 Total Used Space302

17 AGLT2 localgroupdisk, used space 238TB User DNUsed Space (TB)# of Datasets /dc=org/dc=doegrids/ou=people/cn=haijun yang /dc=org/dc=doegrids/ou=people/cn=shawn mckee /dc=ch/dc=cern/ou=organic units/ou=users/cn=lxu99 /c=il/o=iucc/ou=tau/cn=nir amram48 /dc=org/dc=doegrids/ou=people/cn=douglas benjamin Total for Top 5 Users ( used space > 2TB ), see the list above229 Total for Remaining 18 Users (used space < 2TB)9 Total Used Space238


Download ppt "Management of User Requested Data in US ATLAS Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November."

Similar presentations


Ads by Google