Presentation is loading. Please wait.

Presentation is loading. Please wait.

Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Similar presentations


Presentation on theme: "Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN."— Presentation transcript:

1 Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN

2 Outline How it all should work Proxies Incoherence Security model examples Banning Argus Site authorization Pilot jobs Virtual machines and clouds Data security Other services SSO, identity providers Vulnerability aspects HEPiX 2009-10-29, LBNL2 This list probably is incomplete…

3 How it all should work (1) Users and services have digital certificates signed by trusted certificate authorities (CAs) – Certificate lifetime usually is 1 year Users are members of virtual organizations (VOs) – WLCG: alice, atlas, cms, lhcb, dteam, ops, … – Users need to re-sign AUP every year – Sites decide which VOs to support at which QoS Services are rarely made members of a VO – It would be desirable to some extent A service could prove that it is trusted by the VO Now: rely on information system + filtering HEPiX 2009-10-29, LBNL3

4 How it all should work (2) Users create short-lived proxies for grid access Long-lived proxies are only found on MyProxy servers Proxies are delegated to services as needed – Some services can retrieve or renew proxies via MyProxy Services interpret proxies consistently – The same criteria are used by different services – User jobs and data are protected as needed Services log security-related information consistently Users can easily be banned as needed HEPiX 2009-10-29, LBNL4

5 Where we want to be HEPiX 2009-10-29, LBNL5

6 Where we are HEPiX 2009-10-29, LBNL6

7 Proxies (1) Plain grid proxy – Usage: grid-proxy-init – Mapping can only be based on the DN – DNs in grid-mapfile harvested from VOMS servers Different subsets can be mapped differently VOMS proxy – Usage: voms-proxy-init –voms vo voms-proxy-init –voms vo:/vo/group voms-proxy-init –voms vo:/vo/group/Role=role – Plain grid proxy + set of attributes signed by VOMS server – Attributes: groups and/or roles – Mapping can be based on attributes and/or the DN Attributes usually preferred HEPiX 2009-10-29, LBNL7

8 Proxies (2) Proxy lifetime should be “short” – Cf. AFS/Kerberos token lifetime – Default 12 hours, 24 hours probably OK – Current practice: LHC experiments use multi-day proxies to avoid potential problems with proxy renewal CMS use 8-day proxies! Long job needs proxy to be renewed before it expires Long-lived proxies can be stored on a MyProxy server – Trusted services can retrieve or renew short-lived proxies MyProxy server currently is a single point of failure – RFE: upload proxies to multiple servers, try all of them for downloading proxies as needed HEPiX 2009-10-29, LBNL8

9 Incoherence Different services treat proxies differently – Libraries – Mapping Plain proxies VOMS proxies – Logging – Banning Not possible on certain services! – Testing/debugging/forensics tools Available for some scenarios on some services Try finding two gLite services with the same security model ! – OSG, ARC? HEPiX 2009-10-29, LBNL9

10 Security model examples LCG Computing Element – VOMS mapping with fallback on plain proxy mapping CREAM Computing Element – VOMS only OSG Computing Element – GUMS: VOMS, DN Disk Pool Manager – Virtual IDs – VOMS mapping and plain proxy mapping dCache – gPlazma: GUMS, vo-role-map, … Workload Management System – VOMS authZ by 2 different libraries: GridSite, LCMAPS But Condor-G engine only looks at the DN! HEPiX 2009-10-29, LBNL10

11 Banning OSG have SAZ and GUMS, ARC have Charon EGEE/gLite: LCAS library and SCAS/Argus services have banning plugins – Easy to ban a DN – LCG-CE, CREAM-CE, WMS DPM/LFC virtual ID table will get banning flags – Currently only plain proxies can be fully banned By mapping them to non-existent accounts/VOs – VOMS proxies can be banned only from creating new files Argus should make this consistent and easy – Also can import a grid-wide ban list HEPiX 2009-10-29, LBNL11

12 Argus Argus is the long-term gLite authorization framework It should give all gLite services a consistent authZ model It allows for authZ decisions to be taken centrally per site – A single place to pull the plug It can import remote policies – Regional, national, project-based, … – Give priority to local/national/… users – Banning of DNs, e.g. grid-wide Policies can affect QoS for DNs or VOMS attributes – Preferences – Banning Argus will be introduced gradually – It can coexist with legacy services HEPiX 2009-10-29, LBNL12

13 Site authorization EGEE – SCAS Released to production early July for glexec on the WN Only deployed on the few sites that helped debugging glexec and its use by ATLAS and LHCb – Argus In certification OSG – GUMS – SAZ ARC – Charon – Argus support foreseen HEPiX 2009-10-29, LBNL13

14 Pilot jobs (1) A pilot job checks and prepares the worker node environment for a real job, i.e. a task that it downloads from a central task queue – Late binding leads to good efficiency A multi-user pilot job can pick up a task from any user in the VO The task should run with its own associated proxy – Access services, store data etc. with the correct identity It should run under an account corresponding to that proxy – Separate users as the CE head node would have done – Protect the pilot proxy against malicious payloads A setuid root utility is needed to switch to the correct identity – Like “sudo” or Apache “suexec”  gLExec HEPiX 2009-10-29, LBNL14

15 Pilot jobs (2) Each experiment has a pilot job framework – ALICE: AliEn – ATLAS: PanDA – CMS: glideinWMS, only used on OSG – LHCb: DIRAC All examined by GDB Pilot Job Frameworks Review group Current usage – Production managers run VO workload for many/all users – Individual users may be able to run their own jobs Foreseen usage – Pilot jobs use glexec to run payload under user account Problem: we have no production experience with glexec and there is little time left before the LHC starts HEPiX 2009-10-29, LBNL15

16 Virtual machines and clouds Running each job in its own VM is desirable – Reduce security interference between jobs Shared software area and shared services remain – Local files left behind can be cleaned up completely – Implemented at some sites and becoming more popular Shared SW area not needed when SW included in the image – Avoids Trojan horses and bottleneck Complete images also are a natural fit for clouds Some sites are experimenting with clouds HEPiX 2009-10-29, LBNL16

17 Data security (1) Fine-grained security policies for data access are possible in principle In practice there are only 2 levels of security today – Production managers are responsible for the vast majority of a VO’s data volume (99%) – Only they have write access to specific resources used in managing production data Reserved sub-trees in the catalog name space Reserved disk pools and tape access – All the remaining resources are group-writable By default writable for the whole VO! Different groups in a VO can be shielded from each other – If they are mapped differently – This may require site admin intervention HEPiX 2009-10-29, LBNL17

18 Data security (2) BeStMan – Classic grid-mapfile, GUMS CASTOR – Classic grid-mapfile, insecure RFIO !! dCache – gPlazma supports GUMS, vo-role-mapfile, … DPM, LFC – Maps to virtual UIDs and GIDs (defined in DB) – Native VOMS support, fallback on classic grid-mapfile Lcgdm-mapfile to determine the VO for a plain grid proxy Grid-mapfile is needed by DPM GridFTP server StoRM – Native VOMS support – Uses just-in-time ACLs to give access to data on cluster FS HEPiX 2009-10-29, LBNL18

19 Other services Information system – Insecure LDAP Anyone can search for vulnerable hosts Information can be corrupted (DNS spoofing, MITM attack) – Any site can claim it supports any VO The VO can configure a filter to get rid of unwanted sites or run a private, static information system – Filters currently work only for Computing and Storage Elements Monitoring – When secure, often viewable for any DN from a trusted CA Accounting – Secure – Privacy HEPiX 2009-10-29, LBNL19

20 SSO, identity providers SSO for services is popular Identity providers – Kerberos – Shibboleth – … Why should grid usage be excluded? SSO identity can be translated into grid identity – FNAL Kerberos CA, SLCS – SWITCH SLCS – … HEPiX 2009-10-29, LBNL20

21 Vulnerability aspects EGEE Grid Security Vulnerability Group has >70 open issues – The vast majority of them are deemed low risk …for now A complete list of domains involved in WLCG could be used to configure service firewalls accordingly – Outbound client connections might also be constrained Jobs/payloads should be signed by the user proxy – Close the door to “easy” injection of rogue jobs HEPiX 2009-10-29, LBNL21

22 Conclusions Security aspects of WLCG clients and services show a forest of libraries, configurations and features – A lot of legacy More consistency and simplicity are highly desirable Some important functionalities only implemented partially – Banning – Site-wide policies – Data protection There are steady improvements and road maps – To get us out of the woods… HEPiX 2009-10-29, LBNL22

23 HEPiX 2009-10-29, LBNL23


Download ppt "Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN."

Similar presentations


Ads by Google