Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,

Similar presentations


Presentation on theme: "Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,"— Presentation transcript:

1 Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń, Łukasz Opioła, Darin Nikolow, Łukasz Dutka, Renata Słota, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST PPAM 2015 Krakow, Poland, September 6-9, 2015

2 Motivation Problems with Global Data Access Is a new tool needed? Onedata Design Assumptions Key Aspects of Data Access Global data organization Globally distributed metadata Results Conclusions Agenda

3 Scientific communities require global access that integrates independently managed resources. Metadata organization and management is a key to make global access effective, simple and convenient. Motivation

4 Storage heterogeneity and delays/bandwidth issue. Manual transfer of data before/after computations. No accounts integration: Difficult access (security issues). Problematic data sharing. Problems with Global Data Access

5 Is a new tool needed? iRODS LFC Dropbox GoogleDrive Globus Connect Gluster PanFS BeeFS Parrot

6 All organizations (providers) supporting a user have access to all data and meta-data concerning the given user. No central server for the metadata for the sake of performance and availability. No replication everything to everyone, optimally managing the redundancy data. Data access efficiency: Minimal overhead when the data is close to client. In the case of remote data an efficient fragment access. Onedata - Design Assumptions

7 Global data organization Hides complexity of data distribution from users Indicates which remote data should be observed by each organization Globally distributed metadata No trust between providers Caching vs. coherency Onedata - Key Aspects of Data Access

8 Global data organization Easy management and sharing of data for users. Limitation of metadata that provider should know.

9 3 metadata levels Metadata used to coordinate providers’ cooperation Files metadata stored by each provider Current usage metadata Usage optimization Lower level -> more frequent usage -> higher distribution Caching and aggregation of changes Changes pushing to caches Global metadata distribution

10 Supports cooperation (users accounts integration) Provides information which lower level metadata should be synchronized with whom (spaces metadata) Stored by Global Registry – distributed application which works as trusted mediator Global metadata distribution Level 1

11 Files metadata File parts location description Stored by each provider that supports particular space Fast access to needed metadata Limited number of synchronization operations Propagation of changes on the basis of Level 1 metadata Changes aggregation Automatic conflicts resolution Level 1 metadata caching Global metadata distribution Level 2

12 Metadata about current files usage Who should be notified about file change Where data is currently modified Stored by providers, cached by clients First aggregation at client side, second at provider’s Updates Level 2 metadata Global metadata distribution Level 3

13 Caching & aggregation vs. time needed to gain global consistency Set balance at provider level (dynamic clients reconfiguration) Locks for immediate consistency Global metadata distribution Sum up Global Registry Level 1 Provider 1 Level 2 Level 1 Cache Level 3 Client Level 3 Cache Provider 2 Level 3 Level 1 Cache Level 2 More changes -> lower level -> more power

14 Easy organization of data Global distribution hidden Easy results publishing Results Simplicity

15 Results Cooperation

16 Results Efficiency

17 Conclusions Data organization allows hiding global distribution from users keeping providers’ independence Ready for global users cooperation Efficient enough for computations Onedata status Onedata v1 installed in production environment of ACC Cyfronet AGH Onedata v2 currently tested by international organizations

18 Thank you onedata homepage: http://www.onedata.org


Download ppt "Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,"

Similar presentations


Ads by Google