Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Applications and Repositories Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services.

Similar presentations


Presentation on theme: "Grid Applications and Repositories Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services."— Presentation transcript:

1 Grid Applications and Repositories Markus.Buchhorn@anu.edu.au Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services Architect, Grid Services Coordinator, GrangeNet

2 Overview  Common and Uncommon Issues from Diverse “Grid” Application Areas  E-Research activities  Also relevant to education  Large range of community ICT literacy Scholarly Input and Output Slice by issue, dice by application

3 The bigger context: e-Research + infrastructure  The use of IT to enhance research  and education! Access distributed resources transparently Make data readily and appropriately available Make collaboration easier  Is it The Grid ? No, and yes – the Grid is a tool in the kit

4 What are the bits in eRI? Network Layer (Physical and Transmission) (Advanced) Communications Services Layer Applications, Grid, Middleware Services Layer Applications and Users…

5 What’s in that middle bit? Computing Visualisation Collaboration Data Instruments Middle- ware (Advanced) Communications Services Layer Applications and Users…

6 A (local) data architecture A Repository Object Store Files, DB, streams, instruments Metadata DB Scientific, Management, Annotation, Preservation, Access,… Access Interface Presentation Interface Disk, Tape, HSM, RAM, …

7  It’s not just users Other services act on users’ behalf, or each other’s Must operate within the same frameworks and standards Rep. IRP Repository Federation “Portal” or Federation interface AAA Services Metadata-flows Users Computing Collaboration Visualisation Access protocols Queries, Curation AAA flows Data Grids, Federated Repositories, Virtual Collections, … proxy This all applies even with a single repository

8 Application Areas - 1  Geosciences Minerals, oils and gases, tectonics Govt, Surveys, Industry Many data sources (spatial and physical) and simulations  Bioinformatics Genetics, proteomics, … Public datasets, private queries, private annotations

9 Application Areas - 2  High Energy Physics Large expensive instruments, projects Massive data, computation and simulation  Earth Systems Sciences Climate studies, oceanography Massive remote sensing data set, large and complex simulations  Astronomy Big data, complex reduction process, big simulations, long-term research

10 Application Areas - 3  Linguistics, Musicology Archives of digitised cultural material Complex analyses  Social Science Data Census, health, surveys, … Complex data structures, qualitative data  Archaeology Digitised physical materials, spatial and chronological data

11 Application Areas - 4  Financial Many sources, SX, FX, news, … Timeliness (low-delay, high-throughput) and long time scales are important  Music, Arts, Sports Performance, formal and practice Education focus

12 Longevity  Sustainability Data formats  Descriptions, C ompression, lifetimes  Simplex vs Complex (compound) objects Software  Algorithms, implementations, Operating Systems Versioning  Recalculation, interpretation, validation, derivatives  Community valuation and quality  Underlying infrastructure, technologies Storage Facilities Mirroring for protection – policy and technical issues Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.

13 Metadata  Varied research schemas 1 is nice, most have zero or five…  Baseline DC Almost non-existent..  Provenance and processing  Preservation, curation and valuation  Subjective metadata, annotations  Scientific description Itself subjective, and contentious… Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.

14 Lifecycles  Workflows for data to be Acquired Ingested, Curated Delivered  Vary over time as we learn things  Vary over time as we value things  Data needs to be reprocessed How does that impact the existing stored data?  Workflows themselves become part of the metadata and need to be stored and managed Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.

15 Data Movement  Performance vs political requirements Mirroring/Caching; federated repositories Movement across policy boundaries  Collision with authorisation Some data cannot move from its host (in bulk)  Appropriate Delivery needs Remote/field access to data Clients in a different ‘circle’  Bandwidth, compute, language, culture  Movement Protocols Access protocols and inter-repository protocols One standard is great – ten are not Resource discovery Geo, HEP, Ling, SS, Arch, Fin, Mus.

16 Rights  Needs AAA to be working, to scale Authentication, Authorisation and Accounting Requires identities and roles and policies to be understood  Privacy, Security Personal information leakage Anonymised and de-identified data,  needs to stay usable  Ownership Not always with the researcher  Time-varying Data sourced under old agreements Rights vary by status of source  people die, agreements expire, … Geo, Bio, HEP, ESS, Astro, Ling, SS, Arch, Fin, Mus.

17 Types  Digital  Non-Digital Paintings, Objects, Manuscripts  Semi-Digital Books, texts, images, film  Quantitative and Qualitative Describing, searching and finding useful qualitative data is hard Ling, SS, Arch, Fin

18 Processing  Data fusion Single or multiple repositories  Data slicing, latitudinal searches Impacts technology choices  Interfaces for non-humans computing, collaboration, visualisation Geo, Bio, Chem, HEP, ESS, Astro, Ling, SS, Fin

19 Summary  Common and Uncommon Issues from Diverse Application Areas  One size (infrastructure) does not fit all (yet) But 3-4 (40?) sizes may fit most (for now)  Some domains have very different definitions of sustainability, rights issues, data movement, etc. But many don’t…  User and developer education is still needed


Download ppt "Grid Applications and Repositories Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services."

Similar presentations


Ads by Google