Presentation on theme: "Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University."— Presentation transcript:
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University of Manchester
Terms of Reference n Data: numbers, characters, images which can processed and transmitted by [humans] and [machines]. –Unstructured. –Semi-structured. –Structured. n Database Management System (DBMS): a suite of programs which manage the storage and retrieval of large structured sets of persistent data. n Database: one or more large structured sets of persistent data and one component of a database management system. n Federated databases: data integration using middleware.
What’s in a Grid? n Computational Grids - high performance computing resources. n Data Grids - access to heterogeneous datasets. n Access Grid - advanced video conferencing-based collaborative environment. n The Grid makes it possible to share heterogeneous, distributed resources over a network.
GRIDMIDDLEWAREGRIDMIDDLEWARE Visualization Workstation Mobile Access Supercomputer, PC-Cluster DBMS, Sensors, Experiments Networks The Grid Metaphor
Data Integration n Unimpeded use of distributed, heterogeneous, autonomous data resources. –Integrated view of the data resources that allow users to interact with them as if they constituted a single, global, integrated data resource. n Data integration fosters collaboration - one of the fundamental goals of e-research. n Limited DBMS support for Grid integration.
Grid-Enabling: Grid Middleware n GridFTP –High-performance data transfer protocol. n Storage Resource Broker (SRB) –Uniform interface to a virtual distributed data storage resource. n Open Grid Services Architecture Data Access and Integration (OGSAI-DAI) –Grid Data Service (GDS). Standard interface for database access. –Grid Data Service Factory (GDSF). Establishes a database service instance. –Database Access and Integration Service Group Registry (DAISGR). Identifies available database services. –OGSA-DQP Distributed Query Processing i.e. search across multiple databases.
ConvertGrid n ESRC pilot demonstrator project (PDP) in e-Social Science Programme. n Research problem: investigating complex research questions that require the combination of datasets from multiple sources. n Data management: –Access to multiple datasets. n Data fusion: –Multiple geo-referenced data sets i.e. different target geographies e.g Wards, 1991 Postcode Sectors. n Converts data sources with different native geographies to a common Target Geography. –CSV or XML format. –Results returned as a string or streams (FTP/HTTP/GridFTP).
Different Target Geographies
Challenges n Scalability: –Performance and capacity requirements. n Security: –Use of Grid Security Infrastructure (GSI) at the Grid service client level is a non-trivial problem. n Heterogeneity: –Infrastructural. –Syntactic. –Semantic. n Metadata: –Adds contexts to data aiding identification, location, and interpretation.
Further Reading n Watson, P. (2003). Databases and the Grid. In: Grid Computing: Making The Global Infrastructure a Reality, F. Berman, G. Fox, and A. J. G. Hey (eds.), Wiley, pp n Cole, K. et al. (2003). Grid Enabling Quantitative Social Science Datasets: A Scoping Study. ESRC n Atkinson, M. et al. (2004). Data Access, Integration, and Management. In Foster, I. and Kesselman, C. The Grid2: Blueprint for a New Computing Infrastructure, Elsevier, p
Acknowledgements n ConvertGrid Team, University of Manchester –Keith Cole, Jon McLaren, Pascal Ekin, Linda Mason, Stephen Pickles, and Justin Hayes. n Paul Watson, University of Newcastle n Alvaro Fernandes, University of Manchester n Mike Mineter, National e-Science Centre, University of Edinburgh