Presentation is loading. Please wait.

Presentation is loading. Please wait.

CONSOLIDATION OF METADATA IN FIELD OF ENVIRONMENTAL SCIENCES Evgeny Vyazilov, All-Russia Reseach Institute of Hydrometeorological Information – World Data.

Similar presentations

Presentation on theme: "CONSOLIDATION OF METADATA IN FIELD OF ENVIRONMENTAL SCIENCES Evgeny Vyazilov, All-Russia Reseach Institute of Hydrometeorological Information – World Data."— Presentation transcript:

1 CONSOLIDATION OF METADATA IN FIELD OF ENVIRONMENTAL SCIENCES Evgeny Vyazilov, All-Russia Reseach Institute of Hydrometeorological Information – World Data Centre. Obninsk. E-mail: CITES-2005, Novosibirsk, March 19-23 2005 http://

2 Content Necessity of metadata creation A brief history of metadata development and characteristic of the most known systems metadata for various metadata objects Metadata structure on an example ESIMO Documentary data - sources metadata Problems of development and use metadata Metadata as a basis of monitoring of a condition of information resources in the field of an environment Metdata aggregation For examles of out put forms Prospects of metadata development Conclusion

3 World “metadata” in Internet Serach mashin Rus Engl GOOGLE 30100 10700000 Yandex 193084 120302 Yahoo 2790 4710000 MSN 43795 1632490 787 requests for last month for Yandex

4 Characteristics of environmental data collection and processing systems Big size data - hundred Tb (tens thousand files) High intensity of data sets updating - up to 1Tb per one year Large amount of data sources - tens thousand sea expeditions, more than 10 000 hydrometeorological stations and posts Small of output production made by periodically Variety of inquiries - from simple inquiries about information about data up to climatic estimations of influence of environment on objects of economy Variety of distribution forms - table, diagrams, maps, directories, that requires use of various software Necessity of access to data in on-line Many steps of data processing - observation, collection, cataloguing, applied processing, …..

5 Metadata are information on data being auxiliary, help at data processing Semantic Metadata - the logic characteristics of data, advance the items of information on sources of data accommodation, and also give information on: who, where and than observed, when and as data are received, on what carrier and format are stored, what software entered the data, as were checked etc. Syntactic Metadata – information on data accommodation in a network, a disk, describe structure allowable meanings, ways of their representation, interrelation with other data, distribution and other data characteristics, which help to carry out access to data, correctly to interpret them and to use Definitions

6 Necessity metadata Metadata are necessary not who is creator of databases, and the one who uses environment data Distributed character of data centers and platforms Variety of observant platforms, parameters, methods and ways of their reception Variety registration methods of observation (hard copies, technical carriers) Available paper catalogues, lists, information received from technical carriers already few help by data search Metadata - basis for transition to paperless technology of information processing (future "Data source – Decision Makers") Metadata allow faster to be guided in a big data flow about natural environment In many cases data holders are not interested in, that information from databases could be used by departments, ministries, economy subjects or population

7 The requirements to metadata Metadata will help to answer questions: what data are, whence and when the data have got in storehouse, who the author, when also by whom changed, in what structure are stored It are necessary various metadata - information on organizations assembling and storing data, data sets, exchange formats, processing software, etc. For thin data select metadata attributes are required which are not present in initial files, for example, quantitative characteristics of data flows All objects metadata should be stored in one scheme Metadata should allow finding by the logic characteristics of data their physical address of storage Than more full base metadata, them they can be are used for data more effectively Creation of metadata bases should be a duty of each project, program, expedition connected to data reception

8 Features metadata Volume metadata concerning large. So the bases of coverage by oceanographic observation of this or that area are estimated in 1Gb Expendable input of the information at initial loading metadata with the subsequent modification and its repeated use during enough long interval of time Rather small activity of updating both on frequency, and on volume of updating Necessity of centralization of the general information about data and decentralization of the local, detailed information about data

9 Documentary data - metadata source Formalized description of a data sets and data base Description of data sources (organization, observant platforms, projects) Description of data storage format List of parameters Methods description of data check Completeness of a file in relation to the initial carrier or program of observation Descriptions of the observant programs (projects) Description of measurements methods and used equipments List of logic units of a storage (cruises, squares, geographical areas etc.) with the indication of observation amount Description of software List of the publications received on the basis of this data set Used qualifiers and codifiers

10 Where is created occur metadata? Manufacture of observation - network, methods of observation, methods of environment parameters definition, measuring systems Means of manufacture of observation - RVs, stations, satellites, buys Data collection - technology, formats of transfer, description of transmitted complete data sets Accumulation data - description of data sets, organizations - suppliers, owners, users, formats of the acquisition, storage and exchange of data, observant projects, cruises of RVs, information on parameters, codifies, technologies, methods of the data control An interdepartmental and international data exchange - formats, description of complete data sets, observant projects and programs A storage and protection data - processing technology, processing methods, control and analysis data, software, algorithms of calculation of parameters, coverage by observation Modeling - model, methods, formats of the target data Distribution of the data - production (analyses, bulletins, monthly journals, year-books, directories, forecasts, Web sites), soft and hardware Decisions support - impacts of environment on economic objects, ways of prevention of influences

11 Structure metadata Data sources information on organizations information for all RVs and about working RVs information on cruises of RVs information about hydrometeostations Information on the satellites information on networks of observation information on the experts information on the observant projects information on hydrometeorological equipments information about Web sites Information resources information on data sets and databases Web - resources information production Instens of data information about observation, profiles, terms information about times series information about grid data information about text graphic information Syntactic metadata the descriptions of data formats the dictionaries and codifiers information on models, software the dictionary of the terms the list of used abbreviators Spatial metadata information on maps information on shape-files information about attributives data …

12 Place and role of metadata in data management

13 Connection between IR instance descriptions and metadata objects inside one metadata object - between various Instance of metadata objects, for example, for IR description, several projects of one program between different metadata objects - for example, the description of data sets should be accompanied by the complete information on organization, experts, platforms, formats, projects between objects of one type at the distributed storage - for example, information about IR, supported by different organizations

14 Metadata at various levels of data management Local - observant platform (the separate organization) - is necessary the detailed information on data sets (databases) as information on RVs cruises and their condition (in processing, on what carrier etc.), about a coverage on various parameters Regional - project, expedition (corporation) - information on each data set, collection unit, account and data exchange (cruise, monthly flow data from coastal station) National - information about organizations, data sets, software of processing, formats and exchange at a level of the country, observant platforms, observant networks etc. International - information on the international agreements, data sets transferred to the international exchange, including information on cruise and stations, formats of data exchange, processing software At all levels of management are available as the help information of one class (information on data sets, data sources, formats etc.), so specific to each level

15 Metadata at various stages of processing Collection - as a rule metadata are stored together with the initial data Primary processing allocate information on observant platforms Analysis of the information - there is various objects metadata Decisions support - information on production and rules of its release, and also possible types of inquiries and soluble tasks

16 Metadata Standards Metadata Standards ISO 19115 – International Standard for Geographic Information ISO-19139 – XML Schema, Extension of ISO Metadata Schema ISO 3166 – Geographical regions and countries XML - standards Dublin Core – Web resource Description of bibliographic information RDF – Description of complex, hierarchical connection resources LOM – Educational resources description (Learning object model) XMI – XML Metadata Interchange UDDI – Universal Method Description, Discovering, Interface WSDL – Web - Service Description DCML - Data Center Markup Language Framework Specification SWEET – Semantic Web for Earth and Environmental Terminology OWL - Ontology Web Language ESML - Earth Science Markup Language Others standards CDI – Sea Search project (EC) DIF – Data Interchange Format (GCMD, NASA) Standard for Digital Geospatial Metadata – US Federal Geographic Data Committee EDMED – EC standard for marine data ROSCOP – IOC standard for cruise data

17 For effective data management is necessary aggregated metadata Condition IR - aggregated characteristics of databases, their quantity Condition of observation networks used equipment, measuring systems are by the important parameter of quality of a observation network Characteristics of information flows, data distribution on the basic regions of Globe with the indication of organizations - data suppliers Distribution of the information on carriers, various levels of the collection and data exchange Quantity of the executed inquiries with the indication of the tendencies in information needs of the users etc.

18 The characteristics IR, as the result of metadata bases processing on a basis metadata is possible to carry out analytical inquiries and to receive the aggregated characteristics, i.e. to carry out the analysis of receipt data from various organizations. For example, to receive: - Quantity of data sets on organizations, regions - Quantity of RV cruises, arrived in 1990 – 2000 from the various countries, organizations of Russia; quantity of RVs cruises, arrived 1990-2000 on kinds of observation - Quantity of stations on squares, periods, parameters etc. Such information allow to receive the information for data management about quantity of cruises, stations on areas, departments etc.

19 Metadata aggregation Quantity of logic units of data (RVs, expeditions, observant platforms, structures, parameters) for the period of observation for geoobject, organization Information on data set, IR Information on data sources (observing nets, platforms HMS, RVs, satellites, etc.) Information on obserbation in point Quantity of observations in unit of data collection (stations, time) for observation period Quantity of stations, profiles, levels for every parameter Aggregation level

20 Characteristic of aggregated information at various stages of processing Manufacture of observation - quantity of information sources (RVs, coastal stations, buys etc.), volumes of the received information of one source (urgent, daily, monthly, annual) Data sorting in data centre - for time (daily, monthly, annual), in space (station, cruises, territory) Data processing - volume of the process able information, time of processing of the information Distribution of the information - volume of the out put information, periodicity of representation (day, week, decade, month, quarter, year); spatial association of the data (region, water area); sorting of the data (in time, in space, in space and in time simultaneously)

21 Metadata aggregation for various levels of management Higher organizations - general information on a DB condition, IR updating, portal visiting Management - general information on DB condition, IR updating, condition of metadata bases, various IR Users - general information on a DB, subschemes Applied programmers - developers of applications - general information on a condition DB, on subschemes, detailed informations on the tables DB administrator, developers of applications - general information on DB, on subschemes, detailed information on the tables special information on the rights, roles for subschemes, tables, parameters

22 The monitoring IR is operative observation of data flows for preparation information - analytical materials, decisions support on IR, DB development, applications with the purpose of improvement of information maintenance The basic idea DB administrators at any moment from any place should receive metadata about DB quantitative characteristics, their updating Basis monitoring metadata and application developed for reception of an information on bases of the initial data

23 Function of IR monitoring Reception of an information about the contents and volumes of separate subschemes (objects) and tables with a various aggregation degree Reception information on data coverage of separate areas for any data kind and separate parameter Visualization of any metadata kind as the tables by criteria of search Information - analytical tasks on management IR (estimation of volumes, coverage of separate subject domains and geographical areas, forecast of development IR etc.) Estimation visiting and separate IR Distribution of information-analytical materials

24 Basic approach on creation of IR monitoring system Maximal use before the created applications, for example: - Reception information about RVs cruises - Data coverage - Work with Unified Dictionary Parameters - Search metadata - Help information on DB subschema

25 History of development information systems for metadata 1969. RIHMI-WDC. The description for hydrometeorological data sets 1971. RIHMI-WDC, VNIIMORGEO. Metadata on expeditions, profiles, stations (many levels data structure) 1977. IOC UNESCO (MEDI). Metadata on organizations, catalogue of stations, information on data sets which are taking place in the various countries, detailed description of files 1984. RIHMI-WDC. A complex metadata (data sets, cruises, projects, coverage, etc.) 1987. WMO. INFOCLIMA. The bibliography and information on data sets 1997. RIHMI-WDC Electronic directory 1998. UA, MGI. Metadata (projects, experts, information on cruises) 1999. RIHMI-WDC. System metadata ESIMO (Internet) 2002. EU. The project SeaSearch. 4 objects metadata (data sets, cruises, projects, CDI) 2004. RIHMI-WDC. Pilot IOC project. XML Scheme metadata

26 The most known metadata systems CIESIN ( ) - Consortium for International Earth Science Information Network EDMED ( ) - information on data sets (4 thousands) GCMD - information on data sets (10 thousand) INFOTERRA ( ) - The Global Environmental Information Exchange Network Oceanic ( ) - Information on the programs of research courts OceanPortal ( - 5 thousand web - sites OceanExpert ( – information on more than 10 thousand experts, ROSCOP ( ) - information on RVs cruises under the Announced national programs, about 10 thousand expeditions(dispatches) ACOD (RIHMI-WDC) - catalogue of flights (33 thousand expeditions) EOSDIS (

27 Structure and connections of metadata objects (Example in field oceanography)

28 Methods of consolidation metadata - for reception them from one source To develop the uniform scheme metadata for all objects For each metadata object to create java - classes, which can be used in any metadata object (organization, experts, parameters, other) To include java - classes in the appropriate applications for creation of the out put forms for other of objects metadata For each object metadata to give out the list of web – site addresses, on which there are external systems with metadata To organize the automated transfer of search criteria in other of system metadata To create the common list of data sets and DB, appropriate references to their descriptions which are taking place on sites of organizations – data owners (NESDIS)

29 Scheme of data search with help metadata Scheme of data search with help metadata

30 Examples of consolidation on ESIMO web portal Links list UDP IR descriotion Experts

31 Realization of metadata bases Condition ESIMO metadata Descriptions IR - 600 Instance for file system -15000, in DB - some millions Organizations -participants of updating IR - 30, shipowners - 100 Parameters - 600 Objects metadata -15, total amount more than 40 thousand Codifies - 160

32 Forms of distribution Information on data setsInformation on organizations Information on coastal stations Information on equipment

33 Forms of distribution( continuation ) Information on projectsInformation on Links Electronic Guide for ESIMO IR UDP

34 Metadata - as a basis of IR monitoring on ESIMO web portal Information on subschema DB

35 Information on IR and visiting portal

36 Applications for data and metadata aggregations Hydrometeorological ship data, meteobuys Bathy, Tesak massage

37 Information on quantity of IR i nstance

38 Use metadata and results of IR monitoring Guide "Information resources about a condition of World ocean“, Portal visitors (separate information), tatatisticByProfile tatatisticByProfile Monitoring DIRS (detailed information), Pushing under the list (updating DIRS and visiting) Links analysis for condition servers and channels of communication Analysis of portal state (message on telephone)

39 Prospects of metadata development Metadata - basis of the virtual data centre To remove separation of information systems and resources to transform them in unified structure To create system with use of ideas virtual data centre and service - oriented architecture To develop a semantic network - to create uniform space of names on each subject domain -, as a basis of structure description data exchange between the applications IR Provider - Has the catalogue web services on the basis of the standards UDDI, WSDL, SOAP for work with method - Supports conceptual XML - scheme information stored and processable by the virtual centre - Organizes interaction between web-services - metadata exchange Authors IR - support web-services, including metadata, codifiers Users - receive the information at any moment from any point on any object metadata

40 Conclusion A number of information systems do not correspond to requirements of users on completeness, availability, integrity contained in them metadata Last years the process of creation of new objects metadata Basic lack of many systems is the duplication of separate sections of the descriptions in various objects metadata (information on organizations, experts, platforms, others) Uniform complex of technologically connected among themselves information systems with metadata Is necessary metadata objects are allocated, that allows to divide the responsibility for their creation, systematically to create them depending on importance and necessity to consolidate them on various web - portals, to organize complex information maintenance by data Developed approaches on integration was sped up and the consolidations metadata from various sources Systems of metadata collection are now centralized, but the coming years within the framework of creation of services it will be possible to speak about metadata distributed storage Metadata are widely used DB administrators and without consolidation metadata to organize effective operation very difficultly

41 Спасибо за внимание! Thank for attention!

Download ppt "CONSOLIDATION OF METADATA IN FIELD OF ENVIRONMENTAL SCIENCES Evgeny Vyazilov, All-Russia Reseach Institute of Hydrometeorological Information – World Data."

Similar presentations

Ads by Google