Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 1 Exploring the Cloud of Research Information Systematically Keith G Jeffery Director,

Similar presentations


Presentation on theme: "© Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 1 Exploring the Cloud of Research Information Systematically Keith G Jeffery Director,"— Presentation transcript:

1 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 1 Exploring the Cloud of Research Information Systematically Keith G Jeffery Director, IT & International Strategy CCLRC keith.g.jeffery@rl.ac.uk Anne G S Asserson Research Department University of Bergen anne.asserson@fa.uib.no

2 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 2 Structure The Problem, Proposition, Requirement Past Work, Analysis, Conclusion Solution Additional Aspects Future Work Conclusion

3 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 3 university Funding agency Research chemist doctor synchrotron PhD thesis Journal paper presentation conference industry entrepreneur Classification scheme protein DNA The universe of information of relevance What I want ‘magic’ system The Problem

4 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 4 The Problem: The Universe of Relevant Information Unstructured, semistructured –Policy papers –Research proposals (part) –Publications Structured –Research proposals (part) –Research information records (commonly metadata) –Research datasets Heterogeneity –Character set –Language –Syntax –Semantics Highly relevant relevant Partially relevant

5 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 5 Relationship Usual Relation personperson O r g U n it PERSON O r g U n it ORGUNIT PK FK

6 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 6 Project OrgUnit Person Structured Approach Linking relations expressing semantics Pr-Pe Pr-OU Pe-Pe Pe-OU OU-OU

7 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 7 The research project aims to discover the particular properties of the umouzo worm that cause it to provide local tequila with its characteristic taste. The method will involve chemical analysis of the worm at various stages of its lifecycle related to the tequila: its early life in its usual forest habitat, its middle life when collected and held in artificial conditions as stock before it is placed into the bottle of tequila and finally within the tequila environment at various stages of maturity of the tequila. The research will be conducted by Professor Quoaxocoatl and his Analytical Biochemistry team, belonging to the Departments of Chemistry and Biology at the University of Chicken Itza, Mexico using the latest spectroscopic analysis techniques. The beneficiaries of the research will be the Chicken Itza Tequila company which expect to improve the flavour by understanding the chemical changes in the worms. The research will be conducted following the ethical code for experimentation on animals. The funding requested is 1,000,000 US Dollars over 3 years. Research Text: un- or semi-structured text

8 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 8 The research project aims to discover the particular properties of the umouzo worm that cause it to provide local tequila with its characteristic taste. The method will involve chemical analysis of the worm at various stages of its lifecycle related to the tequila: its early life in its usual forest habitat, its middle life when collected and held in artificial conditions as stock before it is placed into the bottle of tequila and finally within the tequila environment at various stages of maturity of the tequila. The research will be conducted by Professor Quoaxocoatl and his Analytical Biochemistry team, belonging to the Departments of Chemistry and Biology at the University of Chicken Itza, Mexico using the latest spectroscopic analysis techniques. The beneficiaries of the research will be the Chicken Itza Tequila company which expect to improve the flavour by understanding the chemical changes in the worms. The research will be conducted following the ethical code for experimentation on animals. The funding requested is 1,000,000 US Dollars over 3 years. Research Text: finding / classifying syntax

9 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 9 The research aims to discover the particular properties of the umouzo worm that cause it to provide local tequila with its characteristic taste. the method will involve chemical analysis of the worm at various stages of its lifecycle related to the tequila: its early life in its usual forest habitat, its middle life when collected and held in artificial conditions as stock before it is placed into the bottle of tequila and finally within the tequila environment at various stages of maturity of the tequila. The research will be conducted by Professor Quoaxocoatl And his Analytical Biochemistry team belonging to the Departments of Chemistry and Biology at the University of Chicken Itza, Mexico using the latest spectroscopic analysis techniques. The beneficiaries of the research will be the Chicken Itza Tequila company which expect to improve the flavour by understanding the chemical changes in the worms. The research will be conducted following the ethical code for experimentation on Animals. The funding requested is 1,000,000 US Dollars over 3 years. Parsing to a defined schema for research information

10 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 10 Problem: How to build the ‘magic system’ (1) how to assure quality data –so that results are accurate; (2) how to assist the end- user in formulating correctly the query –to obtain the expected results; (3) how to structure the data to obtain the optimal response –in terms of performance, recall and relevance including taming heterogeneity. What I want ‘magic’ system

11 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 11 Proposition 3 questions are related Solution based on –Structured data (and using it as metadata) –Formal logic (reproducible) –Knowledge engineering techniques Exposed to end user with –User-friendly assistant –Graphic metaphors

12 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 12 Requirement The end-user wants a homogeneous response to a query –which may involve functional processing in addition to a simple retrieval. –response in a reasonable time –with all relevant information (recall) –some indication of relevance (how closely he answer matches the query). Example: Total amount of funding by year by university spent on biomedical research For last 10 years by midday today Ensuring all universities are included With computed distance score from ‘biomedical’ in classification scheme

13 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 13 Requirement: Data Quality Real world Decision- making based on the computer system Decision-making quality depends critically on the data quality end-user presented with graphical representations of statistically-reduced or model-enhanced data

14 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 14 Requirement: User Query How does an end-user express this in a structured query or find the (relevant, complete) information via Google? –Are the years calendar or financial? –Are the patents measured by number or value of licences granted? –Only publications in peer- reviewed journals? –Universities in which countries “compare the research performance of my university against others across a range of metrics (products, patents, publications) over years”

15 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 15 Requirement: Data Structure Timeliness –when required, usually immediately Relevance –Of results to query Recall –Completeness of results All require structured data Or structured metadata describing semi-structured or unstructured data What is the SQL for I want it now? How to match complex logic of query (with processing) to result set How to know what is in the ‘universe’ For precise matching / measurement / calculation

16 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 16 Structure The Problem, Proposition, Requirement Past Work, Analysis, Conclusion Solution Additional Aspects Future Work Conclusion

17 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 17 Related Work Generally relevant –Information management, user interfaces, performance….. Copious –Related directly to CRIS limited and mainly in CRIS conferences

18 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 18 Related Work - analysis integrating heterogeneous distributed databases of CRIS information including –open access institutional repositories –research datasets –represented by structured metadata into a homogeneous canonical form; harvesting semistructured information from the web to create a structured metadata index –with limited information in a structured form –May be only term and URL which points to the original semistructured data in its native form;

19 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 19 Related Work: Conclusions unstructured or semistructured data is valuable only when indexed by structured metadata satisfactory responses to end-user queries only produced when the data are structured, or indexed by structured metadata; satisfactory response to end-user queries over heterogeneous distributed data requires knowledge-based techniques (And KB techniques rely on structured data) semistructured harvesting/browsing techniques require much human effort and so do not scale Structured data-based techniques allow for further processing, automating and scaling but more R&D is needed to make it work

20 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 20 Related Work: Conclusion Semistructured data indexed Satisfactory user query Basis for advanced KB techniques Scalability Further processing

21 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 21 CERIF-CRIS Publ text Sci data Proj Desc CV Org Desc Publ text Sci data Org Desc CV Proj Desc Much human effort in selecting and integrating Harvesting / browsing Human effort spent on decision-making

22 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 22 Structure The Problem, Proposition, Requirement Past Work, Analysis, Conclusion Solution Additional Aspects Future Work Conclusion

23 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 23 Solution: Quality Structured Information: Data Data quality can only be assured if the input or edit of data attribute values is validated and – if necessary – supported with explanation of valid values To achieve this requires structured data i.e. data arranged as attribute values in a structure to form information validation techniques applicable to each data attribute and relationships between attributes [GoGlJe93]. validation relies on first order logic and Boolean algebra. This demands structured data (information). For CRIS: CERIF –data structured formally as information –Attribute values in controlled vocabularies (domain ontologies)

24 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 24 Solution: Quality Structured Information: Metadata CERIF can act as metadata to other structured, semistructured and unstructured information Examples: OA IRs and research datasets and software. Being extended to financial, project and personnel information ensures such associated datasets are validated, retrieved and interpreted in a structured and logical context. Repository CERIF as metadata Object e.g. document

25 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 25 Solution: Expert Advisor and Query Assistant Much past work on this area. Process is: –Interact with user to discover not ‘what they say’ but ‘what they mean’ –Enhance the query appropriately for relevance, recall, performance –Replay (in natural language) query to user for confirmation –Execute query including taming heterogeneity –Provide user with answer integrated (automatically according to user preference) in preferred Character set, Language, Syntax, Semantics, Media / Mode

26 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 26 Solution: Homogeneous View through Knowledge-Based Information Integration Several known techniques for providing a homogeneous information (syntactic) & knowledge (semantic) view over heterogeneous data –http://epubs.cclrc.ac.uk/work-details?w=33728 May be a stored view (data warehouse) –Problem of currency Or on-demand view –Problem of performance In any case requires –Schema matching –Query rewriting and distribution –Answer conversion and integration

27 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 27 Solution: Explanation When user receives answer –Even if translated to canonical (CERIF) form It may not be understandable so need explanation This is done using knowledge-based techniques –The ‘reverse’ of query assistance / improvement –Using domain ontologies to associate additional information to the answer to explain

28 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 28 Structure The Problem, Proposition, Requirement Past Work, Analysis, Conclusion Solution Additional Aspects Future Work Conclusion

29 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 29 Analysis, Modelling, Visualisation Decision making today is complex –Multiple parameters –Huge volumes of data, sometimes structured as information –Information of varying reliability So typically the data is processed to represent in a succinct fashion (eg graph, map, video) And computer models are used to produce predicted results for comparison with the recorded data This cannot be done without structured information which is somehow made consistent

30 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 30 Push Technology Push technology relies on a standard query which is executed when triggered by a (relevant) change in the observed database or an externally defined condition Therefore it requires structured data (or structured metadata representing un- or semi-structured data)

31 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 31 Now, recall the example query “compare the research performance of my university against others across a range of metrics (products, patents, publications) over years” And let us imagine how it would be handled by the proposed system environment

32 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 32 Utilising the Solution: The Researcher The researcher would find –an easy-to-use assisted interface; –behind which integration of information from multiple sources would be performed automatically, –even bringing into a homogeneous context semistructured information (notably publications and research datasets with associated software) via the structured metadata. Thus the cloud of research information becomes a well-formed set of quality information.

33 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 33 Utilising the Solution: The Research Manager The research manager in a funding organisation would find –an easy-to-use assisted interface with the required information from multiple heterogeneous sources integrated. –The interface provides appropriate functions for comparing the funding; the explanation engine provides background information on the way in which funding is calculated in each country. –quality information structured in context and with appropriate explanation to assist in interpretation. –interface assistance to formulate the query to ensure the correct year(s) and departments are selected and the count of publications is done correctly according to the criteria (e.g. against a list of acceptable peer-reviewed publication channels).

34 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 34 Utilising the Solution: The Innovative Entrepreneur The innovative entrepreneur utilises the advanced interface –to formulate a homogeneous query over the heterogeneous national sources of information. –The query is improved by the inference engine and domain ontology to overcome the fact that terminology differs from country to country and she wishes to compare like with like in terms of patents, products and their generated wealth through licences or sales. –More complex is to compare track records in wealth creation of research groups; a graphical representation of value against time is used with annotation to explain the basis of the reasoning leading to the inclusion or exclusion of certain research outputs.

35 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 35 Structure The Problem, Proposition, Requirement Past Work, Analysis, Conclusion Solution Additional Aspects Future Work Conclusion

36 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 36 Future Work As indicated more R&D required: –Faster and more effective schema integration –generation of software to then automatically manage the distributed heterogeneous queries and the answer homogenisation –Faster and more effective tools for building and maintaining domain ontologies (including consistency checking) –Tools for managing multiple languages and character sets

37 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 37 Structure The Problem, Proposition, Requirement Past Work, Analysis, Conclusion Solution Additional Aspects Future Work Conclusion

38 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 38 Conclusion Keith G Jeffery Director, IT & International Strategy CCLRC keith.g.jeffery@rl.ac.uk Anne G S Asserson Research Department University of Bergen anne.asserson@fa.uib.no The widespread use of web browsers gives the impression that information for decision makers is readily available. It is, but is of varying quality and requires heavy human involvement to use it. This does not scale. Structured data, or metadata representing un- or semi-structured data is the only reliable foundation. Upon this, advanced CRIS systems for decision-making can be built.

39 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 39

40 © Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 40 Entity represented by a relation or table One instance One attribute Another Entity relationship Structured Approach


Download ppt "© Keith G Jeffery, Anne G S Asserson IWIRCRIS Copenhagen 200620061109 1 Exploring the Cloud of Research Information Systematically Keith G Jeffery Director,"

Similar presentations


Ads by Google