Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Keyword Aggregator web service A tool and methodology for managing digital objects’ keywords IINFORMATION MANAGEMENT TECHNOLOGY, LAND & WATER David.

Similar presentations


Presentation on theme: "The Keyword Aggregator web service A tool and methodology for managing digital objects’ keywords IINFORMATION MANAGEMENT TECHNOLOGY, LAND & WATER David."— Presentation transcript:

1 The Keyword Aggregator web service A tool and methodology for managing digital objects’ keywords IINFORMATION MANAGEMENT TECHNOLOGY, LAND & WATER David Benn | Software Engineer, IMT Scientific Computing, CSIRO 1 December 2015 scikey.org

2 Agenda 2 | The Keyword Aggregator | David Benn What problem are we solving? Finding suitable (science) keywords for publications What have we built (to address the problem)? The Keyword Aggregator: –web service, example widget, vocabularies, related tools What remains to be done?

3 Collaboration 3 | The Keyword Aggregator | David Benn CSIRO Land & Water Nick Car * Simon Cox Jonathon Yu CSIRO Information Management Technology (IMT) David Benn IMT Scientific Computing eResearch projects x 2 –1 day per week for 6 months

4 What problem are we solving?

5 Publication Keywords 5 | The Keyword Aggregator | David Benn Data publication, Software publication, Journal paper, Conference paper …

6 Publication Keywords 6 | The Keyword Aggregator | David Benn

7 Publication Keywords: Controlled Vocabulary 7 | The Keyword Aggregator | David Benn

8 Publication Keywords: Free Entry 8 | The Keyword Aggregator | David Benn

9 What have we built? Keyword Aggregator

10 Aggregated Keyword Source 10 | The Keyword Aggregator | David Benn ?

11 Folksonomies 11 | The Keyword Aggregator | David Benn lowest ranking

12 Design Goals 12 | The Keyword Aggregator | David Benn Fast keyword search, even with many vocabs. Relevant search results Various search strategies may be needed, not just full-text search. Allow for folksonomy-style use. Web service and demo client (widget) For direct use or as a reference implementation (e.g. ZK vs jQuery). Simple management of separate vocabularies.

13 Keyword Aggregator 13 | The Keyword Aggregator | David Benn Web Service Vocab 1 Vocab 2 Vocab n … Folkso nomy

14 14 | The Keyword Aggregator | David Benn Keyword Aggregator

15 REST API: http://scikey.org/api/search?keyword=bio&limit=1 http://scikey.org/api/search?keyword=bio&limit=1 15 | The Keyword Aggregator | David Benn {"head": {"vars": ["graph_name", "term", "sum", "text_value", "prefLabel"]}, "results": {"bindings": [ [{"term": {"type": "uri", "value": "http://vocab.nerc.ac.uk/collection/P64/current/G023/"}, "graph_name": {"type": "uri", "value": "http://scikey.org/ns/vocab/gcmd-v6"}, "vocab_subject": {"xml:lang": "en", "type": "literal", "value": "science"}, "text_value": {"xml:lang": "en", "type": "literal", "value: "EARTH SCIENCE > Agriculture > Animal Science > Animal Physiology and Biochemistry"}, "sum": {"datatype": "http://www.w3.org/2001/XMLSchema#integer", "type": "typed-literal", "value": "20"}, "vocab_title": {"xml:lang": "en", "type": "literal", "value": "GCMD Science Keywords V6"}, "vocab_status": {"type": "uri", "value": "http://purl.org/linked-data/registry#statusSubmitted"}, "prefLabel": {"xml:lang": "en", "type": "literal", "value": "EARTH SCIENCE > Agriculture > Animal Science > Animal Physiology and Biochemistry"}}] }

16 Usage Stats: in relational database 16 | The Keyword Aggregator | David Benn KWAG|test|1440120084825|none|http://scikey.org/def/computational_fluid_dynamics_cfd KWAG|test|1440121191964|none|http://scikey.org/def/computational_fluid_dynamic KWAG|test|1440124763241|none|http://scikey.org/def/dynamic_coupled_food_web_model KWAG|test|1440561254563|none|http://scikey.org/def/aggregate_data KWAG|test|1440561254563|none|http://scikey.org/ns/vocab//def/Haplorhini KWAG|test|1440561254563|none|http://scikey.org/def/drinking_water KWAG|test|1440739579652|none|http://scikey.org/def/atmospheric_water_generator KWAG|test|1440743276944|none|http://scikey.org/def/hydrography KWAG|test|1440743352067|none|http://scikey.org/def/archydro KWAG|test|1440745243860|none|http://scikey.org/def/taihu_lake …

17 What have we built? Vocabularies

18 SKOS: http://www.w3.org/TR/skos-reference/http://www.w3.org/TR/skos-reference/ 18 | The Keyword Aggregator | David Benn

19 Vocabularies 19 | The Keyword Aggregator | David Benn MODSIM 2011, 2013 keyword analysis.

20 Vocabularies 20 | The Keyword Aggregator | David Benn GCMD: Global Change Master Directory science keywords

21 Vocabularies 21 | The Keyword Aggregator | David Benn Wikipedia Computer Science

22 Discoverable Vocabularies 22 | The Keyword Aggregator | David Benn

23 Vocabulary Metadata 23 | The Keyword Aggregator | David Benn

24 24 | The Keyword Aggregator | David Benn Vocab Metadata Generation

25 25 | The Keyword Aggregator | David Benn

26 Vocab-of-vocabs concept A SKOS ConceptScheme can point its hasTopConcept property to a Concept outside itself. Useful for broad vocabs where specialisations exist –e.g. science keywords A “vocabulary-of-vocabularies”| Nicholas Car 26 | hasTopConcept

27 Vocab-of-vocabs concept Allow a single, integrated, vocab set to be used by search tools No change to underlying vocabulary A “vocabulary-of-vocabularies”| Nicholas Car 27 | skos:ConceptScheme GCMD Terms skos:Concept OCEANS skos:Concept MARINE SEDIMENTS skos:Concept TURBIDITY skos:ConceptScheme Turbidity Types skos:Concept Turbidity Type 1 skos:Concept Turbidity Type 2 hasTopConcept Vocab 2 Vocab 1

28 Search Strategies Explored 28 | The Keyword Aggregator | David Benn Simple matching in vocab text elements Weighted semantic Assign weights to text matches in different SKOS elements –e.g. skos:prefLabel, skos:altLabel, skos:definition, dc:description Hierarchical Exploits explicit broader/narrower relationships present in some vocabs Historical/popularity based

29 Administration 29 | The Keyword Aggregator | David Benn

30 What remains to be done?

31 Future Work: publication, improvements 31 | The Keyword Aggregator | David Benn Software publication in CSIRO Data Access Portal (in draft) Search result “decoration” from JSON to enhance widget keyword selection. Automate ingestion of arbitrary number of known vocabularies. Streamline vocabulary submission process. Mine arbitrary/federated metadata for vocabulary existence. Inter-vocabulary individual term linking search strategy (e.g. skos:related). Use of previously chosen keywords to inform search result prioritisation. Scale search performance with large or many vocabularies. Web-time hierarchical search: pre-compute, more resources, better graph engine.

32 IMT Scientific Computing, CSIRO David Benn Software Engineer t+61 8 8303 8512 e david.benn@csiro.au david.benn@csiro.au w http://people- my.csiro.au/B/D/David-Benn.aspx Thank you IMT/SCIENTIFIC COMPUTING


Download ppt "The Keyword Aggregator web service A tool and methodology for managing digital objects’ keywords IINFORMATION MANAGEMENT TECHNOLOGY, LAND & WATER David."

Similar presentations


Ads by Google