ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Números.
1 A B C
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
AP STUDY SESSION 2.
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
& dding ubtracting ractions.
Sequential Logic Design
Addition and Subtraction Equations
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Ninth Grade Distribution All Students = 301 (100%) (less level 1 & 2 students) (less level 3 students in full transition) (less students in full.
CHAPTER 18 The Ankle and Lower Leg
2.11.
The Feasibility of Using the Semantic Components Model for Indexing Documents in Digital Libraries * Susan Price + Marianne Lykke Nielsen * Lois Delcambre.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Sampling in Marketing Research
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
Look at This PowerPoint for help on you times tables
Regression with Panel Data
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Progressive Aerobic Cardiovascular Endurance Run
Name of presenter(s) or subtitle Canadian Netizens February 2004.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.
Subtraction: Adding UP
: 3 00.
5 minutes.
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Age Biased Technical and Organisational Change, Training and Employment Prospects of Older Workers Luc Behaghel, Eve Caroli and Muriel Roger Paris School.
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
Biostatistics course Part 14 Analysis of binary paired data
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
How doctors apply semantic components to specify search in work-related information retrieval Marianne Lykke, professor, Aalborg University Susan Price,
Marianne Lykke Nielsen September 2008 Indexing with semantic components improve information retrieval in domain-specific web portal Marianne Lykke Nielsen.
Presentation transcript:

ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference Sapienza University of Rome, Faculty of Philosophy February , 2010 Using semantic components to represent and search domain-specific documents: An evaluation of indexing accuracy and consistency

ISKO 2010Marianne Lykke Agenda Problem and motivation Semantic component model Research questions Test design Results Conclusions

ISKO 2010Marianne Lykke Problem and motivation Challenges for information retrieval in domain-specific digital libraries : Domain-specific libraries often contain large sets of similar documents about few topics o Important to be able to distinguish between topical similar documents Domain experts often have specific information needs targeting a single right answer, specified by domain- specific facets. o Important to be able to limit search to domain-specific dimensions (e.g. Leckie et al., 1996; Fagin et al., 2003; Freund et al., 2005; Hearst et al., 2006)

ISKO 2010Marianne Lykke Problem and motivation Little time for information retrieval o Important that then relevant documents are highly ranked and retrieved by first query Distributed indexing, carried out by indexers with varied degree of indexing competence o Important to address classical indexing problems: quality, exhaustivity, specificity, consistency (e.g. Leckie et al., 1996; Fagin et al., 2003; Freund et al., 2005; Hearst et al., 2006)

ISKO 2010Marianne Lykke Semantic component model Semantic components model developed to facilitate formulation of specific, structured queries covering the search topic exhaustively by domain-specific dimensions Two-level model dividing a given collection into a set of document classes, each class with an associated set of semantic components Based on assumptions that o Domain experts know document genres within a certain domain: content and structure (Dillon, 1991; Orlikowski & Yates, 1994; Bishop, 1999; Vaughan & Dillon, 2005) o Domain-specific document content and structure correspond to domain-specific information needs (Ely et al, 1999,2000; Price, Delcambre, Nielsen, 2006)

HIO 2009Marianne Lykke SC: General information SC: Practical information Document class: Clinical method

HIO 2009Marianne Lykke SC: General information SC: Risk factors After treatment Document class: Clinical method

ISKO 2010Marianne Lykke Semantiske component model Document classSemantic componentDocument classSemantic component Clinical problemGeneral information Diagnosis Referral Treatment Clinical unitFunction and specialty Practical information Referral Staff and organization Clinical methodGeneral information Practical information Referral Aftercare Risks Expected results DrugsGeneral information Practical information Target group Effect Side effects ServicesGeneral information Practical information Referral NoticeGeneral information Practical information Qualification

HIO 2009Marianne Lykke

HIO 2009Marianne Lykke

ISKO 2010Marianne Lykke Case study sundhed.dk: Danish, national health portal Active since 2001, documents Two main target groups: citizens and medical professionals Combination of full-text indexing and controlled, assigned indexing: o ICPC, International Classification Primary Care o ICD-10, International Classification of Diseases o Home-grown Citizens Thesaurus Large and varied group of indexers o 5 regions o Up to 250 indexers per region Specific target group: family doctors

ISKO 2010Marianne Lykke Test design Comparative, experimental indexing study o Baseline: keyword indexing (controlled and free terms) o Experimental: semantic component indexing Test persons: 16 sundhed.dk indexers (convenience sample) Indexing task: 12 sundhed.dk documents o 6 documents were indexed with semantic components (SC) o 6 documents were indexed with keywords Random assignment of documents and indexing methods Training session Evaluation measures: o Accuracy o Consistency o Indexing time o Easiness

ISKO 2010Marianne Lykke Research questions Is semantic component indexing more accurate than keyword indexing compared to a reference standard? Is semantic component indexing more consistent than keyword indexing? Is semantic component indexing faster than keyword indexing? Is semantic component indexing easier than keyword indexing?

ISKO 2010Marianne Lykke Accuracy DocumentSemantic componentKeywords Recall macroaverage Precision macroaverage Recall macroaverage Precision macroaverage ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.34

ISKO 2010Marianne Lykke Consistency DocumentSemantic componentKeywords Mean K ± SD (of all semantic components in the document) Binary K (all vocabularies) Traditional 1 ± SD consistency = c / (a + b – c) ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.24

Time to index

Easiness

ISKO 2010Marianne Lykke Conclusions Varied accuracy for both indexing methods, but data suggests that semantic component indexing might be more accurate Indications that feasibility and easiness of indexing methods are similar Semantic component indexing may be preferable alternative if no appropriate controlled vocabulary is available due to short time for development and easy customization to specific document collection Limitations: o Small sample and a single domain o Not directly comparable evaluation measure Retrieval test shows improvement of document ranking of 25.6% by nDCG (normalized Discounted Cumulative Gain)

ISKO 2009Marianne Lykke Future research Development of model: o Simpler version o Up-marking by users (social tagging) o Automatic up-marking o Up-marking by XML Larger scale evaluation Evaluation in other domains

HIO 2009Marianne Lykke Litteratur Dillon, M (1991). Readers model of text structures: the case of academic articles. International Journal of Man-Machine Studies, – 925. Ely, J, Osheroff, J, Ebell, M, Bergus, G, Levy, B Chambliss, M & Evans, E (1999). Analysis of wquestions asked by family doctors regarding patient care. BMJ, 310 (7206). 358 – 361. Ely, J, Osheroff, J, Gorman, P, Ebell, M, Bergus, G, Levy, B Chambliss, M, Pifer, E & Stavri, P (2000). A taxonomy of generic clinical questions: classification study. BMJ, 321 (7278) Fagin, R., Kumar, R., McCurley, K S., Novak, J., Sivakumar, D., Tomlin, J.A. & Williamson, D.P. (2003). Searching the workplace web. In: Proceedings of the 12th International World Wide Web Conference (WWW 03), Budapest, Hungary, May 20-24, Freund, L., Toms, E. & Waterhouse, J. (2005). Modeling the information behaviour of software engineers using a work-task framework. In: Grove, A (ed.) ASIS&T 05 Proceedings of the 68th Annual meeting, Charlotte, NC, October 28-ember 2, Hearst, M & Plaunt, C (1993). Subtopic structuring for full length document access. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 59 – 69. Leckie, G.J., Pettigrew, K.E. & Sylvain, C. (1996). Modeling the information seeking of professionals. Library Quarterly, 66 (2) Orlikowaki, W J & Yates, J (1994). Genre repertoire: the structuring of communicative practices in organizations. Administrative Science Quarterly, – 574. Price, S, Delcambre, L & Nielsen, M L (2006). Using semantic components to express questions against document collections. Proceedings International Workshop on Health Information and Knowledge Management (HIKM 2006), Arlington (VA). Price, S, Nielsen, M L, Delcambre, L & Vedsted, P (2007). Semantic components enhance retrieval of domain-specific documents. Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management (CIKM), Lisboa, November 6 - 8, 2007.

HIO 2009Marianne Lykke Search term should appear in specified semantic component Search term

HIO 2009Marianne Lykke Semantic component should appear in document

Time to index Indexing Type Total Documents Indexed (max = 96) Mean Num. Docs Indexed Per Indexer (max = 6) Mean Time (min:sec) Min Time (min:sec) Max Time (min:sec) Semantic Components :0300:2427:05 Keywords :5601:0631:26 Time required for indexing documents

HIO 2009Marianne Lykke Research team General practice Peter Vedsted MD, Ph.D. Research Unit general Practice, Århus University Jens Rubak MD Praksis.dk, Region Midt Information and computer science Lois Delcambre, Ph.D., Professor Susan Price, MD, Ph.D. student Computer Science Department Portland State University, USA Marianne Lykke, Ph.D., Associate professor Information Interaktion and Information Arkitecture Danmarks Bibliotekskole sundhed.dk Vibeke Luk Frans la Cour Information specialist IT consultant sundhed.dkAutonomy Supported by grants from the National Science Foundation, grant numbers , and , the National Library of Medicine Training Grant 5- T15-LM07088 and Kvalitetsudviklingsudvalget for Almen Praksis, Aarhus Amt