Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Presented by Zeehasham Rasheed
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Overview of Search Engines
Object-Oriented Analysis and Design LECTURE 8: USER INTERFACE DESIGN.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
What difference a good tool? using Endeca for a faceted catalog Emily Lynema NCSU Libraries ACRL Delaware Valley Chapter Fall Program November 3, 2006.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Which of the two appears simple to you? 1 2.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
HYP Progress Update By Zhao Jin. Outline Background Progress Update.
Universit at Dortmund, LS VIII
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Search Engine Architecture
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
What Does the User Really Want ? Relevance, Precision and Recall.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Ph.D Milestones Jin Zhao 23 Nov Jin Zhao 23 Nov 2012 / 10 Timeline 2 WING, NUS Initial Exploration Topic Formulation.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Detecting Online Commercial Intention (OCI)
Thanks to Bill Arms, Marti Hearst
Information Retrieval
InfoTrac/PowerSearch Interface Enhancements
Introduction to Information Retrieval
Introduction to Search Engines
Presentation transcript:

Math Information Retrieval Zhao Jin

Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated on research development Generic search engines ineffective in such situations –Unaware of user needs and math expressions Why Math Information Retrieval?

Zhao Jin. Math Information Retrieval Linked Expression: a 2 +b 2 =c 2 Filter by > resource category, experience, specificity Tutorials, Slides, Problem Solution Set, Applets, Tools, Data, Algorithms Proof Definition… Tutorial Proof Goal: a user-centric and math-aware DL

Zhao Jin. Math Information Retrieval Introduction Literature Review Domain-specific Information Seeking Studies Current Math Resources Math Information Retrieval User Study Prototype Implementation Future Work Outline

Zhao Jin. Math Information Retrieval Domain Specific Information Seeking Studies Monograph to Online Resources Time-saving or relevant Usability and accessibility problems Current Math Resources Online Isolated Different degree of math awareness Literature Review Key requirements: Usefulness, Usability, and Accessibility Math Web SearchWolfram Function Site 1. Hamper Accessibility 2. Limited search capability and hard to judge usefulness

Zhao Jin. Math Information Retrieval Current Math Information Retrieval Research Expression Matching Text-based approaches –Notational Variation Problem: “a 2 +b 2 =c 2 ” ≠ “x 2 +y 2 =z 2 ” Non-text-based approach Query language Expressiveness vs. User-Friendliness Literature Review User-Friendliness Expressiveness LaTeX, MathML Text Keyword GUI, Natural Language, ASCII Assume an expression input from the user

Zhao Jin. Math Information Retrieval Whether the information needs of the users are satisfied by such resources What do the user really need? How do they perform information seeking? What are the difficulties encountered? Whether the current research focus is appropriate Do they really need/prefer expression search? Unanswered Issues Further Study Needed!

Zhao Jin. Math Information Retrieval Introduction Literature Review User Study Findings Desiderata in Math Information Retrieval Prototype Implementation Conclusion Outline

Zhao Jin. Math Information Retrieval Three Approaches Keyword Search / Browsing /Personal Contacts Trade-off between cost and benefit Expression Search Attractive but utility unknown Keyword search still popular and preferred The multi-faceted user needs Information-oriented / Format-oriented Specificity and Experience for filtering Domain and Intent as context Need to cater for specifically Findings from the User Study

Zhao Jin. Math Information Retrieval Multi-collection search Search through multiple collections on behalf of the user Enhance the usability and accessibility of collections Resource Categorization Automatically classify the materials according to the different facets of the user needs Return results that best suit the user needs Desiderata in Math Retrieval

Zhao Jin. Math Information Retrieval Introduction Literature Review User Study Prototype Implementation –Focus on Resource Categorization Future Work Outline

Zhao Jin. Math Information Retrieval Multi-collection Search Meta-search Offline indexing based on open source package Easier requirement to meet between the two Resource Categorization Domain-specific text categorization on webpages More interesting as a research topic Prototype Implementation Focus of the prototype is on Resource Categorization

Zhao Jin. Math Information Retrieval Entire page is not a suitable unit for categorization Vision-based Segmentation (VIPS) used Webpage Segmentation Definition Variation Toolbar

Zhao Jin. Math Information Retrieval Supervised Machine Learning Pipeline Labels Math related / non-math-related Features Word, Image, Formatting, Hyperlink, Layout, Context Machine Learner SVM Training/Testing Data Small corpus of webpages for 5 math topics Manually annotated Kappa-agreement: 0.87 Resource Categorization

Zhao Jin. Math Information Retrieval Average accuracy: 0.36 on F 1 Strength: separating math contents from the rest Weakness: identifying their exact type Feature Utility Text  competitive baseline Image  filter non-math information Formatting  identify section headings etc. Hyperlink  separate related concepts and resource from the rest Layout  improve precision at the cost of recall Context  not effective overall Evaluation

Zhao Jin. Math Information Retrieval Training Data Insufficient examples Skewed distributions Segmentation Over- or under-segmented Potential Sources of Error

Zhao Jin. Math Information Retrieval Introduction Literature Review User Study Prototype Implementation Future Work Outline

Zhao Jin. Math Information Retrieval Iterative Development Process Resource Categorization Prototype fielding Text-to-Expression Linking Resolve text keywords to expressions Reduce the need for expression input Help to solve the notational variation problem Fit well with the rest of the desiderata Extension to Medical Domain NUH evidence Project Future Work “Pythagorean Theorem”  : “a 2 +b 2 =c 2 ” & “x 2 +y 2 =z 2 ”

Zhao Jin. Math Information Retrieval To create a user-centric and math-aware digital library on math materials Two Desiderata: Multi-Collection Search, Resource Categorization Prototype classification accuracy of 0.36 F 1 Future Text-to-Expression Linking Thank you for listening Questions? Conclusion