Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trends in Web Search and its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore.

Similar presentations


Presentation on theme: "Trends in Web Search and its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore."— Presentation transcript:

1 Trends in Web Search and its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore

2 Min-Yen Kan, WING@NUS 226 Sep 2008World Scientific Talk Tips on Web Searching Visualize results, then come up with multiple queries Use multiple search engines Advanced Search – inurl:, site: – “Phrasal search” But that’s just general search… Federated resources / Niche search engines

3 Min-Yen Kan, WING@NUS 326 Sep 2008World Scientific Talk Site- and Task-specific resources Site Prestige Know what others think and do – Google PageRank (Link structure), Alexa (Traffic) – Google Trends / Insight (Queries) Social Searching (Web 2.0) The voice of the reader / critic – (Bookmarks / Tags) Del.icio.us, Citeulike.org, Bibsonomy.org – (News) Digg / Slashdot – (Blogs) Google Blog, Technorati People Search: Finding public information on a person – Spock (web), Zabasearch (US only) – LinkedIn, Facebook – Must validate your sources http://labs.digg.com/arc/

4 Min-Yen Kan, WING@NUS 426 Sep 2008World Scientific Talk Expert Search Find people who will advocate on your behalf What do they want? Scholar: – Active? → Check their recent articles – Names common? → Define area of interest – Compare against peers – Download vs. citation counts Patent search: – Referenced by: (citation count; different than scholar) Identifying webfaced advocates: – Blog search, PageRank http://flickr.com/photos/phauly/ How do machines do it? Expert search task as benchmark test Download web pages to analyze Needed to deal with spam pages Used PageRank to assess prestige How do machines do it? Expert search task as benchmark test Download web pages to analyze Needed to deal with spam pages Used PageRank to assess prestige → Impact

5 Min-Yen Kan, WING@NUS 526 Sep 2008World Scientific Talk Problem or opportunity? Revenue from print continually declining Students and researchers rely on internet Researchers want archiving rights – freedom of academic information Characteristics: Not zero-sum content Distribution is now largely the role of search engines → Necessitates new role of publisher and new revenue model – Will classic models work? Advertising, Subscription, Transactional & Bundling – Variants? Versioning (Varian), Moving window (JSTOR) http://flickr.com/photos/danielbroche/ The game has fundamentally changed

6 Min-Yen Kan, WING@NUS 626 Sep 2008World Scientific Talk Forecasting – Content is becoming free – MIT / Stanford opening up textbooks – Open access archiving → long term: content will not be primary revenue source eBook revenue hasn’t held up its promise yet… – Device gap: iPhone and nextGen devices → Revenue may be further down the pipe + Academic publishers – Connect to libraries and federations at institution level – Individual customers are secondary Trusted source – Expertise in copyediting, typesetting, project management, distribution, social networking – Many individual web publishers rediscovering same problems → Consultancy model → Win-win partnerships with individual authors

7 Min-Yen Kan, WING@NUS 726 Sep 2008World Scientific Talk Web Trends Social Content Wisdom of masses: Crowdsourcing Rich Media Open Source / Access Paradigmatic change – Classifieds → Craigslist – POTS → Skype – CD store → iTunes – Publishers → ?? http://www.informationarchitects.jp/ slash/iA_WebTrends_2007_2_1024_768.gif

8 Min-Yen Kan, WING@NUS 826 Sep 2008World Scientific Talk Where is research going? Search API usage Browser as computer Web page structure, mining text data Modeling web users at tasks: Exploring / Fact-finding Personalization, recommending Social networks Understanding opinion Query and log analysis http://flickr.com/photos/alisdair/ User centricServer centric

9 Min-Yen Kan, WING@NUS 926 Sep 2008World Scientific Talk Webfaced pop quiz – which is which? Springer American Statistical Society World Scientific courtesy: http://pagerank.si/http://pagerank.si/ WING@NUS

10 Min-Yen Kan, WING@NUS 1026 Sep 2008World Scientific Talk Forecast: Know your strengths Get advocates Make it easy to get individuals to insist to their institution to buy your materials Know who is accessing (not necessarily buying) your content Content revenue will continue to decline Find an economic model that works for you Work as partners in content creation Be savvy on trends Be visible: do “white hat” Search Engine Optimization (SEO) Make your abstracts indexable by others + Academic publishers – Connect to libraries and federations at institution level – Individual customers are secondary Trusted source – Expertise in copyediting, typesetting, project management, distribution, social networking – Many individual web publishers rediscovering same problems –→ Consultancy model –→ Win-win partnerships with individual authors

11 Min-Yen Kan, WING@NUS 1126 Sep 2008World Scientific Talk Trends in Digital Libraries Expanding types of information in search Automated tools for DLs Usability in E-books and online media User modeling Personalization, annotation and relation to other user tasks http://flickr.com/photos/pathfinderlinden >> WING @ NUS

12 Min-Yen Kan, WING@NUS 1226 Sep 2008World Scientific Talk Scholarly Digital Libraries ForeCite: our scholarly DL Data Cleaning Slide and Document Alignment Searching in the OPAC Math Information Retrieval

13 Min-Yen Kan, WING@NUS 1326 Sep 2008World Scientific Talk ForeCite: Beyond the document as an item A user-centric DL framework Put author / reader functionality together Tagging, correction, annotation and viewing Automatic tools: keyphrases and sentence classification For use on and offline, organizes local PDF files for you Only need your web browser Server Client

14 Min-Yen Kan, WING@NUS 1426 Sep 2008World Scientific Talk Data Cleaning Addresses – Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802 – LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343 Products – Honda Fix vs. Honda Jazz – Apple iPod Nano 4GB vs. 4GB iPod nano 4GB Idea: use web as additional context for disambiguation and clustering Placed 3rd in Web People Search Task (WEPS 2007) Search results: “Jeffrey D. Ullman” 384,000 pages “Jeffrey D. Ullman” + “aho” 174,000 pages “J. Ullman” 124,000 pages “J. Ullman” + “aho” 41,000 pages “Shimon Ullman” 27,300 pages “Shimon Ullman” + “aho” 66 pages 45% 33% 0%

15 Min-Yen Kan, WING@NUS 1526 Sep 2008World Scientific Talk Slides and their relationship to documents Document in focus Slides in Focus

16 Min-Yen Kan, WING@NUS 1626 Sep 2008World Scientific Talk Searching in Libraries http://linc.comp.nus.edu.sg

17 Min-Yen Kan, WING@NUS 1726 Sep 2008World Scientific Talk Symbolic Information Search How do users want to search math materials? Our answer: Text-to-Expression Linking – Resolve text keywords to expressions – e.g., “Pythagorean Theorem”  “a 2 +b 2 =c 2 ” or “x 2 +y 2 =z 2 ” Reduce the need for expression input Solves the notational variation problem Not quite right…

18 Min-Yen Kan, WING@NUS 1826 Sep 2008World Scientific Talk Conclusions Consider us your research WING! Trade data and problems for solutions and interns Meanwhile: Use better search strategies Practice white hat SEO Identify webfaced advocates

19 Min-Yen Kan, WING@NUS 1926 Sep 2008World Scientific Talk References Kahin and Varian (2000) Internet Publishing and Beyond Towle et al. (2007) Electronic Books in the 2003-2005 Period, Pub Res Q 23:95-104 Photo Credits Flickr Creative Commons Search Thanks to all of you for listening & my fellow WING group members

20 Min-Yen Kan, WING@NUS 2026 Sep 2008World Scientific Talk

21 Min-Yen Kan, WING@NUS 2126 Sep 2008World Scientific Talk Abstract I will present trends in current academic research on web search and digital libraries, and discuss their relevance to publishers and their economic model. With respect to the web, I will cover how search engines are starting to specialize and use click through and ad data to improve relevance ranking. With respect to digital library research, I discuss my group's research at NUS on advancing the state-of-the-art in scholarly digital libraries. I cover advances on how we deal with data cleaning issues, and slide and equation retrieval and alignment.


Download ppt "Trends in Web Search and its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore."

Similar presentations


Ads by Google