Presentation on theme: "CS246 by John Cho1 CS246: Web Information Systems Junghoo John Cho Spring 2014."— Presentation transcript:
CS246 by John Cho1 CS246: Web Information Systems Junghoo John Cho Spring 2014
CS246 by John Cho2 Course Information Web page: http://oak.cs.ucla.edu/cs246/ Topic: Web information management Time: MW 2:00 -- 3:50 pm Place: Boelter Hall 5422 Instructor: Junghoo John Cho office: 3531H Boelter Hall email: firstname.lastname@example.org please use subject CS246: … office hours: Mon 1-2 pm.
CS246 by John Cho3 Who is this class for? Strong interest in research Interest in Web information systems Time commitment: Around 2-3 papers every week Typically one full day of paper reading One indepedent project Similar to paper writing In fact we read papers from past student projects! Or interesting application implementation
CS246 by John Cho4 Todays Topics Overview of the course topics Course logistics Paper reading assignments Class project
CS246 by John Cho5 Prerequisite Introductory database, e.g., CS143 e.g.: query? SQL? Basic algorithms and data structures Basic probability and statistics P(A|C), Bayes rule, … Design and implementation experience Basic C++ Quick test: Grab a sample paper See if you can read, understand and build it
CS246 by John Cho6 Tell Us About You Name Department & Program Before coming to UCLA Brief history at UCLA Technical/research interests Expectation from the class
CS246 by John Cho7 Legacy database Plain text files Biblio sever Information Galore
CS246 by John Cho8 Central Problem How to manage/access information on the Web? Three major approaches Central indexing E.g., Web search engine Dynamic integration E.g., comparison shopping services Data extraction E.g., spamming companies
CS246 by John Cho9 Topic: Web Search (Central Indexing) Central Index
CS246 by John Cho10 Topic: Web Search (Central Indexing) Web: collection of passive HTML pages Find Web pages relevant to a query Traditional Information Retrieval: Web = collection of HTML pages HTML page = a bag of words More than that? Links, structure of the Web User access patterns HTML tags (markups)
CS246 by John Cho11 Topic: Dynamic Integration Cars.com Amazon.com Apartments.com 401carfinder.com
CS246 by John Cho12 Topic: Dynamic Integration Mediator Wrapper Source 1 Wrapper Source 2 Wrapper Source n
CS246 by John Cho13 Topic: Data Extraction WWW Beatles$10 Madonna $20 NSync$20 Structured data How can we extract structured data from free text automatically?
CS246 by John Cho14 Main Course Workload Paper reading Paper reading assignments Class discussion We mainly focus on central indexing Independent projects
CS246 by John Cho15 High-Level Goal Learn core ideas and techniques Some of the techniques can be useful for other fields Learn how to read papers Hopefully learn what it is like to do research Sometimes very frustrating but often very rewarding
CS246 by John Cho16 Paper Reading Why: Something that you will do all the time as a researcher Learn to be critical and communicate well Acquire knowledge to conduct research/project About 20 papers from Conferences: SIGMOD, VLDB, WWW, and … Before the class: Everyone: read and review the paper During the class: Instructor: present his own understanding and lead class discussion Everyone: participate!!!
CS246 by John Cho17 How to Get Papers From the class homepage http://oak.cs.ucla.edu/cs246/ Some of the materials password protected User name: cs246 Password: papers Let me know if any problem
CS246 by John Cho18 How to Read Papers Understand the Big Picture What is the problem? Why is it important? Why is it difficult? What has this paper done? What others have done?
CS246 by John Cho19 Paper Reviews (1) Due by the preceding Sunday Submit through our Web submission interface on the class Web page Required components: at most 3 paragraph Summary (1 paragraph): your own words This paper discusses how to optimize queries with... Comments/criticisms (1-2 paragraphs): the good & the bad It addresses a real problem and the solution is interesting … But I feel the experiments are not realistic because... Optional: questions, as many as you want Why the authors assume that queries are independent?
CS246 by John Cho20 Paper Reviews (2) May skip 3 paper summaries without penalty Most reviews will get full score unless they are written extremely poorly
CS246 by John Cho21 Class Project Why: Work on a specific problem and learn to find a solution 40% of the class Team of up to 3 Topic: any problem related to the general problem Open style Rigorous study of a research problem or Any interesting system implementation
CS246 by John Cho22 Class Project Schedule Important Milestones Group formation: 4/09 (2 nd week Wed) Project proposal: 4/20 (3 rd week Sun) Project progress: 5/07 (6 th week Wed) Final report: 5/21 (8 th week Sun) Project presentation: 9 th and 10 th weeks You are responsible to stay on track Make appointments with instructor as needed
CS246 by John Cho23 Project: Please Remember Put your aims high and be realistic Expect to read at least 4-5 papers along the way Start early Dont do it right before the deadline Always unexpected obstacles Some students could not finish in previous quarters Please, please start early You are responsible to be on track
CS246 by John Cho24 Grading Midterm: 40% Paper reviews: 20% Project: 40%
CS246 by John Cho25 Announcements First review due Sunday 4/06 Three papers for class 3 and 4 Graph structure in the Web The Anatomy of a Large-Scale Hypertextual … Authoritative sources in a hyperlinked environment