Presentation is loading. Please wait.

Presentation is loading. Please wait.

INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.

Similar presentations

Presentation on theme: "INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know."— Presentation transcript:

1 INF 141 COURSE SUMMARY Crista Lopes

2 Lecture Objective Know what you know

3 Problem Space of this course  “Big Data”  How to  collect it  index it  search it for relevant information

4 Industry segment of this course  Search engines  Google  MS Bing  nameless others  Web information retrieval is big $$$

5 Technical content of this course EngineeringMath

6 Lecture 2  Search engines history  Search & advertising on the Web  Web corpus

7 Lecture 3  Characteristics of the Web  duplication, linkage, spam  how big  rate of change  evolution  Characteristics of Web search Users  reasons for searching  characteristics of queries used  behavior towards results  the need behind the query

8 Lecture 4  Search Engine Optimization

9 Lectures 5 and 6  Web crawling  architecture of a crawling infrastructure  algorithms  constraints

10 Lectures 7, 8 and 9  Index construction  what index is  efficient data structures  efficient algorithms for constructing it

11 Lecture 10  Map Reduce  Index compression

12 Lecture 11  Retrieval  boolean  zones  TF metrics

13 Lecture 12  Ranked Retrieval  weighting fields

14 Lecture 13  Better scoring  TF-IDF  Corpus-wide statistics

15 Lecture 14  Vector Space model  Score by magnitude (euclidian distance)  Score by angle (cosine distance)

16 Lecture 15  Language statistics  Language processing  tokenizing  stemming  stopping  Link analysis  PageRank

17 Lecture 16  Hadoop

18 Lecture 17  Retrieval performance: precision and recall  Latent Semantic Analysis  Singular Matrix Decomposition

19 Lecture 18  Retrieval on LSI  Use of Latent Dirichlet Allocation (LSA)

20 All together  Search engines history  Search & advertising on the Web  Web corpus  Characteristics of the Web  Characteristics of Web search Users  Search Engine Optimization  Web crawling  Index construction  Index compression  Map Reduce  Boolean retrieval  Parametric retrieval  Scored retrieval  TF-IDF and corpus-wide statistics  Language statistics  Language processing  Link analysis (PageRank)  Hadoop  Precision and Recall  LSI (and LDA)

21 Big Data jobs  plenty…  not just traditional search  making sense of data  search on google

22 Where to go from here  Data mining  Machine learning

Download ppt "INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know."

Similar presentations

Ads by Google