Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vermelding onderdeel organisatie September 18, 2015 1 Literature Search IN 3305.

Similar presentations


Presentation on theme: "Vermelding onderdeel organisatie September 18, 2015 1 Literature Search IN 3305."— Presentation transcript:

1 Vermelding onderdeel organisatie September 18, 2015 1 Literature Search http://www.pds.ewi.tudelft.nl/~iosup/Courses/2012_aiosup_lit_search.ppt IN 3305 Alexandru Iosup. Initial slides by Tomas Klos. Course manager: Peter van Nieuwenhuizen. Parallel and Distributed Systems Groep http://www.pds.ewi.tudelft.nl/

2 Literature Surveys: At the Core of Innovation Given a problem (topic of interest) Answer questions about it What solutions exist? What is the most influential solution? What is the rate of innovation in the field? By surveying (understanding, interpreting, and summarizing) the body of related (scientific) knowledge. Where and how can I innovate? IN3305’s study goal “kennismaken met wetenschappelijke literatuur”

3 Innovation is a Vital Competitive Tool Innovation = novel application of knowledge Innovation favors small (but efficient) countries High-tech companies tend to be more innovation-intensive Source: Economist Intelligence Unit, A new ranking of the world’s most innovative countries, April 2009, http://graphics.eiu.com/PDF/Cisco_Innovation_Complete.pdf

4 What is Novel? The Overwhelming Growth of Knowledge “When 12 men founded the Royal Society in 1660, it was possible for an educated person to encompass all of scientific knowledge. […] In the last 50 years, such has been the pace of scientific advance that even the best scientists cannot keep up with discoveries at frontiers outside their own field.” Tony Blair, PM Speech, May 2002 1997 2001 1993 1997 Number of Publications Data: King,The scientific impact of nations,Nature’04.

5 The “Size” of a Research Topic Grid Computing Billions of $ in research investment 2,500 PhDs (my est.) Over 15,000 scientific publications (my est.) in 15 years Several surveys of 100-200 articles each Grid Scheduling Conferences: Grid, CCGrid, HPDC, SC, IPDPS, ICDCS, … Journals: TPDS, CCPE, FGCS, JoGC, … Peer-to-Peer Search Methods Survey of over 300 articles after 5 years of research

6 How to Talk About Books You Haven’t Read “There is more than one way not to read” Not opening the book You cannot read everything How many books can a librarian read? How many books can you read? Let’s estimate Librarians can talk about every book in the library (every book out of millions)  There exists a system to (not) read

7 September 18, 20157 Outline 1.From the IN3305 study goals: 1.“kennismaken met wetenschappelijke literatuur” 2.To read or not to read? 3.What is “scientific literature”? (input and output) 4.Measuring and assessing Quality 5.Useful sites and tools 6.On gaming the citation indices (unethical) 7.Conclusion

8 September 18, 20158 Literature = input Citations Place your work in context Give credit to previous work Support your arguments Show your marginal contribution Prevent plagiarism Read what you cite! (prevent superfluous citing) This does NOT mean: “You should read everything” “You cannot also read what you don’t cite”

9 September 18, 20159 Literature = input Sources: peer-reviewed Textbook/monograph: for teaching and background Complete treatment of a topic Cite a textbook? Mention chapter or page number Journal article More space, detail, thorough than conference paper Sometimes old news at publication date (lag) Paper in edited volume: Multiple papers, review of state-of-the-art Cite individual papers Paper in conference proceedings Recent results Conference quality; publisher of proceedings?

10 September 18, 201510 Sources: not peer-reviewed Working papers, Preprints Up-to-date, spread ideas “Open access” Computing Research Repository (CoRR) http://arxiv.org/corr/home Websites ‘Personal communication’

11 September 18, 201511 Literature = output Publish to conferences and journals Peer-review (for conferences, journals): (double) blind review: Accept, with/without (major) revisions Reject Acceptance rate ratio, e.g., 25% (not bad) (Nature: 10% articles are reviewed) Time to print: up to 1.5 years for journals, 3-6 months for conferences Measuring scientific output: “scientometrics” Q What do you think about this situation?

12 September 18, 201512 Quality? Reputation: ACM, IEEE, Springer, Elsevier, MIT/Princeton/Oxford/… University Press SCIgen - An Automatic CS Paper Generator http://pdos.csail.mit.edu/scigen/ accepted (non-reviewed) for: 2005 World Multi- Conference on Systemics, Cybernetics and Informatics (another one: an Elsevier journal!)

13 September 18, 201513 Scientometrics Scientometrics, “measuring and analyzing science”, Bibliometrics, “study or measurement of texts and information” Citation analysis Which papers cite a paper / does a paper cite? Authority of countries, research groups, individual authors, journals/conferences, individual paper Q What is a citation? “Publish or perish”: quality vs quantity (“80% of all published papers are not cited”) Q Conference or journal? Which conference or journal?

14 September 18, 201514 Citation Databases Commercial ScienceCitation Index (Web of Science/Inf. Sci. Inst.) Scopus (Elsevier) Free Google Scholar: better coverage than ISI CiteSeer (computer science) ArNetMiner (computer science) RePec (economics) More: en.wikipedia.org/wiki/ List_of_academic_databases_and_search_engines

15 Comparing Countries Data: King, The scientific impact of nations, Nature’04. Citation rate per paper, norm. Citation intensity= #Citations/GDP

16 Comparing Groups or Individuals [1/3] An idea: Google PageRank principle Web: network of sites, linking to each other Science: network of papers, citing each other Time World Wide Web’s Links NetworkAcademic Citations Network Q What do you think about this approach?

17 September 18, 201517 Comparing Groups or Individuals [2/3] Journals: Journal Impact Factor Personal: h-index (Hirsch, 2005): A scientist has index h if h of his/her N papers have at least h citations each, and the other (N − h) papers have no more than h citations each. g-index (Egghe, 2006): highest number g s.t. the first g most cited articles have attracted at least g 2 citations. Extensions: e-index; group evaluation Q What about conferences? Q Really, what is a citation? Q (unethical) How to abuse citation indices?

18 September 18, 201518 Journal Impact Factor (JIF) Many journals have no impact factor JIF is the average number of citations in a given year, to papers in a journal in the 2 previous years. For journal x, 2010 number of citations in 2010 to papers in journal x from the period 2008 – 2009 JIF (x, 2008) = Total number of papers in journal x in the period 2008 – 2009 What does an average value mean?

19 September 18, 201519 Journal Impact factors, 2004 ≥1 citation/publication (last 2 years) JIF Journal Rank Highest JIF ~30 Very high JIF ≥15

20 September 18, 201520 CS impact factors, 2005 Journal Rank JIF Highest JIF ~8 Very high JIF ≥2 Highest JIF ~30 Very high JIF ≥15 CSAll Q What do you think about this situation?

21 Comparing Groups or Individuals [3/3] For Computer Science Conference proceedings are to be preferred to journals ISI Web of Science and Elsevier Scopus are not good impact indicators—poor, albeit improving, coverage Google Scholar is a better impact indicator than ISI WoS and Elsevier Scopus; ArNetMiner is reasonable DBLP is a good, selective source, but has no citation links Expert knowledge is required to select the best topical conferences and journals (regardless of their acceptance ratios and impact factors) Q Problems with this approach?

22 September 18, 201522 Outline 1.From the IN3305 study goals: 1.“kennismaken met wetenschappelijke literatuur” 2.To read or not to read? 3.What is “scientific literature”? (input and output) 4.Measuring and assessing Quality 5.Useful sites and tools 6.On gaming the citation indices (unethical) 7.Conclusion

23 September 18, 201523 Method To Find Sources Browse: Google Scholar: http://scholar.google.com/ DBLP: http://dblp.uni-trier.de/ Others: TU Delft library tools Study author using Publish or Perish Look at author homepages Follow links and citations (forward and backward)

24 September 18, 201524 Google Scholar “cited by” Relevant authors TU Delft SFX linking Import into bibtex

25 September 18, 201525 Google Scholar at Work

26 September 18, 201526

27 September 18, 201527 Google Scholar at Work From home: use vpn!

28 September 18, 201528

29 September 18, 201529 DBLP “lists more than one million articles” (april 2008) Indexes: Authors Now also “Faceted search”, “CompleteSearch” Conferences Journals Series Subjects

30 DBLP at Work

31 September 18, 201531 DBLP at Work

32 September 18, 201532

33 September 18, 201533

34 September 18, 201534 TU Delft Library Search http://www.library.tudelft.nl/ws/search/ e.g. “information by subject” -> computer science TUlib “how to find and use scientific information” http://www.library.tudelft.nl/tulib/

35 Harzing’s Publish or Perish Uses Google Scholar data Calculates many indices Number of citations (also per year / article / author /…) Hirsch’s h-index Zhang’s e-index (excess in h-index set) Egghe’s g-index … Similar online tool: ArNetMiner September 18, 201535

36 September 18, 201536 Publish or Perish (http://www.harzing.com/pop.htm)

37 September 18, 201537 Outline 1.From the IN3305 study goals: 1.“kennismaken met wetenschappelijke literatuur” 2.To read or not to read? 3.What is “scientific literature”? (input and output) 4.Measuring and assessing Quality 5.Useful sites and tools 6.On gaming the citation indices (unethical) 7.Conclusion

38 September 18, 201538 Unethical! How to Game the Citation System? (part of) Collaboration graph

39 September 18, 201539 All authors with Erdős number 1 Note: The h-index was “invented” almost a decade after Erdos.

40 September 18, 201540 Collaboration Graph Degree Distribution Erdős

41 September 18, 201541 Collaboration Graph: Connected Components Distribution Giant Component

42 September 18, 201542 Interested? Mark Newman answers: “who is the best connected scientist?” Other references Erdős Number Project http://www.oakland.edu/enp/ http://harveycohen.net/erdos/ -- Jerry Grossman and Smarty Kevin Bacon Oracle—is Kevin Bacon the center of the Hollywood movie industry? (or Sean Connery? or Christopher Lee?) http://oracleofbacon.org/

43 More on the (unethical) Gaming the Citation Indices Self-cite, self-cite, self-cite Journals asking for submitters to cite journal’s papers Program committee members and reviewers asking for their own work to be cited (when not necessary) Not citing old work because it’s old—”killing” old results now allows you to republish them later Work on a popular topic—more people, more citations, more chances (Google Scholar-only) Blog, Tweet, and FB daily about your papers. Ask your friends to re-post.

44 How to Talk About Books You Haven’t Read There exists a system to (not) read 1.Know where to find sources Trustworthy: DBLP, ACM DL, Google Scholar Less trustworthy: CoRR, … 2.Know how to find good sources Number of citations: Google Scholar+Others H-index: Publish or Perish (the program) Try to avoid or weight-out citation cliques 3.Select from the good sources

45 September 18, 201545 Questions?


Download ppt "Vermelding onderdeel organisatie September 18, 2015 1 Literature Search IN 3305."

Similar presentations


Ads by Google