A Study of Highly Similar and Duplicate Citations via Computer- based Text Mining and Manual Verification Skip Garner reporting for the team- Mounir Errami,

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Welcome to the IEEE IPR Office Plagiarism Tutorial Click to begin.
Scientific Literature Tutorial
Compiled by Helene van der Sandt. Is a search engine that searches for scholarly literature Can search across many disciplines Searches for articles,
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. DUPLICATE.
Submission Process. Overview Preparing for submission The submission process The review process.
Publication ethics Sadeghi Ramin, MD Nuclear Medicine Research Center, Mashhad University of Medical Sciences.
Referencing, NOT Plagiarising!. Outline Referencing Citations Creating a reference list Plagiarism Recognising what it is How to avoid it.
Announcements ●Exam II range ; mean 72
1 Information Literacy Legal Issues & Technology.
Experimental Psychology PSY 433
Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern.
Publishing Research Papers Charles E. Dunlap, Ph.D. U.S. Civilian Research & Development Foundation Arlington, Virginia
TECHNOLOGY AND THE ADMINISTRATOR Technological Influences on Plagiarism By Renee Becket, a Contributor Ben Randall Grit 687 Tech & the Admin.
NURSING 475 Step Five: RESEARCH APPLICATION. STEP FIVE: The Assignment: n Select a nursing intervention you performed on this patient. What are some of.
BEYOND CITATION: Using NoodleBib to Organize and Synthesize Research Information Christy Batelka SLM
Shobna Bhatia.  Telephone instrument  Computer  Software Instructions nearly always provided However, frequently not read At least, not until things.
W. Torres What is plagiarism?.
Paraphrasing and Plagiarism. PLAGIARISM Plagiarism is using data, ideas, or words that originated in work by another person without appropriately acknowledging.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
Academic Integrity How to do it right. Why it matters Virtually everything we know has come to us because someone else has taken the time to think about.
Week 1: Find resources, Summarize, paraphrase, thesis, and outline Week 2: Research and Write, incorporate evidence and transitions (1/2 done) Week 3:
Introduction to writing scientific papers Gaby van Dijk.
Chapter 6 Researching Your Subject. In academic research, your goal is to find information that will help you answer a scholarly question. In workplace.
BIO1130 Lab 2 Scientific literature. Laboratory objectives After completing this laboratory, you should be able to: Determine whether a publication can.
What Is Science? Why Write Research Paper? Research paper is written report that enables you to look for similar studies or work done on the question you.
How to Research. Research Paper Assignment Identify what the assignment requires:  topic possibilities  number of sources  type of sources (journal,
Measuring Value and Outcomes of Reading Dr. Carol Tenopir University of Tennessee
PLAGIARISM adapted from ANGEL Cyberplagiarism Question Bank.
Unit 4 Seminar Power Point Presentation. Welcome In this week's seminar, we will discuss the nature of criminal justice research, and using the KU library.
PlagiarismPlagiarism Christine G. Balmes Cristian S. Mendoza Maika E. Laguartilla.
Avoiding Plagiarism What is it? Why is it wrong? How can it be avoided?
 Remember, it is important that you should not believe everything you read.  Moreover, you should be able to reject or accept information based on the.
How to read a scientific paper
Literature Search – How to Make Hard Work Easier? Prof. Haiying Huang Department of Mechanical and Aerospace Engineering University.
Maximizing Library Investments in Digital Collections Through Better Data Gathering and Analysis (MaxData) Carol Tenopir and Donald.
Intellectual Property Basics
Appendix A: Reporting Research Results  How do scientists share their research findings with others?  Through what stages does a research report go as.
Indexing of Tables and Figures: Scientists’ Reaction Carol Tenopir University of Tennessee web.utk.edu/~tenopir/
Passive vs. Active voice Carolyn Brown Taller especializado de inglés científico para publicaciones académicas D.F., México de junio de 2013 ETHICAL.
Original Research Publication Moderator: Dr. Sai Kumar. P Members: 1.Dr.Sembulingam 2. Dr. Mathangi. D.C 3. Dr. Maruthi. K.N. 4. Dr. Priscilla Johnson.
Journal Searching Nancy B. Clark, M.Ed. Director of Medical Informatics Education FSU College of Medicine 1 All recourses are available online in Medical.
Publication and Research Misconduct Stephanie Harriman Deputy Medical Editor.
Notetaking Using Note Cards for Your Research Paper.
Clearing Permissions for my manuscript What do I need to know and what do I do? Emily Hall Rights Manager
Ethics and Plagiarism AAHEP8 -- Amsterdam 2015 Erick Weinberg -- APS.
LITERATURE REVIEW ARCHELLE JANE C. CALLEJO, PTRP,MSPH.
How to Develop a Manuscript: More Aspects Barbara Gastel, MD, MPH Professor, Texas A&M University Knowledge Community Editor, AuthorAID.
INFORMATION LITERACY SKILLS 1. OBJECTIVES  To introduce students to the best search strategies to use when searching for information online.  To expose.
Authorship, Academic Integrity, and Ethics Dr. Heather Blain Vorhies Office of Writing Initiatives The Graduate School 5 September 2013.
Copyright Everything you wanted to know that you did not want to ask.
How To Be A Constructive Reviewer Publish, Not Perish: How To Survive The Peer Review Process Experimental Biology 2010 Anaheim, CA Michael J. Ryan, Ph.D.
Effective Research & Resources Mrs. Bastone, Head of Learning Resources Autumn 2015.
This Week’s Agenda APA style: -In-text citation -Reference List
BIO1130 Lab 2 Scientific literature
Report writing.
Experimental Psychology
Copyright and Plagiarism and Citations, Oh My
Avoiding Plagiarism: Paraphrasing/Quoting and Citation Resources
Copyright and Plagiarism and Citations, Oh My! SCHOOL OF PHARMACY
Test Review Be prepared to provide an answer.
What is Plagiarism? What is MLA Format?
HOW TO WRITE A SYSTEMATIC/NARRATIVE REVIEW
Experimental Psychology PSY 433
North Haven Middle School C. Schwartz LMS February 2017
BIO1130 Lab 2 Scientific literature
An Introduction to the Research Process
Welcome to the IEEE IPR Office Plagiarism Tutorial
Welcome to the IEEE IPR Office Plagiarism Tutorial
Welcome to the IEEE IPR Office Plagiarism Tutorial
Presentation transcript:

A Study of Highly Similar and Duplicate Citations via Computer- based Text Mining and Manual Verification Skip Garner reporting for the team- Mounir Errami, Tara Long, Angela George and Johnny Sun. Weall thank ORI/NLM for support.

eTBLAST – electronic Text Basic Local Alignment and Similarity Tool eTBLAST, a text analytics tool, is an alternative to PubMed and other text databases, using full text similarity search rather than keyword searches to get better results. SCIENCE VOL March 2009

eTBLAST, a free on-line tool has a simple Google-like interface. Select database to search Paste your text in here And search Enter your address if you want results to also be sent to you. Access the resource at: etblast.org

eTBLAST results are linked to the full abstract and other tools, of value to researchers and clinicians while writing, reviewing or studying Some related tools and private access areas for clients Some post-processors that analyze the returned ‘hits’ Raw self-similarity score of query Most similar record Raw similarity score between query and this record

Although eTBLAST was created as a tool to help us better access the literature – reference finding and such, it has been applied to study ethics….a rather unique direction for the lab

Motives for Scientific Misconduct can drive inappropriate behavior. Funding and career pressures of the contemporary research environment. Inadequate institutional oversight. Inappropriate forms of collaborative arrangements between academic scientists and commercial firms. Inadequate training in the methods and traditions of science. The increasing scale and complexity of the research environment, leading to the erosion of peer review, mentorship, and educational processes in science. The possibility that misconduct in science is an expression of a broader social pattern of deviation from traditional norms. National Academy of Sciences 1992

Violations of Accepted Practices have been estimated by surveys. Faking research data- 0.3% Plagiarism- 1.4% Multiple publications of the same data- 4.7% Removing data- 6% Inappropriate inclusion of authors- 10% Changed a study design – 15% Inadequate record keeping- 27.5% Anonymous questionnaire, sent to 8,000 with 3,234 respondents (Martinson et al. 2005)

Similarity is obvious. If you are going to cheat, don’t be so lazy, change the title and abstract more

Are duplicate abstracts representative of duplicate articles? 85% of the text in the duplicate is present in the original. The remaining 15% is comprised of 3 paragraphs taken verbatim from PMID , PMID Less than 1% of the text in duplicate article cannot be found in a different article. The later article does not cite the earlier, 50% of references are shared with the original. The article was originally published in an American journal and then republished 28 years later in a Thai journal by a different author. Later article retracted after questionnaire.

In the cut and paste internet age is this behavior getting better or worse?

There are multiple offenders

A Case of Multiple Duplicate Publication – M Shahrudin

So, how do authors and editors respond to evidence of possible plagiarism? For duplicates with different authors - potentially plagiarized articles - once full text is inspected and a high degree of similarity persists beyond the citation, we send an to all the stakeholders - authors and editors of both papers - asking questions like: Were you aware of the later article? Is there an explanation for the similarity? Was the earlier article copyrighted? Were permissions requested/given to re-use the material? ….and we include a copy of both articles, with the later article annotated with areas of “high similarity”

There were sufficient responses to make statistical observations Unverified entries in Déjà vu with no overlapping authors 7,947 Manually validated duplicates with no overlapping authors 206 Average full text similarity 86% Average reference overlap 73% Pairs with at least one similar table/figure 72% Questionnaires sent 162 Overall response rate 90.8% –Authors of earlier article 55.9% –Authors of later article 39.8% –Editors of journal publishing earlier article 59.1% –Editors of journal publishing later article 9.4% Average response delay following initial contact 8.3 days –Authors of earlier article 5.4 days –Authors of later article 10.8 days –Editors of journal publishing earlier article 8.7 days –Editors of journal publishing later article 9.8 days Total investigations initiated, including retractions 90+ Retractions 50 Average time from initial response to retraction decision 20.8 days

There were surprises.. 93% of authors were unaware that they had been duplicated 26% of duplicate authors denied wrongdoing, 35% admitted and apologized, 16% were from co-authors claiming no involvement in the writing of the manuscript 13% were not aware that they were ‘authors’

A sampling of responses from the stakeholders, with 100s more in the Science Supplement Authors of earlier article “Imitation is the sincerest form of flattery?” “[My] major concern is that false data will lead to changes in surgical practice regarding procedures.” Authors of later article “I would like to offer my apology to the authors of the original paper for not seeking the permission for using some part of their paper. I was not aware of the fact I am required to take such permission.” “There are probably only "x" amount of word combinations that could lead to "y" amount of statements. … I have no idea why the pieces are similar, except that I am sure I do not have a good enough memory and it is certainly not photographic, to have allowed me to have "copied" his piece.... I did in fact review it [the original article] for whatever journal it was published in.” “It was a joke, a bad game, an unconscious bet between friends, ten years ago that such things could happened. I deeply regret.” [Author has 6 entries in Déjà vu, and is VP of the national ethics committee of his country] “At that time we had 2 medical students writing our results up into paper format. They may have been somewhat keen on using the [original] paper as a model to write our paper - resulting in very similar text. This unfortunately may be the result of some inexperience in paper writing which myself and [other author] were not aware of at the time.” Editors of journal publishing earlier article “It's my understanding that copying someone else's description virtually word-for-word, as these authors have done, is considered a compliment to the person whose words were copied.” Editors of journal publishing later article “Believe me, the data in any paper is the responsibility of the authors and not the journal.”

Déjà vu access statistics confirms interest in publishing ethics and impact of different media Bioinformatics Paper Nature Commentary MSNBC Nature News- Whole Paper Plagiarism Nature News- Iran VP Retraction Le Monde Der Spiegel Nature News- Harvard Retraction ?? Science Paper ( , WOW)

And what are we analyzing next? We wish to quantify the amount of full text similarity within and across articles. This has been applied to a sampling of PubMed Central: –Number of full text articles (PMID): 34,878 –Number of sections (subsections): 440,824 –Ave. num of (sub)sections per article: 12.6 introductionmethodsresults/discussion all sub-sections Hits etBlast section title introduction methods results

Within the same articles, intro-discussion and result- discussion sections show most similarity

Similarity among different articles is dominated by similarity in methods sections

All together, our data had lead us to ponder such questions as: 1) Are there too many journals? 2) Are there too many journals indexed in Medline (and therefore receive equal access in response to searches)? 3) What are the criteria for being indexed in Medline, or better yet, what would it take to be removed? 4) Are there too many review articles? 5) What is the value of authors assigning copyright to journals if journals do not enforce them? 6) Is the pressure to publish really distorting the real purpose of publication? 7) How has open access affected these behaviors? 8) Should a new class of publication be created called an ‘update’ where additional material can be contributed by an author to a previous publication, while still getting credit for the advance without having to restate and republish a large fraction of something previous? 9) What is the general response to all involved, and what actions are likely, justified, fair and relevant? 10) Are publication behaviors linked to other ethically questionable behaviors? 11) What constitutes a retraction? and most important 12) How often does a clinician unknowingly base a patient diagnosis or therapy upon a plagiarized or otherwise questionable paper, and how does this affect patient care?

Thanks for you attention. We generate a lot of data, anybody interested in a collaboration?

What we instruct our students about proper use of reference materials Anytime reference materials (text, figures, tables) whether from books, manuscripts, magazines or from web pages, are used or paraphrased in a new written work, it must be cited in a very obvious and relevant way. If any content in the materials (important phrases, sentences, larger sections of text) are used completely or in part they should be enclosed in quotes and also cited in a very obvious and relevant way. In addition to these publication ethics norms, many sources also copyright the material (text, figures, tables) as well. To directly use the material, permissions from those holding the copyright may also be required. For many professional journals, the threshold for requiring permission is when greater than 250 words or any figures, images, or tables are used. There are no absolute standards established, but these are considered the minimum requirements for ethical writing and publication.

Method sections have highest number of similar texts among all sections, using a similarity ratio of 0.4 as the threshold.