Presentation is loading. Please wait.

Presentation is loading. Please wait.

Who Uses the Online Tobacco Industry Documents? Martha Michel 1,2, M.S., Ph.D. Lisa Bero 1,2, Ph.D. 1 Graduate Group in Biological and Medical Informatics,

Similar presentations


Presentation on theme: "Who Uses the Online Tobacco Industry Documents? Martha Michel 1,2, M.S., Ph.D. Lisa Bero 1,2, Ph.D. 1 Graduate Group in Biological and Medical Informatics,"— Presentation transcript:

1 Who Uses the Online Tobacco Industry Documents? Martha Michel 1,2, M.S., Ph.D. Lisa Bero 1,2, Ph.D. 1 Graduate Group in Biological and Medical Informatics, UCSF 2 Center for Tobacco Control Research and Education, UCSF

2 What are the Tobacco Industry Documents? As a result of the Master Settlement, millions of internal tobacco industry documents were released onto the Internet (legacy.library.ucsf.edu) The documents contain memos, scientific reports, faxes, emails, budgets, etc… The documents include information about scientific research, manufacturing, marketing, advertising and sales of cigarettes, and more

3

4 Example of an Internal Tobacco Industry Document

5 Document Collections Legacy document depository at UCSF 5 million documents About 32 million pages and growing 1.5 terabytes Guilford document depository 8 million British American Tobacco documents About 32-40 million pages UCSF has 13,000 documents which have been manually indexed. Industry websites – PM, Lorillard, B&W, RJR Other collections – Tobacco Documents Online, CDC tobacco industry documents

6

7

8 Difficulties of searching the documents No OCR available for searching the full text Variations in spelling and problems when names suddenly change Duplicates Vast quantities of information No or varied indexing Unknown recall and low precision Malone RE, Balbach ED. Tobacco industry documents: treasure trove or quagmire? Tobacco Control 2000;9(3):334-8.

9 Prior Studies of Who Searches Different types of groups used the paper documents depositories (i.e. lawyers, government officials, researchers, tobacco control advocates, health related fields). We still don’t know who uses the electronic documents or why they search… We are currently conducting an online survey of the UCSF Legacy website to examine the use and barriers to searching the existing websites.

10 Aim 1: Conduct Online Survey Purpose of survey: Who uses the documents – (demographics) Purposes for which documents are used Barriers to searching the documents and Suggestions for improving the archives

11 Methods Developed and designed survey using Web Surveyor Conducted pilot test of survey - N=14 Launched surveys in November 2002 2 surveys – one on TCA, one on Legacy Tobacco Control Archives (n=50) Legacy Tobacco Control Documents Collection (n=22)

12 Results from Tobacco Control Archives Survey (n=50)

13 Who Uses the documents?

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29 Text under “other” I would like more structure on how to work the music sight. (4) some stuff its okay telling about schools direct assistance links to student lead orgainization programs against tobacco more categorization I found your site useful. I had to fill out a worksheet from Health class and move along wite the website, but there a few things I could not find. Maybe it was the worksheet not the site, but overall your site helped me out. Thank you love it Why do I have do this survey its slowing me down? This site is cool I think The site is wonderful and very usefull. I want to congratulate the authors for the wonderful job. I´d like more if smuggling was in better view.

30

31 Results from Legacy Survey (n=22)

32

33

34

35

36

37

38

39

40 Text from Other response full textmanipulation of saved sets: more bookbag features ability to search within retrieved set more 'popular documents' type oflinks. OCR It would be great to have a quick way to search only for ads like Philip Morris offers from its advertising archives search engine don't know first visit, I don't know yet fix the bookbag problem quick search box right at top of home page bypassing other pages.Display list of other bates in a set from which the one comes. Nested searchingSlightly larger font Full Boolean search capacity;more than six search term limit;feedback on when user errors are syntactical (as PM gives);not having to toggle back and forth between long and short displays;master ID numbers in the display;OCR capacity--not only would it be fantastic to be able to search the text of the document, it would be invaluable to be able to cut and paste text from the documents into a word processor. A better search engine....you seem to have more documents than Tobacco Documents Online, but when I use the same search terms your search engine tells me it doesn't find any....when TDO finds over 100!

41

42

43 Text from Other response print them out; download to pc. If there are useful documents in a search, I print out the list, then download and print out the useful documents, numbering them with the number on the list. I file them chronologically by theme. by dateby topic and correspondence First by theme (subject) and then by organization/corporation, and/or date. first visit, I don't know yet I e-mail them to my eudora account and search it when I want a citation. I look for a doc at your site, then go to pmdocs.com or the like, type the bates, and pull the description. Then I type the first bates from master file, and this way I get the set of documents with context. Sometime, the same document is in a few different sets! Then I get back to you to download it, or to cross reference with other collections (say, TI). I'm not sure I understand the difference between this question and the previous one. I use Endnote, if that's the question. I wish I had a consistent way. Can you conduct a seminar showing suggestions? Prefer to collect paper documents and arrange them in files that mirror the files they originally came from and/or dates and or events within a date range

44 Who Uses the documents?

45

46

47

48

49 Aim 2: Add the British American Tobacco documents to the Flamenco interface

50 The Tobacco Flamenco 13,000 British American Tobacco documents have been “Flamencoized” More documents are to be indexed as they arrive from Guilford, England and the industry websites

51 Tobacco Thesaurus The thesaurus terms are controlled terminology which are described by broader terms, narrower terms, and related terms on the UCSF website We created a hierarchy with 834 terms based on these relations The parent-child relationships are 7 levels deep.

52

53 Conclusions Many barriers to searching the documents exist Current searches are characterized by unknown recall and low precision, whether using TCA, Legacy or TDO There are different searching profiles for the people who use TCA vs. Legacy

54 Upcoming Goals Collect more survey data and write up results To create Flamenco server at home Eventually conduct usability trial of modified Tobacco Flamenco To work on additional BATCo documents


Download ppt "Who Uses the Online Tobacco Industry Documents? Martha Michel 1,2, M.S., Ph.D. Lisa Bero 1,2, Ph.D. 1 Graduate Group in Biological and Medical Informatics,"

Similar presentations


Ads by Google