Presentation is loading. Please wait.

Presentation is loading. Please wait.

4/29/2015DRAFT August 13, 20121 DRAFT Building Global Science Collaboratories VIVO 2012 Conference Workshop August 22, 1:00 pm – 4:30 pm.

Similar presentations


Presentation on theme: "4/29/2015DRAFT August 13, 20121 DRAFT Building Global Science Collaboratories VIVO 2012 Conference Workshop August 22, 1:00 pm – 4:30 pm."— Presentation transcript:

1 4/29/2015DRAFT August 13, 20121 DRAFT Building Global Science Collaboratories VIVO 2012 Conference Workshop August 22, 1:00 pm – 4:30 pm

2 4/29/2015DRAFT August 13, 20122 Workshop Faculty Anil Srivastava, President, Open Health Systems Laboratory (OHSL) co-located at Johns Hopkins University Montgomery County Campus, Shady Grove, MD, USA Paul Courtney, Project Manager, Dana-Farber Cancer Institute, Boston, MA, USA Ajai Kumar/Hemant Darbari/Swati Mehta/Vivek Koul, Center for Development of Advances Computing C-DAC, Pune, India Rubayi Srivastava, Project Manager, Open Health Systems Laboratory (OHSL), CA, USA Juliusz Pukacki, Poznan Supercomputing and Networking Center (PSNC), Poznan, Poland

3 4/29/2015DRAFT August 13, 20123 DRAFT Agenda 1:05 Anil - introductions 1:15 Anil – Background and Overview 1:50 Paul – Bootstrapping the global collaboratory (Methods) 2:30 – 2:45 Break 2:45 CDAC – Techniques and experiences in extracting data & transforming it for VIVO (Results) 3:30 Julius – Role of VIVO, Semantic Web and Linked Open Data in advancing global science collaboration & enabling collaboratories (Discussion) 4:15 Anil – Future work/discussant 4:30 Workshop Ends

4 4/29/2015DRAFT August 13, 20124 Faculty “assignments” Anil Provide context, history, mission & vision of OHSL, What programs & projects concern OHSL and VIVO fits into the portfolio Paul Provide vision of developing the Global Cancer Collaboratory, where it is going How this effort is connected with other informatics initiatives; historical context of caBIG, the NCI- NCRI informatics collaborations How is this different from simply putting up a VIVO instance at the OHSL campus in Shady Grove, MD? Incubating and nurturing connectivity across international boundaries requires a different “business model” than putting up an institutional VIVO site. Aggregating information (early web model) and the role of imperfect data (and its relationship to Tim Berners-Lee’s Linked Data model) Rubayi Challenges of providing project management support for an international program spanning 12+ time zones, needing to provide support for logistical and knowledge management for multiple platforms and differing levels of technological infrastructures. CDAC Technical challenges of obtaining the same information required from multiple sites Examples of what was easily available at some sites, what was difficult, how were the challenges addressed Julius Semantic Web and Linked Data

5 4/29/2015DRAFT August 13, 20125 Anil Background Current Indo-US collaboration projects underway Indo-US Cancer Research Grid

6 4/29/2015DRAFT August 13, 20126 Research Networking Systems (RNS) “…support individual researchers’ efforts to form and maintain optimal collaborative relationships for conducting productive research within a specific context.” 1 Criteria: Involve shared 2-way interests Ongoing, sporadic interaction Creation of joint work products 1 Schleyer T, Butler BS, Song M and Spallek, H. 2012. Conceptualizing and advancing research networking systems. ACM Trans. Comput.-Hum. Interact. 19, 1, Article 2 (March 2012), 26 pages.

7 4/29/2015DRAFT August 13, 20127 Research Networking Systems (RNS) 1 Within institutions VIVO Harvard Catalyst Stanford CAP Across institutions Distributed Interoperable Research Experts Collaboration Tool (DIRECT) as a federated search tool that leverages the “within instution tools” Research Gate Epernicus Academia.edu BioMed Experts (Elsevier) Elsevier SciVal ® Experts Nature Network 1 Schleyer T, Butler BS, Song M and Spallek, H. 2012. Conceptualizing and advancing research networking systems. ACM Trans. Comput.-Hum. Interact. 19, 1, Article 2 (March 2012), 26 pages.

8 4/29/2015DRAFT August 13, 20128 Research Networking System Models

9 4/29/2015DRAFT August 13, 20129 Global Cancer Collaboratory (GCC) as RNS 1. Support individual researchers’ efforts to form and maintain optimal collaborative relationships – GCC will use VIVO as a tool to capture and store researcher information aggregated from cancer centers in India & United States. 2. For conducting productive research – GCC will be a repository for papers written, presentations & workshops produced. 3. Within a specific context – GCC focuses on support of international collaborations in cancer research.

10 4/29/2015DRAFT August 13, 201210 Global Cancer Collaboratory (GCC) as RNS

11 4/29/2015DRAFT August 13, 201211 Information Aggregation in India

12 4/29/2015DRAFT August 13, 201212 GCC Framework Socio-technical approach Bootstrap by starting as information aggregator VIVO  Using a combination of manual and automated methods – to pull in information from Indian cancer centers as well as from US cancer centers as matter of necessity.  Imperfect data, missing data are expected  OHSL, in partnership with CDAC, has established a VIVO environment [http://cdac-ohsl-vivo.cdac.in/vivo] as a core piece of a Research Network System to serve both countries with a view to foster the creation of team science consortia.  Discovered/developed tools to ease process of information extraction from existing web sites Confluence wiki for document and mind sharing Other logistical efforts (Rubayi later) Awareness of cultural, organizational & working style differences is critical

13 4/29/2015DRAFT August 13, 201213 Model: Early Internet Portals

14 4/29/2015DRAFT August 13, 201214 GCC Goals Demonstrate efficacy of VIVO to provide an efficient means of discovering potential international collaboration partners. Develop criteria & roadmap for researcher information to encourage institutional websites to be semantically compliant using shared ontologies. Establish metrics to assess the effectiveness of our methods

15 4/29/2015DRAFT August 13, 201215 GCC Activities Tasks: Standardization of data and terminology across Cancer Centers Explore sources of data for researchers; lowest hanging fruit model Explore sources of publication data with IndMED and medIND repositories included. To date: Sent SugarCRM profiles to CDAC for ingestion into VIVO Semi-automatically & manually extracted data from cancer sites in India and US Addressed legal concerns by our partners in India about web-scraping information from cancer center websites and repackaging for this project

16 4/29/2015DRAFT August 13, 201216 Explicit steps

17 4/29/2015DRAFT August 13, 201217 How to link RNS’ together? National network using Direct2Experts What about international networks?

18 4/29/2015DRAFT August 13, 201218 GCC Future Work To be done: Add in publications: PubMED search of with cancer as MESH Major Topic and [PL] India over last decade results in 4844 articles Investigate use of IndMED and medIND databases of publications in India Establish metrics to assess the effectiveness of our methods to Increase awareness of the potential for international collaboration Increase awareness of the role of institutions to expose researcher data that will benefit funding & research opportunities

19 4/29/2015DRAFT August 13, 201219 Rubayi Logistical challenges Communication Collaboration tools

20 4/29/2015DRAFT August 13, 201220 CDAC Technical Challenges & Lessons Learned Data extraction and conditioning Ontology for each cancer center

21 4/29/2015DRAFT August 13, 201221 DFCI Full Name, Specialization, Department, Interests

22 4/29/2015DRAFT August 13, 201222 Fred Hutch Full Name, Designation/Appointment, Division, Interests, Phone, email, Fax

23 4/29/2015DRAFT August 13, 201223 HCGOncology Cancer Center Only one profile can be accessed at a time

24 4/29/2015DRAFT August 13, 201224 Doctor Profile in HCGOncology Cancer Center Fig: Doctor’s Profile in HCGOcology 24

25 4/29/2015DRAFT August 13, 201225 Doctor Profile in HCGOncology Cancer Center Fig: Doctor’s Profile in HCGOcology 25 Name: Dr Sanjay Mishra Qualification: M.D. (RT) Specialisation: Radiation Oncology Location: Hubli Data Structure: class=“txtblue” is the label; class=“txtcont” is the content

26 4/29/2015DRAFT August 13, 201226 Doctor Profile in HCGOncology Cancer Center 26 Name:Dr. N.K.VinodQualification:AD, PDCCASpecialization:AnesthesiologistLocation:Bangalore Name:Dr.Prabha SeshacharQualification:MBBS, DASpecialization:AnesthesiologistLocation:Bangalore Name:Dr. H.C.RajeshQualification:MDSpecialization:AnesthesiologistYears of Experience:16 yrs Name:Dr. Gaurav DwivediQualification:MBBS, MDSpecialization:AnesthesiologistLocation:Delhi Name:Dr. Kshirod Kumar AcharyaQualification:MBBS, MSSpecialization:AnesthesiologistLocation:Cuttack Name:Dr. Ganesh NayakQualification:MSSpecialization:Cardio Thoracic SurgeryLocation:Bangalore Name:Dr. B C BommaiahQualification:MDSpecialization:CardiologistLocation:Bangalore Name:Dr. Kshitish Ch. MishraQualification:MBBS, MDSpecialization:Clinical OncologyLocation:Cuttack

27 4/29/2015DRAFT August 13, 201227 phpThumb.php?src=uploads/doctors_images/4f840e5181aa2.p ng& Name: Dr Sanjay Mishra Qualification: M.D. (RT) Specialisation: Radiation Oncology Location: Hubli …….. Structure of Data for Profile in HCGOncology Cancer Center Data of HCG Oncology site is present in the form of embedded tables. Every Profile is present in a separate page, so the structure of data and pages is difficult to retrieve using DEiXTo. CDAC has developed an extraction tool to get the data from this site. 27

28 4/29/2015DRAFT August 13, 201228 Researcher Profile In Dana Farber Cancer Institute Fig: Researcher’s Profile in Dana Farber 28

29 4/29/2015DRAFT August 13, 201229 Researcher Profile In Dana Farber Cancer Institute Fig: Researcher’s Profile in Dana Farber 29 A Gregory A. Abel, MD, MPH Medical Oncologist, Hematologic Oncology Clinical Interest Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders Â

30 4/29/2015DRAFT August 13, 201230 ="/directory/profile.asp?pgt=Gregory+A%2E+Abel%2C+MD%2C+MPH Gregory A. Abel, MD, MPH Medical Oncologist, Hematologic Oncology Clinical Interest Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders.... Structure of Data for Profile in Dana Farber Cancer Center 30

31 4/29/2015DRAFT August 13, 201231 A Gregory A. Abel, MD, MPH Medical Oncologist, Hematologic Oncology Clinical Interest Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders  Janet L. Abrahm, MD Palliative Medicine Physician, Palliative Care (Adult) Clinical Interests Palliative medicine, Symptom management, End-of-life care  Structure of Data for Profile in Dana Farber Cancer Center 31

32 4/29/2015DRAFT August 13, 201232 In Dana Farber Cancer Institute Profile data is present in structured form which DEiXTO is able to extract. Since Data is organized in Structured manner, we can extract data using “DEiXTo” Tool.DEiXTo Observation on Structure Profile Data Present in Dana Farber Cancer Center DEiXTo (or ΔEiXTo) is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate “extraction rules” (wrappers) that describe what pieces of data to scrape from a website. 32

33 4/29/2015DRAFT August 13, 201233 Kailash S. Sharma M.D., D.A. (Anesthesiology)../../images/anaesthesia/drsharma.jpg Designation: Director Academics TMC Area of Work: Anaesthesia Special Interests: Difficult Airway Monitoring Cancer Pain Email : rashmikailashsharma@yahoo.co.in mailto:rashmikailashsharma@yahoo.co.in Phone No. (+9122) 24177044 Structure of Profile Data Present in TATA Memorial Hospital 33

34 4/29/2015DRAFT August 13, 201234 Profile page data structure is not uniform. Insufficient data with profiles Format in which profiles are present are not uniformly structured. Data extracted manually Observation on Structure Profile Data Present in TATA Memorial Hospital 34

35 4/29/2015DRAFT August 13, 201235 References 1.http://www.dana-farber.org/http://www.dana-farber.org/ 2.http://www.hcgoncology.com/http://www.hcgoncology.com/ 3.http://tmc.gov.inhttp://tmc.gov.in 4.http://en.wikipedia.org/wiki/Web_scrapinghttp://en.wikipedia.org/wiki/Web_scraping 5.http://deixto.com/http://deixto.com/ 35

36 4/29/2015DRAFT August 13, 201236 Indo-US Cancer Collaboratory: A VIVO Pilot

37 4/29/2015DRAFT August 13, 201237 Data Extraction from Website Fig: Dana Farber Profiles 37 Wednesday, April 29, 2015 Data Extraction from DFCI and represented in CSV format

38 4/29/2015DRAFT August 13, 201238 Ontology Creation Fig: Create New Ontology 38 Wednesday, April 29, 2015

39 4/29/2015DRAFT August 13, 201239 Ontology Creation Success Fig: Creation of New Ontology is Success 39 Wednesday, April 29, 2015

40 4/29/2015DRAFT August 13, 201240 Class Creation Fig: Create New Class inside a Ontology 40 Wednesday, April 29, 2015

41 4/29/2015DRAFT August 13, 201241 Create New Link to Super Class Fig: Add a Super class link to this class 41 Wednesday, April 29, 2015

42 4/29/2015DRAFT August 13, 201242 Selection of Superclass Wednesday, April 29, 2015 42

43 4/29/2015DRAFT August 13, 201243 Create New Link to Super class Fig: Select a class as Super class from dropdown list 43 Wednesday, April 29, 2015

44 4/29/2015DRAFT August 13, 201244 Create New Link to Super Class Fig: Identify the super class link at cursor position 44 Wednesday, April 29, 2015

45 4/29/2015DRAFT August 13, 201245 Create Data Property Fig: Create a Data property inside class 45 Wednesday, April 29, 2015

46 4/29/2015DRAFT August 13, 201246 Data Property Created Fig: Data Property Created successfully 46 Wednesday, April 29, 2015

47 4/29/2015DRAFT August 13, 201247 Create Object Property Fig: Create a New Object Property 47 Wednesday, April 29, 2015

48 4/29/2015DRAFT August 13, 201248 Model Creation Fig: Creation of Models to get the URI’s 48 Wednesday, April 29, 2015

49 4/29/2015DRAFT August 13, 201249 After Model Creation Fig: Models Created Successfully 49 Wednesday, April 29, 2015

50 4/29/2015DRAFT August 13, 201250 Convert CSV to RDF 50 Wednesday, April 29, 2015

51 4/29/2015DRAFT August 13, 201251 Convert CSV to RDF Fig: Convert CSV File to RDF 51 Wednesday, April 29, 2015

52 4/29/2015DRAFT August 13, 201252 Wednesday, April 29, 2015 52 a ; "Hematologic Oncology" ; "Edwin" ; "Edwin P. Alyea III,MD" ; "Stem cell/ bone marrow transplant, Leukemia" ; "Alyea" ; "P." ; "Dana Farber" ; "III,MD" ; "Medical Oncologist". Subject Predicate Object Predicate Object Ingested Data URI's

53 4/29/2015DRAFT August 13, 201253 SPARQL Query >. Fig: SPARQL Query to Construct data 53 Wednesday, April 29, 2015 }

54 4/29/2015DRAFT August 13, 201254 Execute SPARQL Query Fig: Execution of Constructed SPARQL query 54 Wednesday, April 29, 2015

55 4/29/2015DRAFT August 13, 201255 Execute SPARQL Query Fig: SPARQL Query executed successfully 55 Wednesday, April 29, 2015

56 4/29/2015DRAFT August 13, 201256 Upload RDF Fig: Upload RDF file 56 Wednesday, April 29, 2015 Fig: RDF Successfully Uploaded

57 4/29/2015DRAFT August 13, 201257 View Uploaded Profiles Fig: Upload Profiles 57 Wednesday, April 29, 2015 Click to View Profile

58 4/29/2015DRAFT August 13, 201258 Juliusz: Poznan Supercomputing and Networking Center Semantic web and interoperability

59 4/29/2015DRAFT August 13, 201259 Future work Incorporate medIND and IndMED biomedical journal databases as well as PubMED into VIVO.


Download ppt "4/29/2015DRAFT August 13, 20121 DRAFT Building Global Science Collaboratories VIVO 2012 Conference Workshop August 22, 1:00 pm – 4:30 pm."

Similar presentations


Ads by Google