Presentation on theme: "4/29/2015DRAFT August 13, 20121 DRAFT Building Global Science Collaboratories VIVO 2012 Conference Workshop August 22, 1:00 pm – 4:30 pm."— Presentation transcript:
4/29/2015DRAFT August 13, 20121 DRAFT Building Global Science Collaboratories VIVO 2012 Conference Workshop August 22, 1:00 pm – 4:30 pm
4/29/2015DRAFT August 13, 20122 Workshop Faculty Anil Srivastava, President, Open Health Systems Laboratory (OHSL) co-located at Johns Hopkins University Montgomery County Campus, Shady Grove, MD, USA Paul Courtney, Project Manager, Dana-Farber Cancer Institute, Boston, MA, USA Ajai Kumar/Hemant Darbari/Swati Mehta/Vivek Koul, Center for Development of Advances Computing C-DAC, Pune, India Rubayi Srivastava, Project Manager, Open Health Systems Laboratory (OHSL), CA, USA Juliusz Pukacki, Poznan Supercomputing and Networking Center (PSNC), Poznan, Poland
4/29/2015DRAFT August 13, 20123 DRAFT Agenda 1:05 Anil - introductions 1:15 Anil – Background and Overview 1:50 Paul – Bootstrapping the global collaboratory (Methods) 2:30 – 2:45 Break 2:45 CDAC – Techniques and experiences in extracting data & transforming it for VIVO (Results) 3:30 Julius – Role of VIVO, Semantic Web and Linked Open Data in advancing global science collaboration & enabling collaboratories (Discussion) 4:15 Anil – Future work/discussant 4:30 Workshop Ends
4/29/2015DRAFT August 13, 20124 Faculty “assignments” Anil Provide context, history, mission & vision of OHSL, What programs & projects concern OHSL and VIVO fits into the portfolio Paul Provide vision of developing the Global Cancer Collaboratory, where it is going How this effort is connected with other informatics initiatives; historical context of caBIG, the NCI- NCRI informatics collaborations How is this different from simply putting up a VIVO instance at the OHSL campus in Shady Grove, MD? Incubating and nurturing connectivity across international boundaries requires a different “business model” than putting up an institutional VIVO site. Aggregating information (early web model) and the role of imperfect data (and its relationship to Tim Berners-Lee’s Linked Data model) Rubayi Challenges of providing project management support for an international program spanning 12+ time zones, needing to provide support for logistical and knowledge management for multiple platforms and differing levels of technological infrastructures. CDAC Technical challenges of obtaining the same information required from multiple sites Examples of what was easily available at some sites, what was difficult, how were the challenges addressed Julius Semantic Web and Linked Data
4/29/2015DRAFT August 13, 20125 Anil Background Current Indo-US collaboration projects underway Indo-US Cancer Research Grid
4/29/2015DRAFT August 13, 20126 Research Networking Systems (RNS) “…support individual researchers’ efforts to form and maintain optimal collaborative relationships for conducting productive research within a specific context.” 1 Criteria: Involve shared 2-way interests Ongoing, sporadic interaction Creation of joint work products 1 Schleyer T, Butler BS, Song M and Spallek, H. 2012. Conceptualizing and advancing research networking systems. ACM Trans. Comput.-Hum. Interact. 19, 1, Article 2 (March 2012), 26 pages.
4/29/2015DRAFT August 13, 20127 Research Networking Systems (RNS) 1 Within institutions VIVO Harvard Catalyst Stanford CAP Across institutions Distributed Interoperable Research Experts Collaboration Tool (DIRECT) as a federated search tool that leverages the “within instution tools” Research Gate Epernicus Academia.edu BioMed Experts (Elsevier) Elsevier SciVal ® Experts Nature Network 1 Schleyer T, Butler BS, Song M and Spallek, H. 2012. Conceptualizing and advancing research networking systems. ACM Trans. Comput.-Hum. Interact. 19, 1, Article 2 (March 2012), 26 pages.
4/29/2015DRAFT August 13, 20128 Research Networking System Models
4/29/2015DRAFT August 13, 20129 Global Cancer Collaboratory (GCC) as RNS 1. Support individual researchers’ efforts to form and maintain optimal collaborative relationships – GCC will use VIVO as a tool to capture and store researcher information aggregated from cancer centers in India & United States. 2. For conducting productive research – GCC will be a repository for papers written, presentations & workshops produced. 3. Within a specific context – GCC focuses on support of international collaborations in cancer research.
4/29/2015DRAFT August 13, 201210 Global Cancer Collaboratory (GCC) as RNS
4/29/2015DRAFT August 13, 201211 Information Aggregation in India
4/29/2015DRAFT August 13, 201212 GCC Framework Socio-technical approach Bootstrap by starting as information aggregator VIVO Using a combination of manual and automated methods – to pull in information from Indian cancer centers as well as from US cancer centers as matter of necessity. Imperfect data, missing data are expected OHSL, in partnership with CDAC, has established a VIVO environment [http://cdac-ohsl-vivo.cdac.in/vivo] as a core piece of a Research Network System to serve both countries with a view to foster the creation of team science consortia. Discovered/developed tools to ease process of information extraction from existing web sites Confluence wiki for document and mind sharing Other logistical efforts (Rubayi later) Awareness of cultural, organizational & working style differences is critical
4/29/2015DRAFT August 13, 201213 Model: Early Internet Portals
4/29/2015DRAFT August 13, 201214 GCC Goals Demonstrate efficacy of VIVO to provide an efficient means of discovering potential international collaboration partners. Develop criteria & roadmap for researcher information to encourage institutional websites to be semantically compliant using shared ontologies. Establish metrics to assess the effectiveness of our methods
4/29/2015DRAFT August 13, 201215 GCC Activities Tasks: Standardization of data and terminology across Cancer Centers Explore sources of data for researchers; lowest hanging fruit model Explore sources of publication data with IndMED and medIND repositories included. To date: Sent SugarCRM profiles to CDAC for ingestion into VIVO Semi-automatically & manually extracted data from cancer sites in India and US Addressed legal concerns by our partners in India about web-scraping information from cancer center websites and repackaging for this project
4/29/2015DRAFT August 13, 201216 Explicit steps
4/29/2015DRAFT August 13, 201217 How to link RNS’ together? National network using Direct2Experts What about international networks?
4/29/2015DRAFT August 13, 201218 GCC Future Work To be done: Add in publications: PubMED search of with cancer as MESH Major Topic and [PL] India over last decade results in 4844 articles Investigate use of IndMED and medIND databases of publications in India Establish metrics to assess the effectiveness of our methods to Increase awareness of the potential for international collaboration Increase awareness of the role of institutions to expose researcher data that will benefit funding & research opportunities
4/29/2015DRAFT August 13, 201219 Rubayi Logistical challenges Communication Collaboration tools
4/29/2015DRAFT August 13, 201220 CDAC Technical Challenges & Lessons Learned Data extraction and conditioning Ontology for each cancer center
4/29/2015DRAFT August 13, 201221 DFCI Full Name, Specialization, Department, Interests
4/29/2015DRAFT August 13, 201222 Fred Hutch Full Name, Designation/Appointment, Division, Interests, Phone, email, Fax
4/29/2015DRAFT August 13, 201223 HCGOncology Cancer Center Only one profile can be accessed at a time
4/29/2015DRAFT August 13, 201224 Doctor Profile in HCGOncology Cancer Center Fig: Doctor’s Profile in HCGOcology 24
4/29/2015DRAFT August 13, 201225 Doctor Profile in HCGOncology Cancer Center Fig: Doctor’s Profile in HCGOcology 25 Name: Dr Sanjay Mishra Qualification: M.D. (RT) Specialisation: Radiation Oncology Location: Hubli Data Structure: class=“txtblue” is the label; class=“txtcont” is the content
4/29/2015DRAFT August 13, 201226 Doctor Profile in HCGOncology Cancer Center 26 Name:Dr. N.K.VinodQualification:AD, PDCCASpecialization:AnesthesiologistLocation:Bangalore Name:Dr.Prabha SeshacharQualification:MBBS, DASpecialization:AnesthesiologistLocation:Bangalore Name:Dr. H.C.RajeshQualification:MDSpecialization:AnesthesiologistYears of Experience:16 yrs Name:Dr. Gaurav DwivediQualification:MBBS, MDSpecialization:AnesthesiologistLocation:Delhi Name:Dr. Kshirod Kumar AcharyaQualification:MBBS, MSSpecialization:AnesthesiologistLocation:Cuttack Name:Dr. Ganesh NayakQualification:MSSpecialization:Cardio Thoracic SurgeryLocation:Bangalore Name:Dr. B C BommaiahQualification:MDSpecialization:CardiologistLocation:Bangalore Name:Dr. Kshitish Ch. MishraQualification:MBBS, MDSpecialization:Clinical OncologyLocation:Cuttack
4/29/2015DRAFT August 13, 201227 phpThumb.php?src=uploads/doctors_images/4f840e5181aa2.p ng& Name: Dr Sanjay Mishra Qualification: M.D. (RT) Specialisation: Radiation Oncology Location: Hubli …….. Structure of Data for Profile in HCGOncology Cancer Center Data of HCG Oncology site is present in the form of embedded tables. Every Profile is present in a separate page, so the structure of data and pages is difficult to retrieve using DEiXTo. CDAC has developed an extraction tool to get the data from this site. 27
4/29/2015DRAFT August 13, 201228 Researcher Profile In Dana Farber Cancer Institute Fig: Researcher’s Profile in Dana Farber 28
4/29/2015DRAFT August 13, 201229 Researcher Profile In Dana Farber Cancer Institute Fig: Researcher’s Profile in Dana Farber 29 A Gregory A. Abel, MD, MPH Medical Oncologist, Hematologic Oncology Clinical Interest Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders Â
4/29/2015DRAFT August 13, 201230 ="/directory/profile.asp?pgt=Gregory+A%2E+Abel%2C+MD%2C+MPH Gregory A. Abel, MD, MPH Medical Oncologist, Hematologic Oncology Clinical Interest Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders.... Structure of Data for Profile in Dana Farber Cancer Center 30
4/29/2015DRAFT August 13, 201231 A Gregory A. Abel, MD, MPH Medical Oncologist, Hematologic Oncology Clinical Interest Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders Â Janet L. Abrahm, MD Palliative Medicine Physician, Palliative Care (Adult) Clinical Interests Palliative medicine, Symptom management, End-of-life care Â Structure of Data for Profile in Dana Farber Cancer Center 31
4/29/2015DRAFT August 13, 201232 In Dana Farber Cancer Institute Profile data is present in structured form which DEiXTO is able to extract. Since Data is organized in Structured manner, we can extract data using “DEiXTo” Tool.DEiXTo Observation on Structure Profile Data Present in Dana Farber Cancer Center DEiXTo (or ΔEiXTo) is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate “extraction rules” (wrappers) that describe what pieces of data to scrape from a website. 32
4/29/2015DRAFT August 13, 201233 Kailash S. Sharma M.D., D.A. (Anesthesiology)../../images/anaesthesia/drsharma.jpg Designation: Director Academics TMC Area of Work: Anaesthesia Special Interests: Difficult Airway Monitoring Cancer Pain Email : email@example.com mailto:firstname.lastname@example.org Phone No. (+9122) 24177044 Structure of Profile Data Present in TATA Memorial Hospital 33
4/29/2015DRAFT August 13, 201234 Profile page data structure is not uniform. Insufficient data with profiles Format in which profiles are present are not uniformly structured. Data extracted manually Observation on Structure Profile Data Present in TATA Memorial Hospital 34
Your consent to our cookies if you continue to use this website.