Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education & Scholarly Communication External Research Division.

Similar presentations


Presentation on theme: "Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education & Scholarly Communication External Research Division."— Presentation transcript:

1 Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education & Scholarly Communication External Research Division Microsoft Corporation

2 Division within Microsoft Research focused on partnerships between academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computing Supporting groundbreaking research to help advance human potential and the wellbeing of our planet Developing advanced technologies and services to support every stage of the research process Microsoft External Research is committed to interoperability and to providing open access, open tools, and open technology

3 Mission Optimize and extend Microsoft software to meet the specific needs of the academic community Our approach: Conduct applied projects to enhance academic productivity by evolving Microsofts scholarly communication offerings Microsoft External Research is uniquely positioned to drive this initiative across Microsoft

4 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Data Collection, Research & Analysis Authoring Publication & Dissemination Storage, Archiving & Preservation Collaboration SharePoint LiveMeeting Office Live Office OpenXML XPS Format SQL Server & Entity Framework Rights Management Data Protection Manager Office 2010: Word PowerPoint Excel OneNote Tablet PC/UMPC Word PowerPoint 2010 WPF & Silverlight Sea Dragon / PhotoSynth / Deep Zoom Excel 2010 Windows Server HPC Astoria / Pop Fly The Scholarly Communication Lifecycle Discoverability FAST MSR Academic Search Bookweb SharePoint 2010

5 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Interoperability is essential – Actively lobby and drive for consensus around technical standards and standardized protocols proactively adopted by the community; enable broad community engagement Customers have told Microsoft that interoperability is OUR responsibility Leverage Existing Community Protocols, Practices, Guidelines, etc. – Example – metadata conventions / taxonomies / ontologies: a traditional strength for libraries – and a critical component in enabling Web 2.0 Optimize for data-driven research – To both data (scientific) and to information (scholarly publications) – Reproducible research + computational science – Properly document / annotate scholarly output Data preservation (and provenance) should be baseline – Documentation of the datas provenance – Preservation needs to be like accessibility features – i.e., assumed as required Semantic knowledge discovery & social networking – Harnessing collective intelligence must be a consideration – since accessing research is a core step in the life-cycle. Enable knowledge discovery – Optimize for Web 2.0 scenarios and allow end-users/experts to find things easier

6 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Open Science Open Access Open Source Open Data In order to help catalyze and facilitate the growth of advanced CI, a critical component is the adoption of open access policy for data, publications and software. NSF Advisory Committee on Cyberinfrastructure (ACCI) Microsoft Interoperability Principles Open Connections to Microsoft Products Support for Standards Data Portability Open Engagement

7 DataCite is an international consortium to establish easier access to scientific research data on the Internet increase acceptance of research data as legitimate, citable contributions to the scientific record, and to support data archiving that will permit results to be verified and re- purposed for future study. The Open Planets Foundation has been established to provide practical solutions and expertise in digital preservation, building on the 15 million investment made by the European Union and Planets consortium. OPF members benefit from the Planets results, new developments and the growing OPF community that includes experts at some of the most prestigious research, technology and memory institutions in Europe. The Confederation of Open Access Repositories (COAR) is a not-for-profit association of repository initiatives launched in October It aims to enhance greater visibility and application of research outputs through global networks of Open Access digital repositories. The Coalition for Networked Information (CNI) is an organization dedicated to supporting the transformative promise of networked information technology for the advancement of scholarly communication and the enrichment of intellectual productivity. Membership includes some 200 institutions representing higher education, publishing, network and telecommunications, information technology, and libraries and library organizations. ICSTI, the International Council for Scientific and Technical Information, offers a unique forum for interaction between organizations that create, disseminate and use scientific and technical information. ICSTI's mission cuts across scientific and technical disciplines, as well as international borders, to give member organizations the benefit of a truly global community. CrossRef is a not-for-profit membership association whose mission is to enable easy identification and use of trustworthy electronic content by promoting the cooperative development and application of a sustainable infrastructure. CrossRef's general purpose is to promote the development and cooperative use of new and innovative technologies to speed and facilitate scholarly research.

8 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License

9 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Source code and binary: Services: Connects to GenePattern database Data: Resulting data (and provenance) stored within Word document Data: Control and execute query pipelines into GenePattern Relationships: Inline graphics are synchronized to dataset

10 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Intent: Insert Creative Commons licenses from within Office 2007 Relationships: license information stored as RDF XML within the document OOXML Source code and binary: Services: Integrates with Creative Commons Web API to create new licenses

11 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Phil Bourne Lynn Fink Source code and binary: Relationships: Ontology browser Intent: Term recognition & disambiguation John Wilbanks Services: Ontology download web service

12 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Binary (version 2.0): Relationships: ORE Resource Map creation Structure: Read, convert, and author NLM XML documents Structure: Client-side XML validation Services: repository deposit via SWORD This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Relationships: Citation lookup and reference management

13 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License

14 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Relationships: Navigate and link referenced chemistry Peter Murray- Rust Joe Townsend Jim Downing Available soon: Data: Semantics stored in Chemistry Markup Language Intent: Recognizes chemical dictionary and ontology terms Author/edit 1D and 2D chemistry. Change chemical layout styles. Intelligence: Verifies validity of authored chemistry

15 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Organize collection of individual workflow activities Author, Execute and Monitor Workflows Available now: Compose and modify workflows via drag & drop canvas View data products, performance metrics, and provenance data

16 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License

17 The Windows Azure platform offers a flexible, familiar environment for developers to create cloud applications and services. With Windows Azure, you can shorten your time to market and adapt as demand for your service grows. Windows Azure offers a platform that is easily implemented alongside your current environment.Windows Azure platform Offerings: – Windows Azure: operating system as an online service Windows Azure – Microsoft SQL Azure: fully relational cloud database solution Microsoft SQL Azure – Windows Azure platform AppFabric: connects cloud services and on-premises applications Windows Azure platform AppFabric – Microsoft Codename Dallas: information marketplace for data and web services Microsoft Codename Dallas

18 Microsoft "Dallas" is a service allowing developers and information workers to easily discover, purchase, and manage premium data subscriptions in the Windows Azure platform. – Dallas is an information marketplace that brings data, imagery, and real-time web services from leading commercial data providers and authoritative public data sources together into a single location, under a unified provisioning and billing framework. – Dallas APIs allow developers and information workers to consume this premium content with virtually any platform, application or business workflow. – More:

19 Excel Calculation Services (ECS) is the "engine" of Excel Services that loads the workbook, calculates in full fidelity with Microsoft Office Excel 2007, refreshes external data, and maintains sessions. Excel Web Access (EWA) is a Web Part that displays and enables interaction with the Microsoft Office Excel workbook in a browser by using Dynamic Hierarchical Tag Markup Language (DHTML) and JavaScript without the need for downloading ActiveX controls on your client computer, and can be connected to other Web Parts on dashboards and other Web Part Pages. Excel Web Services (EWS) is a Web service hosted in Microsoft Office SharePoint Services that provides several methods that a developer can use as an application programming interface (API) to build custom applications based on the Excel workbook. More: us/library/ms aspxhttp://msdn.microsoft.com/en- us/library/ms aspx

20 What is it? – The Open Data Protocol (OData) is a Web protocol for querying and updating data that provides a way to unlock your data and free it from silos that exist in applications today. OData does this by applying and building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores. The protocol emerged from experiences implementing AtomPub clients and servers in a variety of products over the past several years. HTTPAtom Publishing ProtocolJSON – OData is being used to expose and access information from a variety of sources including, but not limited to, relational databases, file systems, content management systems and traditional Web sites. – OData is consistent with the way the Web works - it makes a deep commitment to URIs for resource identification and commits to an HTTP-based, uniform interface for interacting with those resources (just like the Web). This commitment to core Web principles allows OData to enable a new level of data integration and interoperability across a broad range of clients, servers, services, and tools. – OData is released under the Open Specification Promise to allow anyone to freely interoperate with OData implementations.Open Specification Promise Find out more – & – Contact Pablo Castro / Blog:

21 The Open Government Data Initiative (OGDI) is a cloud-based collection of software assets that enables publicly available government data to be easily accessible. Using open standards and application programming interfaces (API), developers and government agencies can retrieve the data programmatically for use in new and innovative online applications, or mash-ups that can help: – Improve citizen services – Enhance collaboration between government agencies and private organizations – Increase government transparency OGDI promotes the use of this data by capturing and publishing re- usable software assets, patterns, and practices. The data repository already holds over 60 different government datasets that are readily available for use in new applications, and is continuously updated with additional government datasets. More:

22 In partnership with the California Digital Librarys Curation CenterCalifornia Digital Librarys Curation Center – In collaboration with Tricia Cruse & John Kunze – Part of the DataONE (an NSF DataNet Project)DataONE PROPOSED

23 Proposed functionality under consideration: Support for versioning, so that revision history and the original raw data can be easily protected and recovered, Standardized date/time stamps so that researchers can easily determine when the data were created and last updated. A workbook builder allowing researchers to select from globally shared standardized layouts for capturing data, Ability to export metadata in a standard format (e.g., a DataCite citation or an EML document that describes the dataset(s) in a workbook) so that researchers can readily share their data, Ability to select from a globally shared vocabulary of terms for data descriptions (e.g., column names), and as needed to add new terms to the globally shared vocabulary, to enable wide collaboration between researchers Ability to import term descriptions from the shared vocabulary and annotate them locally to refine their definitions as used in the dataset, Speed bumps to discourage use of macros and customizations that would impede interoperation of data imported from Excel into other applications, and Ability to deposit data and metadata directly into a data archive to enable compliance with funding agency requirements to preserve and publish research data. PROPOSED

24 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Lee Dirks DirectorEducation & Scholarly Communication Microsoft External Research URL – Facebook: Scholarly Communication at MicrosoftScholarly Communication at Microsoft


Download ppt "Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education & Scholarly Communication External Research Division."

Similar presentations


Ads by Google