Where’s My Data? Using MetriDoc to manage data integration headaches Joe Zucca– Tommy Barker –

Slides:



Advertisements
Similar presentations
Argos Moving Quickly into the Community
Advertisements

University of Arkansas Division of Agriculture Cooperative Extension Service Argos - Moving Quickly into the Community Presenter: Bruce.
How to Author Teaching Files Draft Medical Imaging Resource Center.
LOCALIZED REFERENCE LINKING PROJECT Dale Flecker NFAIS/NISO Linking Workshop February 24, 2002 Philadelphia.
Localization and Extended Services NFAIS/NISO Linking Workshop February 24, 2002 Miriam Blake Los Alamos National Laboratory.
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Freedom by design OL 2 Stephanie Taylor Project Manager.
OAI and Publishers metadata Using the static repositories approach to disclose small journals.
Websydian Anne-Marie Arnvig Manager, Websydian Communications & Relations.
Websydian products.
Welcome to Middleware Joseph Amrithraj
OhioNET EZProxy Service
Tony Melvyn Product Manager OCLC Delivery Services Enhancement Overview for ALI, Academic Libraries of Indiana March 11, 2011.
Bboogle Teams: Supporting Small Group Communications through Google Apps Integration with the Blackboard Learn Platform Jonathan Smith, Software Architect,
METALOGIC s o f t w a r e © Metalogic Software Corporation DACS Developer Overview DACS – the Distributed Access Control System.
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Building Frameworks of Organizational Intelligence Joe Zucca Director for Planning and Communication University of Pennsylvania Libraries.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Introduction to ASP.NET. 2 © UW Business School, University of Washington 2004 Outline Static vs. Dynamic Web Pages.NET Framework Installing ASP.NET First.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Master’s course Bioinformatics Data Analysis and Tools Lecture 6: Internet Basics Centre for Integrative Bioinformatics.
Revising Riverbot Outline and Specifications Christian Skalka.
8/28/2001Database Management -- Fall R. Larson Database Management: Introduction University of California, Berkeley School of Information Management.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
Peoplesoft: Building and Consuming Web Services
12/6/06 1 Hofstra University - CSC005 Special Topics LAMP Technologies.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
HADOOP ADMIN: Session -2
Towards Bboogle 3.0.0: a Technical Walkthrough Patricia Goldweic Sr. Software Engineer AR&T, Northwestern University Brian Nielsen Manager, Faculty Support.
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
MIRC Refresher Course: New Developments Medical Imaging Resource Center.
ISYS 573 Special Topic – VB.Net David Chao. The History of VB Early 1960s:BASIC-Beginner’s All-Purpose Symbolic Instruction Code –Teaching –Simple syntax,
Library Services on the Expedia Model Access 2004: October 15, 2004 BC Electronic Library Network John Durno.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Platinu m Sponsor s Silver Sponsors Gold Sponsor s.
Emerging Uses for the OpenURL Framework Ann Apps and Ross MacIntyre MIMAS, The University of Manchester.
Plumbing and Counting… Joe Zucca Assessment, Planning and Publications Librarian University of Pennsylvania Plumbing and Counting…An Update on the Penn.
Syllabus Management System. The Problem There is need for a management system for syllabi that: Provides a simple and effective user interface Allows.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
METRIDOC: A Framework for Library Business Intelligence With the support of University of Pennsylvania Libraries.
Microsoft.NET Norman White Stern School of Business.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
UMBC’s WebAuth Robert Banz – UMBC
Enterprise PHP - Introduction Enterprise Client-Server Development with PHP Nic Shulver, FCES, Staffordshire University A fifteen credit module based on.
ASP.NET (Active Server Page) SNU OOPSLA Lab. October 2005.
METRIDOC: A Framework for Managing and Exposing Library Event Data With the support of University of Pennsylvania Libraries.
Web Browsing *TAKE NOTES*. Millions of people browse the Web every day for research, shopping, job duties and entertainment. Installing a web browser.
Using SQL for Patron Card Expiration Reminders For Norcal IUG – Nov. 20, 2015 At the Berkeley Public Library.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Oct HPS Collaboration Meeting Jeremy McCormick (SLAC) HPS Web 2.0 OR Web Apps and Databases (Oh My!) Jeremy McCormick (SLAC)
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
A Detailed Introduction To Visual Studio.NET CRB Tech ReviewsCRB Tech Reviews introduces you to the world of Visual.
Arklio Studija 2007 File: / / Page 1 Automated web application testing using Selenium
XNAT 1.7: Getting Started 6 June, Introduction In this presentation we’ll discuss:  Features and functions in XNAT 1.7  Requirements  Installing.
Continuous Integration (CI)
Netscape Application Server
Docker Birthday #3.
Platform as a Service.
PHP / MySQL Introduction
Web Development Using ASP .NET
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
CIS16 Application Development – Programming with Visual Basic
Module 01 ETICS Overview ETICS Online Tutorials
Developing and testing enterprise Java applications
Presentation transcript:

Where’s My Data? Using MetriDoc to manage data integration headaches Joe Zucca– Tommy Barker – Sponsored by

The Problem The request seems simple but the solution is complex Generally asked “who did / used x?” which leads to other questions Where’s the data? What’s the grain of the answer? So how do we answer these questions? If lucky, run script / query against a database and generate report If not lucky, build an application to answer the question This is what MetriDoc is built for

Current Solution - Datafarm Datafarm = Crontab + Perl + CGI = Spaghetti Voyager Blackboard COUNTER DLA logs Datafarm Gate Count Ezproxy Penn Community Borrow Direct App 1 App 3 App 2 App n

Datafarm Shortcomings Maintainability issues Not shareable Not reusable

MetriDoc = Datafarm 2.0 As our system grew, we began creating MetriDoc to address Datafarm’s problems Needed a scheduler that was more sophisticated than cron Needed languages that were more maintainable than perl Needed integration tools to simplify data gathering across disparate systems We built prototypes and services to help us evaluate technologies Received a grant from IMLS to speed up development Hired another programmer

MetriDoc Philosophy Keep it simple Sometimes a script is all you need Ease of use is more important than performance Don’t recreate the wheel 100% open source Sharable data

MetriDoc – How it Works MetriDoc’s core is built around database schemas A MetriDoc implementation consists of loading tables and normalized tables Loading tables prime the repository The user is responsible for populating these tables Normalized tables are built from the data in the loading tables MetriDoc takes care of this Conforming to similar schemas provides interesting possibilities Sharing data is easy Sharing a single repository is easy (think amazon web services) Easier to collaborate From a user’s perspective MetriDoc has tools to get your stuff in the loading tables But ultimately you just need to get it in there, so you can use whatever Use the MetriDoc tools to manage your integration needs Useful for getting, transforming / resolving, moving and loading data

MetriDoc – Core Technologies JVM Java is used for infrastructure Groovy is the primary language Master Scheduler Essentially the brains of MetriDoc Using Hudson for now ( Integration Tooling Tooling built on top of Apache Camel ( Helps move data from one place to another Really helpful for batch processing Resolutions / Transformation Tools Patron anonymization, text normalization, resource id to title resolutions, etc.

The Metridoc Solution Metridoc = Hudson + Java / Groovy + Apache Camel = Integration Nirvana Step 1 – Fill the loading tables Load Ezproxy Load Patron Info Load Counter Hudson Loading Tables Voyager Ezproxy COUNTER

Loading Tables ||Philadelphia||PA||United States||Default+datasets+documents+pwp+vanwert||jsmith||[19/Jan/2011:00:01: ]||GET|| 10X%2329%23266%232&_version=1&md5=8e47306a7f3a7da8a6fe7b521a7a149b||302||0|| cine&volume=29&issue=2&date= &atitle=An+adjuvanted+pandemic+influenza+H1N1+vaccine+provides+early+and+long+term+protection+in+health+care+workers.& spage=266&sid=EBSCO:aph&pid=Madhun%2c+Abdullah+S.%3bAkselsen%2c+Per+Espen%3bSjursen%2c+Haakon%3bPedersen%2c+Gabriel%3bSvindland%2c+Signe%3bN%c3%b 8stbakken%2c+Jane+Kristin%3bNilsen%2c+Mona%3bMohn%2c+Kristin%3bJul- Larsen%2c+%c3%85sne%3bSmith%2c+Ingrid%3bMajor%2c+Diane%3bWood%2c+John%3bCox%2c+Rebecca+J aph||Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: ) Gecko/ Firefox/3.0.5 (.NET CLR )]||Re07OuEIyQo8X6w||UPennLibrary=AAAAAUkQ36AAAFTaAwO7Ag==; __utma= ; __utmc= ; __utmz= utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=upenn; WRUID=0; __utmv= |1=User-Type=Current%20Students=1,; __utma= ; __utmc= ; __utmz= utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=upenn%20blackboard; hp=/vanpelt/; __utma= ; __utmc= ; __utmz= utmcsr=library.upenn.edu|utmccn=(referral)|utmcmd=referral|utmcct=/biomed/; proxySessionID= ; ezproxy=Re07OuEIyQo8X6w; ARPT=MWPYIPS108CWYL; EHost2=sid=49d81d dbd-b94f- __utmb= ; __utmb= ; __utmb= ; ASPSESSIONIDCCAQQCRC=AHJAGJMDDPNIIMLMHBCPCHBL Patron_idPatron_ipurlRef_urlProxy_idEzproxy_id jsmith http://www…

The Metridoc Solution Metridoc = Hudson + Java / Groovy + Apache Camel = Integration Nirvana Step 2 – Populate the normalized tables Normalize Ezproxy Normalize Patron Info Normalize Counter Hudson Repository Loading Tables

Generally used for building software, but a fantastic cron replacement Can run arbitrary scripts locally and remotely Supports master / slave distribution model seamlessly Can be managed entirely via REST Extensible Helps with job dependencies It is simple and free Active community with a huge collection of plugins Jenkins – Death to Cron

A Little Groovy

The Metridoc Job Framework

Metrics on the Cheap

Where we are….