Working With Large Datasets in Corporate Settings Ed Bassin www.profsoft-health.com.

Slides:



Advertisements
Similar presentations
Database Management Using Microsoft Access Xinhua Chen, Ph.D. Chinese Association of Professionals in Science and Technology March 23, 2003.
Advertisements

Components of GIS.
HTTP Request/Response Process 1.Enter URL ( in your browser’s address bar. 2.Your browser uses DNS to look up IP address of server.com.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
This presentation is intended as a detailed WebEx, to bring potential customers to an understanding of Dream Report capabilities. This presentation focuses.
Using R as enterprise-wide data analysis platform Zivan Karaman.
ISMT221 Information Systems Analysis and Design Prototyping with MS Access Lab 6 Tony Tam.
MSc IT UFIE8K-10-M Data Management Prakash Chatterjee Room 3P16
What is it? –Large Web sites that support commercial use cannot be written by hand What you’re going to learn –How a Web server and a database can be used.
Learning Web development. 3(+1) Tier architecture PHP script Remote services Web Server (Apache, IIS) Browser (IE, FireFox, Opera) Desktop (PC or MAC)
15 Chapter 15 Web Database Development Database Systems: Design, Implementation, and Management, Fifth Edition, Rob and Coronel.
Lecture Microsoft Access and Relational Database Basics.
The Hierarchy of Data Bit (a binary digit): a circuit that is either on or off Byte: 8 bits Character: each byte represents a character; the basic building.
Outline IS400: Development of Business Applications on the Internet Fall 2004 Instructor: Dr. Boris Jukic Server Side Web Technologies: Part 2.
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
Multiple Tiers in Action
Interpret Application Specifications
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
Lecture-8/ T. Nouf Almujally
+ Connecting to the Web Week 7, Lecture A. + Midterm Basics Thursday February 28 during Class The lab Tuesday, February 26 is optional review Class on.
Web to Database Connectivity Tools Frank Cervone Assistant Director for Systems DePaul University Libraries Access ‘98 October 3, 1998.
1 Web Database Processing. Web Database Applications Static Report Publishing a report is prepared from a database application and exported to HTML DB.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 24 How Websites Work with Databases How Websites Work with Databases.
MySQL GUI Administration Tools Rob Donahue Manager, Distributed Systems Development May 7th, 2001 Rob Donahue Manager, Distributed Systems Development.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Student Learning Environment on the World Wide Web l CGI-programming in Perl for the connection of databases over the Internet. l Web authoring using Frontpage.
Information Systems Chapter 5 Building the database Part 1. Unsing Access.
DB Libraries: An Alternative to DBMS By Matt Stegman November 22, 2005.
Using PostGIS and MapServer in the Census Interaction Data Service Presentation to AGI Technical SIG 'Open-Source in GIS' British Antarctic Survey, Cambridge,
Overview of Data Access MacDonald Ch. 15 MIS 324 Professor Sandvig.
M1G Introduction to Database Development 6. Building Applications.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Chapter 1 Introduction to SAS ® Enterprise Guide ®
Universiti Utara Malaysia Chapter 3 Introduction to ASP.NET 3.5.
Case 2: Emerson and Sanofi Data stewards seek data conformity
1 3. Computing System Fundamentals 3.1 Language Translators.
Database control Introduction. The Database control is a tool that used by the database administrator to control the database. To enter to Database control.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Team Members Team Members Tim Geiger Joe Hunsaker Kevin Kocher David May Advisor Dr. Juliet Hurtig November 8, 2001.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
Dynamic web content HTTP and HTML: Berners-Lee’s Basics.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Database Concepts Track 3: Managing Information using Database.
CS779 Term Project Steve Shoyer Section 5 December 9, 2006 Week 6.
Producing a high-impact web experience by integrate Macromedia Flash and ASP By Katie Tuttle CS 330: Internet Architecture and Programming Project.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Web-based Front End for Kraken Jing Ai Jingfei Kong Yinghua Hu.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Enterprise Database Systems Introduction to SQL Server Dr. Georgia Garani Dr. Theodoros Mitakos Technological.
5 Copyright © 2004, Oracle. All rights reserved. PL/SQL Server Pages.
Eurostat May 2016 Eurostat, Unit B3 – IT solutions for statistical production Test Client Jean-Francois LEBLANC Christian SEBASTIAN.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages. PHP is a widely-used, free, and efficient alternative.
Internet/Web Databases
Introduction to Dynamic Web Programming
PHP / MySQL Introduction
Web Browser server client 3-Tier Architecture Apache web server PHP
8 6 MySQL Special Topics A Guide to MySQL.
Chapter 10 ADO.
MIS2502: Data Analytics MySQL and MySQL Workbench
Web Application Development Using PHP
Presentation transcript:

Working With Large Datasets in Corporate Settings Ed Bassin

Background—About ProfSoft Medical/pharma claims analysis software – Main uses are provider profiling, quality analysis 14 clients range from 15K to 2.6M members – Databases from 900K to 110M claim lines Compete with Fortune 100 companies by stressing content, task-appropriate technology Stata is the core of our product – 25,000 lines of ado files – Stata do file generators

Challenges We Face Managing that quantity of data End-users are not statisticians – Want point-and-click tools – Do not understand complicated techniques Stata is largely unknown at our clients. SAS is the standard “heavy duty” data package. Integrating Stata with the technology of corporate America.

Why We Chose Stata Performance Relative ease of programming Chose for analytic capabilities, not UI I knew it reasonably well

Interfacing with databases Create_table ado reads Stata structure and writes appropriate SQL to build, load, and index tables – Write delimited text files with DBMS/Copy – Call native DBMS tools to load gigabytes of data – Support Oracle, Microsoft SQL Server, MySQL Execsql ado calls native DBMS tools to run SQL scripts Process is fast, easy, invisible

Web-Based, Point-and-click Stata Use PHP to write do files PHP executes Stata, calls do file Stata writes HTML and closes PHP page displays output End-user doesn’t know Stata in background Process can be both synch and asynch

Integrating Stata with Excel Excel is everyday app for our users Use Excel web queries to get to Stata – Build URL through forms or user actions – Two ways of getting Stata output to Excel Store Stata output in DBMS Run Stata jobs through PHP – Create HTML table & return results to Excel – Excel manipulates & formats Stata output

What Works Well Analytic flexibility Performance Calling Stata from a web server is easy Getting Stata datasets to HTML Integration with DBMS systems Hiding Stata from end-users

Lessons Learned Segment data as much as possible – Be prepared to write special programs to run routine statistical procedures – Stata statistical programs work with raw data, not aggregated data If missing data is not an issue, write your own egen or collapse routines Automate memory setting by examining structure of dataset you want to use DBMS/Copy to handle reading/writing of large datasets Version control with CVS

Problems Integration with other data formats – Infile, outfile are very slow for large datasets – DBMS/Analyst was not maintained for Stata 8 Limitations of merge command Abbreviations drive us nuts No IDE (integrated development environment) Stata datasets aren’t indexed Stata has no name in corporate America Recruiting Stata programmers