Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
MS Access.
Information Resources Management January 23, 2001.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Managing Data Resources
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Chapter 1: The Database Environment
Deploying Visual Studio Team System 2008 Team Foundation Server at Microsoft Published: June 2008 Using Visual Studio 2008 to Improve Software Development.
Microsoft Access Ervin Ha.
FTP. SMS based FTP Introduction Existing System Proposed Solution Block Diagram Hardware and Software Features Benefits Future Scope Conclusion.
Online Job Portal with Exam
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Overview of SQL Server Alka Arora.
Credit Card Fraud Detection
M icrosoft Data Warehousing - SQL Server State of the Technology Presentation by Sujata Angara Nakul Johri Sang Ho Park.
SednaSpace A software development platform for all delivers SOA and BPM.
Fundamentals of Information Systems, Fifth Edition
Global File Reader. Agenda Introduction Current Scenario Proposed Solution Block Diagram Technical Implementation Hardware & Software Requirements Benefits.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Group Members: Group Members:.  Introduction  Current Scenario  Proposed Solution  Block Diagram  Technical Implementation  Hardware & Software.
© 2008 IBM Corporation ® Atlas for Lotus Connections Unlock the power of your social network! Customer Overview Presentation An IBM Software Services for.
Group Members:  Group Members: . A GENDA Overview Current Scenario Proposed Solution Block Diagram Flowchart Technical Implementation Hardware & Software.
EZee iCafe System. Contents Introduction Current Scenario Proposed Solution Architecture / Block Diagram Hardware / Software Requirements Features Benefits.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Agenda Introduction Current Scenario Proposed Solution Block Diagram Technical Implementation Hardware & Software Requirements Benefits / Advantages Features.
Project Title: Billing System And Stock Management Created by: Guided by: Shivani Kanakhara Prof. Darshan Upadhyay Pooja Raja Mansi Vyas.
Member 1Member 2Member 3Member 4. Agenda Introduction Current Scenario Proposed Solution Block Diagram Technical Implementation Hardware & Software Requirements.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Re- Evaluation System Marks Marks Re-Evaluation System.
Data Driven Designs 99% of enterprise applications operate on database data or at least interface databases. Most common DBMS are Microsoft SQL Server,
Facilitating Document Annotation using Content and Querying Value.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
LOGO Intellisense Notepad Member 1 Member 2 Member 3 Member 4.
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
Database Basics BCIS 3680 Enterprise Programming.
Database Connectivity with ASP.NET. 2 Introduction Web pages commonly used to: –Gather information stored on a Web server database Most server-side scripting.
IS6146 Databases for Management Information Systems Lecture 1: Introduction to IS6146 Rob Gleasure robgleasure.com.
Facilitating Document Annotation Using Content and Querying Value.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
BUILDING THE INFORMATION INFRASTRUCTURE. The Challenge  Information understanding through increased context and consistency of definition.  Information.
UNIVERSITY MANAGEMENT SYSTEM
Managing Data Resources File Organization and databases for business information systems.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Management Information & Evaluation System
Systems Analysis and Design in a Changing World, Fifth Edition
Database System Concepts and Architecture
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Created by Kamila zhakupova
Introduction to SharePoint 2007
Data Warehouse.
CS 174: Server-Side Web Programming February 12 Class Meeting
Database Management System (DBMS)
Chapter 1 Database Systems
Chapter 6 System and Application Software
Database Systems Instructor Name: Lecture-3.
Enterprise Program Management Office
One Language. One Enterprise.™
Chapter 1 Database Systems
DATABASES WHAT IS A DATABASE?
Chapter 6 System and Application Software
Chapter 6 System and Application Software
Chapter 6 System and Application Software
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Implementing a Distributed Enterprise Architecture to Deliver BI
Presentation transcript:

Ihr Logo Data Explorer - A data profiling tool

Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution  Project Scope  Block Diagram  Implementation  Technology  Hardware and Software Requirements  Features and Benefits  Future Enhancement

Your Logo Introduction (1/2)  Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data.  Data profiling is an analysis of the candidate data sources for a data warehouse to clarify the structure, content, relationships and derivation rules of the data. Profiling helps to understand anomalies and to assess data quality, but also to discover, register, and assess enterprise metadata.  The purpose of data profiling is both to validate metadata when it is available and to discover metadata when it is not.  The result of the analysis is used both strategically, to determine suitability of the candidate source systems and give the basis for an early go/no-go decision, and tactically, to identify problems for later solution design, and to level sponsors’ expectations. Data Profiling

Your Logo Introduction (2/2)  Find out whether existing data can easily be used for other purposes  Improve the ability to search the data by tagging it with keywords, descriptions, or assigning it to a category  Give metrics on data quality, including whether the data conforms to particular standards or patterns  Assess the risk involved in integrating data for new applications, including the challenges of joins  Assess whether metadata accurately describes the actual values in the source database  Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns. Pourpose of Data Profiling

Your Logo Existing System  Initially the data Profiling activities used to be done by writing complicated SQL queries  This would be comfortable for analyst or user who knows to write SQL queries  Many of us do not know the proper syntax and format for writing SQL queries  To overcome this, Data Profiling tools were introduced  Data Profiling Tools, to a some extent overcome the limitations for writing complex queries  All types of profiling activities were not supported by the tools  User has to understand and learn how to use the tool

Your Logo Limitations of Existing System  Development time is more.  Need to understand the functionality for developing the queries.  Results needs to be exported to excel or notepad for anlysis  Traditional Approach  Complex User Interface  Limited Functionality.  Setup and Installation.  License Cost.  Minimum Server Requirements SQL Queries Existing Tools

Your Logo Proposed Solution  Developing an Application performing all the types of profiling  Easy to use interface  Minimum system requirements  Feature to export the profiling results data to excel  Additional feature to indicate the Data Quality i.e. Data Quality Indicator  Supporting multiple Databases like Oracle 10g, Oracle 11g, MS SQL Server 2005, MS SQL Server 2008, My SQL etc

Your Logo Project Scope  Keeping the Time Line and other factors in mind, the project will currently support only MS SQL Server  Also the project will have following types of Profiling: Column Profiling Empty Column Analysis Null Rule Analysis Constant Analysis Frequency Analysis Uniqueness Analysis Primary/Composite Key Analysis

Your Logo Architecture Diagram Analysis Team Management Business Users Data Explorer Data Profiling Central Metadata Repository Capture Issues and Notes MS SQL Server Other Databases Reporting

Your Logo Implementation  The project will be implemented module wise.  Project will be having different modules. Each module will be developed individually and Unit Tested  After completion of all the modules and unit testing, all the modules will be integrated and System Integration Testing will be performed  There will be separate modules for Databases retrieval from server, Tables retrieval after selecting a database, Columns retrieval after selecting a Table  There will be separate module for each type of profiling discussed.

Your Logo Implementation - Profiling Details  Column Profiling  This will help in discovering total no of records, null percentage, unique percentage, minimum and maximum value in the column, documented data type etc.  Constant Analysis  This will help in discovering those columns which has less than 4 and greater than 0 distinct values.  Null Rule Analysis  This will help in finding all the columns in a table which has 100% NULL values

Your Logo Implementation - Profiling Details  Unique Analysis  This will help in finding all the columns in table which has 100% uniqueness.  Primary Key / Composite Key Analysis  It will help us to find out the possible primary or composite key columns which can be have unique combination.  Frequency Analysis  This will help in finding the no. of distinct values in the columns and the no. of time the value is repeated in a column.

Your Logo Technology  Data Explorer will be developed on.NET platform using C# as a coding language. .NET is Microsoft platform for developing advanced and Robust applications .NET supports a wide range of library classes which eases the development efforts and hence more time can be utilized in other activities .NET is called Language Independent Platform as it support 4 native languages and 21 non-native languages.  Native Languages are a Microsoft created languages i.e. C#. VB.Net. J#, VC++  Non-Native or Non Microsoft Languages supported are Pearl, Ruby on Rails etc

Your Logo Hardware and Software Requirements Data Explorer Pentium Core 2 Duo processor or above 2 GB RAM 20 GB HDD Printer Router for Internet Connection Windows 2000/ Windows XP/ Windows Vista/ Windows 7 Microsoft.NET Framework 3.5 Microsoft Visual Studio 2008

Your Logo Features  Supports multiple databases like MS SQL Server, Oracle  Different types of profiling like Column Profiling Constant Analysis Unique Analysis Null Rule Analysis Frequency Analysis Empty Column Analysis Primary / Composite Key Analysis  Quickly Analyze and validate data issues

Your Logo Benefits  Quick discovery of data issues  No more writing of queries to profile data  Time efficient  Shorten the implementation cycle of major projects  Improve understanding of data for the users  Discovering business knowledge  Improves data accuracy in corporate databases

Your Logo Future Enhancement  Data Explorer can be further extended to support unstructured or semi-structured data like flat files,.csv files  It can also be extended to support other relation data bases like MS Access, MySQL, Sybase etc Time efficient  It can also be enhanced by including Data Quality reports on top of Data Quality Results  There can be mechanism to store the profiling results so that it can be used or referred later at any point of time

Ihr Logo Thank You