Visualization and analysis of clusters in large populations of fraud cases.

Slides:



Advertisements
Similar presentations
EiS – Education iT Services “Our passion in EiS is to make a real difference in education and ultimately children’s lives by providing innovative solutions.
Advertisements

Time & Frequency Products R. Peřestý, J. Kraus, SWRM 4 th Data Quality Workshop 2-5 December 2014 GFZ Potsdam Recent results on ACC Data Processing 1 SWARM.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Processing and Fundamental Data Analysis CHAPTER fourteen.
The Components There are three main components of inDepth Lite, inDepth and inDepth+ Real Time Component Reporting Package Configuration Tools.
Sta220 - Statistics Mr. Smith Room 310 Class #16.
We begin by subtracting noise (run flash lamp without HV) from signal (run flash lamp with HV) In the above for the cathode, we subtract the brown noise.
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
Bootstrapping LING 572 Fei Xia 1/31/06.
GL Assessment is part of the GL Education Group. In case of enquiries please contact GL Assessment by ing Summary presentation.
Chemometrics Method comparison
France : Improving checks in customs data OCDE – 7 November 2011.
T-test Mechanics. Z-score If we know the population mean and standard deviation, for any value of X we can compute a z-score Z-score tells us how far.
Sampling distributions for sample means IPS chapter 5.2 © 2006 W.H. Freeman and Company.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Health Datasets in Spatial Analyses: The General Overview Lukáš MAREK Department of Geoinformatics, Faculty.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Normal Distribution Introduction. Probability Density Functions.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
Compare data from two time periods Descriptive statistics My Law Minitab software Prepared by: Mark J. Nigrini Copyright © 2012 by Mark J. Nigrini. All.
STATISTICS “CALCULATING DESCRIPTIVE STATISTICS –Measures of Dispersion” 4.0 Measures of Dispersion.
Chapter 3 Descriptive Measures Measures Central Tendency MeanMedianModeDispersionRange Variance & Standard Deviation Measures Central Tendency MeanMedianModeDispersionRange.
UK Trade in Goods Statistics – A QIF project Rafael Mastrangelo (HMRC) Jonathan Digby-North (ONS)
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
Using DataData. Why do we need to deal with data ? In the context of what we do in Qatar the answer could perhaps best be ‘To sumarise and present large.
Part II  igma Freud & Descriptive Statistics Chapter 2 Means to an End: Computing and Understanding Averages.
Working paper number WLTP-DHC Comparison of different European databases with respect to road category and time periods (on peak, off peak, weekend)
Statistics 1: Introduction to Probability and Statistics Section 3-2.
Preparing Data for Quantitative Analysis Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
1 Which came first – terminology or models? Contributed Paper No. 28 Miroslava Brchanova.
© aSup-2007 THE DISTRIBUTION OF SAMPLE MEANS   1 Chapter 7 THE DISTRIBUTION OF SAMPLE MEANS.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Unifix cubes activity  Get into groups of 6-10 people.  Each person in the group grab a handful of unifix cubes. It can be a large handful or a small.
QUERY CONSTRUCTION CS1100: Data, Databases, and Queries CS1100Microsoft Access1.
Overview of the GEMS Food programme to support chemical risk assessment Dr Philippe Verger.
Sampling Distributions Chapter 9 Central Limit Theorem.
Workshop on Post Clearance USING OF IT SYSTEMS – EXAMPLE OF USE FOR RISK MANAGEMENT CAIRO 12 – 14 April 2010.
Establishing baselines Detecting a Trend What to do following a Trend How to re-baseline Life Cycle of a Trend.
PROVIDING INTERNATIONAL COMPARABILITY OF POVERTY ASSESSMENTS
Confidence Interval Estimation
SCC Spend Visibility (SpendVis) Training
Review Class test scores have the following statistics:
Chapter 12 Using Descriptive Analysis, Performing
M & M color Analysis By Ellen Zimmer.
Module 8 Statistical Reasoning in Everyday Life
SSI Toolbox Status Workbook Overview
Problem DC 10-2, Page 547 What is K? The confidence factor
Week 6 Statistics for comparisons
Estimation techniques for missing intra-EU trade
Event Name Here 2/16/2019 CDS Tariff Overview.
CHAPTER 15 SUMMARY Chapter Specifics
Statistics 1: Introduction to Probability and Statistics
UIG Task Force Progress Report
Sampling results 5 (10%) 74% 10 (20%) 25 (50%) 45 (90%) Sample Size
Statistics for Managers Using Microsoft® Excel 5th Edition
Section 12.2 Comparing Two Proportions
Hanford Performance Indicator Forum
Data Warehousing Data Mining Privacy
Prof. Dimitar Hadjinikolov, DSc Asst. Vassil Gechev, PhD
Ethnic Data on Children
Unscheduled Care Analysis
NUTS amendment
Data processing German foreign trade statistics
Joins and other advanced Queries
Quality Assessment The goal of laboratory analysis is to provide the accurate, reliable and timeliness result Quality assurance The overall program that.
The Life Cycle of a Trend Savannah River Nuclear Solutions, LLC
Visual selection: Neurons that make up their minds
Unscheduled Care Analysis
Backtesting.
Compliance for statistics
Presentation transcript:

Visualization and analysis of clusters in large populations of fraud cases.

The fraud: Textiles, mainly from Asian countries are declared with as low as one tenth of their real value -> Evasion of ad valorem duties: Declared value: 0.2$ x 30,000 Units = 6,000$ x 12% = 720$ Actual Value: 2$ x 30,000 Units = 60,000$ x 12% = 7,200$ During the investigation, sources speak of as many as 40,000 containers (scanned orig. invoice)

The task: Identify all potential cases of undervaluation in the 27 EU Member States using the declared average unit prices; Create an overview of clusters of cases in order to identify shifts between EU Member States; evaluate the effectiveness of counter measures taken by Member States; identify any organized behaviour across Member States.

The problem: Several statistical methods could identify these cases as outliers (e.g. t-test), but: Fraud appears to be so widespread that it biased the declared average unit prices over time Example: Unit Prices for imported T-Shirts, by 27 EU-MS Workbook: TableauPresentation.tbw / Worksheet: Unit-Prices EU27

Comparison between Unit Prices declared during export in China and import in Czech Republic Solution: Chinese export unit prices seem to be reliable Workbook: TableauPresentation.tbw / Worksheet: Comparison Unit Prices CN MS05

Description of the data set: Contains import declaration data for four chapters with textile goods of the Combined Nomenclature: 61, 62, 63 and 64 Date aggregation level: Day Starting from Jan 2007, last update Mai 2009 More than 6M records Possibility for daily updates (monitoring function) Contains: Product Codes, Customs Procedures, Member State, Third Country, Volume, Statistical Value and Average Unit Value

Analytical approach: Period Cust. Proc. Product CodeOriginDestinationVolume Statistical Value Average Unit Price Chinese Export avg minimum Value Difference in PCT 31/01/ CNAT7968,783832,410,3617,532,05 The ‘lowest acceptable unit value’ is calculated: average minimum values minus standard deviation over time (as declared at Chinese export) Differences between this value and the declared unit value are calculated

Visualization: PERIOD set to ‘All Values’ Shape of the marks set to Country of Origin Colour of the marks represents percentage of import values to export values: Red to pale brown -> Below 100% Pale brown to green -> Above 100% Period Product Codes Workbook: TableauPresentation.tbw / Worksheet: Sample Import Database

Period Product Codes Visualization: Filter on DIFF_PCT (Difference in %) set < 100% Workbook: TableauPresentation.tbw / Worksheet: Sample Import Database Filtered

Visualization: Overview all Member States Question 1: Can we identify clusters of cases?

Question 2: Can we see shifts between Member States? Event: Member States were informed regarding fraud cases Event: Mission to CZ, risk profile is introduced Event: Risk profile in DE is adapted Workbook: TableauPresentation.tbw / Worksheet: Shift MS05 – MS06

Question 2: Can we see shifts between Member States? Workbook: TableauPresentation.tbw / Worksheet: Product611211

Question 3: Can we identify an organized behaviour? Overview of all Member States combined, by Product Code Workbook: TableauPresentation.tbw / Worksheet: Overview Total

Other Findings: Differences between Member States regarding the distribution of unit values per volume -> Effect of differently designed risk profiles Workbook: TableauPresentation.tbw / Worksheet: Price per MS all Products