Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri.

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Section B: Exam Questions
Introduction Section An Overview. Focusing a Topic.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Model of factors at play in the perpetration of violence Introducing an interactive model for understanding violence against women, violence against children.
Dynamic Bayesian Networks (DBNs)
Chapter 30 CRC Press: Forensic Science, James and Nordby, 2nd Edition 1# Forensic Science An Introduction to Scientific and Investigative Techniques Stuart.
Chapter 30 CRC Press: Forensic Science, James and Nordby, 2nd Edition1 Chapter 30 Serial Offenders: Linking Cases by Modus Operandi and Signature.
Unit 2: SMELLS Molecular Structure and Properties
Crime Section, Central Statistics Office..  The Crime Section would like to acknowledge the assistance provided by the Probation Service in this project.
Designing a Continuum of Learning to Assess Mathematical Practice NCSM April, 2011.
Copyright © 2015, 2011, 2008 Pearson Education, Inc. Chapter 5, Unit E, Slide 1 Statistical Reasoning 5.
APPLICATIONS OF DIFFERENTIATION 4. So far, we have been concerned with some particular aspects of curve sketching:  Domain, range, and symmetry (Chapter.
Ian Oldfield, Metropolitan Police Service (New Scotland Yard), Interactive Offender Profiling System (IOPS) Ian Oldfield – Metropolitan Police Service.
COMP 3009 Introduction to AI Dr Eleni Mangina
Competitor Analysis.
Police Technology Chapter Twelve
Single Motif Charles Yan Spring Single Motif.
Creative Writing Revision
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
SYSTEM ANALYSIS AND DESIGN
Chapter 13: Inference in Regression
Understanding Data Analytics and Data Mining Introduction.
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Person-Specific Domain Adaptation with Applications to Heterogeneous Face Recognition (HFR) Presenter: Yao-Hung Tsai Dept. of Electrical Engineering, NTU.
1 Book Cover Here Chapter 20 BURGLARY Criminal Investigation: A Method for Reconstructing the Past, 7 th Edition Copyright © 2014, Elsevier Inc. All Rights.
Chapter 2 Modeling and Finding Abnormal Nodes. How to define abnormal nodes ? One plausible answer is : –A node is abnormal if there are no or very few.
1 1 Slide Introduction to Data Mining and Business Intelligence.
PREDICTING THE 2013 SAINT LOUIS CITY HOMICIDE RATE SPENCER SCHNEIDENBACH SHAILESH LANJEWAR XUN ZHOU BEN HOLTMAN.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Machine Learning CSE 681 CH2 - Supervised Learning.
Chapter 2 Research in Abnormal Psychology. Slide 2 Research in Abnormal Psychology  Clinical researchers face certain challenges that make their investigations.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Cluster Management Scorecard FITT (Fostering Interregional Exchange in ICT Technology Transfer)
Foundation of Forensics Modus Operandi Corpus Delicti.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
How to observe hand hygiene practices among health-care workers Observers should carefully read the “Hand Hygiene Technical Reference Manual” before undergoing.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Examining Subjective Understanding of Participants and Outside Observers’ in Adolescent Romantic Couples’ Interactions Joseph W. Dickson 1 Melinda S. Harper.
Abcd Pricing a product for the first time Jo Scott 19th May 2003 Warwick University.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Phenomena in Problem Solving Insight Problems Functional Fixedness Using external representations.
FBI Method of Profiling Violent Serial Offenders
Criminal Profiling. Psychological Profiling Involves investigating the offender’s BehaviorMotivesBackground As a guide for what direction the Investigation.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
THE SCIENTIFIC METHOD. THE SCIENTIFIC METHOD: is a process used to find answers to questions about the world around us is an organized series of steps.
CONFERENCE EVALUATION DATA ANALYSIS. DATA ANALYSIS  A credible amount of data has to be collected to allow for a substantial analysis  Information collected.
Robust Planning using Constraint Satisfaction Techniques Daniel Buettner and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
How to structure good history writing Always put an introduction which explains what you are going to talk about. Always put a conclusion which summarises.
Psychlotron.org.uk What does this crime scene tell you about the offender responsible?
Lesson Overview Lesson Overview What Is Science?.
Discuss how researchers analyze data obtained in observational research.
CJ 425 Crime Mapping Unit 6 Seminar “Patterns”. Outline Repeat Incidents Tactical Analysis – Definition – Information Used 7 types of Patterns Inductive/Deductive.
Machine Learning BY UZMA TUFAIL MCS : section (E) ROLL NO: /31/2016.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.
Intelligent and Adaptive Systems Research Group A Novel Method of Estimating the Number of Clusters in a Dataset Reza Zafarani and Ali A. Ghorbani Faculty.
Preparing to teach OCR GCSE (9-1) Geography B (Geography for Enquiring Minds) Planning, constructing and introducing your new course.
The Bottom-Up Approach to Offender Profiling
9. Introduction to signal detection
Introduction to science
The scientific Method.
Style You need to demonstrate knowledge and understanding beyond undergraduate level and should also reach a level of scope and depth beyond that taught.
Homework Due: Friday 23rd March
Presentation transcript:

Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction One of the most important problems in crime analysis is that of crime series detection, or the detection of a set of crimes committed by the same individual or group. Criminals follow a modus operandi (M.O. 作案手法或模式 ) that characterizes their crime series; for instance, some criminals operate exclusively during the day, others work at night, some criminals target apartments for housebreaks, while others target single family houses.

Introduction Crime analysts need to identify these crime series within police databases at the same time as they identify the M.O. Currently, analysts identify crime series by hand: They manually search through the data using database queries trying to locate patterns, which can be very challenging and time-consuming.

Introduction From a computational perspective, crime series detection is a very difficult problem: It is a clustering problem with cluster-specific feature selection, where the set of features for the cluster is the M.O. of the criminal(s). One cannot know the M.O. of the series without determining which crimes were involved, and one cannot locate the set of crimes without knowing the M.O.—the M.O. and the set of crimes need to be determined simultaneously.

Introduction The problem of crime series detection is much more data intensive and computationally harder than hotspot prediction (when and where crimes will occur). To determine the M.O., we require very fine-grained detailed information about past crimes: where the offender(s) entered (front door, window, etc.); how they entered (pried doors 撬開門, forced doors 破門, unlocked doors ( 未上鎖的門 ), pushed in air conditioner, etc.), whether they ransacked the premise 洗劫房屋, the type of premise, etc.

Introduction This is high dimensional structured data, leading to a computationally challenging data mining problem of finding all crimes that are similar to each other in several ways.

Introduction The main hypothesis in this work is that most crime patterns have a core of crimes that exemplify the M.O. of the series. A core might be approximately three or four crimes that are very similar to each other in many different ways. According to our hypothesis that each crime series has a core, if we can locate all small cores of crime within the dataset, we have thus located pieces of most crime series.

Introduction Though we cannot determine the M.O. of a series before we see it, we can characterize parts of the M.O.s we expect generally—for instance, crimes in a series are often close in time and space. We define a pattern-general similarity ( 犯罪的一般模式相似度 ) that encodes factors that are generally common to most crime series. It is learned from past crime series; for instance, proximity in time and space highly contribute to the pattern-general similarity.

Introduction The pattern-general similarity induces a similarity graph over the set of crimes, where there is an edge between two crimes when they are similar according to the pattern-general similarity.

Introduction Crimes in a series also have pattern-specific similarity ( 犯罪的特定模 式相似度 ), where crimes are similar to each other if they all share the same M.O. The pattern-specific similarity and pattern-general similarity are used for detecting cores: A core must have both sufficiently large pattern- general similarity and pattern-specific similarity. Once the cores are found, we construct the full crime series by merging overlapping cores.

Introduction Our method is a general subspace clustering method that can be used beyond crime series, and for other completely different application domains. It is novel in that it considers both pattern-general and pattern- specific aspects of patterns, where ‘‘pattern-general’’ means that it is common to many crime series, and ‘‘pattern-specific’’ meaning aspects of a particular crime series.

Introduction Our method also can find subspace clusters whose subspace morphs dynamically over the cluster; an M.O. can change when an offender becomes more sophisticated, adapts his M.O. to avoid detection, or when his preferred method is not available (e.g., he prefers entry through unlocked rear doors but will push in the air conditioner if his preferred means of entry is not available).

Introduction Our three methodology sections follow the three main components of our method: learn a similarity graph, mine cores, and merge cores. We then show experiments where our method was tested on the full housebreak ( 入侵家戶 ) database from the Cambridge Police Department containing detailed information from thousands of crimes from over a decade. This method has been able to provide new insights into true patterns of crime committed in Cambridge.

Introduction One observation that has been revealed in our experiments is that computers do not have the same biases that humans do. Computers search through the database in a different way than a human would, and thus can work symbiotically ( 共生地 ) with analysts, showing them avenues to consider where they would not have normally ventured.

FIG. 7. The locations of crimes in the first series.

Details of a one crime series with 10 crimes. d is the number of defining features, where the defines features characterize the M.O. of the crime series.

Crimes 1 to 5 are well connected as a subset, and crimes 6 to 10 are well connected as another subset. From only the similarity graph, the two subsets do not seem very related except for a single edge between crimes {5, 6} connecting them; however, this is only the pattern-general part of the story.

We discover cores of size 3 with at least 6 defining features. Table 3 lists the cores and their defining features in this series, where a check mark means the feature is a defining feature and a circle means it is not. These cores show how the crimes are similar to each other in a pattern-specific way.

Crime Series #1 The defining feature set was chosen to include six features, which are geographic location, days apart, location of entry, the ransacked indicator, time of day, and day of the week. As these data were reconsidered by crime analysts, we found out that when these crimes were analyzed back in 2006–2007, they were viewed as two unrelated patterns, one at the end of 2006, crimes 1 to 5, and one at the beginning of 2007, crimes 6 to 10. The connection between these two subsets of crime is very subtle, and there is over a month gap between the two patterns, so it did not occur to the crime analysts to link them.

Crime Series #1 Their intuition agrees completely with the similarity graph, as the two subsets are weakly connected only by one edge; however, recall that this only describes the pattern-general similarity–what one would expect from generic pattern without considering a specific M.O. On examination of the cores, not only are they correlated in six features, but five of the cores (core indices 11, 12, 14, 15, 16) contain crimes from both of the subsets, which is strong evidence that the two subsets should be merged together.

Conclusion Our automated series detection methods are able to detect crime patterns within big data sets where a human could not. Detecting patterns among crime data is critical to effective policing. When a pattern is identified, police can implement effective strategies to prevent future crimes, solve past crimes, identify and capture offenders, and ensure offenders are investigated and prosecuted fully for the crimes for which they are responsible.