Download presentation
Presentation is loading. Please wait.
1
Community-based Security Informatics Research: The COPLINK Experience
Acknowledgement: NSF, CIA/ITIC, DHS, NIJ/DOJ, NLM/NIH, COPS, TPD, PPD, KCC Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, University of Arizona
2
Outline COPLINK Background and Research Framework
COPLINK Connect and Detect: Community-based Research COPLINK STV, Agent, and Deception Detection Research COPLINK Visual Criminal Network Analysis Research From COPLINK to BorderSafe and Terrorism Research
3
Outline COPLINK Background and Research Framework
4
Introduction The concern about national security has increased significantly since the terrorist attack on September 11, 2001 Intelligence agencies such as the CIA and FBI are actively collecting and analyzing information to investigate terrorists’ activities Local law enforcement agencies have also become more alert to criminal activities in their own jurisdictions that may be relevant to national security
5
COPLINK Progression 1990-present NSF CISE funding (IIS, Digital Government, Digital Library, NSDL, ITR, IDM, CSS), NLM/NIH (medical informatics), DARPA NIJ COPLINK funding; Web-enabled data warehousing for law enforcement NIJ AGILE interoperability funding; information sharing NSF Digital Government funding; data/text mining, agents, and knowledge management; COPLINK Center NSF/CIA KDD funding; intelligence community DHS BorderSafe funding; NSF/CIA disease informatics (bioterrorism) funding; NSF ITR funding, terrorism portal Goal: A model and testbed for law enforcement and national security research
6
Local Law Enforcement Level National Security Level
Crime Types Type Local Law Enforcement Level National Security Level Traffic Violations Driving under the influence (DUI), fatal or personal injury, property damage, traffic accident, road rage - Sex Crime Sexual offenses, sexual assault, child molesting Organized prostitution Theft Robbery, burglary, larceny, motor vehicle theft, stolen property Theft of national secrets or weapons Fraud Forgery and counterfeiting, frauds, embezzlement, identity deception Transnational money laundering, identity fraud, transnational financial fraud Arson Arson on buildings, apartments Gang / drug offenses Narcotic drug offenses (sales or possession) Transnational drug trafficking Violent Crime Criminal homicide, armed robbery, aggravated assault, other assaults Terrorism (bioterrorism, bombing, hijacking, etc.) Cyber Crime Internet frauds (e.g., credit card fraud, advance fee fraud, fraudulent Internet banking sites), illegal trading, network intrusion/hacking, virus spreading, netspionage, cyber-piracy, cyber-pornography, cyber-terrorism, theft of confidential information, hate crime Increasing public influence
7
The COPLNK Research Framework
8
Building the Science of
Intelligence and Security Informatics
9
Outline COPLINK Testbed: Data Characteristics Information Sharing and Interoperability
10
Tucson PD Data Sources TPD Record Management System: Stores a wide range of information from incident reports to warrants to pawn tickets, from person descriptions to vehicles to weapons and property items. Incident data goes back as early as 1983. Database: Litton PRC RMS31 on Oracle 7.3, Compaq OpenVMS TPD Mug Shot Database: Stores about 90,000 mug shots taken by the ID Department. Database: ImageWare on SQL Server 7.0, Windows NT 4.0 Server TPD Gang Database: Stores comprehensive information about 3,200 gang members: their activities, aliases, physical descriptions, vehicles, etc. Database: In House Access 97, Windows NT 4.0 Server
11
Tucson PD RMS Documents
Incident Reports: Report number, crime type, precinct, MOs, date and time. Pawn Tickets: Ticket number, data and time. Warrants: Warrant number, docket number, type and issue date. Field Interviews: FI number, type, precinct, date and time.
12
Tucson PD RMS Data Objects
Person: True names, aliases, descriptions, addresses, IDs, marks and phone numbers. Organization: Name, address and phones. Vehicle: VIN, license plate, make, model, style, year and colors. Property: Serial number, type, make, model, size and colors. Weapon: Serial number, type, manufacturer, caliber and colors.
13
COPLINK Database: Tucson PD
14
COPLINK Documentation
Sample COPLINK ERD, Entity Relationship Diagram
15
COPLINK Documentation
COPLINK Data Dictionary: 217 Tables, 1000 attributes
16
COPLINK Data Formats Delimited ASCII text files
SQL Server 2000 backup file SQL Server 2000 detached database Oracle 8i/9i dump file Oracle 8i/9i transportable tablespace DB2 UDB 7 backup file TPD data available: 10/1/2002, PPD data: 2/1/2003
17
Information Management Challenges: Tucson PD Data Across all Crime Types
Incident Reports: Report number, crime type, precinct, MOs, date and time. Pawn Tickets: Ticket number, data and time. Warrants: Warrant number, docket number, type and issue date. Field Interviews: FI number, type, precinct, date and time.
18
Information Management Challenges: Sample COPLINK Table
COPLINK Data Dictionary: 217 Tables, 1000 attributes
19
COPLINK Connect and Detect: Community-based Research
Outline COPLINK Connect and Detect: Community-based Research User-centered Design, Information Sharing, Information Retrieval, HCI, and Association Rule Mining
20
COPLINK Connect: Information Sharing
Consolidating & sharing information promotes problem solving and collaboration Records Management Systems (RMS) Gang Database Mugshots Database
21
COPLINK Connect Functionality
Generic, common XML based criminal elements representation Data migration (batch and incremental) and mapping for all major databases and legacy systems Database independent: ODBC compliance data warehouse Multi-layered Web-based architecture: database server, Web server, browser Powerful and flexible search tools for various reports, e.g., incidents, warrants, pawns, etc. Graphical browser-based GUI interface for ease of use, training and maintenance H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003.
22
COPLINK Detect: Crime Analysis
Consolidated information enables targeted problem solving via powerful investigative criminal association analysis
23
COPLINK Detect Functionality
Simple association rule mining applied to criminal elements relationships Generic, common XML based representation for criminal relationships Incremental data migration and association analysis on databases Support powerful, multi-attribute queries using partial crime information Graphical browser-based GUI interface for simple crime relationship analysis and case retrieval H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.
24
COPLINK Detect 2.0/2.5
25
COPLINK Connect/Detect Deployment
Tucson, Phoenix (Arizona) Huntsville (Texas) Montgomery County (Maryland) Polk County/Des Moines (Iowa) Ann Arbor (Michigan) Boston (Massachusetts) Redmond (Washington) Henderson County (North Carolina) Shawnee County (Kansas) San Diego (CA) Pima County, Arizona DHS (Arizona) State of Alaska, Los Angeles (CA) Serving 20+ states, 300+ agencies, protecting 30M+ citizens
26
COPLINK STV, Deception Detection and Agent Research
Outline COPLINK STV, Deception Detection and Agent Research Visualization, HCI, Agent, Data Mining
27
COPLINK Spatial-Temporal Visualization: Timeline Tool
Visualizes the chronologically ordered set of events associated with user-selected database entities Events placed along horizontal axis Entities placed along vertical axis Entities can be grouped together Each row contains all events associated with the entities in a group Time-based Zooming User can zoom into a specific time interval for more detail, while hiding uninteresting portions of the timeline
28
COPLINK Spatial-Temporal Visualization: GeoMapping Tool
Plots location of incident events within a selected time interval Zooming/panning capabilities User-selectable GIS layers Overview map Provides context to the currently selected region Plot events over time Plot events as they occur, use different color shadings to indicate when it occurred relative to other events Plot events as they occur and remove them after they are over, using directed arrows to highlight movement from one event to the next in time
29
COPLINK Spatial-Temporal Visualization: Periodic Pattern Tool
Reveals periodic patterns of incident occurrence Incident events will be plotted continuously on a circular graph Time period represented along circle (day, week, month, etc.) Height from center indicates number of incidents that occurred at that specific time Customizable granularity (e.g. year, month, day, etc.) 3-sigma statistical significance line Indicates unusually large or small number of occurrences at a specific time
30
COPLINK Data Mining Research
Deception Detection, a data mining approach “An agent must spell a suspect’s name exactly right, or the FBI computer will not recognize it. That can be particularly frustrating in cases such as the Sept. 11 probe, in which suspects have used multiple names and sometimes created identities by switching a few letters in their names.” – FBI FBI’s problem with 9/11 suspect names, e.g., “Majed M.GH Moqed,” “Majed Moqed,” and “Majed Mashaan Moqed,” and DOB, e.g., “ ” and “ ” A deception taxonomy was created based on criminal deceptions in law enforcement databases Patterns existed in criminal deceptions, e.g., SSN variations, name variations, etc. Phonetic and syntactic string comparators are adopted Promising initial testing result: 94% accuracy in deception detection G. Wang, H. Chen, H. Atabakhsh, “Automatically Detecting Deceptive Criminal Identities,” Communications of the ACM, forthcoming, 2002.
31
A Taxonomy for Deceptions in Criminal Identity
32
A Taxonomy of Deceptions in Criminal Identity: Name Deception
Either false first name or false last name (62.5%) Only the middle initial is changed (62.5%) Similar pronunciation but different spelling (42%) A Completely false name (29.2%) Using abbreviated names or adding extra letters (29.2%) Leaving out the first name or last name (29.2%) Exchanging last name and first name (8%)
33
A Taxonomy of Deceptions in Criminal Identity: DOB, SSN, Residency
DOB and ID (SSN) deception: In most cases, criminals only make minor changes in DOB and SSN, e.g., Residency deception: 42% criminals in the collection deceived on address information. In most cases, only one portion of the address is changed slightly, e.g., street number.
34
String Comparators Phonetic Russell SoundEx code: Newcombe [1959], encodes a name with a format having a prefix letter followed by a three-digit number, e.g., PEARCE and PIERCE both coded as: “P620”. However, phonetic matching is particularly poor at finding matches [Zobel and Dart 1996]; Spelling string comparator [Jaro 1976; Winkler 1990]. compares spelling variations between two strings instead of phonetic codes Limitation: common characters in both strings must be within half the length of the shorter string
35
Other Approximate String Matching tool
Agrep [Wu, Manber 1992]: A general string matching algorithm that can handle character variations of insertion, deletion, and substitution. The pattern is represented as a bit array. The computation only involves simple bit operations (RightShift) and logic operations (AND, OR) on bit arrays. Rdj+1=Rshift[Rdj] AND Sc OR Rshift[Rd-1j OR Rd-1j+1] OR Rd-1j Agrep has been integrated into Unix and been in wide use since June 1991
36
Algorithm Design Compare corresponding fields of each pair of records (disagreement): Sname, SDOB, Saddr, and SID To capture different types of name deceptions, Calculate the Normalized Euclidean Distance for the overall dis-similarity between two records, i.e., Disagreement =
37
Experimental Results (Training: 80 cases)
Table: Distance matrix, the distance value shows the degree of disagreement between each pair of records in the training data set.
38
Experimental Results (Training: 80 cases)
Table: Determining best threshold value (0.48)
39
Experimental Results (Testing: 40 cases)
Table: Accuracy of deception detection when the best threshold value (0.48) is applied to the testing data set (40 records)
40
COPLINK Agent Research
COPLINK Agent: alert and collaboration in a wireless architecture Enhance police information timeliness, collaboration, mobility, and safety via a web-based wireless alerting system (under testing at TPD) Real-time alert of time-critical information from multiple databases, e.g., CAD (computer-aided dispatching) database, MVD Identify and inform officers/detectives who are working on similar cases Push time-critical information via wireless and personalized communications, i.e., web alert, , cell phone, and pager
41
COPLINK Agent: Wireless Alert and Collaboration
Allows Patrol Officers to enhance their community expertise Further promotes Officer safety through curbside knowledge Secure wireless access and alert: laptop, PDA, pager, cell phone Alert: 24-7 monitoring of time-critical information from different databases Collaboration: Automatically informing detectives working on similar cases
42
COPLINK Agent: Vehicle Search Form
Multi-DB Search Notification setting Alert Method
43
COPLINK Agent: Web and E-mail Collaboration Alerts
Web Alert Alert
44
COPLINK Agent: Cell Phone and Pager Alert
Cell phone alert Pager alert with case number
45
Agent User Study and Result Summary
Study Design: Case study method based on structured interviews, archival records analysis, and usability survey. Use QUIS (Questionnaire for User Interaction Satisfaction) survey instrument developed by the HCI Lab at the U. of Maryland. 10 participants: crime analysts and detectives in several TPD units. Positive feedback on system Effectiveness and Efficiency: Monitoring: “… the information I have received back was instrumental in making at least 2 felony cases that will be prosecuted on the federal level.” Collaboration from CAD Alert: “… allowing us to respond to incidents we know are important that the field units perhaps don’t realize in a timely manner.” Multi-database Search: “The Tucson City Court Search was helpful because I located one of my suspects on her court date.” High User Satisfaction from QUIS survey items: Averaged 5.5 for 49 items on a 7-point Likert scale (7: most useful). Strengths: Offers good Investigative power; Easy to read layout; Potential for Collaborative information sharing; CAD Integration; High intention to use. Weaknesses: Lack of help messages; Difficult for inexperienced users; Obscure user preference settings.
46
Arizona Daily Star, Jan 7, 2001
47
New York Times, Nov 2, 2002
48
Newsweek, March 3, 2003
49
Interacting with the LE Community
User-centered design (2 officers assigned to project); frequent, focused, staged user studies (a user study team); quick prototyping and user feedback (quarterly) TPD user briefings: 30+ user groups and management demos/briefings (2 chiefs, 7 assistant chiefs) Arizona/regional partner briefings: 30+ regional partners demos/meetings; Phoenix, Pima, etc. Annual COPLINK Center research workshop, under NSF Digital Government Program National/regional NIJ/DOJ and LE meetings: 20+ LE IT meetings; International Association of Chiefs of Police (IACP) meetings Regional deployment and success: Arizona, TX, Iowa, Michigan, Boston, Alaska, CA, etc.
50
COPLINK Lessons Learned
Know their pain and build something they can use. What street cops need. Build trust and know the culture. security, policy, training, user acceptance (build a Living Lab) Early and consistent user involvement. 2 TPD officers, 7 asst. chiefs, 2 chiefs Create early and small successes. Detect/Connect, group to division and department Spread the success and solicit partners. Tucson, AZ, CA, TX, MA, Montgomery, MA, Alaska, etc. Understand funding agencies expectation. NIJ (tools), NSF (research) Development and research prioritization. research (Ph.D.) after development (MS/BS); little cutting-edge research in the first two years Establish deployment partners. KCC, diff(operational system,research prototype) = $2M Work with university technology transfer office. office of (preventing) technology transfer?
51
Outline COPLINK Visual Criminal Network Analysis (CNA) Research: BorderSafe, and Dark Web Terrorism Research
52
Research Approach Testbed and community grounded algorithm, toolkit and system research and development Advanced visual criminal network analysis and knowledge mapping research and technologies
53
BorderSafe: Research Objectives
Participate in DHS BorderSafe IFE Experiment, in partnership with CNRI, ARJIS (SD), SDSC, TPD, and AZ DHS Develop (1) border-crosser and border-crossing vehicle analysis techniques, by (2) leveraging local law enforcement and local DHS data, and (3) using COPLINK crime analysis abilities Advance visual criminal network analysis (CNA) and knowledge mapping research and technologies (e.g., terrorism, terrorist, terrorized)
54
First generation — manual approach
Current Capability: Criminal Network Analysis (LE and Intelligence Community) First generation — manual approach Anacapa Chart (Harper & Harris, 1975) Second generation — graphics-based approach Analyst’s Notebook, Netmap, Watson COPLINK hyperbolic tree view, network view Third generation — structural analysis approach
55
Anacapa Chart (1st generation)
Manually extract criminal associations from data files Construct an association matrix and draw a link chart based on the association matrix Association Matrix Link chart
56
Analyst’s Notebook, Netmap, Watson (2nd generation)
57
A 9/11 Terrorist Network: centrality, cliques, typology…
58
BorderSafe Visual Criminal Network Analysis (CNA) Design
J. Xu and H. Chen, “Criminal Network Analysis and Visualization: A Data Mining Perspective,” Communications of the ACM, 2004, forthcoming.
59
Visual CNA: Network Display
Nodes represent individual criminals labeled by their names Links represent relationships between criminals Adjust the slider to perform clustering and blockmodeling
60
Visual CNA: Subgroup Display
The reduced star structure found using blockmodeling Circles represent groups. The size of a circle is proportional to the number of group members. Each group is labeled by its leader’s name.
61
Visual CNA: Member Ranking
The rankings of each group member in terms of centrality measures The first one of each column is the leader, gatekeeper, and outlier, respectively The inner structure of a selected group Adjust the slider to do further blockmodeling
62
Meth World: Subgroup Verification
Subgroups detected have different characteristics: The subgroups found are consistent with the groups’ specializations or responsibilities in a network Offenders who were responsible for stealing, counterfeiting, and cashing checks and providing money to other groups to carry out drug transactions White gang members who were involved in assaults and murders White gang members who were involved in crack cocaine Drug dealers
63
Visual CNA: Network Structure
A chain structure found in a 60-member network using blockmodel analysis
64
Temporal CNA: The Evolution of Meth World
The network in Year: 1995, 1996, 1998, 1999, 2002
65
Cross-Jurisdictional CNA: The Extended Meth World (TPD & PPD)
Highlighted (red) nodes represent criminals who appear in both TPD and PPD databases Tucson Phoenix
66
Customs and Border Protection (CBP) Border Crossing Information
CBP has provided the Border Safe project with license plate numbers seen crossing the border. These can be integrated with local data to enhance the analysis. Video equipment automatically extracts license plate numbers of cars as they cross into or out of the country at the Douglas AZ port of entry. 1,125,155 Records: plate, state, date, time 226,207 Distinct vehicles 209 Days of information over an 18 month period 130,195 Plates issued in AZ 5,546 Plates issued in CA 90,466 Plates issued in Mexico
67
Border Crossing Records and TPD
Many of the vehicles found in the CBP data also show activity in the TPD database. The fact that a vehicle frequently crosses the border is of interest in criminal investigations. The TPD data provides a link between license plates and criminal activity networks. 8,300 Distinct vehicles appear in both datasets 34,632 Crossings recorded crossings involve those vehicles
68
A Vehicle to Watch? This network contains 5 border crossing plates (outlined in red). The large green dots were confirmed to be criminals of significant interest. Shape Indicates Object Type circles are people rectangles are vehicles Color Denotes Activity History Larger Size Indicates higher levels of activity Border Crossing Plates are outlined in Red Gang related Violent crimes Narcotics crimes Violent & Narcotics
69
A Vehicle to Watch? People / Vehicles previously
Plate ABC-123 - Crossed border 35 times. - No prior Narcotics associations “Jane” - Associated with Vehicle - No known Narcotics activity “Joe” - Related with Jane and vehicle in ‘Suspicious Activity’ report - Some prior narcotics activity “Bob” - Related with Joe in Narcotics - Involved in 11 narcotics incidents - Connected to a big narcotics network People / Vehicles previously never linked to narcotics can be identified using such Networks to focus and support investigations. Truncated version of previous network Name Removed
70
From COPLINK to BorderSafe to Terrorism Knowledge Portal
Terrorism: Identify key terrorism literature, resources, and experts (web portal, meta searching, citation network analysis, knowledge maps, expert finder) Terrorist: Understand how the terrorist groups are revealed on the web and how they use the web (Dark Web, web spidering and mining, back-link analysis, terrorist network analysis, multilingual entity and event extraction) Terrorized: Assist citizens and victims responding to terrorism (pattern-based chatbots, system assessment, victim consultation and resources, scalable anonymous robot assistance) (In collaboration with Drs. Reid, Sageman and Levine and Sandia National Lab)
71
Terrorist Group Web sites Government information
The Web Dark Web Hate Groups | Racial Supremacy | Suicidal Attackers | Activists / Extremists | Anti-Government | … Information Sources Terrorist Group Web sites Search Engines Terrorism databases Government information Personal Profile Search Collection Methods Meta Searching Downloading from Gov’t Web sites Automatic Spidering Back link search Filtering Data Storage International Terrorism Terrorism research information Domestic Terrorism
72
Sageman’s Global Salafi Jihad (GSJ) Data
Data collected and cross-validated from open sources regarding 172 GSJ members (Dr. Sageman, U. Penn) Background From upper or middle class (3/4) Average age is 26 Affiliation through friendship, kinship, discipleship, and worship Four clusters (based on geographical distribution): lieutenants and network structures Central Staff: Osama bin Laden Core Arabs: Khalid Sheikh Mohammed Maghreb Arabs: Zain al Abidin Mohd Hussein Indonesians: Abu Bakar Baasyir
73
Jihad CNA: “Combined” = “Link to GSJ” + “Operational” + “Family” (107 nodes)
Scale free network Cliques A clique Scale free network Osamar bin Larden Hierarchical network
74
Jihad CNA: 9/11 Hijackers in “Combined” Network Monitor New Open Sources
Mohemed Atta
75
Outline Developing the Science of Intelligence and Security Informatics (ISI)
76
Develop the Science of Intelligence and Security Informatics (ISI)
ISI: The study of the use and development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy based approach. similar to “Biomedical Informatics” (information centric) National Security is a long-term mission Need to develop long-term research agenda and partnership (researchers, practitioners, policy makers, industries, law enforcement and intelligence professionals, etc.) A bottom-up, success-driven approach From selected demonstration sites, to regional partnership, and then to national deployment. Build small successes first.
77
Building an ISI Community
Federal funding priority: Building community-based “Living Labs” Many disparate LE, intelligence, and industry meetings (vendor driven) Academic ISI special issues: JASIST, DSS, and ACM TOIT, forthcoming, 2004 IEEE Intelligence and Security Informatics Conference: Sponsored by NSF, NIJ, CIA, and DHS, 2003 (Tucson), 2004 (Tucson), 2005 (Atlanta), 2006 (San Diego), 2007 (NJ), 2008 (Taiwan)
78
For project information: http://ai. arizona. edu/COPLINK hchen@eller
For project information:
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.