Presentation on theme: "1 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,"— Presentation transcript:
1 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Data Sciences at Sandia National Laboratories Sept, 2014 Steven Castillo, Ph.D. ISR Systems Engineering and Decision Support Presentation to University of Illinois at Urbana-Champaign Approved for UUR: SAND2014-17938 PE
2 Motivation Slide Mission Context The analysis of data is a common point of success or failure We see exponential growth in the size, complexity and information density of data far exceeding available human analysis capabilities. This research challenge intersects most Sandia Mission Areas. Why Sandia? Envision, create the future Pathfinder, data-centric systems for high-consequence decision making: Sandia delivers sensors into several mission areas. Sandia has access to relevant data sets and analysts. We live the full data science cycle at the cutting edge as part of executing our broad national security missions. 4 1818 2 2424 3737 9 2 24 4 39
3 Contributors to current limitations: Only a small fraction of the data is ever examined by analysts. Workflows are labor intensive and devoid of effective computational tools. Systems do not exploit the relationship discovery potential of the data or identify meaningful, defensible trends and patterns.
4 4 Intelligence Information Data Signal Physics Increasing Value Decreasing Volume Analyst (2014) Analyst (2017) Sensors Increasing density Increasing data rates Increasing information density
5 Focus: High Consequence Decision Making Crosscutting Capabilities: Fundamental mathematics and science transformed into advanced mission capability Sandia World Class S&T: Research Foundations Supporting National Security Missions: Nuclear Weapons, Defense & Intelligence Human Analyst Centric – Transforming Information into Actionable Intelligence Discovery and Disambiguation Patterns of Life: anomaly detection Patterns of Life: predictive analytics Robust Data Analysis: incomplete, vulnerable and uncertain data Intelligent Data Collection: tasking of sensors, optimal data sets
6 Research Portfolio – An Investment in Next Generation Analytics for the Nation Current portfolio PANTHER Grand Challenge – Pixels to Intelligence: Pattern Analytics Counter Adversarial Data Analytics - internal research & development Adversarial Modeling - internal research & development DARPA sponsored graph modeling Customer sponsored streaming/graph algorithms Customer funded cyber defense Portfolio focus on innovative R&D Foster high-risk projects Promote R&D impact on mission stakeholders
7 Contrast National Security to Google GoogleNS Data to Decisions Data GatheringHighly Cooperative/dense, homogeneous (Android, Chrome, Google Searches, etc.) Adversarial Environments/sparse/diver se (can’t see everything all the time) Data Analytics Environment Large, homogeneous computing environment (cloud) Diverse architectures, available throughput, bandwidth Human AnalysisConsumer performing “buy” decisions Analyst extracting actionable intelligence Decision ConsequencesSmall – millions of consumers making decisions related to shopping High – tactical and strategic, UQ and validation are critical
8 Grand Challenges – Internal R&D Laboratory Directed Research & Development (LDRD) Sandia initiates 1-2 GC LDRDs/ yr Cross laboratory, interdisciplinary High risk, urgent needs, low TRL (technology readiness level) Three year effort Create a Lasting National Technical Capability Required to assemble highly engaged external advisory board Cooperative development with partners to shape technical vision
9 PANTHER Grand Challenge Pattern Analytics to Support High Performance Exploitation and Reasoning: Crosscutting Innovation in Geospatial-Temporal Analysis PANTHER R&D: Scalable, temporal and geospatial relationship discovery for robust, mission- relevant pattern analysis. How is PANTHER different? Develop new mathematical approaches to this problem with a deep understanding of human capabilities and information landscape. Challenge Problem Categories: Signature Search – enable searches for signatures that are difficult to discretize and subject to interruptions in space and time. Motion & Trajectory Analysis –discover patterns in motion datasets at multiple semantic scales under conditions of intermittent data.
10 Panther Team - Interdisciplinary PI: Kristina Czuchlewski, PhD PM: Bill Hart, PhD Team Leads: Jim Chow, Ph.D., Randy Brost, Ph.D., Laura McNamara, Ph.D. and David Stracuzzi, Ph.D.
11 R&D Challenge Questions Where are chemical processing plants? Where are active businesses? Did someone arrive in a car and enter a building? Which one(s)? Which aircraft flights are point-to-point? Are any aircraft flying search patterns? Are any aircraft flying search patterns over sensitive locations? Is there an activity surge? Is an aircraft flight pattern departing from normal? What might it do next? What can we infer given limited data? For any of the above, what is the result confidence? Signature Search Temporal Patterns Limited Data Uncertainty/Quality Trajectory Analysis
12 Current Paradigm New Paradigm State of the ArtSandia R&DPANTHER Grand Challenge ApproachSingle frame or target recognition Statistical, pixel- level processing for specific tactical missions Identify patterns/ relationships and activities over multiple target classes and extend to strategic missions. Achieve 10 7 reduction in data volume in space & time. SpatialSingle frameSpecific areas (10s of km 2 ) Achieve 10 4 single frame reduction in data presented. Query and exploit 100s of km 2 TemporalMinutes of data, real-time and offline. Weeks of information, available in minutes Pattern based query of up to 1 year’s worth of data. Available in minutes. ExampleATR, CCD, matched filters Statistical Normalized Coherence Object-based image analysis: features, entities, patterns and relationships. Ask questions that cannot currently be answered. Will it work? Limited, does not scale. Limited, scales for specific missions. Unproven. Scalable & extensible to other domains.
13 FY14 Highlights Unsupervised computational methods for detecting outliers (with no a priori knowledge) within a TB-scale database… in under 30 minutes. New geometric feature vector enables comparisons. Result: 44 seconds on our Netezza DB machine to compute a trajectory segmentation from points. Result: 20 minutes of wall-clock time to turn 1.2 billion points into 15 million trajectories. Geospatial feature extraction and classification for signature relationship and temporal trending searches. Pixel statistics extracted via new superpixel segmentation algorithm. Unique temporal attributes of coherent changes exploited for labeling.
14 FY14 Highlights, cont. New, efficient graph algorithm formulation developed to enable geospatial-temporal topological search complexity. Multi-source data search under one framework Representation and search under heterogeneous temporal conditions (ephemeral and activity). Complex sensor feature data “compressed” for analysis with pointers back to original sensor source. Intermittency/ interruption nodes implemented. Eye-tracking experiments are enabling visual cognition/ search models – and eventual user interface design(s). Pattern match quality, statistical and probabilistic approaches are under investigation for characterizing uncertainty in geospatial temporal graph representations. Bright Roof Shadow
15 GeoSpatial Semantic Graph Representations of Features & Activity Graph includes activity: @ t=1, the graph includes objects with location From t=1 to t=2, the graph encodes change Nodes for activity events. 1 Node attributes include time observed. No persistence expected. Spatial and temporal relationship edges. 1
16 Signature Search A signature (over space/ time) encodes a desired question. For example, “Where are buildings with nearby grass, pavement, and dirt? Graph template: Building Grass Pavement Dirt
17 Matches Graph search finds all matches to signature template. In this case all red nodes with adjacent green, grey, and tan nodes. New approach to graph search, polynomial time complexity Searches are saved
18 Advantages Efficient representations in time (only store change). Relationship, change, and temporal analysis over multiple times and heterogeneous spatial ensembles in the same query. Change detection Activity characterization Efficient search algorithms – computation not limited by brittle relational databases. Potential to take full advantage of graph topology search advances, enhanced by geospatial-temporal semantics. Feature-based analysis Multi-modality, in a single search representation. Sensor agnostic – PANTHER emphasizes SAR.
19 Extracting Statistical Distributions Using Superpixel Segmentation Superpixel Segmentation Divides image into compact regions containing pixels similar in spatial proximity and intensity Derives from large body of research in optical image processing community extended to SAR imagery PANTHER Approach Superpixel segmentation algorithms enable high quality segmentations and efficient execution De-noising advances enable utilization of novel image processing techniques. Superpixel Segmentation of SAR Image
20 Initial Automated Static Feature Extraction Result Exploit ~21 days worth of change via coherence Result: automatically extract paved roads
21 Did someone arrive by car and visit a building? Query algorithms must handle disconnections & interruptions Static and ephemeral features for time slice A:* Note: Based on hand-annotated primitive features. * Note: Hand-edited for explanation clarity. Data semantics: Building Fence Dirt Road Gravel Desert Shadow Low Vegetation Human Tracks Vehicle Tracks Vehicle B F DR G D S LV HT VT V
22 Example: Connected Signature Result from star-graph algorithm:* Correct Why wasn’t this found? Note: Based on hand-annotated primitive features. * Note: Hand-edited for explanation clarity.
23 Result of Disconnected Signature Search Result from star-graph algorithm:* Correct Note: Based on hand-annotated primitive features. * Note: Hand-edited for explanation clarity.
24 Diversity of Problems Power Plant Search Tank Complex Search Site Activity Analysis Construction Analysis Activity Analysis -- Interrupted Signature All of these were solved by the same code.
25 Discovery of Flight Patterns Collections of geometric descriptions can describe a trajectory. Extensions: impact of time. Forgot Something Avoid Holding Pattern Mapping
26 Big Feature Space Advantage Clustering, leading to unsupervised learning techniques Previous examples showed searching the space for a specific pattern or a specific volume of the feature space But, with clustering, the computer can group the different patterns in the feature space without knowing a priori what they are. Perhaps most importantly, many clustering algorithms specifically identify outliers in the feature space that correspond to odd behaviors Did not specify “find this,” only told routine to “make groups of similar flights.” This was one of many clusters that had distinctive shapes Forgot Something, Revisited
27 Discovery of Odd flights Clustering done based on geometric features Many clusters found, but what remains is… Represents approximately 700 out of a total of 50,000 flights from one day Note: we have ~5M points/day, ~1GB/day, currently >300GB
28 Better knowledge & deeper insights from Big Data in minutes, not months; over months, not hours or minutes; covering hundreds, not 10s of km 2 Better knowledge & deeper insights from Big Data in minutes, not months; over months, not hours or minutes; covering hundreds, not 10s of km 2
29 Technical Areas for Collaboration Graph Analytics Tensor Analysis Computational Geometry Digital Signal and Image Processing Machine Learning Scalable and Highly Distributed Architectures Human Performance and Cognitive Understanding Machine Feature Identification in Imagery (EO, SAR, LIDAR) Uncertainty Quantification and Propagation – from Sensor to Answer Robust Data Analysis Intelligent Data Collection
30 University Collaborations PhD Interns: Colorado State University, Utah State University Faculty have clearances and clear expertise in critical areas Students can gain clearances Year-round appointments give flexibility for Sandia work, university requirements Win-win-win: Sandia gains valuable technical support in critical challenge areas, Student gets a degree/publications/potential future employment, University and professor fulfill mission and gain a strong supporter. Critical Skills Master’s Program (CSMP): University of Illinois Similar to old One-Year-on-Campus-Program NDA’s: USU, CSU, UIUC (in progress)