Presentation is loading. Please wait.

Presentation is loading. Please wait.

Show suggestions and borderlines Hierarchical Clustering

Similar presentations


Presentation on theme: "Show suggestions and borderlines Hierarchical Clustering"— Presentation transcript:

1 Show suggestions and borderlines Hierarchical Clustering
Interactive Machine Learning at Scale with CHISSL Dustin Arendt1, Emily Grace2, and Svitlana Volkova2 1Visual Analytics Group, 2Data Science and Analytics Group Structured Input Boston Dataset1 Task: housing price (regression) 14 hand engineered features Visualized using radar glyphs The distribution is the median housing price CHISSL Computer-Human Interaction Semi-Supervised Learning What? CHISSL is an O(n) incremental transductive learning algorithm and user interface to: Rapidly organize unlabeled data into groups defined by example The model learns user intent and makes recommendations The model can be corrected in a feedback loop. The user can export the recommendations as labels to train the model How? Train inductive model Export transduction Show suggestions and borderlines 1-NN classifier User Provides Label Parent pointer array Hierarchical Clustering Representation Client Loop Server Text Input VAST Challenge 2014 text dataset2 Task: text exploration by selecting a keywords and coloring by topic The distribution is the date of publication Tip of the Iceberg Only a few carefully chosen representative instances are shown per group to avoid overwhelming the user. Drag and Drop Users train the model by dragging an instance to the group they feel it is most similar to. Responsive The model learns from the users example after each user interaction and re-predicts within milliseconds. User friendly Users can choose what to group, how many groups there are, and what groups mean. ABOUT Pacific Northwest National Laboratory The Pacific Northwest National Laboratory, located in southeastern Washington State, is a U.S. Department of Energy Office of Science laboratory that solves complex problems in energy, national security, and the environment, and advances scientific frontiers in the chemical, biological, materials, environmental, and computational sciences. The Laboratory employs nearly 5,000 staff members, has an annual budget in excess of $1 billion, and has been managed by Ohio-based Battelle since 1965. For more information on the science you see here, please contact: Dustin Arendt Pacific Northwest National Laboratory Richland, WA 99352 (509) Sequence Input VAST Challenge 2014 GPS dataset2 Task: patterns of life, an icon encoding the predominant activity, e.g., home, work, food, each hour for 1 day The distribution is the date the sequence occurred Dynamic Graph Input VAST Challenge Dataset2 Task: sub-network clustering Instances are day x ego-network pairs The features are the degree distribution of the sub-network Why? You can’t crowdsource every problem: Requires domain expertise Experts are rare and their time is valuable Classes are well-defined or known ahead of time Better than active learning because the user: Chooses what to label Can refine the task on the fly Can understand model performance in real-time Next? Evaluation against Active Learning: Computational User study Applications: Trajectories & time series Cybersecurity and insider threat Engineering: Enrichment of streaming data File Name // File Date // PNNL-SA-##### Datasets


Download ppt "Show suggestions and borderlines Hierarchical Clustering"

Similar presentations


Ads by Google