Constructing an Anonymous Dataset From the Personal Digital Photo Libraries of Mac App Store Users JCDL 2013 Jesse P. Gozali, Min-Yen Kan, Hari Sundaram.

Slides:



Advertisements
Similar presentations
The Relational Model and Normalization (1)
Advertisements

Detecting and Mixing Colors STEM DIGITAL Institute Rob Snyder.
Dynamic View Selection for Time-Varying Volumes Guangfeng Ji* and Han-Wei Shen The Ohio State University *Now at Vital Images.
Describing Data: Frequency Distributions and Graphic Presentation
SQL: The Query Language Part 2
We Can Read About Mixing Colors
Calypso Construction Features
C. Multimedia Production and Web Site Development
A probabilistic model for retrospective news event detection
Esri International User Conference | San Diego, CA Demo Theater | ArcGIS Beta Community and ArcGIS 10.1 Beta Program Mike Hogan & Rohit Gupta July 12 th,
Digital Imaging with Charge- coupled devices (CCDs)
Introduction to the Practice of Statistics
MarkeTrak V Hearing Aid Industry Market Tracking Survey Sergei Kochkin, Ph.D. Knowles Electronics, Inc. June 1999.
Mobile Apps & Mobile Web
Ethan Bruning Senior Sales Engineer Mobile Capture Apps – Introduction to Mobile Capture App Design and Development.
AMES-Cloud: A Framework of Adaptive Mobile Video Streaming and Efficient Social Video Sharing in the Clouds 作者:Xiaofei Wang, MinChen, Ted Taekyoung Kwon,
Ethanol Transportation and Storage Hazards Developed by Western Iowa Tech Community College This material was produced under a grant (SH F-19)
Copyright 2006 by Pearson Education 1 Building Java Programs Supplement 3G: Graphics.
James Hays and Alexei A. Efros Carnegie Mellon University CVPR IM2GPS: estimating geographic information from a single image Wen-Tsai Huang.
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
Multimedia and weBLOGging Grade 7-9 | Cahaya Bangsa Classical School (C) 2010 Digital Media Production Facility 06 – Blog HTML Basic.
CS 240 Computer Programming 1
The zooniverse.org real science online. The Zooniverse is a collection of websites where members of the public are asked to look at data and interpret.
Xiao Zhang and Wenliang Du Dept. of Electrical Engineering & Computer Science Syracuse University.
13- 1 Chapter 13: Color Processing 。 Color: An important descriptor of the world 。 The world is itself colorless 。 Color is caused by the vision system.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
What makes an image memorable?
Understanding Human-Smartphone Concerns: A Study of Battery Life Denzil Ferreira, Anind K. Dey, Vassilis Kostakos Pervasive 2011.
Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into.
Apps VS Mobile Websites Which is better?. Bizness Apps Survey Bizness Apps surveyed over 500 small business owners with both a mobile app and a mobile.
E-Reader Workshop Andrea Gannon August 15, Let’s get digital  Black and white E-readers  7 inch LCD media tablets  Mid-size LCD tablets (7.9-9.
Sparky + The Next Generation College Mobile Solution Ruoyang Zhang ENG 302 Class Project All rights reserved 08/06/2014.
The NHD and the Future of Stream Mapping in West Virginia Evan Fedorko West Virginia GIS Technical Center Jackie Strager Natural Resource.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Version 1.0 Requirements.  PROstructor ◦ PROstructor is a community and service to finding, scheduling and paying professional for private, group lessons.
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
2 Outline Introduction –Motivation and Goals –Grayscale Chromosome Images –Multi-spectral Chromosome Images Contributions Results Conclusions.
Imagery 2.0 –you are here and there A brief introduction to social photo and video.
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
The NHD and the Future of Stream Mapping in West Virginia Evan Fedorko West Virginia GIS Technical Center Jackie Strager (NRAC)
Presented By: ROLL No IMTIAZ HUSSAIN048 M.EHSAN ULLAH012 MUHAMMAD IDREES027 HAFIZ ABU BAKKAR096(06)
An Introduction to Analyzing Colors in a Digital Photograph Rob Snyder.
Jaroslaw Kutylowski 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Robust Undetectable Interference Watermarks Ryszard Grząślewicz.
Android absolutely dominated the number of smartphones shipped worldwide in the first three months of 2015, with.
Tablets by Hayden. Chapter 1introduction Chapter 2 sales Chapter 3 history on tablets Chapter 4 apps.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
POSTER TEMPLATE BY: Background Objectives Psychophysical Experiment Smoothness Features Project Pipeline and outlines The purpose.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Power Guru: Implementing Smart Power Management on the Android Platform Written by Raef Mchaymech.
Operating Systems By-Anoushka Puri. What is an Operating System An operating system is an interface between the user and the computer hardware. It is.
Goal: To understand the HR diagram Objectives: 1)Recap of magnitude scale 2)To understand the Relationship between Temperature, color, and color magnitude.
Implementation recommendations 1st COPRAS review Presentation at 2nd COPRAS annual review, 15 March 2006, CEN/CENELEC meeting centre, Brussels Bart Brusse.
Fraud Detection with Machine Learning: A Case Study from Sift Science
Experience Report: System Log Analysis for Anomaly Detection
Mobile Activity Recognition
Identifying Counterfeits of Mobile Devices By: Pirvette Lee Department: Mechanical Engineering Technology Background Counterfeit tech carries significantly.
From: What are the units of storage in visual working memory?
Chapter 15: App Monetization Strategies
Machine Learning Feature Creation and Selection
Differential Privacy in Practice
Application Support, Development & Administration
Light Light has wave-like properties
Lesson 11 Key Concepts.
Chapter 3 – part2.
Do You Have Multiple Amazon Seller Accounts? Amazon Knows it! By EsellersCare Contact : +1 (855)
Jiwon Kim Steve Seitz Maneesh Agrawala
Presentation transcript:

Constructing an Anonymous Dataset From the Personal Digital Photo Libraries of Mac App Store Users JCDL 2013 Jesse P. Gozali, Min-Yen Kan, Hari Sundaram National University of Singapore, Arizona State University 1 Slides Available:

COLLECTING PERSONAL DATA FOR RESEARCH Research on personal digital photo libraries need access to real data. The personal nature, especially since photos are involved, makes accessing large datasets difficult, yet alone creating a publicly available one. Past research that requires such data have resorted to photos from the researchers themselves or soliciting volunteers with monetary remuneration. 2 Slides Available:

CROWDSOURCING? How can we reach out to a large number of potential volunteers? Crowd-sourcing platforms (e.g. Amazon Mechanical Turk) are useful for gathering human judgements, as long as precautions are taken (qualification task, verification questions, fake data filtering) However: Annotations on the data must be done by photo owners, not third-party evaluators due to the semantic gap between the photos and the event they represent. Motivation is monetary; MTurk participants may not be target users. 3 Slides Available:

APP STORES A solution: Application Stores Widely used for mobile applications (e.g. Android Marketplace, Apples App Store), but also for desktop applications (Valves Steam, Apples Mac App Store, Microsofts Windows Store). Large user base with high download rates. Helps application developers to manage the purchase, distribution, updating, and publicity of their applications. 4 Slides Available:

DATASET CONSTRUCTION We did a study to use Mac App Store (MAS) to alleviate issues with cost and reaching potential participants for constructing a dataset. We published a photo browser application Chaptrs ver. 2 on MAS and invited users to participate in the study (opt-in), expanding on our work presented in JCDL 2012.JCDL

CHAPTRS Photo Browser (ver. 2) 6

CONS IN USING MAS Application needs to have a useful purpose for the user. Its main purpose cannot be for collecting data; for us, the main purpose is a chapter-based photo browser. A necessary overhead just like qualification tasks and verification questions in MTurk. Application needs to undergo a review process; usually 1-2 weeks, but will be longer if complications arise (resubmission, appeals to review board). 7

PROS IN USING MAS Cost doesnt scale with number of participants or amount of data collected. Cost only attributed to the 99 USD / year fee. Cost is lower than reported by previous work with MTurk If we consider the 20,778 photo sets (473,772 photos) we collected in 60 days, cost is USD per photo If we consider the 60 photo sets (8,107 photos) with chapter boundary annotations, cost is USD per annotation Visibility is high: total number of downloads in 60 days of study is 2,549 (42 per day) 8

CHAPTRS DATASET The dataset we constructed has anonymous photo features, corresponding to those used in our event photo stream segmentation algorithm: time gap, aperture diameter, log light (scene brightness), 8-bin color histogram. 20,778 photo sets (473,772 photos), including 60 photo sets (8,107 photos) with author-annotated chapter boundary annotations. The dataset can be expanded to include other anonymous photo features. Released as a publicly available dataset to further research in personal digital photo libraries

COLOR DISTRIBUTION The dataset has an 8-bin color distribution for each photo. We clustered these distributions with k-means for up to k=9 and found k=6 to have an optimal BIC score. Clusters 1, 4, 5, and 6 show different ratios of white to black while the ratios of the remaining 6 colors remain fairly constant. Cluster 2 shows the representative color distribution for blue/cyan -colored photos. Cluster 3 shows the representative color distribution for the red/yellow -colored photos. 10

PHOTO TAKING BURSTS A photo taking burst is a sequence of photos (> 1 photo) taken in succession with an average time gap of t seconds. To be reasonably referred to as a burst, t should be a small value. However, just to be thorough, we identified bursts for t from 1.1 seconds to 96,000 seconds (26 hours). Most bursts had an average time gap of 9.3 seconds with ~3 photos on average. The largest average number of photos per burst is 4 photos with an average time gap of 1.1 seconds. 11

LOG LIGHT (BRIGHTNESS) The histogram of log light values (a measure of scene brightness) has two peaks and fits a 2-mixture Gaussian distribution While we do not have access to the absolute timestamps of the photos, these may be peaks corresponding to day (left) and night time (right mixture) photos. 12

CONCLUSION First study on chapter-based photo organization Unsupervised method for event photo stream segmentation, embedded into... Released a freely-available chapter-based photo browser Released publicly available dataset for photo organization research Outlined data collection method to reach personal digital photo libraries using the Mac App Store (MAS) as a distribution platform and released the dataset to the research community 13 Dataset Available: