Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting Sexually Provocative Images

Similar presentations


Presentation on theme: "Detecting Sexually Provocative Images"— Presentation transcript:

1 Detecting Sexually Provocative Images
Debashis Ganguly, Mohammad H. Mofrad, Adriana Kovashka Department of Computer Science, University of Pittsburgh Hi, I am Debashis Ganguly from the University of Pittsburgh. This is a work with my colleague Mohammad and professor Kovashka.

2 Real Life Challenges Overwhelming amount of visual data on the Internet Parents may want to restrict the visual contents which their children can see. Lots of manual effort is invested by digital content administrators to classify images in age restricted categories. Abundance of visual content available on the Internet, and the easy access to such contents by all users pose a lot of challenges to the content management process. For example, if a parent wants to restrict what his or her child can see, the web contents need to be tagged as offensive or not. Currently, automatic filtering is far from perfect and often allows offensive imagery in the “safe zone” results. Thus, it takes a lot of manual effort to classify images under age restricted categories by the web content owners.

3 Limitations of Existing Approaches
Existing approaches detect pornographic contents based on percentage of skin area exposed by the subjects in such images. Skin detection algorithms are among the most basic methods for filtering sexual contents in images. ***Researchers are working on this problem from the early 2000’s. Jiao et. al., “Detecting adult image using multiple features”, Info-tech and Info-net 2001 Duan et. al., “Adult image detection method based on skin color model and support vector machine”, Asian Conference on Computer Vision 2002 Zheng et. al., “Shape based adult image detection”, International Journal on Image and Graphics 2006 Lee et. al., “Naked image detection based on adaptive and extensible skin color model”, Pattern recognition 2007

4 Limitations of Existing Approaches (contd.)
Current methods can not differentiate between pornographic content, portrait or harmless body shot like below. Existing works pay little attention to the intention behind the image composition and the goals of the photographic subject with respect to how the photo should be perceived. They cannot differentiate between “naked” subjects and “sexual” subjects. For example, these schemes would fail to distinguish between a portrait like “Mona Lisa”, a harmless photo of a bodybuilder, and a person posing with sexual intent. ***Because mostly their assumption is relied on the amount of skin ratio, they extracted from the image

5 Approach: Identifying Features
17 types of Attributes composed from: Posture and gesture Posture, gesture with fingers, movement, head position, direction of body and face relative to camera, etc. Facial expression Mouth open or closed, type of smile, biting lips, eyebrows, eyelids, looking direction Scene context Outdoor scene, outdoor events, indoor scenes with props or with flat background Skin exposure Fully clothed, bare bodied, private body parts exposed 5 types of Moods and Emotions: Defensive, suggestive, playful, relaxed, upset 3 Sexual Intents Yes, maybe, no We identify 17 different attributes based on posture, gesture, facial expression, scene context and skin exposure. Earlier work suggested that they can help identify intents behind images. We also identified moods and emotions of the subject to predict the sexual intents.

6 Hierarchical Framework
Y M N Sexual Intent D S P R U Moods and Emotions E F Image Background C B Skin Exposure L Facial Expressions H W G Posture and Gesture Color/SIFT/HOG/FC6/FC7/FC8 Automatically Extracted Features We propose a hierarchical framework where from the input images, the low level features like CNN FC7 (using Caffe tools) or SIFT (using VLFEAT) are extracted. Then a set of multi-class SVMs are trained from these extracted features to learn each attribute (remember we have 17 attributes), from attributes to moods and emotions, and then from moods to final global classes of sexual intents. Knowledge is propagated through the layers of hierarchical framework. 1st level SVMs are trained using automatic features (e.g. CNN FC7, SIFT, etc.) to predict attributes 2nd level SVMs are trained using the 17 attributes to predict moods and emotions 3rd level SVMs are trained using 5 moods and emotions to predict global sexual intent.

7 Experiments: Dataset 1,146 celebrity images
203 Hollywood celebrities from people.com 892 and 254 images of female and male candidates respectively 5.6 images per person ratio 19 questions per image for annotations Amazon Mechanical Turk by majority voting of 3 annotators per image 70.5% annotator consensus We introduce a new dataset of 1146 images. The novelty of the dataset is in its annotations. Identifying 19 relevant questions that can help develop insight of sexual intents behind images.

8 Experiments: Baseline
Automatically extracted features Low level features: Color histogram, SIFT, HOG using VLFeat CaffeNet Features: FC6, FC7, FC8 using Caffe Direct model Single level of classification hierarchy trained from automatically extracted features to predict sexual intent Joo et. al., “Visual persuasion: inferring communicative intents of images”, CVPR 2014 Subset of features mapped based on relevance to our problem domain This is a method to predict intents behind photographs of politicians.

9 Results: Overview In general hierarchical framework has higher F-measure than the corresponding direct model. FC7 performs the best among all low level features and baseline of Joo et. al. ***Precision = TP/TP + FP ***Sensitivity (true positive rate) = Recall = TP / P = TP / (TP + FN) ***F-measure = (2 * Precision * Recall) / (Precision + Recall)

10 Results: Overview In general hierarchical framework has comparable accuracy with the corresponding direct model. ***Accuracy = TP + TN / (TP + TN + FP + FN) ***From the results, the hierarchical framework accuracy is comparable to direct approach (our rival), but at the same page, we are having a higher F-measure compare to direct model which shows the superiority of hierarchical framework for the task of sexual content screening.

11 Results: Overview In general hierarchical framework has higher sensitivity than the corresponding direct model. Higher sensitivity allows the ability to catch any and all sexually provocative images. ***Sensitivity (true positive rate) = Recall = TP / P = TP / (TP + FN)

12 Results: Overview In general hierarchical framework has lower specificity compared to the corresponding direct model. Lower specificity means higher false positive which leads to classify non-provocative images as sexually provocative images. In a real world application of this model, the lower specificity imposes a manual refining process for detecting non-sexual contents out of all sexually classified contents, which is less problematic than lower sensitivity. ***Specificity (true negative rate) = TN / N = TN / (TN + FP)

13 Conclusion Our method enables automated contents classification based on behaviors and intents of the portrayed subjects. It allows prompt intervention of human experts upon integrating the proposed methodology with mobile apps, social media websites, and media streaming websites. Please come see our poster for more info.


Download ppt "Detecting Sexually Provocative Images"

Similar presentations


Ads by Google