The Web Media Verification Challenge

The Web Media Verification Challenge
Olga Papadopoulou Markos Zampoglou Symeon Papadopoulos Yiannis Kompatsiaris 1st International School on “Learning from Signals, Images, and Video” Thessaloniki, July 2019

Social media as news source and misinformation
Verifiably false/misleading info Created, presented and disseminated for economic gain and intentionally deceive the public May cause public harm “The weaponisation of on-line fake news and disinformation poses a serious security threat to our societies” (EC) “Fake news represents a danger to democracy” (Eurobarometer study)

Multimedia Knowledge and Social Media Analysis Lab (MKLab)
Personnel: 3 senior researchers, 20+ post-doc researchers, 40+ research assistants, 6 PhD candidates Research projects: 20 H2020 (co-ordinating 4), 8 National Projects Industry: Infalia spin-off, contracts (e.g. Motorola UK-US) Publications: 146 Journals, 8 Patents, 437 Conferences Events: MMM 2019, IVMSP 2018, Internet Science 2017, ESSIR2015 Open source, available tools, datasets Research Areas: Computer Vision, Semantic Technologies, Social media and big data, IoT-sensors, Brain Computer Interfaces Applications: Media, Culture, Security, Smart Cities, eHealth

Media verification activities
Areas Image tampering Social media mining Video verification Deep fakes Projects WeVerify (ongoing) TENSOR (ongoing) InVID ( ) REVEAL ( ) SocialSensor ( ) Results Tools: Image Verification Assistant, Tweet Verification Assistant, Context Aggregation and Analysis Datasets: Fake video corpus, Tweet verification corpus, Wild Web tampered image dataset

The many faces of disinformation

Three Challenges Tampered image detection
use image forensics output to spot digitally manipulated images Contextual video verification leverage video metadata to produce a credibility score for the input video Verification-oriented comment detection build a classifier to easily select comments that are useful for verification

#1 Tampered image detection

Tampered image detection
Tampered image Ground truth Forensic results Forensic results (colorized)

Forensic output fusion: challenges
Different output styles depending on algorithm Inconsistent performance: Not all algorithms work in every case Distracting results on non-detections Lack of large-scale datasets

Datasets Columbia Uncompressed Image Splicing Detection Evaluation Dataset 180 tampered / 180 untampered 1st IFS-TC Forensics Challenge 447 tampered / 447 untampered Realistic Tampering Dataset 220 tampered / 220 untampered Split Training: 718 tampered / 718 untampered Test: 128 tampered / 128 untampered Baseline performance Precision Recall F1-score Untampered 0.68 0.86 0.76 Tampered 0.81 0.59 Average 0.74 0.73 0.72

Baseline solution and ideas
Threshold regions (T=190) Extract connected components (morphological clean-up) Feature extraction First three components: area, height, width, perimeter Number of components Image moments Fixed-length feature vector using statistics of 3 larger components Decision tree classifier Ideas Better features (keypoints, CNN activations) Better classifiers (end-to-end DCNNs) Others (link to object detection/semantic segmentation outputs)

#2 Contextual video verification

Contextual video verification
Main hypothesis: misleading (aka fake) videos are written/published in different ways compared to trustworthy ones Leverage different signals of quality and credibility contained in video metadata

Fake video corpus Unique cases: 200 fake and 180 real of YouTube, Facebook and Twitter Total number of videos in cascades: 2920 fake and 2090 real STAGED TAMPERED REUSE

Feature extraction from video title and channel statistics RBF SVM classifier Ideas Explore better text mining pipelines: ELMo: Deep contextualized word representations LSTM and Bi-LSTM (process word sequences in both the directions) GPT-2 From video title From channel metadata Text length # words Contains question/explanation marks Contains 1st/2nd/3rd person pronoun # uppercase characters # positive/negative sentiment words # slang words Has ‘:’ symbol # question/explanation marks Channel view count Channel subscriber count Channel video count Channel comment count Precision Recall F-score 0.63 0.93 0.75

#3 Verification-oriented comment detection

Verification-oriented comment detection
Problem formulation: comments often contain valuable pieces of information that can help an analyst verify or debunk a video controversial videos attract thousands of videos: hard to separate useful comments from noise Build a classifier that can assign scores to video comments in proportion to their relevance for verification

Filter by a predefined list of verification-oriented keywords available in 7 languages (English, German, Greek, Arabic, Spanish, Farsi) Ideas Use baseline solution as weak annotation and bootstrap a supervised classification scheme Consider additional features (e.g. comment length, contains link) lies, fake, wrong, lie, confirm, where, location, lying, false, incorrect, misleading, propaganda, liar

Let’s get it started https://bit.ly/2XGnmS2
leads to

Thank you! Get in touch!

The Web Media Verification Challenge

Similar presentations

Presentation on theme: "The Web Media Verification Challenge"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Web Media Verification Challenge

Similar presentations

Presentation on theme: "The Web Media Verification Challenge"— Presentation transcript:

Similar presentations

About project

Feedback