Encrypted Traffic Mining (TM) e.g. Leaks in Skype

Slides:

Advertisements

Similar presentations

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Advertisements

SoNIC: Classifying Interference in Sensor Networks Frederik Hermans et al. Uppsala University, Sweden IPSN 2013 Presenter: Jeffrey.

Speech Enhancement through Noise Reduction By Yating & Kundan.

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Tuning Skype Redundancy Control Algorithm for User Satisfaction Te-Yuan Huang, Kuan-Ta Chen, Polly Huang Proceedings of the IEEE Infocom Conference Rio.

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Mudhakar Srivatsa, Ling Liu and Arun Iyengar Presented by Mounica Atluri.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.

Energy Characterization and Optimization of Embedded Data Mining Algorithms: A Case Study of the DTW-kNN Framework Huazhong University of Science & Technology,

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Presenter: Yufan Liu November 17th,

Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis Presented by Yang Gao 11/2/2011 Charles V. Wright MIT Lincoln Laboratory Scott.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Metamorphic Malware Research

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Encapsulation Security Payload Protocol Lan Vu. OUTLINE 1.Introduction and terms 2.ESP Overview 3.ESP Packet Format 4.ESP Fields 5.ESP Modes 6.ESP packet.

Chess Review May 11, 2005 Berkeley, CA Closing the loop around Sensor Networks Bruno Sinopoli Shankar Sastry Dept of Electrical Engineering, UC Berkeley.

Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks Yi Lu, Weichao Wang, Bharat Bhargava CERIAS and Department of Computer Sciences Purdue.

Skype & its protocol Aaron Loar CPE 401. Introduction Skype’s Background Topology 3 Node Types Questions.

A PRESENTATION BY SHAMALEE DESHPANDE

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Key-Stroke Timing and Timing Attack on SSH Yonit Shabtai and Michael Lustig supervisor: Yoram Yihyie Technion - Israel Institute of Technology Computer.

Wireshark Presented By: Hiral Chhaya, Anvita Priyam.

Introduction to Automatic Speech Recognition

Kalman filter and SLAM problem

LE 460 L Acoustics and Experimental Phonetics L-13

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Network and Systems Laboratory nslab.ee.ntu.edu.tw Te-Yuan Huang, Kuan-Ta Chen, Polly Huang Network and Systems Laboratory National Taiwan University Institute.

Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.

Markov Localization & Bayes Filtering

Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis

Computer Networks: Multimedia Applications Ivan Marsic Rutgers University Chapter 3 – Multimedia & Real-time Applications.

An efficient secure distributed anonymous routing protocol for mobile and wireless ad hoc networks Authors: A. Boukerche, K. El-Khatib, L. Xu, L. Korba.

7-Speech Recognition Speech Recognition Concepts

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Intercepting Mobile Communications: The Insecurity of Nikita Borisov Ian Goldberg David Wagner UC Berkeley Zero-Knowledge Sys UC Berkeley Presented.

Voice over IP Why Challenges/solutions Voice codec and packet delay.

Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.

Particle Filters.

1 © NOKIA FILENAMs.PPT/ DATE / NN Helsinki University of Technology Department of Electrical and Communications Engineering Jarkko Kneckt point to point.

© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.

Karlstad University IP security Ge Zhang

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.

ﺑﺴﻢﺍﷲﺍﻠﺭﺣﻣﻥﺍﻠﺭﺣﻳﻡ. Group Members Nadia Malik01 Malik Fawad03.

Second Line Intrusion Detection Using Personalization DISA Sponsored GWU-CS.

IPsec Introduction 18.2 Security associations 18.3 Internet Security Association and Key Management Protocol (ISAKMP) 18.4 Internet Key Exchange.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

A paper by: Paul Kocher, Joshua Jaffe, and Benjamin Jun Presentation by: Michelle Dickson.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

IP security Ge Zhang Packet-switched network is not Secure! The protocols were designed in the late 70s to early 80s –Very small network.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.

HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Charles V. Wright Scott E. Coull Gerald M. Masson Lucas Ballard Fabian Monrose.

Using Speech Recognition to Predict VoIP Quality

CS 445/656 Computer & New Media

On-line Detection of Real Time Multimedia Traffic

핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee

Artificial Intelligence for Speech Recognition

Supervised Time Series Pattern Discovery through Local Importance

Audio and Speech Computers & New Media.

E-MiLi: Energy-Minimizing Idle Listening in Wireless Networks

Presentation transcript:

Encrypted Traffic Mining (TM) e.g. Leaks in Skype Benoit DuPasquier, Stefan Burschka

Contents Who, What (WTF), Why Short Introduction 2 TM Engineering Approach TM Signal Analysis Methods Results Questions

Who: Since Feb 2011 @ Stefan Francesco Torben Sebastian Antonino Sakir, Benoit, Antonio Ulrich, Ernst, ... Nur & Malcolm Wurst Stefan Francesco Torben Sebastian Antonino Fabian Mischa Noe ﺤﺮﺐ © NASA ? © Rouxel Antonio, Patrick, Hugo, Pascal, K-Pascal, Mehdi, Javier, Seili, Flo, Frederic, Markus, ... Dago © Rouxel

What: Apollo Projects Network Troubleshooting: Operational Picture: NINA: Automated Network Discovery and Mapping TRANALYZER: High Speed and Volume Traffic Flow Analyzer TRAVIZ: Graphic Toolset for Tranalyzer Operational Picture: How to understand Multidimensional Data? Automated Protocol Learning and Statemachine reversing

WTF is in it?

Traffic Mining: Hidden Knowledge: Listen | See, Understand, Invariants  Model Application in Security (Classification, Decoding of encrypted traffic ) Netzwerk usage (VoiP, P2P traffic shaping, skype detection) Profiling & Marketing (usage performance- & market- index) Law enforcement and Legal Interception (Indication/Evidence)

Encrypted Content Guessing Traffic Mining: Encrypted Content Guessing SSH Command Guessing IP Tunnel Content Profiling Encrypted Voip Guessing: e.g. Skype

If you plainly start listening to this 22:06:51.410006 IP 193.5.230.58.3910 > 193.5.238.12.80: P 1499:1566(67) ack 2000 win 64126 0x0000: 0000 0c07 ac0d 000f 1fcf 7c45 0800 4500 ..........|E..E. 0x0010: 006b 9634 4000 8006 0e06 c105 e63a c105 .k.4@........:.. 0x0020: ee0c 0f46 0050 1b03 ae44 faba ef9e 5018 ...F.P...D....P. 0x0030: fa7e 9c0a 0000 28d8 f103 e595 8451 ea09 .~....(......Q.. 0x0040: ba2c 8e91 9139 55bf df8d 1e07 e701 7a09 .,...9U.......z. 0x0050: cf96 8f05 84c2 58a8 d66b d52b 0a56 e480 ......X..k.+.V.. 0x0060: 472d e34b 87d2 5c64 695a 580f f649 5385 G-.K..\diZX..IS. 0x0070: ea31 721f d699 f905 e7 .1r...... Header Payload You will end like that

Distinguish from by listening So, what is the Task? Distinguish from by listening Gap in tracks Sound ~ Packet Length Packet Fire Rate (Interdistance)

Why Skype? Google Talk, SIP/RTP, etc too easy EPFL Google Talk, SIP/RTP, etc too easy At that time many undocumented codecs, including SILK Challenge: Constant packet flow, so no indication about speaker pause Feds: Pedophile detection in encrypted VoIP

TM Exercise: See the features? Codec training Burschka (Fischkopp) Linux Dominic (Student) Windows Ping min l =3 SN

Hypotheses Existence of Transfer Function between audio input and observed IP packet lengths Output is predictable Given the output, input can be estimated

Parameters influencing IP output Basic signals (Amplitude, Frequency, Noise, Silence) Phonemes Words Sentences

Assumptions Everybody uses Skype Only direct UDP communication mode, Problem already complicated enough Language: English

Basic Lab setup MS Windoof XP Pro Ver 2002 SP3 Intel(R) Core(TM) 2 E6750 @ 2.66 GHz 2.99 Gz RAM 2.00 GB Skype Version 4.0.0.224 Skype’s audio codec SILK Phonem DB from Voice Recognition Project with different speakers

1. Engineering Approach: Influencing Parameters Audio codec is invariant component Skype’s internal (cryptography, network layer) Sound cards Software being used to feed voice into Skype Software being used to generate sounds.

Derive the Transfer Function

Example: Frequency sweep

Result: Skype Transfer Model Desync packet generation process and codec output codec Speeds unsyncronized Ip layer

2. Mining Approach Engineering approach inappropriate, model too complex So Voice to Packet generation process has to be learned Find mapping: Phonems Words Sentences Produce Invariants

Attack, Comb, Decay, Sustain, Release Phoneme / /, e.g. in word pleasure Find Homomorphism between 44 Phonems Commutativity f (a * b) = f (b * a) Additivity f (a * b) = f (a) * f (b)

Results: Signal Invariant Analysis No satisfying Homomorphism except in Signal Length and Silence / Signal Word construction difficult due to phoneme overlapping Noise / Silence estimation & substraction improves results considerably The longer the sequence, the better the results  Sentences Detection

Sentence Signals Same sentences, similar output  

Different Sentences same Speaker 

Signal Differentiation: Dynamic Time Warping (DTW) Dynamic programming algorithm, Predecessor of HMM Mainly used for speech processing Suited to compare sequences varying in time or speed Squared euclidian distance Visualization of similarity DTW map

Matching DTW map path Optimal Path Young children should avoid exposure to contagious diseases

Non-matching DTW map path The fog prevented them from arriving on time Young children should avoid exposure to contagious diseases

Results: Speaker dependent Six Recordings: Permutation of three sentences Nine target sentences, one model per sentence 66% of correct Classification Mis-classification: “I put the bomb in the train” “I put the bomb in the bus” Eight target sentences, several models per sentence 83% of correct guesses

Noise & Speaker Resilience The Kalman Filter (‘60ies) Recursive linear filter Mainly used for radar or missile tracking problems Estimates state of linear discrete-time dynamical system from series of noisy measurements (If non-linear: use 1. order Taylor term) Process & measurement noise must be additive and gaussian Our case: k = 0  F,H,Q,R const in time © Greg Welsh, Gary Bishop

Kalman Filter Functionality Average Estimator, Predictor X,t1 Y,t2 Z,t3 Position of Alice and Bob not known Bob: At time t1 plane at position X Alice: At time t2, the plane is at position Y Kalman Filter: Prediction of next plane position At time t3, the plane will be at position Z

Example: Constant Line Estimation Estimation Goal Data Kalman Filter Estimation

Kalman Model for one Sentence

Mitigation Techniques No perfect solution Trade-offs between bandwidth consumption, computational power and information leakage required Padding at the cryptographic layer Pad each packet to bit position length, e.g., 58  64 Bytes Computational acceptable Add random payload to network layer Random payload of random size New header field required Computational expensive

Conclusions Mitigation techniques: Relatively easy Detection of a sentence in Skype traces is possible Q&D: With an average accuracy greater than 60% Can reach 83% under specific conditions Kalman Filter: Speaker independent models Mitigation techniques: Relatively easy Invest more work  better results: s. USA 2011

Next: All IP Signal Processing

Questions / Comments Science is a way of thinking much more than it is a body of knowledge. Carl Sagan V0.57 http://sourceforge.net/projects/tranalyzer/ stefan.burschka@ruag.com