Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.

Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

2 Digits sequence Noisy digits sequence Denoised by state of the art algorithm of Cohen & Berdugo Segev, Schechner, Elad, Cross-Modal Denoising

Use one modality to denoise another? Use video to denoise a soundtrack? 3 Segev, Schechner, Elad, Cross-Modal Denoising

a Very intense Non-stationary Unknown Unseen source. Noise Single microphone 4 Segev, Schechner, Elad, Cross-Modal Denoising

5 very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing video Cross-modal Example-Based Segev, Schechner, Elad, Cross-Modal Denoising

8 Training xample set nput test set I E Segev, Schechner, Elad, Cross-Modal Denoising

10 ~syllable (0.25 sec) Segev, Schechner, Elad, Cross-Modal Denoising

lophone 11 Xylophone Segev, Schechner, Elad, Cross-Modal Denoising

lophone 12 Sound Xylophone Segev, Schechner, Elad, Cross-Modal Denoising

13... Examples Segev, Schechner, Elad, Cross-Modal Denoising

Cross-modal representation. 17 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising

18 Input videoVideo feature-space time (sec) Input audio Audio feature-space Segev, Schechner, Elad, Cross-Modal Denoising

19 Input audio-video time (sec) Audio-video feature-space Segev, Schechner, Elad, Cross-Modal Denoising

20 Training audio-video Audio-video examples feature-space time (sec) Segev, Schechner, Elad, Cross-Modal Denoising

21 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

24 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

25 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

26 Examples... Segev, Schechner, Elad, Cross-Modal Denoising

27 Examples... Segev, Schechner, Elad, Cross-Modal Denoising

28 Noisy audio Clean segment Segev, Schechner, Elad, Cross-Modal Denoising

29 Noisy audio Clean segment Denoised Segev, Schechner, Elad, Cross-Modal Denoising

Examples... 30 Segev, Schechner, Elad, Cross-Modal Denoising

31 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising

32... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising

Bartender experiment 35 Segev, Schechner, Elad, Cross-Modal Denoising

Cross-modal representation. 37 Generating multimodal features. Cross-modal pattern recognition (NN). Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising

39 Feature-space For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising

40 Feature-space bi fif ty two ar bi -fif -ty-two For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising

41 Current cluster Next cluster bityfiftwoar bi ty fif two ar 1 1 1 1 1 1 1 Feature-space bi fif ty two ar 1 2 1 Segev, Schechner, Elad, Cross-Modal Denoising

42 Current cluster Next cluster bityfiftwoar bi ty fif two ar 13 17 22 9 43 21 53 60 2 3 711 6 23 12 5 7 6 1 2 4 5261 12 Syllable consecutive probability The probability for transition between clusters = Number of examples in training set Segev, Schechner, Elad, Cross-Modal Denoising

43 Hidden Markov Model P Time delay bifif ty two bi Segev, Schechner, Elad, Cross-Modal Denoising

44 P Time delay bifif ty two bi Audio noise Segev, Schechner, Elad, Cross-Modal Denoising

45 Hidden Markov Model P Time delay bifif ty two bi + Audio noise Segev, Schechner, Elad, Cross-Modal Denoising

46 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising

47... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising

49 Input video Segev, Schechner, Elad, Cross-Modal Denoising

50 Input video Segev, Schechner, Elad, Cross-Modal Denoising

51 Input video Vector of indices Segev, Schechner, Elad, Cross-Modal Denoising

52 A Cost function A Regularization term A Data term A Regularization term A Data term Segev, Schechner, Elad, Cross-Modal Denoising

53 A Cost function A Regularization term A Data term A Regularization term A Data term Optimally vector of indices Segev, Schechner, Elad, Cross-Modal Denoising

54 nodes edges Complexity : Examples Input... Complexity: Dynamic Programming Segev, Schechner, Elad, Cross-Modal Denoising

Cross-modal representation. 58 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising

Audio Features 59 Sensitivity to sound perception. Dimension reduction Visual Features Focusing on the motion of interest Dimension reduction Speech Features Music Features Requirements The spatial trajectory of a hitting rod DCT coefficients MFCCs Spectrogram of each segment Segev, Schechner, Elad, Cross-Modal Denoising

60 MFCCs – Mel-frequency Ceptral Coefficients Audio signal Signal spectrum Mel-frequency filter bank log(. ) DCT MFCCs Segev, Schechner, Elad, Cross-Modal Denoising

61 Spectrogram of each segment Spectrogram Xylophne signal Spectrogram accumulation Segev, Schechner, Elad, Cross-Modal Denoising

The given movie 62... speech Segev, Schechner, Elad, Cross-Modal Denoising

Locking on the object of interest 63... speech Segev, Schechner, Elad, Cross-Modal Denoising

64... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising

65... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising

Extracting features 66 DCT coefficients which highly represent motion between frames speech Segev, Schechner, Elad, Cross-Modal Denoising

The given movie 67... Xylophone Segev, Schechner, Elad, Cross-Modal Denoising

Locking on the object of interest 68 Xylophone... Segev, Schechner, Elad, Cross-Modal Denoising

Extracting global motion by tracking 69 Xylophone... X Z Y Segev, Schechner, Elad, Cross-Modal Denoising

70 Xylophone... X ZY Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising

Extracting features 71 Xylophone Hitting rod spatial coordinates X Y Z Segev, Schechner, Elad, Cross-Modal Denoising

Speech 72 A corpus of a limited number of words and syllables: Digits and bar beverages. Video rate 25fps, Audio rate 8000Hz. Kmeans clustering, 350 clusters. Distance measurement l 2 norm. Xylophone A corpus of a limited sounds. Video rate 25fps, Audio rate 16000Hz Distance measurement l 2 norm. Segev, Schechner, Elad, Cross-Modal Denoising

73 Xylophone Training duration: 103 sec Testing duration : 100 sec Music from song by GNR: SNR = 0.9 Xylophone Melody: SNR = 1 Segev, Schechner, Elad, Cross-Modal Denoising

Speech: Digits 74 Training duration: 60 sec Testing duration : 240 sec NoisyDenoised SNR = 0.07 Segev, Schechner, Elad, Cross-Modal Denoising

Speech: Bartender 75 Music from song by Phil Collins Male SpeechWhite Gaussian Training duration: 48 sec Testing duration : 350 sec SNR = 0.59 SNR = 0.3SNR = 0.38 Segev, Schechner, Elad, Cross-Modal Denoising

76 video very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing Example-based Hidden Markov Model Segev, Schechner, Elad, Cross-Modal Denoising

Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.

Similar presentations

Presentation on theme: "Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.

Similar presentations

Presentation on theme: "Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1."— Presentation transcript:

Similar presentations

About project

Feedback