Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.

Similar presentations


Presentation on theme: "Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1."— Presentation transcript:

1

2 Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

3 2 Digits sequence Noisy digits sequence Denoised by state of the art algorithm of Cohen & Berdugo Segev, Schechner, Elad, Cross-Modal Denoising

4 Use one modality to denoise another? Use video to denoise a soundtrack? 3 Segev, Schechner, Elad, Cross-Modal Denoising

5 a Very intense Non-stationary Unknown Unseen source. Noise Single microphone 4 Segev, Schechner, Elad, Cross-Modal Denoising

6 5 very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing video Cross-modal Example-Based Segev, Schechner, Elad, Cross-Modal Denoising

7 6

8 7

9 8 Training xample set nput test set I E Segev, Schechner, Elad, Cross-Modal Denoising

10 9

11 10 ~syllable (0.25 sec) Segev, Schechner, Elad, Cross-Modal Denoising

12 lophone 11 Xylophone Segev, Schechner, Elad, Cross-Modal Denoising

13 lophone 12 Sound Xylophone Segev, Schechner, Elad, Cross-Modal Denoising

14 13... Examples Segev, Schechner, Elad, Cross-Modal Denoising

15 14... Examples Segev, Schechner, Elad, Cross-Modal Denoising

16 15... Examples Segev, Schechner, Elad, Cross-Modal Denoising

17 16... Examples Segev, Schechner, Elad, Cross-Modal Denoising

18 Cross-modal representation. 17 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising

19 18 Input videoVideo feature-space time (sec) Input audio Audio feature-space Segev, Schechner, Elad, Cross-Modal Denoising

20 19 Input audio-video time (sec) Audio-video feature-space Segev, Schechner, Elad, Cross-Modal Denoising

21 20 Training audio-video Audio-video examples feature-space time (sec) Segev, Schechner, Elad, Cross-Modal Denoising

22 21 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

23 22 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

24 23 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

25 24 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

26 25 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

27 26 Examples... Segev, Schechner, Elad, Cross-Modal Denoising

28 27 Examples... Segev, Schechner, Elad, Cross-Modal Denoising

29 28 Noisy audio Clean segment Segev, Schechner, Elad, Cross-Modal Denoising

30 29 Noisy audio Clean segment Denoised Segev, Schechner, Elad, Cross-Modal Denoising

31 Examples... 30 Segev, Schechner, Elad, Cross-Modal Denoising

32 31 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising

33 32... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising

34 33... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising

35 34... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising

36 Bartender experiment 35 Segev, Schechner, Elad, Cross-Modal Denoising

37 36... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising

38 Cross-modal representation. 37 Generating multimodal features. Cross-modal pattern recognition (NN). Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising

39 38 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising

40 39 Feature-space For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising

41 40 Feature-space bi fif ty two ar bi -fif -ty-two For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising

42 41 Current cluster Next cluster bityfiftwoar bi ty fif two ar 1 1 1 1 1 1 1 Feature-space bi fif ty two ar 1 2 1 Segev, Schechner, Elad, Cross-Modal Denoising

43 42 Current cluster Next cluster bityfiftwoar bi ty fif two ar 13 17 22 9 43 21 53 60 2 3 711 6 23 12 5 7 6 1 2 4 5261 12 Syllable consecutive probability The probability for transition between clusters = Number of examples in training set Segev, Schechner, Elad, Cross-Modal Denoising

44 43 Hidden Markov Model P Time delay bifif ty two bi Segev, Schechner, Elad, Cross-Modal Denoising

45 44 P Time delay bifif ty two bi Audio noise Segev, Schechner, Elad, Cross-Modal Denoising

46 45 Hidden Markov Model P Time delay bifif ty two bi + Audio noise Segev, Schechner, Elad, Cross-Modal Denoising

47 46 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising

48 47... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising

49 48... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising

50 49 Input video Segev, Schechner, Elad, Cross-Modal Denoising

51 50 Input video Segev, Schechner, Elad, Cross-Modal Denoising

52 51 Input video Vector of indices Segev, Schechner, Elad, Cross-Modal Denoising

53 52 A Cost function A Regularization term A Data term A Regularization term A Data term Segev, Schechner, Elad, Cross-Modal Denoising

54 53 A Cost function A Regularization term A Data term A Regularization term A Data term Optimally vector of indices Segev, Schechner, Elad, Cross-Modal Denoising

55 54 nodes edges Complexity : Examples Input... Complexity: Dynamic Programming Segev, Schechner, Elad, Cross-Modal Denoising

56 55... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising

57 56... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising

58 57... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising

59 Cross-modal representation. 58 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising

60 Audio Features 59 Sensitivity to sound perception. Dimension reduction Visual Features Focusing on the motion of interest Dimension reduction Speech Features Music Features Requirements The spatial trajectory of a hitting rod DCT coefficients MFCCs Spectrogram of each segment Segev, Schechner, Elad, Cross-Modal Denoising

61 60 MFCCs – Mel-frequency Ceptral Coefficients Audio signal Signal spectrum Mel-frequency filter bank log(. ) DCT MFCCs Segev, Schechner, Elad, Cross-Modal Denoising

62 61 Spectrogram of each segment Spectrogram Xylophne signal Spectrogram accumulation Segev, Schechner, Elad, Cross-Modal Denoising

63 The given movie 62... speech Segev, Schechner, Elad, Cross-Modal Denoising

64 Locking on the object of interest 63... speech Segev, Schechner, Elad, Cross-Modal Denoising

65 64... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising

66 65... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising

67 Extracting features 66 DCT coefficients which highly represent motion between frames speech Segev, Schechner, Elad, Cross-Modal Denoising

68 The given movie 67... Xylophone Segev, Schechner, Elad, Cross-Modal Denoising

69 Locking on the object of interest 68 Xylophone... Segev, Schechner, Elad, Cross-Modal Denoising

70 Extracting global motion by tracking 69 Xylophone... X Z Y Segev, Schechner, Elad, Cross-Modal Denoising

71 70 Xylophone... X ZY Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising

72 Extracting features 71 Xylophone Hitting rod spatial coordinates X Y Z Segev, Schechner, Elad, Cross-Modal Denoising

73 Speech 72 A corpus of a limited number of words and syllables: Digits and bar beverages. Video rate 25fps, Audio rate 8000Hz. Kmeans clustering, 350 clusters. Distance measurement l 2 norm. Xylophone A corpus of a limited sounds. Video rate 25fps, Audio rate 16000Hz Distance measurement l 2 norm. Segev, Schechner, Elad, Cross-Modal Denoising

74 73 Xylophone Training duration: 103 sec Testing duration : 100 sec Music from song by GNR: SNR = 0.9 Xylophone Melody: SNR = 1 Segev, Schechner, Elad, Cross-Modal Denoising

75 Speech: Digits 74 Training duration: 60 sec Testing duration : 240 sec NoisyDenoised SNR = 0.07 Segev, Schechner, Elad, Cross-Modal Denoising

76 Speech: Bartender 75 Music from song by Phil Collins Male SpeechWhite Gaussian Training duration: 48 sec Testing duration : 350 sec SNR = 0.59 SNR = 0.3SNR = 0.38 Segev, Schechner, Elad, Cross-Modal Denoising

77 76 video very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing Example-based Hidden Markov Model Segev, Schechner, Elad, Cross-Modal Denoising


Download ppt "Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1."

Similar presentations


Ads by Google