Download presentation
Presentation is loading. Please wait.
Published byFrancine Weaver Modified over 9 years ago
1
VAD in CLUE Andy Pepperell
2
Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully transcoded! Want to be able to determine “active” video for intra-room segment switching VAD algorithms must be consistent – Categorization of audio media – Calculation of energy values (dB)
3
Single vs multiple audio streams Some consumers may receive single, pre- mixed, audio stream from provider whereas some may receive multiple separate streams in a linear array – Want parity between these 2 cases so that all consumers are equally capable Single speaker rooms should not be disadvantaged for segment switching
4
Capture set example Media capture(s)Description VC0, VC1, VC23 camera view of room VC31 camera view of room AC0, AC1, AC23 microphone version of room audio AC3Single pre-mixed version of room audio A consumer choosing to receive and render AC3 should be able to switch between VC0, VC1, and VC2 equally as well as one that chooses to receive AC0, AC1 and AC2 separately.
5
Details Idea is to include (potentially multiple) active position information with VAD as well as “overall” VAD for the audio stream – For example, if leftmost segment of 3 is “loudest” then the audio stream would indicate a specific audio VAD value at the active position Centre of left segment might be 16 ( / 100) Active positions could be determined by other means, e.g. button press
6
Messaging implications In stream configure message from consumer to provider, consumer should be able to specify VAD characteristics – Algorithm for provider to use if a choice is available – Maximum number of active positions to include in provider’s audio Need to consider security of VAD information
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.