New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses.

New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses multi-view use case Offers a more flexible way of associating audio with video Remove the “linear array” audio type, replaced by using area of capture

Other topics to consider Framework has these in appendix to be discussed VAD (voice activity detection) Media source selection (e.g. from a roster) Composition and switching algorithms audio and video

Composition/Switching Algorithms Framework has simple boolean attributes for indicating a Media Capture is switched or composed. Is this enough? If not, what else do we need? Another use case to make it clear? More detailed indications about exactly how a capture is switched or composed? Anything else? Interested people should propose specific additions to the framework

Attributes EXTENSIBILITY Audio attributes Channel Format  Stereo  Mono Audio attributes Channel Format  Stereo  Mono Video attributes Spatial scale  Image width Video attributes Spatial scale  Image width Media Capture attributes Purpose (role)  Main  Presentation Mixed – true/false Auto switched – true/false Area of Capture - ranges Point of Capture - point Area Scale millimeters Media Capture attributes Purpose (role)  Main  Presentation Mixed – true/false Auto switched – true/false Area of Capture - ranges Point of Capture - point Area Scale millimeters

Capture Scene VC0VC2VC1 VC3VC4 Cameras People VC1 VC2 VC0 Capture Scene Three cameras Two cameras, moved & zoomed out Switched (based on voice) with composed PiP VC5

Capture Scene VC0VC2VC1 VC3VC4 VC1 VC2 VC0 xBegin=0 xEnd=100 VC5 x = 0 x = 100 x = 200 x = 300 xBegin=100 xEnd=200 xBegin=200 xEnd=300 xBegin=0 xEnd=150 xBegin=150 xEnd=300 xBegin=0 xEnd=300 x = 150 Area of capture Point of capture x = 250 x = 150 x = 50

Capture Set Each alternative representation of a Capture Scene is a row in a Capture Set Three cameras Two cameras, moved and zoomed out Switched (based on voice), composed PiP (VC0, VC1, VC2) (VC3, VC4) (VC5) (AC0) (VC0, VC1, VC2) (VC3, VC4) (VC5) (AC0) Capture Set Rows VC0VC2VC1 VC3VC4 VC5

Video Capture Adjacency cameras people right leftVC0 VC1 right left VC0 VC1 Capture Set: (VC0, VC1) Other capture set rows Capture Set: (VC0, VC1) Other capture set rows x = 0 x = 100 x = 200 x = 0 x = 100 x = 200 x = 100 x = 50 x = 150

Example with Field of View 1 xBegin=0 Point of capture = (673,0) x along straight line xBegin=1446 xEnd=1346 yBegin=3000 yEnd=3000 xEnd=2792 Point of capture = (2119,0) a Angle a = 2 * arctan ((1346/2) / 3000) = 25.3° Field of view angle can be calculated from the area of capture and point of capture attributes. y distance from camera

Example with Field of View 2 xBegin=0 Point of capture = (1396,0) y distance from camera xEnd=1346 yBegin=3000 yEnd=3000 xBegin=1446 xEnd=2792 a yBegin=3000 yEnd=3000 x along arc

Matching Audio with Video Same capture scene Video adjacency matches audio sound stage Rendering side uses Area of Capture attributes to match the audio with the video

Mono x = 0 to 100 Stereo x = 0 to 300 Matching Audio with Video Spatial extent of video Spatial extent of audio LeftRight VC0VC2VC1 x = 0 to 100x = 100 to 200x = 200 to 300 Mono x = 100 to 200 Mono x = 200 to 300 One stereo AC Three mono ACs

Supporting the use cases 3.1 point to point symmetric Different number of audio channels on each side Different number of video and audio channels Match the sound stage with video display Handle gaps/overlap between captures Audio levels match

Supporting the use cases 3.2 point to point asymmetric Send subset of available streams Allow some user choice Sender does composition into one stream Receiver does composition of multiple streams onto one display

Supporting the use cases 3.3 multipoint Site switching Segment switching Still need work on VAD Switch based on manual control Composing reduced image sizes (continuous presence)

Supporting the use cases 3.4 presentation Video/audio streams for presentation Multiple presentation streams BFCP-like control of multiple streams (not in CLUE scope?) Consistent placement of multiple streams at each site

Supporting the use cases 3.5 Heterogeneous systems Transcoding middlebox Single or multiple streams Different bit rates Different layout policies Not settled yet

Supporting the use cases 3.5 Multipoint education Multiple streams with different roles (different scenes) Placing video on correct screen Still need work on VAD Requesting a stream from a particular site

Supporting the use cases 3.5 Multipoint multiview Different views of same scene Assigning camera views to remote displays for best eye contact

Addressing requirements Summary of whether or not items from the requirements document are met

New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses.

Similar presentations

Presentation on theme: "New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses.

Similar presentations

Presentation on theme: "New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses."— Presentation transcript:

Similar presentations

About project

Feedback