|
H.264/AVC and its Extensions: How Close is this Family? Wednesday, 7 November, 2007
H.264/AVC is a state-of-the-art video coding standard that has ushered in a new benchmark for video coding efficiency. The design offers a powerful set of coding tools and provisions for network-friendly representation of the video. Building on the success of the base specification, a number of extensions have been recently developed to meet the demands of various application needs. For instance, profession applications require coding of higher bit depths and color sampling formats. A scalable video representation is useful to serve a diverse set of display and networking environments or to satisfy dynamic delivery constraints imposed during transmission. Then there is multiview video coding, which aims to enable 3D video and free-view video applications. In this talk, I will provide a brief overview of the new coding tools that have been introduced in the various extensions and summarize their performance. To understand the intimate associations among this family of tools, I will then analyze the conceptual, architectural and performance relationships among them. I will also speculate on the potential business impact of these extensions and highlight the market relationships that exist. We will find that although the application space is quite broad, the current family of coding tools is rather tight-nit. I will close this talk by identifying emerging opportunities and some possibilities for new extensions, some of which might bring this family even closer and others that appear to be more divergent.
From Picture Coding to Image Understanding: Finding the Object of Interest Thursday, 8 November, 2007
From Discrete Cosine Transform to 3D model-based coding, the progress of picture coding goes hand-in-hand with the progress of image understanding. Among recent image understanding techniques, topic models have become a popular approach to object discovery, i.e., extracting the "object of interest" from a set of images, in a completely unsupervised manner. In this talk, we will outline this approach, and extend it from still images to motion videos, using a novel spatial-temporal framework that models both the appearance and the motion of the object of interest. The spatial and temporal models are tightly integrated so that motion ambiguities can be resolved by appearance, and appearance ambiguities can be resolved by motion. This framework finds application in video retrieval (e.g., Google Video or YouTube), video surveillance, and of course, picture coding.
DCT, Wavelets and X-lets: The Quest for Image Representation, Approximation and Compression Thursday, 8 November, 2007
Expansion of signals in orthonormal bases is central to signal and image processing. From the KLT and its approximation, the DCT, basic transform coding has been very successful. Over the last 15 years or so, wavelets have appeared as a powerful alternative to the more traditional Fourier like representations, having impact for example on image coding standards, like JPEG2000. We first briefly review Fourier and wavelet bases, and address approximation theoretic properties, in particular the interesting behavior of certain simple non-linear approximation schemes for piecewise smooth signals. We extend this to compression schemes, indicating the basic difference between approximation and compression. We then move to the ''real'' problem, namely schemes suited for true two-dimensional signals, with objects having smooth 1-dimensional singularities, or contours. We review recent constructions in this area, including curvelets, contourlets, directionlets as well as signal adaptive schemes. The challenge of constructing generic two-dimensional bases that have optimal approximation behavior is described, and the various proposals contrasted. In particular, the proof that contourlets can achieve the optimal 1/M^2 NLA rate will be briefly outlined. We end by pointing out areas of current research. First, a challenge is certainly finding practical schemes. This entails dealing with finite size data, as well as sampled and possibly noisy data. Only this will allow using new bases for ''real'' compression tasks. In addition, new types of imagery start appearing, where true multidimensional processing will be required, like for example plenoptic images. The applicability of directional analysis in such cases will be discussed. This talk is based on work done with a number of collaborators, in particular B.Beferull-Lozano (UValencia), M.Do (UIUC), P.L.Dragotti (Imperial),L.Sbaiz (EPFL), P.Vandewalle (EPFL) and V.Velisavljevic (DTelekom)
Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Friday, 9 November, 2007
Like with pictures, humans talk about a "sound image" when they try to characterize an acoustic scene containing salient spatial aspects. This talk will review the basic aspects of stereophonic / multi-channel audio that determine the perceived sound image and will outline how these aspects can be represented efficiently. One of the most remarkable innovations in this context was the recent development of the "Spatial Audio Coding" (SAC) approach. Exploiting the human perception of spatial sound, such coding schemes are capable of transmitting high quality surround sound using bitrates that have been used for carrying traditional two-channel stereo audio so far. The talk will outline the underlying ideas and describe the architecture of the recently finalized "MPEG Surround" specification. Equipped with a set of attractive capabilities, the technology enables introduction of surround sound into existing distribution infrastructures while retaining full compatibility to mono or stereo receivers. Finally, an outlook is provided of a next technology generation envisaged for standardization within ISO/MPEG allowing for bit-efficient and backward compatible coding of several sound objects.
|
|