Invited Speakers

H.264/AVC and its Extensions: How Close is this Family?

Wednesday, 7 November, 2007

by Anthony Vetro, Mitsubishi Electric Research Labs, USA

H.264/AVC is a state-of-the-art video coding standard that has ushered in a new benchmark for video coding efficiency. The design offers a powerful set of coding tools and provisions for network-friendly representation of the video. Building on the success of the base specification, a number of extensions have been recently developed to meet the demands of various application needs. For instance, profession applications require coding of higher bit depths and color sampling formats. A scalable video representation is useful to serve a diverse set of display and networking environments or to satisfy dynamic delivery constraints imposed during transmission. Then there is multiview video coding, which aims to enable 3D video and free-view video applications. In this talk, I will provide a brief overview of the new coding tools that have been introduced in the various extensions and summarize their performance. To understand the intimate associations among this family of tools, I will then analyze the conceptual, architectural and performance relationships among them. I will also speculate on the potential business impact of these extensions and highlight the market relationships that exist. We will find that although the application space is quite broad, the current family of coding tools is rather tight-nit. I will close this talk by identifying emerging opportunities and some possibilities for new extensions, some of which might bring this family even closer and others that appear to be more divergent.

Dr. Anthony Vetro has been with Mitsubishi Electric Research Labs for the past 10+ years and is currently a Senior Team Leader with responsibility for research and development in the digital video area. He received his BS, MS and PhD degrees in electrical engineering from Polytechnic University, and his primary research interests are related to the coding and transport of multimedia content. He has published more than 100 papers and has been an active member of the MPEG and JVT standardization committee for several years. He is currently serving as an editor for multiview video coding amendment of H.264/AVC.

He is also active in various IEEE conferences, technical committees and editorial boards. Dr. Vetro is currently an associate editor for IEEE Signal Processing Magazine and a member of the Technical Committee on Multimedia Signal Processing of the IEEE Signal Processing Society, as well as the Technical Committees on Visual Signal Processing & Communications and Multimedia Systems & Applications of the IEEE Circuits and Systems Society. He recently served as Conference Chair for ICCE 2006 and Tutorials Chair for ICME 2006, and has been a has been a member of the Publications Committee of the IEEE Transactions on Consumer Electronics since 2002. Dr. Vetro has also received several awards for his work on transcoding, including the 2003 IEEE Circuits and Systems CSVT Transactions Best Paper Award. He is a Senior Member of the IEEE.

From Picture Coding to Image Understanding: Finding the Object of Interest

Thursday, 8 November, 2007

by Tsuhan Chen, Carnegie Mellon University, USA

From Discrete Cosine Transform to 3D model-based coding, the progress of picture coding goes hand-in-hand with the progress of image understanding. Among recent image understanding techniques, topic models have become a popular approach to object discovery, i.e., extracting the "object of interest" from a set of images, in a completely unsupervised manner. In this talk, we will outline this approach, and extend it from still images to motion videos, using a novel spatial-temporal framework that models both the appearance and the motion of the object of interest. The spatial and temporal models are tightly integrated so that motion ambiguities can be resolved by appearance, and appearance ambiguities can be resolved by motion. This framework finds application in video retrieval (e.g., Google Video or YouTube), video surveillance, and of course, picture coding.

Tsuhan Chen has been with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, since October 1997, where he is currently a Professor. From August 1993 to October 1997, he worked at AT&T Bell Laboratories, Holmdel, New Jersey. He received the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, California, in 1990 and 1993, respectively. He received the B.S. degree in electrical engineering from the National Taiwan University in 1987.

Tsuhan served as the Editor-in-Chief for IEEE Transactions on Multimedia in 2002-2004. He also served in the Editorial Board of IEEE Signal Processing Magazine and as Associate Editor for IEEE Trans. on Circuits and Systems for Video Technology, IEEE Trans. on Image Processing, IEEE Trans. on Signal Processing, and IEEE Trans. on Multimedia. He co-edited a book titled Multimedia Systems, Standards, and Networks.

Tsuhan received the Charles Wilts Prize at the California Institute of Technology in 1993. He was a recipient of the National Science Foundation CAREER Award, from 2000 to 2003. He received the Benjamin Richard Teare Teaching Award at the Carnegie Mellon University in 2006. He is elected to the Board of Governors, IEEE Signal Processing Society, 2007-2009. He is a member of the Phi Tau Phi Scholastic Honor Society. He is Fellow of IEEE, and a Distinguished Lecturer of the Signal Processing Society.

DCT, Wavelets and X-lets: The Quest for Image Representation, Approximation and Compression

Thursday, 8 November, 2007

by Martin Vetterli, EPFL, Switzerland and UC Berkeley, USA

Expansion of signals in orthonormal bases is central to signal and image processing. From the KLT and its approximation, the DCT, basic transform coding has been very successful. Over the last 15 years or so, wavelets have appeared as a powerful alternative to the more traditional Fourier like representations, having impact for example on image coding standards, like JPEG2000.

We first briefly review Fourier and wavelet bases, and address approximation theoretic properties, in particular the interesting behavior of certain simple non-linear approximation schemes for piecewise smooth signals. We extend this to compression schemes, indicating the basic difference between approximation and compression.

We then move to the ''real'' problem, namely schemes suited for true two-dimensional signals, with objects having smooth 1-dimensional singularities, or contours. We review recent constructions in this area, including curvelets, contourlets, directionlets as well as signal adaptive schemes. The challenge of constructing generic two-dimensional bases that have optimal approximation behavior is described, and the various proposals contrasted. In particular, the proof that contourlets can achieve the optimal 1/M^2 NLA rate will be briefly outlined.

We end by pointing out areas of current research. First, a challenge is certainly finding practical schemes. This entails dealing with finite size data, as well as sampled and possibly noisy data. Only this will allow using new bases for ''real'' compression tasks. In addition, new types of imagery start appearing, where true multidimensional processing will be required, like for example plenoptic images. The applicability of directional analysis in such cases will be discussed.

This talk is based on work done with a number of collaborators, in particular B.Beferull-Lozano (UValencia), M.Do (UIUC), P.L.Dragotti (Imperial),L.Sbaiz (EPFL), P.Vandewalle (EPFL) and V.Velisavljevic (DTelekom)

Dr. Martin Vetterli got his Engineering degree from Eidgenoessische Technische Hochschule Zuerich (ETHZ), his MS from Stanford University and his Doctorate from Ecole Polytechnique Federale de Lausanne (EPFL). He was an Associate Professor in EE at Columbia University in New York, and a Full Professor in EECS at the University of California at Berkeley before joining the Communication Systems Division of EPFL. He held several positions at EPFL, including Chair of Communication Systems, and founding director of the National Center on Mobile Information and Communication systems (http://www.mics.org), which investigates next generation communication systems, including sensor networks. He works on signal processing and communications, in particular, wavelet theory and applications, image and video compression, joint source-channel coding, self-organized communication systems and sensor networks, and has published about 120 journal papers on the subjects. His work won him numerous prizes, like best paper awards from EURASIP in 1984 and of the IEEE Signal Processing Society in 1991,1996 and 2006, the Swiss National Latsis Prize in 1996, the SPIE Presidential award in 1999, and the IEEE Signal Processing Technical Achievement Award in 2001. He is a Fellow of IEEE, and was a member of the Swiss Council on Science and Technology (2000-2004). He is the co-author, with J.Kovacevic, of the graduate textbook "Wavelets and Subband Coding'' (Prentice-Hall, 1995), and with J.Kovacevic and V.Goyal, of the forthcoming book "The World of Fourier and Wavelets" (2008).

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Friday, 9 November, 2007

by Jürgen Herre, Fraunhofer Institute for Integrated Circuits (IIS), Erlangen, Germany

Like with pictures, humans talk about a "sound image" when they try to characterize an acoustic scene containing salient spatial aspects. This talk will review the basic aspects of stereophonic / multi-channel audio that determine the perceived sound image and will outline how these aspects can be represented efficiently. One of the most remarkable innovations in this context was the recent development of the "Spatial Audio Coding" (SAC) approach. Exploiting the human perception of spatial sound, such coding schemes are capable of transmitting high quality surround sound using bitrates that have been used for carrying traditional two-channel stereo audio so far. The talk will outline the underlying ideas and describe the architecture of the recently finalized "MPEG Surround" specification. Equipped with a set of attractive capabilities, the technology enables introduction of surround sound into existing distribution infrastructures while retaining full compatibility to mono or stereo receivers. Finally, an outlook is provided of a next technology generation envisaged for standardization within ISO/MPEG allowing for bit-efficient and backward compatible coding of several sound objects.

Dr. Jürgen Herre joined the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, Germany, in 1989. Since then he has been involved in the development of perceptual coding algorithms for high quality audio, including the well-known ISO/MPEG-Audio Layer III coder (aka "MP3"). In 1995, Dr. Herre joined Bell Laboratories for a PostDoc term working on the development of MPEG-2 Advanced Audio Coding (AAC). Since the end of ‘96 he is back at Fraunhofer working on the development of advanced multimedia technology including MPEG-4, MPEG-7 and secure delivery of audiovisual content, currently as the Chief Scientist for the Audio/Multimedia activities at Fraunhofer IIS, Erlangen. Dr. Herre is a fellow of the Audio Engineering Society, co-chair of the AES Technical Committee on Coding of Audio Signals and vice chair of the AES Technical Council. He also served as an associate editor of the IEEE Transactions on Speech and Audio Processing and is an active member of the MPEG audio subgroup.