@SPACE1 =

COMUNICAÇÃO DE ÁUDIO E VÍDEO

INSTITUTO SUPERIOR TÉCNICO

Year 2012/2013 – 1^st Semester, Responsible: Prof. Fernando Pereira

1^st Exam – 9^th January 2013, 8am (Wednesday)

MEEC: The marks should be out before 11th January (Friday), 2pm at the CAV Web page and the exam checking session will on the 11th January (Friday), 5pm in room LT4.

The exam is 3 hours long. Answer all the questions in a detailed way, including all the computations performed and justifying well your answers.

Don’t get ‘trapped’ by any question; move forward to another question and return later. Good luck !

I (0.5 + 0.5 + 1.0 + 1.0 = 3.0 val.)

Consider a facsimile transmission using the READ coding method at 3200 bit/s for pages with 1000 lines, each line with 1728 samples. Consider also that, on average, 75% of the samples in each line are white.

Assume that

1. the unidimensionally coded lines have an average compression factor of 15 for the back runs and 25 for the white runs

2. the bidimensionally coded lines have an average compression factor of 22 for the back runs and 30 for the white runs

a) How many bits does a unidimensionally and a bidimensionally coded line spent on average ? (R: 80.64 e 62.84 bit/line)

b) If the k parameter of the READ method is 5, what is (on average) the periodicity (in bits) of recovering the decoding synchronization? (R: 332 bit)

c) Provide a formula for the global compression factor (of a full page) only as a function of the parameter k.

d) Why does bilevel image coding typically follow a lossless approach ?

II (0.5 + 0.5 + 0.5 + 1.0 + 1.0 val. = 3.5 val.)

Consider the JPEG standard to code photographic images with a 576×720 luminance resolution, 4:2:2 colour subsampling and 8 bit/sample.

a) How many more luminance blocks than chrominance blocks exist in this type of images. (R: same number)

b) Determine the average number of bits per pixel (considering both the luminance and the chrominances) that are spent when coding this type of image with a global compression factor (for the luminance and the chrominances together) of 20. (R: 0.8 bit/pixel)

c) Determine the total number of bits that have to be spent to code an image if an average number of 4 DCT coefficients are coded per block and each coefficient costs, on average, 3 bits for the luminance and 2 bits for the chrominance; additionally consider that the EOB (End of Block) word costs 2 bits. (R: 155 520 bit)

d) Why is it reasonable to say that the DCT representation involves a frequency interpretation of the image content?

e) Explain a mechanism allowing to exploit some redundancy between neighboring blocks in JPEG coding ? (R: prediction of the DC coefficient from left neighboring block)

III (0.5 + 0.5 + 1.0 + 1.0 + 0.5 = 3.5 val.)

Consider a videotelephony communication using Recommendation ITU-T H.261. The video sequence is coded with a CIF spatial resolution, a frame rate of 10 Hz and a constant bitrate channel of 64 kbit/s. The bits for each coded image are uniformly generated in the time between the acquisitions of two images.

Knowing that the first image has used 9600 bits, the second image 16000 bit, and the third image 4800 bits, determine:

a) Considering that a constant bitrate channel is used, what architectural element allows the encoder spending a very different number of bits per frame ? (R: encoder output buffer)

b) The time instants at which the receiver obtains all bits for the second and third images. (R: 400 and 475 ms)

c) The minimum size of the encoder output buffer in order all bits above are transmitted without problems. (R: 12800 bit)

d) The initial visualization delay associated to the system defined in c) while justifying the formula used. (R: 300 ms)

e) The maximum number of bits that the 5th image may spend. (R: 14400 bit)

IV (0.5 + 1 + 0.5 + 1 = 3 val.)

Consider the MPEG-1 and MPEG-2 Audio standards.

a) Determine the coding rate for stereo audio content with a 22 kHz bandwidth and the usual number of bit/sample if coded with a Layer 2 codec to reach CD transparent quality. (R: 176 kbit/s)

b) What are the 2 main ways audio perceptual masking contributes to reduce the bitrate when coding the audio signal ?

c) Why does the Layer 3 codec use the (M)DCT with an overlapping window ? (R: Reduce block effect)

d) Why is it reasonable to say that the Layer 3 codec has a hybrid time/frequency coding structure ?

V (0.5 + 1.2 + 0.5 + 0.8 = 3 val.)

Suppose that you are contacted by a company to design a digital storage system for video clips. The company requires some editing flexibility and needs to store the largest number of 5 minutes clips in a disk with 1 TByte (10¹²) of capacity. The maximum access speed to the disk is 50 Mbit/s. The clips have HDTV resolution with the following characteristics: 1920×1152 (Y), 4:4:4, 8 bit/sample at 25 Hz.

a) Assuming that you have at your disposal, providing the required video quality, a JPEG coding solution with average compression factors of 40 and 45 for the luminance and chrominances, respectively, determine the maximum access time for an image knowing that the compression factors for critical frames are 25% lower than average. (R: 32.768 ms)

b) Assuming now that you have at your disposal, providing the required video quality, a MPEG-2 Video coding solution with N=15 and M=3 with the following average compression factors:

· I frames: 30 and 35 for the luminance and chrominances, respectively

· P frames: 40 and 50 for the luminance and chrominances, respectively

· B frames: 50 and 60 for the luminance and chrominances, respectively

Determine the maximum access time for an image knowing that the compression factors for critical frames are again 25% lower than average. (R: 233 ms)

c) Determine, justifying, which coding solution would you propose to your client if a maximum random access requirement of 50 ms is put forward together with the requirement of maximizing the number of clips stored in the disk. (R: JPEG)

d) How many full video clips would you be able to store in the disk for the proposed solution. (R: 868 full clips)

VI (1.0 + 1.0 + 0.5 + 0.5 + 1.0 = 4.0 val.)

Consider a DVB system for the transmission of digital TV.

a) What does it mean using hierarchical B frames in H.264/AVC ? What is the main difference regarding classical B frames as used in MPEG-2 Video. Give a practical example.

b) What type of 3D video coding format would you choose to provide the users with stereo TV channels with the minimum impact on the transmission and coding chain ? What is the main implication in the stereo views regarding the previous single 2D view ? (R: frame compatible format)

c) Explain why is it possible to transmit H.264/AVC coded content in a MPEG-2 Systems enabled TV transmission chain?

d) Regarding channel coding in DVB-x2, explain which parameters and how may be used to tune the delay and the error correction power ? (R: block length and coding rate)

e) In DVB-S2, the set of allowed modulations evolved in two different directions to improve the modulation efficiency regarding the single modulation specified in DVB-S. Which are these directions ? Which of these directions may be more critical for applications requiring higher reliability ? (R: increase the number of phases and the number of amplitudes)