Video Quality Metrics
Digital images are subject to a wide variety of distortions during acquisition, compression storage, transmission and reproduction. Any of these may result in degradation of the image quality. To ensure an acceptable level of quality control, video metrics are applied to judge the quality of the image. Ideally the image quality assessment process would supply image quality metrics that would predict the image quality automatically. In reality subjective tests are carried out which require human experts to view the video and assess it for quality applying the Mean Opinion Score(MOS) weighting. This is time consuming and expensive. Objective tests can be carried out automatically, these objective quality metric systems predict visual quality by using information known about the human visual system to compare signals.
This essay will compare the performance of PSNR and SSIM – two objective video quality metrics. For this purpose two video clips will be employed, the video clips will be compressed using the MPEG-4 codec to 128Kbps output rate and to 1Mbps output rate, which are suitable for low speed and high speed internet connections respectively.
Ffmpeg a computer programme which can record, convert and stream audio visual content in many formats was used to convert the files to mp4. The command line is as follows:
ffmpeg –i coastguard_cif.avi –vcodec MPEG4 – b 1Mb – bf 1 –an –psnr –vstats coastbig.mp4
-i filename, -vcodec: the codec to be used, -b: the output bit rate, -bf: the number of B frames to be included between the P frames, -an: specifies no audio, -vstats the production of statistics, -psnr include the psnr details.
MSU video quality measurement tool version 2.0 was the programme used to produce SSIM information.
The following tables detail information on the two video clips, information is included for the original clips and the output from the two MPEG-4 compressions performed for each.
Coastguard.avi
Original file
Coastguard_cif.avi |
Compressed with Mpeg-4 for Internet slow connection | Compressed with MPEG-4 for Internet fast connection | |
File Size | 45.6 Mbytes | 351Kbps | 1.506MBps |
Frame size | 352 x 288 | 352 x 288 | 352 x 288 |
Bit rate | 30421Kbps | 145Kbps | 914Kbps |
Frame rate | 25 fps | 25 fps | 25 fps |
Length | 12 secs | 12 secs | 12 secs |
Compression factor | 125 | 29 | |
Frame size | 152105 bytes | 72 – 27018bytes | 113 -27018 bytes |
Recognition | Significant blocking, particularly when there is activity in the scene | Very little variance from original |
News.avi file
Original file
news_cif.avi |
Compressed with Mpeg-4 for Internet slow connection | Compressed with MPEG-4 for Internet fast connection | |
File Size | 45.6 MB | 339KB | 1.532MB |
Frame size | 352 x 288 | 352 x 288 | 352 x 288 |
Bit rate | 30421Kbps | 197Kbps | 1007Kbps |
Frame rate | 25 fps | 25 fps | 25 fps |
Length | 12 secs | 12 secs | 12 secs |
Compression factor | 1 | 130 | 29 |
Frame size | 152105bytes | 43 -12970 bytes | 166-24865bytes |
Recognition | Significant blocking, particularly when there is activity dancers | Very little variance from original |
Peak signal to noise ratio (PSNR is frequently used as a measure of the distortion introduced by MPEG compression. For colour video, it is typically acceptable to compute PSNR on the luminance component only. However, PSNR is not based on signal difference as processed by the human visual system. Typical PSNR values for lossy compression are between 30 and 50db, the higher the value the better the quality.
The actual metric that is in use in this part of the assignment is the peak signal-to-reconstructed image measure which is called PSNR. The MOS refers to the Mean Opinion Score to describe the subjective viewing comparison. The following is a brief comment on each video clip regarding the subjective and objective quality measures.
Coastguard output at quality of 128Kbps
The maximum value of PSNR in this sequence is 41.41 and the minimum is 25.19 with an average of 27.35. On subjective analysis of the clip, significant blocking takes place (MOS = 1) between frames 25 and 80. In this section of the film clip there is a scene in which two boats are in motion and this, in turn, causes movement in the water. However, this is not reflected in the PSNR(figure 1) whose value in these frames does not fall below 32, which is well above average for the clip.
The SSIM (Figure 2) values for this clip range from a minimum of .21164 to a maximum of .96969 with an average value of .649144 over the 300 frames. The SSIM has its lowest values in the frames 50 to 80 which coincides with the part of the clip where the two boats are moving on the water the SSIM graph output clearly indicates that there is a significant issue with quality at this point in the video with some of the frames in this range labelled as bad frames.
Figure 4: SSIM for coastguard.avi compressed using MPEG-4 to output rate of 1Mbps
Coastguard output at 1Mbps
The maximum value of PSNR in this sequence is 41.43 and the minimum is 31.5 with an average of 34.43. This higher PSNR average demonstrates the superior quality of the output at these compression settings. Whereas the highly compressed video mentioned above has a greater range between the maximum and the minimum PSNR, the difference is not so great in this, with the minimum well above the average in the previous clip. In this case, the PSNR results correlate with the subjective measure with MOS not lower than 4 at any point.
Figure 6: SSIM for news.avi compressed using MPEG-4 to output rate of 128Kbps
Figure 8: SSIM for news.avi compressed using MPEG-4 to output rate of 1Mbps
News output at 1Mbps
The maximum value of PSNR in this sequence is 45.25 and the minimum is 39.87 with an average of 43.89. This small range of values reflects the high quality result of this compression with little or no interruption to the viewer with a MOS of 5. The SSIM values for this clip range from a minimum of .83389 to a maximum of .98662 with an average value of .956755 over the 299 frames. The SSIM values display a dip in three areas as mentioned previously (frames 89,148,239), as SSIM incorporates structural information, luminance and contrast information the changing of a large part of the background from dark to light will register as a significant difference to the SSIM model, whereas subjectively this would not register as a change in the background is perceived to be a normal event.
Group of Pictures
Each sequence of is subdivided into groups of pictures and the groups of pictures are subdivided into slices or frames, each slice will be of type I, P, B or D. I frames are Intra frames and as are compressed soley in a spatial manner they will have a higher PSNR value and SSIM, P frames are predicted and will have a lower value for PSNR and SSIM and B frames are bi-directionally predicted they will have the lowest value for PSNR and SSIM. Each GOP starts with an I frame , the periodic peaks on the graph represent the I-frames, counting the frames between the peaks gives a GOP of between 10 and 15 frames. The GOP length is constant for the file, and in this instance the GOP length is the approximately the same for every clip. This is contrary to good practice as for the best performance, at low bit rate streaming, the visual quality of the output can be considerably improved by reducing the frame rate and the GOP size.
The relationship between increased compression levels and reduced PSNR is demonstrated in figure 10, the frames of the output produced by compressing to a rate of 128Kbps in the main has all frames compressed above 95:1, with all frames compressed above 90:1and the majority of teh PSNR values are in the range 25 to 35 with equivalent MOS values of poor to good, whereas the 1Mbps output has a more diverse range of compression ratio form 80:1 to 98:1 and has the PSNR values in the range 32 to 43 with MOS values of good to excellent.
Figure 12: PSNR vs compression ratio for news.avi compressed using MPEG-4
The PSNR values for news.avi compressed to 1Mbps are contained within a small range of 42 to 45, where as the compression ratio is spread over 82: to 99.9:1, the type of video lent itself well to compression, with the majority of the contents of the frame staying constant from frame to frame. The compression rate for the majority of frames in the clip compressed at 128Kbps is in the region of 99:1 with only the I frames compressed at a lower rate(90:1-96:1) as evidenced by Figure 9.