OS/2 Multimedia Chapter – Video

by Maria Ingold

X.4 VIDEO

Video, like other initially analog media, is becoming increasingly digital. Movies are now available on digital media such as laserdisks, and CD-I (Compact Disk-Interactive) provides digital video in the home environment. Digital video is used on computers for teleconferencing, television viewing, educational programs, and entertainment such as movies, games, and animation.

X.4.1 Characteristics of Motion Video

Digital video can be created from uncompressed or raw video, but the file sizes needed to store these raw movies are massive. A raw movie is typically used if the clarity of each individual frame is more important than the actual size of the recording, or if the video is only going to be displayed. This is ideal for accurately capturing single images or for monitoring video input. Because of the sheer amount of information and processing power needed to digitally represent a raw movie, the trend has been away from raw or uncompressed movies, to ones that use various digital video compression and decompression algorithms.

All motion video has several characteristics that define its appearance. These are the movie dimensions, the frame rate, the number of colors, and the compression/decompression algorithms used to record and play back a movie. These features are limited by both the capabilities of the hardware and the software used to capture each frame, as well as the hardware and the software on which the video is played back.

When a movie is recorded, each video image must first be captured from the input source, such as a laserdisk player, video cassette player, or a camcorder. Capturing is typically done using a framegrabber card. A framegrabber card samples each frame from the motion video, and then stores this in an image format such as RGB565 or YUV411. The RGB565 format defines each pixel in the image with 5 bits of red, 6 bits of green, and 5 bits of blue. This allows for each pixel to be stored in 16 bits of information, and takes advantage of the fact that human eyes are more sensitive to variations in green, than to small changes between reds or blues. YUV411 stores the video images as luminance (Y) and chrominance (UV). Luminance is the brightness, or light and dark values, in an image. The eye is more sensitive to black and white variations, than it is to color so only one quarter as much information needs to be stored for each of the chrominance or color values. Each of the images that the card has grabbed off of the video source is then stored in either an uncompressed format, or in a compressed format using various encoding algorithms.

The dimensions of a movie are its frame area (width x height) in pels. The maximum size of the movie is determined by the framegrabber card. The size of the captured frame is typically a ratio of the maximum size allowed by the framegrabber card. For example, if a card can capture at a maximum size of 640 x 480, specifying a ratio of 2:1 would make the frame dimensions of the movie 320 x 240.

The frame rate expresses the number of frames per second (FPS) that are to be captured. To maintain the illusion of fluid motion, 24-30 frames per second are typically required. This is the frame rate range provided by television (30 FPS), animation (30 FPS), and video cassette players (24 FPS). Depending on the processor speed of the recording system, the encoding algorithm used, the movie dimensions, and whether hardware assistance is provided, the frame rate of digital motion video can range anywhere from 8 to 30 frames per second. Fifteen FPS generally provides an acceptable compromise between file size and movie smoothness. The actual decompression rate depends on the processing power of the playback system, the complexity of the compression algorithm, and whether hardware assisted playback is available. If the video includes an audio track, and the hardware is incapable of maintaining the desired frame rate, frames are dropped to prevent the audio track from becoming unsynchronized. Frames rather than audio are dropped because the human ear is more sensitive to degredations in audio, than the eye is to missing imagery. The brain is usually able to interpolate between images and fill in enough of the missing pieces of information for the image information to be understandable.

The maximum number of colors that can be captured from the input video source are determined by the capabilities of the framegrabber card and the compression algorithm. The color space is then converted to match the color capabilities of the playback system. For example, if a video was captured at 16 bits per pel (65,536 colors), it could be played back at 256 colors on a system that only supported 8 bit color.

The data rate depicts how much bandwidth is required to play back a movie. This value is given in KB per second (KBS). As an example, a 320 x 240 movie with 16 bit color (65,536 colors) recorded at 15 frames per second produces a raw movie file that would require a data rate of 2,250 KB per second.

320 x 240 x 15 frames x 16 bits  x  byte  x    KB      = 2,250 KB 
                sec        pixel   8 bits   1024 bytes         sec

If the associated audio was recorded at 11.025 kHz, 16 bit, and mono this would add another 21.5 KB per second. This brings aggregate the data rate to an impractical 2271.5 KB per second. To play back the video at the average data rate of a single-spin CD ROM – 150 KBS – would require a 15:1 compression ratio.

2,250 KB + 21.5 KB  = 2271.5 KB  x sec    = 15
      sec       sec          sec   150 KB

Stating that 150 KBS is the data rate of a single-spin CD ROM means that this CD ROM can only deliver an average of 150 KB of data per second. Similarly, a double-spin CD ROM supports an average of 300 KB of data per second, and a quad-spin CD ROM can provide 600 KB of data per second. If the data rate either exceeds or falls short of the capability of its storage device, the playback subsystem has to compensate to maintain audio and video synchronization.

Hardware assistance for compression during recording or decompression during playback is known as hardware motion video (HMV). With faster microprocessors, more memory, faster CD ROMs, advanced graphics capabilities, and better video compression and decompression (CODEC) algorithms, more of the actual recording and playback capabilities of digital video can be done in software. This type of video is known as software motion video (SMV) or software-only video.

While hardware driven algorithms do typically produce a higher quality image, the additional hardware required for recording and playback makes HMV less affordable than software-only video. Although with the advent of video cards with on-board Motion Picture Experts Group (MPEG) standard decompressors, the quality/cost issue is decreasing. The only hardware necessary for SMV is the framegrabber card used to capture images from the input source. Additionally, hardware algorithms often have to be rewritten as better hardware becomes available, whereas most software algorithms are scaleable to new hardware platforms. This scalability is attained by paying attention to the characteristics that are used in creating the compressed movie, as well as to those that are employed in its playback. Once again, these characteristics are the movie size, the frame rate, the number of colors, the data rate, and the compression algorithm.

X.4.2 Compressors and Decompressors

Selecting a compression type involves choosing not only the compressor/decompressor (CODEC), but also whether an asymmetric or symmetric algorithm is used. An asymmetric, or off-line algorithm, takes a raw or compressed movie file, or input from a frame-stepped device (such as a laserdisk player), and spends longer than the average frame time to compress each frame. By spending extra time in the compression stage, better image quality and compression ratios can be achieved. Asymmetric compression is more frequently used for professional level motion video. Sometimes, as in the case with live video sources such as cameras, symmetric or real-time compression is the only option. In this case, the data is compressed as it is digitized. While this compression algorithm is faster, the frame size, frame rate, or video quality may have to be lowered in order to compensate for the large bandwidth requirements, especially in software-only video.

As with other forms of multimedia compression, video compression can be lossless or lossy. Lossless compression stores the video information in such a way so that no data or detail is lost. The decompressed video is identical to the original source. This is useful for storing computer video animation, since large areas often contain pels of the same color. Since the image is computer generated, a black background, for example, is one where the RGB values have all been specifically set to zero. In video, the input is analog, so a black background will actually be composed of a grouping of dark colors which appear to be black. Lossless algorithms are simply too computationally complex for software video as they do not attain high enough compression ratios or low enough data rates for playback from a hard disk or a CD ROM.

Lossy compression algorithms modify the input image by making some generalizations, and then applying certain “tricks” during its recreation to make it appear similar enough to the original as to be acceptable. This produces a respectable facsimile of the original video and provides a high compression ratio. Lossy compression takes advantage of spatial and temporal compression. Spatial, or intraframe, compression attempts to eliminate redundant information within a frame, without regard to information stored in any prior frames. This frame is called an I-frame, or a key frame. Temporal, or interframe, compression stores only the pixel changes between the current and previous frame. Frames compressed in this manner are referred to as delta frames. A video sequence typically starts with a key frame followed by a specified number of deltas, followed by a key frame, and so on. The starting key frame provides a “snapshot” base for the first series of deltas. The other key frames

are interjected at regular intervals to calibrate the deltas and to provide seek and scan points. Additional key frames are included when there is a significant change in the movie, such as a scene transition.

Video with a great deal of motion or scene changes, such as a techno music video that flashes new scenes on every drum beat, requires many additional key frames, thereby decreasing the degree of compression that can be achieved. A video of a person talking would require very few additional key frames because the only change would be the movement of the lips and the eyes. As a result, low motion video can attain a higher degree of compression.

A CODEC implements the compression and decompression algorithms for a particular standard. OS/2 multimedia supports the IBM UltimotionTM, Intel IndeoTM 2.1, 3.1, and 3.2 standards for software motion video. Support is also provided for MPEG hardware motion video, and Autodesk FLI/FLC animation. See Table X.5. Additional CODECs such as Quicktime can be easily installed into the OS/2 multimedia system by providing the CODEC along with an install script.

CODEC Name File Extension
Autodesk FLI/FLC Animation .FLI and .FLC
IBM Ultimotion .AVI
Intel Indeo 2.1, 3.1, 3.2 .AVI
MPEG Hardware Interface .MPG

Table X.5 OS/2 Multimedia CODECs