OS/2 Multimedia Chapter – Audio

by Maria Ingold

X.2 AUDIO

Audio provides a more natural method of human-computer interaction through dictation, text-to-speech, speech commands, and general clarification or replacement of written material. The ear perceives audio as sound waves, which are changes in air pressure.

Figure X.1 shows how a sound wave is defined by its amplitude, and its wavelength or frequency. The amplitude can be referred to as the volume, or the amount of atmospheric pressure that is displaced by the sound wave. The amplitude of a sound wave decreases over time. A decrease in amplitude signifies an decrease in volume. The wavelength is the distance sound travels in one complete cycle of pressure change. One cycle, as measured by the wavelength, is the distance between the peaks or valleys of two adjacent waves. The time it takes for the completion of one full cycle is called the period. The frequency is the number of cycles that occur in one second, and is measured in Hertz (Hz). Because sound can have very high frequencies, frequency may also be specified in kilohertz (kHz), which is thousands of cycles per second, or megahertz (mHz), which is millions of cycles per second. The frequency range of human hearing is about 20 Hz to 20,000 Hz (20 kHz).

Sound wave

Figure X.1 Sound Wave

X.2.1 Analog Audio

A sound wave, like a sine wave, is continuous in nature. A sound wave composed of a continuum of values, rather than a collection of separate values, is referred to as analog audio. When analog ecording methods are used to record sound waves, the stored audio is also represented in a continuous manner. The oscillating grooves on a record album, for example, are a direct analog encoding of the original sound. Encoding is achieved by imitating the volume of a sound wave with the magnitude of an electrical signal. The volume is raised and lowered by respectively increasing and decreasing the voltage of the signal. Variances in the voltage reflect changes in the frequency, and faster variances represent a higher frequency.

Analog audio is stored on analog media such as records or cassette tapes, and analog devices such as the microphone and speakers are used for recording and listening. Analog technology has problems with noise and distortion and the audio quality of the storage media degrades with use. Degradation can simply be needle wear on a record album, or the oxide breakdown of audio tapes. Digital technology and media such as the compact disc (CD), and the digital audio tape (DAT) resolve data degradation problems, and in the case of the CD, also eliminate some of the storage degradation problems as well. These digital formats store audio as sequences of 0’s and 1’s so the sound remains consistent and there are no new noises or distortions.

X.2.2 Digital Audio

Since continuous analog data can not be interpreted by a computer, the analog sound wave has to be converted to a series of discrete data values. This discrete representation of the analog sound wave is known as digital audio. Humans and computers require conversions between analog audio and digital audio to talk to each other.

Digital recording creates a digital waveform by discretely sampling the amplitude of an analog sound wave at periodic intervals. This sampling process is referred to as an Analog-to-Digital Conversion (ADC). When a person speaks into a microphone attached to a sound card in a computer, the sound card uses its internal analog-to-digital conversion hardware to transform the analog voice information into discrete values. Playback of a digital sound requires converting the digital waveform back to an analog waveform. This is achieved through a process called Digital-to-Analog Conversion (DAC). Both the sound card in a computer and the CD player in a home entertainment system use digital-to-analog conversion hardware to convert their digital audio into the analog audio that is perceptible to humans.

The following process takes place when a sound card uses its analog-to-digital conversion hardware to record someone speaking into a microphone. First, the microphone converts the sound wave of the voice into an electronic analog signal. The ADC hardware on the sound card then measures the analog input signal’s voltage at periodic intervals. This measurement at discrete time intervals is referred to as sampling. The samples are converted into a format that the computer can use and are stored for later processing or playback.

When a sound is played back from a computer or from a CD player to a set of speakers or headphones, the digital-to-analog conversion hardware recreates the analog waveform by generating an electrical signal voltage equivalent to the magnitude of each digital sample. The voltage is maintained for the duration of the sample. This creates a “staircase” signal. Running the signal through a low-pass filter smoothes out the voltage curve into a continuous analog waveform. It should be noted that the reconstructed waveform is not an exact reproduction of the original analog wave. This is because the DAC hardware has to estimate the values between each pair of samples. This is also why an audiophile will tell you LPs are the most accurate reproduction of audio.

A higher sampling rate and a more precise sampling resolution help to produce more accurate waveforms. The sampling rate is the number of samples per second that are taken from the analog sound wave. The greater the number of samples per second, the more closely the digital waveform can approximate the analog sound wave. The sampling rate is usually expressed in kilohertz (kHz), or thousands of samples per second. Typical sampling rates are 11.025 kHz, 22.05 kHz, and 44.1 kHz. These respectively map to the quality needed for voice, music, and CD audio.

At least 44.1 kHz, or 44,100 samples per second, are needed to produce the 20,000 Hz that defines the upper frequency of human hearing. This is because the maximum frequency that can exist must be less than half the sampling frequency. In this case, if the sampling rate is 44 kHz (44,000 samples per second), the theoretical upper frequency limit is 22 kHz. Due to a digital audio error called audio aliasing, the 22 kHz frequency is reduced to 20 kHz. To accommodate for this, the actual upper frequency limit is set at about 90% of the theoretical frequency limit. This upper frequency response is then approximately one-half the sampling rate. For example, the highest upper frequency that can be achieved for 11.025 kHz is approximately 5000 Hz, which is adequate for reproducing voice.

Sample resolution is given as the number of bits per sample. These samples are typically 8 or 16 bits, with 16 bits providing better audio capture and playback quality. This is because 8-bit audio can only represent 28 or 256 different volume (amplitude) levels, while 16-bit audio can represent 216 or 65,536 different volume levels for each sample stored.

Another factor in digital audio quality is the number of channels. One channel is referred to as mono, and two is referred to as stereo. Stereo output provides a left and a right channel. One speaker can play the same two tracks, or two speakers can play them in stereo. Additionally, each speaker could play a different mono track.

Channels
(stereo or mono)
Sampling Rate
(kHz)
Sampling Resolution One Second Sample (KB) One Minute Sample (MB)
1 11.025 8 bit 10.77 .63
1 11.025 16 bit 21.5 1.26
2 11.025 8 bit 21.5 1.26
2 11.025 16 bit 43 2.5
1 22.05 8 bit 21.5 1.26
1 22.05 16 bit 43 2.5
2 22.05 8 bit 43 2.5
2 22.05 16 bit 86 5
1 44.1 8 bit 43 2.5
1 44.1 16 bit 86 5
2 44.1 8 bit 86 5
2 44.1 16 bit 172 10

Table X.1 Sample size for one second and for one minute

Table X.1 shows that as sound fidelity increases, so do the data rates and storage requirements. CD quality audio has two channels, a 16-bit sampling resolution, and 44,100 samples per second. Multiplying this out results in a data rate of 172 kilobytes (KB) per second, or 10 megabytes (MB) of storage per minute.

2 channels x 44,100 samples x 16 bits   x  byte  x  kilobyte = 172 kilobytes
                 second        sample     8 bits   1024 bytes      second

Table X.1 brings up an interesting consideration. In all cases, where the sampling rate is constant, recording a file in mono with 16-bit sampling resolution produces the same file size as recording a stereo file with 8-bit sampling resolution. In this case, the best audio quality is achieved by using the mono file with the 16-bit sampling resolution.

These large sizes introduce problems with both data storage and data throughput. The data storage problem can be resolved by using a larger hard disk, storing the data on CD-ROM or on a server, or by compressing the audio. Guaranteeing the data rate, however, requires more than a fast hard disk or a fast CPU.

X.2.3 Audio Compression

To decrease the amount of data space needed to reproduce the original analog audio requires audio compression. A compression algorithm such as Adaptive Differential Pulse Code Modulation (ADPCM) uses a different encoding method for storing the sampled data, while a compression format such as MIDI achieves compression by representing audio in a new way. Pulse Code Modulation (PCM), is the digital waveform format encoded by the analog-to-digital conversion hardware. The Analog-to-Digital Converter assigns binary PCM values to the amplitude samples. However, the 172 KB per second data rate for CD quality can make PCM an impractical storage alternative.

A more preferable and commonly used compression format for storing digital audio is ADPCM which stores the difference between an estimate of the next sample based on previous values, and the actual sample value as determined by PCM encoding. Since this difference is typically small, far fewer bits are needed to record it than to store the actual sampled magnitude. Using ADPCM instead of PCM can reduce storage requirements by a factor of 16 to 1. This is a lossless compression which means that the digital original can be recreated exactly. Lossy compression algorithms allow some fidelity to be lost. Lossy algorithms typically take advantage of the fact that the human ear has more tolerance of distortion in loud sounds, and less tolerance of noise in soft sounds to store sounds that appear to have less overall noise.

X.2.4 MIDI

MIDI, or Musical Instrument Digital Interface, is another encoding format used to transfer music digitally between synthesizers, computers, and recording and playback equipment. MIDI does not store digital audio as a waveform, but as a set of instructions. An example would be “Play Middle C on the Piano loudly for one second.” This encoding takes 3 bytes to begin the sound and another 3 to end it. This results in 6 bytes of information needed to record one second of audio. This equates to about 12 KB per minute. This is much more efficient than the 10 MB per minute required by PCM CD quality audio.

Editing MIDI audio is simple since the information is stored as a text sequence. Creating the initial audio requires a synthesizer to generate the music. Playing back a MIDI file also requires a built in synthesizer on the audio card. This means that the PC speaker driver can not play back the MIDI file. Since MIDI files contain information about notes rather than sounds, the actual sounds have to be generated. The sounds are generated using either FM (frequency modulation) synthesis or wavetable synthesis. FM synthesis creates rather artificial sounds using a computer algorithm. A more realistic method, used by newer audio cards and sampling synthesizers, is wavetable synthesis. This method maps the note to a digital sample of the actual instrument.

The playback of MIDI files may sound different depending on what hardware is used. This is because the hardware may use a slightly different algorithm to create the sound or the instrument may be mapped to a different sample. On the other hand, when a waveform is played back, it produces a similar sound regardless of the underlying hardware.

X.2.5 OS/2 Multimedia Audio Formats

There are a variety of compressed and uncompressed audio formats. The OS/2 Multimedia subsystem provides support for the most popular digital wave audio formats and MIDI. For a complete list see Table X.2. If support for additional audio formats has been provided it can easily be added and used in both existing and new applications. The developer can create and add additional audio format support via I/O procedures (IOProcs) as we will see in section X.8.2 on Multimedia I/O (MMIO).

Audio Format File Extension
Apple Interchange File Format (AIFF) Wave Digital Audio .AIF
Amiga IFF Format .IFF
Creative Labs Inc. Voice Audio .VOC
IBM Audio Visual Connection (AVC) ADPCM Digital Audio ._AU with ._AD
RIFF Waveform Digital Audio .WAV
Unix (NeXT/SUN) SND Format .AU, .SND
Musical Instrument Digital Interface (MIDI) .MID

Table X.2 OS/2 Multimedia Audio Formats