TOOLS OF TECHNOLOGY

Hari V Sahasrabuddhe

Indian Institute of Technology

Powai, Mumbai 400076

Abstract

Digital sound recordings and powerful desktop computers have given us new tools with which to explore music. Bisic ideas of Signals, digital sampling and processing, and of representation of a musical interval in the cents scale are covered. Results of my observations on a CD-quality digital recording of tanpura are presented. Two very interesting intervals are prominent in the recording I studied. The sound of tanpura deserves further exploration to advance our understanding of it. Pointers to further source material are included.

Introduction

Digital sound recordings and powerful desktop computers have given us new tools with which to explore music. Today’s PC has a reasonably good sound card which can digitize sound coming from a microphone or a cassette player etc. Public domain software is available for transferring sound files from an audio CD to the computer’s hard disk, for example the MusicMatch Jukebox [Musicmatch 2004]. A powerful sound editor capable of most operations I am reporting here is similarly available as freeware [Audacity 2004]. There are reports in the literature of studies

Made on sounds of Indian music. For example, R. Sengupta et al have used digital sound processing tools to assess tanpuras [Sengupta, R. et al, 2003]. Modak has used older electronic instruments (signal generator and oscilloscope) to examine the sound of a tanpura [Modak, H. V. 2004]. Paritosh Pandya has analysed a digital recording of tapnura [Pandya, P. 2004]. Such results can easily be used, duplicated or added to by anyone with access to a PC and a microphone. Therefore I wish to present here some basic ideas for anyone desiring to do so. There are useful references for those who wish to delve deeper.

Signals, digital sampling and processing

Analog and digital representation of sound

The sounds we hear are fluctuations in air pressure. When we capture them using a microphone, they get converted to fluctuations of another quantity, an electrical voltage. We say that the fluctuating voltage produced by the microphone is an analog representation of the sound, because its fluctuations are analogous or similar to the fluctuations of air pressure in sound. If I were to obtain, somehow, a graph of air pressure against time on a graph paper, as in Figure 1, that would be another analog representation of the sound. But next if I were to count the height of the line at each ordinate of the graph and write these numbers in a table, that would be a digital representation of the same sound. Yes, because although the table of numbers looks very different, from such a table I could always reconstruct the original graph, and in that sense it is a representation of it. The representation of sound on a CD is essentially such a table of numbers. An analog-to-digital converter is an electronic device which converts an analog quantity, such as the voltage produced by a microphone, into a digital representation of that quantity. This process of conversion is called sampling, because we are noting the values at only certain points on the graph, not everywhere.

Figure 1: graph of hypothetical sound wave

The digital representation of sound on a CD is sampled every 1/44100 of a second, or “at 44100 Hertz (Hz for short)”, which means the same thing. A fundamental result in sampling states that only waves with frequency less than half the sampling rate can be correctly reconstructed from a set of sample values (also known as sampled signal). Furthermore, waves with higher frequencies are not merely lost, they are wrongly represented! For example, when a 300 Hz wave is sampled at 500 Hz, in the sampled signal it appears to be a 500 – 300 = 200 Hz wave! To prevent such accidents, an analog signal must be first filtered to remove any components with too high a frequency.

Digital Fourier Transform (DFT) and Fast Fourier Transform (FFT)

In DFT, a digitally sampled sound or other signal (which we now know is just a table of numbers) is converted into another set of numbers. In the original set each number represents the value of the changing quantity such as air pressure at one point of time, whereas in the new set of numbers each number represents how the numbers of the first set fluctuate at a particular rate. Each of the new numbers depends on all of the old ones. Thus, there are many calculations to make – a DFT of 1000 numbers would typically involve about 1000,000 multiplications. However, many of these calculations are actually repeated. The FFT algorithm uses this fact to produce the same 1000 numbers in much fewer multiplications (about 10,000). Therefore, DFT and FFT are really the same result, produced with different efficiency.

The smart reader will have observed that there is another type of sampling going on here. Since the DFT/FFT is a finite set of numbers, surely it cannot measure fluctuations at every possible frequency – only at certain sample frequencies. Indeed, an FFT of, say 1/5 seconds of sound, will have its frequency sample points spaced at intervals of 5 Hz. This is important for us to remember, because it places a limit on how accurate frequency measurements based on FFT can be. For details see for example Matlab manual [UTA 2004].

Harmonic-to-noise ratio (HNR)

The human voice contains some smoothly undulating (harmonic) components as well as some apparently random component. This is because of the way our body produces speech. Initially a very noisy, hiss-like sound is created in the throat, but later the resonances (ringing) in the nose-mouth cavity makes it rounder. Through malfunction or misuse of the equipment, a pathological speaker may produce a very different mixture of harmonic (ringing) and noisy parts – too much of one and too little of the other. Therefore, by measuring the harmonic-to-noise ratio (HNR) in speech signal, an expert may be able to identify the exact problem with a bad speaking voice and suggest ways of improving it. This is in fact how HNR measurement is commonly used.

Musician’s palette of pitches

Our perception of pitch is logarithmic. For example, just as a tone at 400 Hz appears one octave higher than another at 200 Hz, a tone at 600 Hz appears one octave higher than another at 300 Hz. In other words, it is the ratio between the two frequencies which we perceive as distance, not the absolute numerical difference. The ratio between frequencies of two consecutive notes on a keyboard tuned according to the equal temperament system is the twelfth root of 2, and this distance is treated as 100 cents. We can calculate the number of cents between two given frequencies f1 (lower) and f2 (higher) as 1200*log(f2/f1)/log(2). The base for the two logarithms must be the same. This calculation can give us a feel for how far apart two competitors for the position of a sur are. For example, Shuddha re (9/8) sits at 204 cents, 4 cents above its equal temperament counterpart, whereas Komal re (16/15) is 12 cents higher than equal temperament at 112 cents.

In the light of these observations, how serious is the limitation of electronic media in representing frequencies accurately? Say we calculate an FFT of one swar in a taan, lasting for 1/8 second. Then the frequency points in the FFT are going to be 8 Hz apart from one another. At Madhya sa of, say, 156 Hz (Kali 2), that amounts to 87 cents, or almost one patti! Of course, there are more accurate ways of measuring the frequencies of signals. I can filter the signal so that it looks smooth, and then count the number of samples between two successive “upward zero-crossings” (when one sample is negative and the next is positive). Given the sampling rate, that number can give me an accurate reading of the frequency of the filtered signal. At the sampling rate of 44100 Hz, one cycle of tar sa of kali panch at 466 Hz will span 95 sample points. So if I just count an integer number and base my frequency measurement on that, it could be off by as much as 1 in 95, which is 18 cents!

Take another example: the typical wow rating of a good cassette deck is 0.15%. What is a wow rating? – it is the amount by which the speed of the tape, and therefore any pitch reproduced by the tape, may fluctuate even when the deck is in perfect working condition. That is 2.6 cents. Hardly noticeable for most of us, except when we listen to a tape that has gone through 5 generations of copying, because then the change could be as large as 13 cents. And, of course, which receiver of a fifth copy can be certain that all the decks involved in copying were good and in perfect order? Conclusion: don’t judge an artist from a tape recording unless you are sure of the pedigree of the recording.

My experiments with sound files

Input recording

I took a recording of an acoustic tanpura tuned to A (first string: pancham) through a typical general-purpose dynamic microphone and sampled at 16 bits, 44100 Hz (Soundfile 1) to see what ingredients I could find in the sound. Analysis was done using Coolpro 1.1 in 16-bit resolution. (Coolpro is now renamed Adobe Audition) [Adobe 2004].

Expectations

The sound of a tanpura originates from four vibrating strings and is amplified by transferring the vibrations to a thin wooden sheet backed by a gourd through the bridge. Consequently one can expect the sound to be mostly a mixture of sinusoidal waves of four fundamental frequencies and their integer multiples (harmonics or upper partials). To the extent to which the tanpura is well-tuned, one expects all observed partials to be multiples of just two fundamental frequencies – sa and pa (which should in turn be in a ratio 2:3).

Findings

A large number of harmonics of the sa and pa frequencies were indeed found to be present. Interestingly, the perceived fundamental kharaj frequency (about 110 Hz) is itself very weak indeed in the recording. This could be due to a limitation of the microphone, but I doubt that is the case.

The 4^th to 7^th and 9^th harmonics of sa are all strong – within 6 db of each other. Somehow the 8^th is a little weaker. On the other hand the 40^th, 46^th and 48^th harmonics are nearly as strong as the 8^th! The list is given in Table 1. I shall be glad to share the longer complete set of observations with those interested.

Harmonic number	Relative strength (rather weakness) in negative db
3	21
4	10
5	13
6	13
7	15
8	19
9	15.5
12	21
16	19.5
24	20.5
34	21
40	20
46	19.5
48	21

Table 1: prominent harmonics of sa

Some of the above harmonics (numbers which are multiples of three) are also harmonics of the pancham string, which may explain why many of them are relatively strong. The third and ninth harmonics of pa, which are not harmonics of sa, are also quite strong at 17 and 18.5 db on the same scale. Other odd harmonics of pa (which are therefore not harmonics of sa) are relatively weaker. Once again, the complete list is available on request.

Some of the other variations have plausible explanations. In general even harmonics appear stronger than odd harmonics, and many doubly even harmonics are even stronger. However there are some harmonics which are anomalously strong or weak. Why is the 46^th harmonic so strong, while the 42^nd is 3.5 db weaker in spite of being simultaneously the 26^th harmonic of pa, and the 50^th harmonic is a full 12 db weaker? Does the answer lie in idiosyncrasy of mechanics of the instrument, or of the microphone, or is it an artifact of the digital filter? Further investigation is needed.

Two interesting harmonics

I have heard it said that all surs can be heard from a well-tuned tanpura. I do not know exactly what the statement means, but I want to draw your attention to two apparent asurs that are present. In the unmodified recording the do not seem to sound objectionable. But hear the modified recordings in which I have slightly amplified those frequencies. The harmonics I speak of are the ninth harmonic of pa, which is the famous 27/16 dha. It can be clearly heard in Soundfile 2,in which it is enhanced by 9 db. Even more interesting is the case of the seventh harmonic of sa, which is a rather strange komal ni (969 cents). It is clearly audible in Soundfile 3 in which it has been enhanced by only 6 db! How is it that it does not bother anyone? Perhaps some day we will discover that.

Further reading

There are many good books on Digital Signal Processing which cover the basic ground, for example Proakis and Manolakis [Proakis, J and Manolakis, D 1997], Oppenheim and Schafer [Oppenheim, A. and Schafer, R. 1994]. For those studying the singing voice, [Titze, I. 2004] is a series of articles in simple language.

References

Adobe 2004: Adobe Corporation web page. http://www.adobe.com/products/audition/main.html

Audacity 2004: Audacity homepage http://audacity.sourceforge.net/

Modak, H. V. 1997: “Physics of Tanpura: Some Investigations” presented at the workshop on Tanpura, India Music Forum, 6/7/1997 and available here: http://www.indiamusicforum.com/seminar/tan/tan03.htm

Musicmatch 2004: Musicmatch web page: http://www.musicmatch.com/

Oppenheim, A. and Schafer, R. 1994: Discrete Time Signal Processing, New Delhi: Prentice-Hall (India), 1994.

Pandya, P. 2004: “Beyond Swayambhu Gandhar: An Analysis of Perceived Tanpura Notes” available here: http://www.tcs.tifr.res.in/%7Epandya/music/index.html

Proakis, J. and Manolakis, D. 1997: Digital Signal Processing - Principles, Algorithms and Applications, New Delhi: Prentice-Hall (India), 1997.

Sengupta, R. et al “Acoustic Cues for the Timbral Goodness of Tanpura” Report NSA2003-57, ITC Sangeet Research Academy, Kolkata, 2003.

Titze, I. 2004: Science for Singers. Available here: http://www.ncvs.org/ncvs/info/singers/colmenu.html

UTA 2004: University of Texas web pages for the Matlab manual. Referenced page available here: http://www.utexas.edu/math/Matlab/Manual/tec6.2.html