TOOLS OF TECHNOLOGY
Hari V Sahasrabuddhe
Indian Institute of Technology
Powai, Mumbai 400076
Digital
sound recordings and powerful desktop computers have given us new tools with
which to explore music. Bisic ideas of
Signals, digital sampling and processing, and of representation of a musical
interval in the cents scale are covered.
Results of my observations on a CD-quality digital recording of tanpura
are presented. Two very interesting
intervals are prominent in the recording I studied. The sound of tanpura deserves further exploration to advance our
understanding of it. Pointers to
further source material are included.
Digital
sound recordings and powerful desktop computers have given us new tools with
which to explore music. Today’s PC has
a reasonably good sound card which can digitize sound coming from a microphone
or a cassette player etc. Public domain
software is available for transferring sound files from an audio CD to the
computer’s hard disk, for example the MusicMatch Jukebox [Musicmatch
2004]. A powerful sound editor capable
of most operations I am reporting here is similarly available as freeware
[Audacity 2004]. There are reports in
the literature of studies
Made
on sounds of Indian music. For example,
R. Sengupta et al have used digital sound processing tools to assess tanpuras
[Sengupta, R. et al, 2003]. Modak has
used older electronic instruments (signal generator and oscilloscope) to
examine the sound of a tanpura [Modak, H. V. 2004]. Paritosh Pandya has analysed a digital recording of tapnura
[Pandya, P. 2004]. Such results can easily be used, duplicated or added to by
anyone with access to a PC and a microphone.
Therefore I wish to present here some basic ideas for anyone desiring to
do so. There are useful references for
those who wish to delve deeper.
Analog and digital representation of sound
The
sounds we hear are fluctuations in air pressure. When we capture them using a microphone, they get converted to
fluctuations of another quantity, an electrical voltage. We say that the fluctuating voltage produced
by the microphone is an analog representation of the sound, because its
fluctuations are analogous or similar to the fluctuations of air pressure in
sound. If I were to obtain, somehow, a
graph of air pressure against time on a graph paper, as in Figure 1, that would
be another analog representation of the sound. But next if I were to count the height of the line at each
ordinate of the graph and write these numbers in a table, that would be a digital
representation of the same sound.
Yes, because although the table of numbers looks very different, from
such a table I could always reconstruct the original graph, and in that sense
it is a representation of it. The
representation of sound on a CD is essentially such a table of numbers. An analog-to-digital converter is an
electronic device which converts an analog quantity, such as the voltage
produced by a microphone, into a digital representation of that quantity. This process of conversion is called sampling,
because we are noting the values at only certain points on the graph, not
everywhere.
Figure 1: graph of hypothetical sound wave
The
digital representation of sound on a CD is sampled every 1/44100 of a second,
or “at 44100 Hertz (Hz for short)”, which means the same thing. A fundamental result in sampling states that
only waves with frequency less than half the sampling rate can be correctly
reconstructed from a set of sample values (also known as sampled signal). Furthermore, waves with higher frequencies
are not merely lost, they are wrongly represented! For example, when a 300 Hz wave is sampled at
500 Hz, in the sampled signal it appears to be a 500 – 300 = 200 Hz wave! To prevent such accidents, an analog signal
must be first filtered to remove any components with too high a
frequency.
In
DFT, a digitally sampled sound or other signal (which we now know is just a
table of numbers) is converted into another set of numbers. In the original set each number represents
the value of the changing quantity such as air pressure at one point of time,
whereas in the new set of numbers each number represents how the numbers of the
first set fluctuate at a particular rate.
Each of the new numbers depends on all of the old ones. Thus, there are many calculations to make –
a DFT of 1000 numbers would typically involve about 1000,000
multiplications. However, many of these
calculations are actually repeated. The
FFT algorithm uses this fact to produce the same 1000 numbers in much fewer
multiplications (about 10,000). Therefore,
DFT and FFT are really the same result, produced with different efficiency.
The
smart reader will have observed that there is another type of sampling going on
here. Since the DFT/FFT is a finite set
of numbers, surely it cannot measure fluctuations at every possible
frequency – only at certain sample frequencies. Indeed, an FFT of, say 1/5 seconds of sound, will have its
frequency sample points spaced at intervals of 5 Hz. This is important for us to remember, because it places a limit
on how accurate frequency measurements based on FFT can be. For details see for example Matlab manual
[UTA 2004].
Harmonic-to-noise ratio (HNR)
The human voice contains some smoothly undulating (harmonic) components as well as some apparently random component. This is because of the way our body produces speech. Initially a very noisy, hiss-like sound is created in the throat, but later the resonances (ringing) in the nose-mouth cavity makes it rounder. Through malfunction or misuse of the equipment, a pathological speaker may produce a very different mixture of harmonic (ringing) and noisy parts – too much of one and too little of the other. Therefore, by measuring the harmonic-to-noise ratio (HNR) in speech signal, an expert may be able to identify the exact problem with a bad speaking voice and suggest ways of improving it. This is in fact how HNR measurement is commonly used.
Our
perception of pitch is logarithmic.
For example, just as a tone at 400 Hz appears one octave higher than
another at 200 Hz, a tone at 600 Hz appears one octave higher than another at
300 Hz. In other words, it is the ratio
between the two frequencies which we perceive as distance, not the absolute
numerical difference. The ratio between
frequencies of two consecutive notes on a keyboard tuned according to the equal
temperament system is the twelfth root of 2, and this distance is treated
as 100 cents. We can calculate the number of cents between two given
frequencies f1 (lower) and f2 (higher) as 1200*log(f2/f1)/log(2). The base for the two logarithms must be the
same. This calculation can give us a
feel for how far apart two competitors for the position of a sur are. For example, Shuddha re (9/8) sits at 204
cents, 4 cents above its equal temperament counterpart, whereas Komal re
(16/15) is 12 cents higher than equal temperament at 112 cents.
In
the light of these observations, how serious is the limitation of electronic
media in representing frequencies accurately?
Say we calculate an FFT of one swar in a taan, lasting for 1/8 second. Then the frequency points in the FFT are
going to be 8 Hz apart from one another.
At Madhya sa of, say, 156 Hz (Kali 2), that amounts to 87 cents, or
almost one patti! Of course, there are
more accurate ways of measuring the frequencies of signals. I can filter the signal so that it looks smooth,
and then count the number of samples between two successive “upward
zero-crossings” (when one sample is negative and the next is positive). Given the sampling rate, that number can
give me an accurate reading of the frequency of the filtered signal. At the sampling rate of 44100 Hz, one cycle
of tar sa of kali panch at 466 Hz will span 95 sample points. So if I just count an integer number and
base my frequency measurement on that, it could be off by as much as 1 in 95,
which is 18 cents!
Take
another example: the typical wow rating of a good cassette deck is 0.15%. What is a wow rating? – it is the amount by
which the speed of the tape, and therefore any pitch reproduced by the tape,
may fluctuate even when the deck is in perfect working condition. That is 2.6 cents. Hardly noticeable for most of us, except when we listen to a tape
that has gone through 5 generations of copying, because then the change could
be as large as 13 cents. And, of
course, which receiver of a fifth copy can be certain that all the decks
involved in copying were good and in perfect order? Conclusion: don’t judge an artist from a tape recording unless
you are sure of the pedigree of the recording.
Input recording
I took a recording of an acoustic tanpura tuned to A (first string: pancham) through a typical general-purpose dynamic microphone and sampled at 16 bits, 44100 Hz (Soundfile 1) to see what ingredients I could find in the sound. Analysis was done using Coolpro 1.1 in 16-bit resolution. (Coolpro is now renamed Adobe Audition) [Adobe 2004].
Expectations
The sound of a tanpura originates from four vibrating strings and is amplified by transferring the vibrations to a thin wooden sheet backed by a gourd through the bridge. Consequently one can expect the sound to be mostly a mixture of sinusoidal waves of four fundamental frequencies and their integer multiples (harmonics or upper partials). To the extent to which the tanpura is well-tuned, one expects all observed partials to be multiples of just two fundamental frequencies – sa and pa (which should in turn be in a ratio 2:3).
Findings
A large number of harmonics of the sa and pa frequencies were indeed found to be present. Interestingly, the perceived fundamental kharaj frequency (about 110 Hz) is itself very weak indeed in the recording. This could be due to a limitation of the microphone, but I doubt that is the case.
The 4th to 7th and 9th harmonics of sa are all strong – within 6 db of each other. Somehow the 8th is a little weaker. On the other hand the 40th, 46th and 48th harmonics are nearly as strong as the 8th! The list is given in Table 1. I shall be glad to share the longer complete set of observations with those interested.
Harmonic number |
Relative strength (rather weakness) in negative db |
3 |
21 |
4 |
10 |
5 |
13 |
6 |
13 |
7 |
15 |
8 |
19 |
9 |
15.5 |
12 |
21 |
16 |
19.5 |
24 |
20.5 |
34 |
21 |
40 |
20 |
46 |
19.5 |
48 |
21 |
Table 1: prominent
harmonics of sa
Some of the above harmonics (numbers which are multiples of three) are also harmonics of the pancham string, which may explain why many of them are relatively strong. The third and ninth harmonics of pa, which are not harmonics of sa, are also quite strong at 17 and 18.5 db on the same scale. Other odd harmonics of pa (which are therefore not harmonics of sa) are relatively weaker. Once again, the complete list is available on request.
Some of the other variations have plausible explanations. In general even harmonics appear stronger than odd harmonics, and many doubly even harmonics are even stronger. However there are some harmonics which are anomalously strong or weak. Why is the 46th harmonic so strong, while the 42nd is 3.5 db weaker in spite of being simultaneously the 26th harmonic of pa, and the 50th harmonic is a full 12 db weaker? Does the answer lie in idiosyncrasy of mechanics of the instrument, or of the microphone, or is it an artifact of the digital filter? Further investigation is needed.
Two interesting harmonics
I have heard it said that all surs can be heard from a well-tuned tanpura. I do not know exactly what the statement means, but I want to draw your attention to two apparent asurs that are present. In the unmodified recording the do not seem to sound objectionable. But hear the modified recordings in which I have slightly amplified those frequencies. The harmonics I speak of are the ninth harmonic of pa, which is the famous 27/16 dha. It can be clearly heard in Soundfile 2,in which it is enhanced by 9 db. Even more interesting is the case of the seventh harmonic of sa, which is a rather strange komal ni (969 cents). It is clearly audible in Soundfile 3 in which it has been enhanced by only 6 db! How is it that it does not bother anyone? Perhaps some day we will discover that.
Further
reading
There are many good books on Digital Signal Processing which cover the basic ground, for example Proakis and Manolakis [Proakis, J and Manolakis, D 1997], Oppenheim and Schafer [Oppenheim, A. and Schafer, R. 1994]. For those studying the singing voice, [Titze, I. 2004] is a series of articles in simple language.
Adobe
2004: Adobe Corporation web page. http://www.adobe.com/products/audition/main.html
Audacity
2004: Audacity homepage http://audacity.sourceforge.net/
Modak, H. V. 1997: “Physics of Tanpura: Some Investigations” presented at the workshop on Tanpura, India Music Forum, 6/7/1997 and available here: http://www.indiamusicforum.com/seminar/tan/tan03.htm
Musicmatch
2004: Musicmatch web page: http://www.musicmatch.com/
Oppenheim,
A. and Schafer, R. 1994: Discrete Time Signal Processing, New Delhi: Prentice-Hall
(India), 1994.
Pandya,
P. 2004: “Beyond Swayambhu Gandhar: An Analysis of Perceived Tanpura Notes”
available here: http://www.tcs.tifr.res.in/%7Epandya/music/index.html
Proakis,
J. and Manolakis, D. 1997: Digital Signal Processing - Principles,
Algorithms and Applications, New Delhi: Prentice-Hall (India), 1997.
Sengupta,
R. et al “Acoustic Cues for the Timbral Goodness of Tanpura” Report NSA2003-57,
ITC Sangeet Research Academy, Kolkata, 2003.
Titze,
I. 2004: Science for Singers. Available here: http://www.ncvs.org/ncvs/info/singers/colmenu.html
UTA
2004: University of Texas web pages for the Matlab manual. Referenced page available here: http://www.utexas.edu/math/Matlab/Manual/tec6.2.html