A Java Applet to Calculate Virtual Pitch
Jeff Jensen October 2008
- Introduction: What is virtual pitch?
- How to use the applet
- Applet and source code
- Examples of output and some quirky behavior
- Technical discussion of how the algorithm works
- Ideas for further work
- A Crash Course in Acoustics (work in progress)
- References
- Contact me
Introduction: What is virtual pitch?
Virtual Pitch is a concept originated by Prof. Ernst Terhardt from
Technische Universitat Munchen in 1969-1970..
Basically it is an extension of the fundamental bass
of Rameau and the residue pitch of Schouten. In very basic terms, it
deals with how a bunch of distinct pitches (called a complex tone) are
percieved by the human auditory system to fuse together into a "single" tone. The
simplest example would be plucking a single guitar string, say the top string
tuned to e, with frequency ν. It has harmonics
2ν, 3ν, 4ν...., and these are all distinct and audible
pitches, but we percieve the sound as a single pitch e. Now if the
string is not perfectly flexible (think of a steel piano string), then when it vibrates
in segments, the frequencies are not exactly integer multiples of the fundamental
ν; instead they may be 2.0001ν, 3.00006ν, 4.00002ν....
(and now they would better be termed partials rather than harmonics). But
we still tend to percieve a single pitch e, provided the partials are not
too far off their ideal values.
Now consider several different guitar strings played at the same time.
There is still the tendency for the human auditory system to try to fuse
these sounds together, if possible (this is a well established experimental fact).
Pluck the chord
c4-e4-g4 (the
subscripts are the standard numbering for what octave the notes are in; the
4th octave is from middle c up to b). People will "hear" notes that are
not actually present, such as c3 and c2.
These tones are virtual pitches in Terhardt's terminology; by
contrast a tone that actually is present as a vibration in the air, like
c4, is called a spectral pitch.
There is a little more to the definition than that. Following the discussion on
Terhardt's web site ([Terhardt *] in the references), a spectral pitch is
directly the effect of sound waves in the air exciting a place in the cochlea of
the ear. Virtual pitch is phenomenon of processing by the brain. Thus in
the example above with the note e and waves in the air of frequencies
ν 2ν 3ν 4ν...., the ear picks up a
spectral pitch of ν, and at the same time the brain processes the
upper harmonic sequence 2ν 3ν 4ν...., and
detects a virtual pitch of ν!
The perception of virtual pitch is a bit delicate, however.
There is some variability from listener to listener.
It depends on the loudness (sound pressure level, SPL) of the sounds,
and the mixture of the partials. A complex of sounds almost never fuses into a
single pitch, instead we get a spectrum, and which pitch from the spectrum
is most prominent can depend on the musical context. This is still an active area
of research in the psychological acoustics community.
How to use the applet
This is a conversion of Terhardt's original C code into Java,
with a graphical user interface,
and an additional level of processing to allow the user to input note names or
just intonation fractions, rather than raw frequencies. The algorithm itself is
completely independent of any tuning or temperament; it works internally
only with frequencies and loudness values. For musical convenience, the interface
of this applet lets you input note names in 12-equal temperament and get the
results back in terms of 12-equal temperament note names, if you like.
-
[Frequencies box]
The text box functions like a command line.
There are several formats that you can use to input the base frequencies:
- You can input frequencies (in Hz) as numbers
by starting with the letter n as the tag that identifies what type of
input follows:
n 440 554 660
- Or, specifically for musical applications, you can input the letter
names of notes on the piano keyboard, by starting with the letter l
l A4 C#5 E5
(note that you need to specify the octave number. Middle C is denoted C4 and
everything in the octave above middle C then has a "4" suffix)
Note also that the note name letters must be capitalized. The symbol for
"flat" is a lowercase b.
- You can also input Just Intonation fractions.
First you have to set the base frequency; To express the same A major chord
as in the previous two examples, we need a slightly more complicated identifying
tag: the number 1 ( not a lowercase "L"!)
and it must be set equal to a frequency
1=440 1 5/4 3/2
Don't put any spaces in the segment "1=440". This part is saying the number
"1" corresponds to 440Hz; then 5/4 corresponds to 550 Hz and so on. Note that
"1" does not have to be in the chord: C4 has frequency 264 Hz, and we can express
E minor with respect to this as:
1=264 5/4 3/2 15/8
- Warning: At present you can only enter 18 or so distinct values in
this box.
-
[SPL box]
Here we input the loudness (in decibels) of the frequencies listed above.
For convenience, you don't have to list them all. For example, if you have 5
frequencies in the upper box, you should need to list, in order, 5 loudness values
here. But if you put only 3 (or 1), it will set all the remaining ones to the last
value.
- [masking]
As an experimental feature, you can turn masking off. It is on in Terhardt's original algorithm.
-
[upper harmonics]
This is purely a convenience feature. To save you having to type in upper harmonics
in the Freq box , this will automatically add them on. Real musical instruments
produce tones with upper harmonics.
-
[pitch shifts]
Probably you will want to leave this turned off, for music theoretic purposes. It is
a psychoacoustical fact that people percieve a slight shift in pitch, dependent upon
the loudness of the tone, the shape of the amplitude spectrum, and the presence of
other tones which may produce masking. It tends to produce results that diverge from
traditional music theory, and thus may not really be applicable in the context of a concert,
for example.
- [weight]
The weight values can be expressed as their raw numbers, or expressed as a percentage of
the sum of all such values.
- [identify to pitch classes]
By default, we calculate the weights of all the distinct pitches in all octaves. However, it
may be desirable to group all the notes "E" together, for example, without regard to octave.
This feature only works for named notes, like E or C#; a pitch that is in between the standard ones
is still expressed as something like "??(3)".
-
[Temperament for output]
This refers to assigned note letter names to the output frequencies.
-
[Calculate button]
When you have made all your input settings, click on this. The applet automatically
resets itself, so you can keep on changing the inputs and keep on calculating.
Sometimes the applet image gets distorted after scrolling; if this happens, just
click on your browser's refresh button.
source code
Examples and explanation of output
Lets run with the default input and see what happens:
Input settings
Box name |
Value |
Remarks |
Freqs (Hz): |
n 440 550 660 |
|
SPLs (dB): |
70 |
equivalent to 70 70 70 |
masking: |
on |
active if box is checked |
upper harmonics: |
No upper harmonics |
|
pitch shifts: |
no pitch shifts |
|
weight: |
% |
The weight values can be expressed as a percentage of the sum of all of them |
pitch classes: |
Not pitch classes |
This feature adds together the weights of all the notes that have names. |
output reference: |
12-equal temperament |
the standard |
Output of algorithm
Note name |
Weight |
Freq |
Pitch type |
Remarks |
A4 |
0.395 |
440.0 Hz |
|
The pitch with the highest weight value is A in the 4th octave with frequency
440 Hertz. The psychoacoustical meaning of a raw weight value is
unclear,
as Terhardt points out in his paper (referenced below). |
A2 |
0.242 |
110.0 Hz |
Virtual |
This pitch is not present in the original sounds |
A3 |
0.223 |
220.0 Hz |
Virtual |
|
??(5) |
0.216 |
550.0 Hz |
|
The ?? is because the frequency 550 Hz is not a close enough match
to the equal temperment value of C#5 = 554.37 |
E5 |
0.188 |
660.0 Hz |
|
|
A1 |
0.121 |
55.0 Hz |
Virtual |
|
What I call the Rameau root is the greatest common factor of all
the input frequencies (with a small margin for error built in).
Things to Puzzle over:
- n 300 350 400 450 500 produces only one output: 300
- l C#4 D4 D#4 gives no output.
- l C4 C4 C4 should give root C4! [Perhaps it is just that the algorithm intends
all the same pitches to be collected together at the outset.
- Adding upper harmonics sometimes turns spectral pitches into virtual ones. But this
is not suprising because, as was discussed in the introduction, both types of
pitch perception are active at the same time, and one will produce a stronger
signal than the other.
The solution to these anomalies may be to be able to turn off masking and to
be able to adjust the weighting of the various parameters in Terhard's algorithm
(described below).
Note also that we do not get exactly the same results as Terhardt published in
1982 for this same triad A4 - C#5 - E5:.
Output of algorithm from [Terhardt, Stoll, Seewann 1982a] (Table 1 p.677)
Note name |
Weight |
Freq |
Pitch type |
A4 |
1.41 |
440 Hz |
Virtual |
A3 |
1.09 |
220 Hz |
Virtual |
A2 |
0.59 |
110 Hz |
Virtual |
D3 |
0.52 |
293.3 Hz |
Virtual |
E6 |
0.35 |
1320 Hz |
Spectral |
F2 |
0.28 |
87.3 Hz |
Virtual |
The explanation for this is probably the unknown mix of partials; Terhardt
recorded the
chord being played on a piano and analyzed that, but didn't publish the spectra,
except to say the sound was 70 dB. (I might also remark here that the mix of partials
in a piano tone is quite complicated and constantly changing as the tones are sustained).
It is also possible that Terhardt's algorithm changed slightly from 1982 to 1994, which
is the date of the code I got.
Description of the algorithm
The raw C language source code is available from Prof. Richard Parncutt's website:
http://www-gewi.uni-graz.at/staff/parncutt/ptp2svpCode.html
(see also the references at the end).
Here, Terhardt's original code is reincarnated in the Java classes
PartTonePattern, SpectralPitchPattern, VirtualPitchPattern, and CombinedPitchPattern.
The algorithm takes a set of R input frequencies, in Hz
{ f0, f1, ..., fR-1 }
and R sound pressure level (SPL) values, in dB.
Once input, these are sorted and put into the Part Tone Pattern object.
The Spectral Pitch Pattern creates a set of weights for each frequency
{ w0, w1, ..., wR-1 }.
It does masking and computation of pitch shifts, based on experimentally
measured acoustical parameters.
The Virtual Pitch Pattern is where the virtual pitches are calculated.
Here is the criteria that Terhardt says he uses to give a numerical weight
to the virtual pitch candidates:
-
The number of relevant spectral components which provide the
same (or nearly the same) virtual pitch. The weight should increase with the
number of components. This doesn't seem to be present in the code, however!
[See the function sortIntoVP( )].
-
The spectral pitch weight of the relevant components: Higher
spectral weight supporters should imply higher virtual weight for the candidate.
This is accounted for in the formula for Cij.
-
The weight should decrease as the subharmonics numbers get higher (and
thus the subharmonics more distant from the actual frequencies present).
This is accounted for in the formula for Cij.
-
The weight should increase with the accuracy of the subharmonic
coincidences, attaining a maximum value for perfect matching.
This is accounted for in the formula for Cij, specifically in
the factor (1 - γ/δ).
One idea for a future enhancement would be to allow the user to alter the amount of
importance given to each of these items.
The heart of the algorithm is the function subCoincidence():
- Loop i over input frequencies
{ f0, f1, ...,
fR-1 }
- Get fi
- Loop over subharmonics
1/m·fi
for m = 1, ..., M =12 (the number of virtual pitch candidates
we allow)
- loop j over all other input frequencies fj ,
j ≠ i.
- We seek the nth harmonic of of
1/m·fi
that most closely matches fj. [See the expression for n
below this list]
- Does n·( 1/m·fi )
≈ fj ? [see expression for γ below]
- If yes, compute weight (Coincidence coefficient Cij)
[formula below].
If no, set this Cij = 0.
- Sum these Cij to get the total weight Wi,m
(called vpw in the code).
- If the virtual pitch candidate
1/m·fi is a good
enough match to the other fj's to get a big weight, put
it into the VPP array (sorted by descending weights).
- Next i
[Note that we only care about how the harmonics of the sub-harmonics of
fi match the fundamentals fj; we don't try
to match upper harmonics of a given fj ].
Since formatting equations in HTML is problematic at best, here are some of the
expressions for the above quantities written separately from the list:
n = Int |
[ |
fj |
      |
1/m·fi |
|
] |
The expression for γ, the what Terhardt calls the degree of inharmonicity,
is the amount which
1/m·fi
misses being a subharmonic of fj.
We want
|
n·( 1/m·fi )
- fj
|
< δ·fj
[ 8% of fj ]
γi,j,m |
= |
γ |
= |
|
|
n·( 1/m·fi )
- fj
|
|
We require γ < δ = 0.08 for the 2 frequencies to be considered a
"match". In this case, we compute Cij, which Terhardt calls the
Coincidence coefficient. (It is a geometrical mean of the weights, but I don't
fully understand its justification):
C( 1/m·fi ,
fj ) |
= |
Cij |
= |
{ |
[ |
|
]½ · (1 - γ / δ) |
| | |
0 | if γ > δ | or n > 20 |
|
Note that we do not use the shifted frequencies to do the subharmonic
matching! The shifts are only accounted for later in forming the Combined
Pitch Pattern.
The Rameau root is a theoretical construction; this pitch is not necessarily
perceived by listeners. It is the greatest common divisor of all the
input frequencies, with a little fudge factor built in.
Things I have added that were not in Terhardt's original C code:
- Of course, the graphical interface
- Rameau root
- I disabled the cutoff of virtual pitches at 500 Hz in the subroutine
truVP.
Ideas for further work:
- Mathematically viewing the virtual pitch algorithm as a map:
|
: (Spectra) |
→ |
(Spectra) |
We can ask: Does this map have any fixed points? Invert the map to five a
spectra that maps to a single tone?
- Understand the meaning of the pitch weights. In [Terhardt,Stoll,Seewan 1982b]
p.687, various possible interpretations are given as "pitch strength",
"salience", and "probability of perceiving a particular pitch".
- Express the various pitch weights as percentages of their total, rather than
just a raw number.
References
-
[Parncutt *]
Prof. Richard Parncutt's web site at University of Graz, Austria
http://www-gewi.uni-graz.at/staff/parncutt/ In particular, look under
the section Computer programs for source code and documentation.
-
[Terhardt 1974]
Ernst Terhardt
Pitch, consonance, and harmony.
Journal of the Acoustical Society of America 55 #5 1974.
p.1061-1069
-
[Terhardt 1979]
Ernst Terhardt
Calculating Virtual Pitch.
Hearing Research 1 1979. p.155-182
-
[Terhardt, Stoll, Seewann 1982a]
Ernst Terhardt, Gerhard Stoll, and Manfred Seewan.
Pitch of complex signals according to virtual-pitch theory:
Tests, examples, and predictions.
Journal of the Acoustical Society of America 71(3)
March 1982 p.671-678.
-
[Terhardt, Stoll, Seewann 1982b]
Ernst Terhardt, Gerhard Stoll, and Manfred Seewan.
Algorithm for extraction of pitch and pitch salience from complex tonal signals.
Journal of the Acoustical Society of America 71(3)
March 1982 p.679-687.
-
[Terhardt *]
Ernst Terhardt web site at Technische Universitat Munchen:
www.mmk.ei.tum.de/persons/ter.html
Return to Music Theory
Return to Home Page
Send me email:
jjensen14@hotmail.com Advisory: messages
with keywords typical of spam in the subject line (including "!" as in "Get out of
debt!") get automatically discarded before I see them.