A Java Applet to Calculate Virtual Pitch

Jeff Jensen October 2008

Introduction: What is virtual pitch?
How to use the applet
Applet and source code
Examples of output and some quirky behavior
Technical discussion of how the algorithm works
Ideas for further work
A Crash Course in Acoustics (work in progress)
References
Contact me

Introduction: What is virtual pitch?

Virtual Pitch is a concept originated by Prof. Ernst Terhardt from Technische Universitat Munchen in 1969-1970.. Basically it is an extension of the fundamental bass of Rameau and the residue pitch of Schouten. In very basic terms, it deals with how a bunch of distinct pitches (called a complex tone) are percieved by the human auditory system to fuse together into a "single" tone. The simplest example would be plucking a single guitar string, say the top string tuned to e, with frequency ν. It has harmonics 2ν, 3ν, 4ν...., and these are all distinct and audible pitches, but we percieve the sound as a single pitch e. Now if the string is not perfectly flexible (think of a steel piano string), then when it vibrates in segments, the frequencies are not exactly integer multiples of the fundamental ν; instead they may be 2.0001ν, 3.00006ν, 4.00002ν.... (and now they would better be termed partials rather than harmonics). But we still tend to percieve a single pitch e, provided the partials are not too far off their ideal values.

Now consider several different guitar strings played at the same time. There is still the tendency for the human auditory system to try to fuse these sounds together, if possible (this is a well established experimental fact). Pluck the chord c₄-e₄-g₄ (the subscripts are the standard numbering for what octave the notes are in; the 4th octave is from middle c up to b). People will "hear" notes that are not actually present, such as c₃ and c₂. These tones are virtual pitches in Terhardt's terminology; by contrast a tone that actually is present as a vibration in the air, like c₄, is called a spectral pitch.

There is a little more to the definition than that. Following the discussion on Terhardt's web site ([Terhardt *] in the references), a spectral pitch is directly the effect of sound waves in the air exciting a place in the cochlea of the ear. Virtual pitch is phenomenon of processing by the brain. Thus in the example above with the note e and waves in the air of frequencies ν 2ν 3ν 4ν...., the ear picks up a spectral pitch of ν, and at the same time the brain processes the upper harmonic sequence 2ν 3ν 4ν...., and detects a virtual pitch of ν!

The perception of virtual pitch is a bit delicate, however. There is some variability from listener to listener. It depends on the loudness (sound pressure level, SPL) of the sounds, and the mixture of the partials. A complex of sounds almost never fuses into a single pitch, instead we get a spectrum, and which pitch from the spectrum is most prominent can depend on the musical context. This is still an active area of research in the psychological acoustics community.

How to use the applet

This is a conversion of Terhardt's original C code into Java, with a graphical user interface, and an additional level of processing to allow the user to input note names or just intonation fractions, rather than raw frequencies. The algorithm itself is completely independent of any tuning or temperament; it works internally only with frequencies and loudness values. For musical convenience, the interface of this applet lets you input note names in 12-equal temperament and get the results back in terms of 12-equal temperament note names, if you like.

[Frequencies box]
The text box functions like a command line. There are several formats that you can use to input the base frequencies:
- You can input frequencies (in Hz) as numbers by starting with the letter n as the tag that identifies what type of input follows:
  n 440 554 660
- Or, specifically for musical applications, you can input the letter names of notes on the piano keyboard, by starting with the letter l
  l A4 C#5 E5
  (note that you need to specify the octave number. Middle C is denoted C4 and everything in the octave above middle C then has a "4" suffix) Note also that the note name letters must be capitalized. The symbol for "flat" is a lowercase b.
- You can also input Just Intonation fractions. First you have to set the base frequency; To express the same A major chord as in the previous two examples, we need a slightly more complicated identifying tag: the number 1 ( not a lowercase "L"!) and it must be set equal to a frequency
  1=440 1 5/4 3/2
  Don't put any spaces in the segment "1=440". This part is saying the number "1" corresponds to 440Hz; then 5/4 corresponds to 550 Hz and so on. Note that "1" does not have to be in the chord: C4 has frequency 264 Hz, and we can express E minor with respect to this as:
  1=264 5/4 3/2 15/8
- Warning: At present you can only enter 18 or so distinct values in this box.
[SPL box]
Here we input the loudness (in decibels) of the frequencies listed above. For convenience, you don't have to list them all. For example, if you have 5 frequencies in the upper box, you should need to list, in order, 5 loudness values here. But if you put only 3 (or 1), it will set all the remaining ones to the last value.
[masking]
As an experimental feature, you can turn masking off. It is on in Terhardt's original algorithm.
[upper harmonics]
This is purely a convenience feature. To save you having to type in upper harmonics in the Freq box , this will automatically add them on. Real musical instruments produce tones with upper harmonics.
[pitch shifts]
Probably you will want to leave this turned off, for music theoretic purposes. It is a psychoacoustical fact that people percieve a slight shift in pitch, dependent upon the loudness of the tone, the shape of the amplitude spectrum, and the presence of other tones which may produce masking. It tends to produce results that diverge from traditional music theory, and thus may not really be applicable in the context of a concert, for example.
[weight]
The weight values can be expressed as their raw numbers, or expressed as a percentage of the sum of all such values.
[identify to pitch classes]
By default, we calculate the weights of all the distinct pitches in all octaves. However, it may be desirable to group all the notes "E" together, for example, without regard to octave. This feature only works for named notes, like E or C#; a pitch that is in between the standard ones is still expressed as something like "??(3)".
[Temperament for output]
This refers to assigned note letter names to the output frequencies.
[Calculate button]
When you have made all your input settings, click on this. The applet automatically resets itself, so you can keep on changing the inputs and keep on calculating.

Sometimes the applet image gets distorted after scrolling; if this happens, just click on your browser's refresh button.

alt="Applet failure..? Your browser does not support Java???"

source code

Examples and explanation of output

Lets run with the default input and see what happens:

Input settings
Box name	Value	Remarks
Freqs (Hz):	n 440 550 660
SPLs (dB):	70	equivalent to 70 70 70
masking:	on	active if box is checked
upper harmonics:	No upper harmonics
pitch shifts:	no pitch shifts
weight:	%	The weight values can be expressed as a percentage of the sum of all of them
pitch classes:	Not pitch classes	This feature adds together the weights of all the notes that have names.
output reference:	12-equal temperament	the standard

Output of algorithm
Note name	Weight	Freq	Pitch type	Remarks
A4	0.395	440.0 Hz		The pitch with the highest weight value is A in the 4th octave with frequency 440 Hertz. The psychoacoustical meaning of a raw weight value is unclear, as Terhardt points out in his paper (referenced below).
A2	0.242	110.0 Hz	Virtual	This pitch is not present in the original sounds
A3	0.223	220.0 Hz	Virtual
??(5)	0.216	550.0 Hz		The ?? is because the frequency 550 Hz is not a close enough match to the equal temperment value of C#5 = 554.37
E5	0.188	660.0 Hz
A1	0.121	55.0 Hz	Virtual

What I call the Rameau root is the greatest common factor of all the input frequencies (with a small margin for error built in).

Things to Puzzle over:

n 300 350 400 450 500 produces only one output: 300
l C#4 D4 D#4 gives no output.
l C4 C4 C4 should give root C4! [Perhaps it is just that the algorithm intends all the same pitches to be collected together at the outset.
Adding upper harmonics sometimes turns spectral pitches into virtual ones. But this is not suprising because, as was discussed in the introduction, both types of pitch perception are active at the same time, and one will produce a stronger signal than the other.

The solution to these anomalies may be to be able to turn off masking and to be able to adjust the weighting of the various parameters in Terhard's algorithm (described below).

Note also that we do not get exactly the same results as Terhardt published in 1982 for this same triad A4 - C#5 - E5:.

Output of algorithm from [Terhardt, Stoll, Seewann 1982a] (Table 1 p.677)
Note name	Weight	Freq	Pitch type
A4	1.41	440 Hz	Virtual
A3	1.09	220 Hz	Virtual
A2	0.59	110 Hz	Virtual
D3	0.52	293.3 Hz	Virtual
E6	0.35	1320 Hz	Spectral
F2	0.28	87.3 Hz	Virtual

The explanation for this is probably the unknown mix of partials; Terhardt recorded the chord being played on a piano and analyzed that, but didn't publish the spectra, except to say the sound was 70 dB. (I might also remark here that the mix of partials in a piano tone is quite complicated and constantly changing as the tones are sustained). It is also possible that Terhardt's algorithm changed slightly from 1982 to 1994, which is the date of the code I got.

Description of the algorithm

The raw C language source code is available from Prof. Richard Parncutt's website:
http://www-gewi.uni-graz.at/staff/parncutt/ptp2svpCode.html (see also the references at the end).

Here, Terhardt's original code is reincarnated in the Java classes PartTonePattern, SpectralPitchPattern, VirtualPitchPattern, and CombinedPitchPattern.

The algorithm takes a set of R input frequencies, in Hz { f₀, f₁, ..., f_R-1 } and R sound pressure level (SPL) values, in dB. Once input, these are sorted and put into the Part Tone Pattern object.

The Spectral Pitch Pattern creates a set of weights for each frequency { w₀, w₁, ..., w_R-1 }. It does masking and computation of pitch shifts, based on experimentally measured acoustical parameters.

The Virtual Pitch Pattern is where the virtual pitches are calculated. Here is the criteria that Terhardt says he uses to give a numerical weight to the virtual pitch candidates:

The number of relevant spectral components which provide the same (or nearly the same) virtual pitch. The weight should increase with the number of components. This doesn't seem to be present in the code, however! [See the function sortIntoVP( )].
The spectral pitch weight of the relevant components: Higher spectral weight supporters should imply higher virtual weight for the candidate. This is accounted for in the formula for C_ij.
The weight should decrease as the subharmonics numbers get higher (and thus the subharmonics more distant from the actual frequencies present). This is accounted for in the formula for C_ij.
The weight should increase with the accuracy of the subharmonic coincidences, attaining a maximum value for perfect matching. This is accounted for in the formula for C_ij, specifically in the factor (1 - γ/δ).

One idea for a future enhancement would be to allow the user to alter the amount of importance given to each of these items.

The heart of the algorithm is the function subCoincidence():

Loop i over input frequencies { f₀, f₁, ..., f_R-1 }

Get f_i
Loop over subharmonics ¹/_m·f_i for m = 1, ..., M =12 (the number of virtual pitch candidates we allow)

loop j over all other input frequencies f_j , j ≠ i.

We seek the n^th harmonic of of ¹/_m·f_i that most closely matches f_j. [See the expression for n below this list]
Does n·( ¹/_m·f_i ) ≈ f_j ? [see expression for γ below]
If yes, compute weight (Coincidence coefficient C_ij) [formula below].
If no, set this C_ij = 0.
Sum these C_ij to get the total weight W_i,m (called vpw in the code).

If the virtual pitch candidate ¹/_m·f_i is a good enough match to the other f_j's to get a big weight, put it into the VPP array (sorted by descending weights).

Next i

[Note that we only care about how the harmonics of the sub-harmonics of f_i match the fundamentals f_j; we don't try to match upper harmonics of a given f_j ].

Since formatting equations in HTML is problematic at best, here are some of the expressions for the above quantities written separately from the list:

n = Int

[

f_j

¹/_m·f_i

]

The expression for γ, the what Terhardt calls the degree of inharmonicity, is the amount which ¹/_m·f_i misses being a subharmonic of f_j.

We want | n·( ¹/_m·f_i ) - f_j | < δ·f_j [ 8% of f_j ]

γ_i,j,m

f_j

| n·( ¹/_m·f_i ) - f_j |

We require γ < δ = 0.08 for the 2 frequencies to be considered a "match". In this case, we compute C_ij, which Terhardt calls the Coincidence coefficient. (It is a geometrical mean of the weights, but I don't fully understand its justification):

C( ¹/_m·f_i , f_j )

C_ij

{

[

W_i		W_j
	·
m		n

]^½ · (1 - γ / δ)

if γ > δ

or n > 20

Note that we do not use the shifted frequencies to do the subharmonic matching! The shifts are only accounted for later in forming the Combined Pitch Pattern.

The Rameau root is a theoretical construction; this pitch is not necessarily perceived by listeners. It is the greatest common divisor of all the input frequencies, with a little fudge factor built in.

Things I have added that were not in Terhardt's original C code:

Of course, the graphical interface
Rameau root
I disabled the cutoff of virtual pitches at 500 Hz in the subroutine truVP.

Ideas for further work:

Mathematically viewing the virtual pitch algorithm as a map:

: (Spectra) → (Spectra)

We can ask: Does this map have any fixed points? Invert the map to five a spectra that maps to a single tone?
Understand the meaning of the pitch weights. In [Terhardt,Stoll,Seewan 1982b] p.687, various possible interpretations are given as "pitch strength", "salience", and "probability of perceiving a particular pitch".
Express the various pitch weights as percentages of their total, rather than just a raw number.

References

[Parncutt *] Prof. Richard Parncutt's web site at University of Graz, Austria http://www-gewi.uni-graz.at/staff/parncutt/ In particular, look under the section Computer programs for source code and documentation.
[Terhardt 1974] Ernst Terhardt Pitch, consonance, and harmony. Journal of the Acoustical Society of America 55 #5 1974. p.1061-1069
[Terhardt 1979] Ernst Terhardt Calculating Virtual Pitch. Hearing Research 1 1979. p.155-182
[Terhardt, Stoll, Seewann 1982a] Ernst Terhardt, Gerhard Stoll, and Manfred Seewan. Pitch of complex signals according to virtual-pitch theory: Tests, examples, and predictions. Journal of the Acoustical Society of America 71(3) March 1982 p.671-678.
[Terhardt, Stoll, Seewann 1982b] Ernst Terhardt, Gerhard Stoll, and Manfred Seewan. Algorithm for extraction of pitch and pitch salience from complex tonal signals. Journal of the Acoustical Society of America 71(3) March 1982 p.679-687.
[Terhardt *] Ernst Terhardt web site at Technische Universitat Munchen: www.mmk.ei.tum.de/persons/ter.html

Return to Music Theory
Return to Home Page

Send me email: jjensen14@hotmail.com Advisory: messages with keywords typical of spam in the subject line (including "!" as in "Get out of debt!") get automatically discarded before I see them.