The Francinstien Revisited

- further thoughts on the Francinstien as a head-shadow compensator

 

by Richard Brice

 

Spatial Hearing

 

Sound localisation in humans is remarkably acute. As well as being able to judge the direction of sounds within a few degrees, experimental subjects are sometimes able to estimate the distance of a sound source as well. Consider the situation shown below,

 

Figure 1 - Duplex hearing theory

 

in which an experimental subject is presented with a source of sound located at some distance from the side of the head. The two most important cues the brain uses to determine the direction of a sound are due to the physical nature of sound and its propagation through the atmosphere and around solid objects. We can make two reliable observations:

 

a) at high-frequencies, the relative loudness of a sound at the two ears is different since the nearer ear receives a louder signal compared with the remote ear and

 

b) at all frequencies, there is a delay between the sound reaching the near ear and the further ear.

 

It can be demonstrated that both effects aid the nervous system in its judgement as to the location of a sound source: At high frequencies, the head casts an effective acoustic "shadow" which acts like a low-pass filter and attenuates high frequencies arriving at the far ear, thus enabling the nervous system to make use of interaural intensity differences to determine direction. At low frequencies, sound diffracts and bends around the head to reach the far ear virtually unimpeded. So, in the absence of intensity-type directional cues, the nervous system compares the relative delay of the signals at each ear. This effect is termed interaural delay difference. In the case of steady-state sounds or pure-tones, the low-frequency delay manifests itself as a phase difference between the signals arriving at either ear. The idea that sound localisation is based upon interaural time differences at low frequencies and interaural intensity differences at high frequencies has been called Duplex theory and it originates with Lord Rayleigh at the turn of the twentieth century.

 

Binaural Techniques

 

The simplest (and in some ways the best) stereo system was invented in 1881, when Monsieur Clement Ader placed two microphones about eight inches apart (the average distance between the ears) on stage at the Paris Opera where a concert was being performed. He relayed these signals over telephone lines to two telephone ear pieces at the Paris Exhibition of Electricity. The amazed listeners were able to hear, by holding one ear piece to each ear, a remarkably lifelike impression that they too were sat in the Opera audience. This was the first public demonstration of binaural stereophony, the word binaural being derived from the Latin for two ears.

 

The techniques of binaural stereophony, little different from this original, have been exploited many times in the century since the first demonstration. However, psycho-physicists and audiologists have gradually realised that considerable improvements can be made to the simple spaced microphone system by encapsulating the two microphones in a synthetic head and torso. The illusion is strengthened still more if the dummy head is provided with artificial auricles (external ears or pinnae). The binaural stereophonic illusion is improved by the addition of an artificial head and torso and external ears because it is now known that sound interacts with these structures before entering the ear canal. If, in a recording, microphones can be arranged to interact with similar features, the illusion is greatly improved in terms of realism and accuracy when the signals are relayed over headphones. This is because headphones sit right over the ears and thus do not interact with the listener's anatomy on the final playback.

 

Binaural audio is theoretically capable of recreating perfect, accurate sound-fields; apparently reproducing sounds from all directions at the ears of the listener and the system only requires two, discrete recoding channels, so it is therefore efficient in engineering terms too. However, some important limitations should be noted: Firstly, by making mouldings of experimental subject's pinnae, experiments have consistently shown that subjects are far better at judging the direction of sounds when utilising their "own" pinnae than when listening with another person's external ear mouldings. It seems that via childhood experience we learn to listen with our "own ears". This would not be such a depressing limitation were it not for the fact that every person's pinnae are as unique as their finger prints. Secondly, and most damning of all, there appears to be a very real commercial drawback imposed by the system's method of signal presentation over headphones. Music listening is both a shared activity and a process which is shared with other activities and headphones prevent both.

 

 

 

 

Crosstalk-cancellation

 

The desire to re-create spatial sound-fields without headphones has been recognised since the very earliest experiments with stereophony. However, if the signals from a dummy head recording are replayed over two loudspeakers placed in the conventional stereophonic listening arrangement (as shown below),

 

Figure 2 - Crosstalk signals in loudspeaker stereo

 

 

the results are very disappointing. The reason for this is the two unwanted crosstalk signals: the signal emanating from the right loudspeaker which reaches the left ear; and the signal emanating from the left loudspeaker which reaches the right ear as shown. These signals result in a failure to reproduce the correct interaural time delay cues at low frequencies. Several researchers have proposed and constructed systems in which complementary cancelling signals were fed to the speakers to cancel these crosstalk signals (Roland RSS System & Thorn EMI Sensaura). Unfortunately, to work well, the system required that the listener held one, very precise position, a situation which invalidated the convenience and companionship of loudspeaker listening.

 

Two-Loudspeaker Stereophony

 

Two loudspeaker stereophony has restricted ambitions compared with binaural stereo. When listening to music on a two-channel, loudspeaker, stereo audio system, a sound "image" is spread out in the space between the two loudspeakers. The reproduced image thus has some characteristics in common with the way the same music is heard in real-life - that is, with individual instruments or voices (known as phantom images) each occupying, to a greater or lesser extent, a particular and distinct position in space. However, insofar as this process is concerned with creating and re-creating a "sound-event", it is limited in that the image is flat and occupies only the space bounded by the loudspeakers. Nevertheless, the system has proved popular and endured for fifty years as the staple presentation of audio. Only in the last ten or so years have multi-channel systems become a reality, which aim to produce artificial sound-fields which surround or "immerse" the listener.

Two different techniques are used in the production of most stereo records and CDs. The first is a system that was invented in 1928 by Alan Blumlein - a British genius working for EMI. This is by far the most commonly employed system and is based on encoding phantom image positions by means of inter-channel amplitude differences. The second system is much rarer, and is based on encoding inter-channel time differences between the stereo channels. Only Blumlein’s technique is described below.

 

Blumlein's (intensity derived) system

 

Blumlein was well aware of Duplex spatial-hearing theory and gives a good précis in his patent application (1933). He therefore expected that the high-frequency inter-aural intensity cues and low-frequency inter-aural delay cues would be formed differently.

 

Low frequency directional hearing

 

Figure 3 illustrates a real sound source auditioned in real life. The two ears of the listener are spaced distance h apart. The sound source is placed so that its direction is θ

 to the straight-ahead position. The sound will travel further to the right ear than to the left. If v is the velocity of sound in air, the time interval between the arrivals of the sound at the two ears will be,

 

            (h sin θ) / v

 

Because h is small compared with the with the distance from the source there will be a phase difference,

 

 

 

 

 

 

Φ = (ωh sin θ) / v

 

where ω is times the frequency of the sound wave.

 

If a recoding and reproduction system can be designed which exactly recreates, by means of the correct sound pressures at the ears of the listener, the original time differences of arrival, the listener will experience a virtual sound source at angle Φ.

 

 

Figure 4 – phase differences due to inter-channel intensity ratios

 

It may be demonstrated, by geometrical reasoning, that the phase difference at the ears of a listener seated in relation to loudspeakers disposed as in figure 4, the phase differences at the ears may be calculated to be,

 

            Φ δ = [(L - R) / (L + R)] . [(ω h sin ψ) / v ] ................................ (A)

 

Thereby demonstrating that any given phase shift may be derived at the listener's ears by means of the appropriate ratio of in-phase signals (L and R) fed to the loudspeakers[1].

 

But how might the appropriate information be captured to create the appropriate inter-channel amplitude ratio? One answer is to “steer” sound sources into a particular position using a ratio-metric potentiometer designed progressively to attenuate one channel whilst progressively strengthening the other as the knob is rotated; the input being shared equally between both channels when the knob is in its centre (12 o'clock) position. Such a control is referred to as a panoramic potentiometer or pan-pot for short.

 

 

This technique is the norm for the huge majority of recorded stereo recordings in which each instrumentalist or vocalist is close-miked and the result of the "mix" of all the instruments combined together electrically inside the audio mixer. Using this technique, it is one of the duties of the sound engineer performing the stereo mix, to construct a stereo picture or image. This is done with a combination of artificial reverberation and of "steering" the apparent position of each instrumentalist within the stereo picture by means of the pan-pot. All pan-pots encode stereo information by inter-channel intensity differences alone; they can therefore be regarded as a pure version of Blumlein's intensity-derived stereo system.

 

For the capturing of real sound-fields something more subtle is required. Clark et al. (1958) describing the commercialisation of Blumlein's EMI stereo system thirty years after Blumlein's original patent was written, show how the sound-field may be sampled so as to recreate the appropriate phase-shifts at the listener's ears. Clark and his team opted for a coincident stereo microphone technique based on crossed figure-of-eight (velocity) microphones.

 

Given that the output of a velocity microphone follows a cosine law as shown below,

 

 

the microphone voltages derived from a horizontal crossed pair, placed together, angled 90 degrees apart and inclined so that each pair is placed such that its maximum response is at 45 degrees to the median plane, will be,

 

                        EL = k. sin ( 45o + θt )

 

                        ER = k. sin ( 45o - θt ) 

 

where θt is the true angle of the recorded sound source from the median plane.

 

From which it may be derived that,

 

                        ( EL - ER ) / ( EL + ER ) = tan θt ............................ (B)

 

Encode - decode

 

Equations A and B may be combined together to produce a simple expression for the reproduction of an entire encode-decode chain. It is,

 

 

                        sin θa = tan θt . sin ψ     ............................................ (C)

 

 

 

where θa is the apparent angle of reproduced sound.

 

 

Clark et al. plotted θt against θa for various values of ψ. This graph is reproduced as figure above. The curves represents the perceived angle (y axis) versus the captured angle (x axis) for an encode system with the signals captured from perpendicular, crossed figure-of-eights replayed over loudspeakers the base angles of which subtend 60 degrees at the listening position (This represents the "classic" stereo system).

 

As you can see, the stage is cramped; in that the original, captured 90 degrees is compressed to the reproduced 60 degrees. However, the scaling is fairly linear. Interestingly, the case for ± 45 degree loudspeakers is plotted too in the figure. This illustrates that a 90 degree soundstage may be accurately produced by such a system. It is worth noting that, equation C illustrates that a perfect illusion may be created (at least at frequencies below 700Hz) by the EMI system.

 

HF imaging

 

Unfortunately neither Blumlein himself, nor the post-war team of Clark, Dutton and Vanderlyn, were able to offer such straightforward, theoretical analysis for HF imaging. They pointed out that at frequencies above about 700Hz (a frequency which has a wavelength approximately equal to the dimensions of the human head) a system based on phase-difference will become ambiguous. Above this watershed frequency, they assumed that amplitude differences at the ears accounted for the perception of stereo. Given that the absorption and diffraction of the head was very difficult to model (especially in the computer-less world of 1958!), any relationship had to be derived empirically. They found by experiment that the perceived angle for frequencies above 700Hz was greater than that for low frequencies for a given inter-channel ratio. Unable to posit a reason why, their approach was to process the microphone signals prior to recording in order to allow for this exaggerated stereo effect at HF. They calculated, experimented and proved that if the ratio of (L - R)/(L + R) was reduced by 70% at HF, the image could be brought into line with the LF image. This they did by deriving a sum (L + R) and difference (L - R) channel and inserting a low-pass filter into the difference channel. They invented a matrix and filter circuit to accomplish this and they referred to this technique as stereo shuffling. Below is an illustration of their practical circuit and its implementation in the difference channel.

 

 

 

The loss of the shuffler

 

Unfortunately, the EMI "shuffler" circuit was found to introduce distortion and tonal colouring and was eventually abandoned. And that left the stereo system "broken" - a situation which has effectively lasted until today. The important qualitative fact to appreciate is that, for a given inter-channel intensity difference, the direction of the perceived auditory event is further from a central point between the loudspeakers when a high-frequency signal is reproduced than when a low frequency is reproduced. Since music is itself a wideband signal, when two-loudspeakers reproduce a stereo image from an inter-channel intensity derived stereo music signal, the high frequency components of each instrument or voice will subtend a greater angle than will the low-frequency components. We might say the image was "blurred" with respect to frequency - an effect similar to chromatic aberration in optics. Importantly, this limitation is as true with close-miked, pan-pot stereo recordings (the vast majority of the recorded catalogue) and with coincident-microphone, stereo recordings.

 

Improving Image Sharpness by means of Inter-channel Crosstalk

 

A new, simple and non-distorting technique which may be used to narrow the HF stereo image and re-map it to coincide with the LF image was introduced by Brice (1997), (1998). The technique involves deliberate crosstalk being introduced between the stereo channels at frequencies above approximately 700Hz. The technique was commercially exploited in the FRANCINSTIEN range of stereophonic image enhancement systems developed by Perfect Pitch Music Ltd. Commercial units for use in hi-fi systems and recording studios were both produced

 

The hi-fi version of the Francinstien

 

 

Head-shadow compensator

 

Philip Vanderlyn, who worked with Blumlein, writing after his retirement from EMI twenty years later (1979), referred to experiments made after the publication of the joint "Stereosonic" paper in which demonstrated that it was possible to derive a theoretical law for high-frequency stereo images based on transient sounds. To do this, he described a simple - but remarkable - experiment in which the listener in a standard, anechoic stereo experiment was replaced by two spaced omni directional microphones, spaced apart by a distance equivalent to the human interaural spacing (approximately 20cm) but with no baffle interposed between them. The unmodified signals from these microphones were then sent - via amplifiers - to headphones, worn by the listener in a different room. In simplistic terms, the microphones thereby "replaced" the listener's ears but not her head!

 

 

Unremarkably, LF imaging was possible using this set-up and this bore close resemblance to the results obtained by the listener at the same position as the microphones in the anechoic room. Much more surprisingly, it was found that it was possible to pan HF transient sounds (random noise filtered above 2kHz, modulated by a pulse with rapid onset and slow decay), in exactly the same way as low frequency sounds; that's to say, by means of simple amplitude differences between the loudspeakers. Vanderlyn advanced a putative mechanism for this based on the ear and brain system integrating the cues arriving at each ear over a period of time longer than the interaural transit time as shown below.

 

 

 

Whatever the mechanism, as Vanderlyn says himself in his 1979 article,

 

" The results ... are sufficiently clear cut to show that an effective interaural time difference cue is produced when loudspeakers are driven with high frequency transient signals, in phase but of different amplitude, without the benefit of head shadowing effects."

 

Even more amazingly, it was found that the apparent position of HF transient signals panned in this way, correlated perfectly with the law for low frequencies given above (equation C). Vanderlyn next experimented with a set-up in which a dummy head was interposed between the microphones, to mimic (roughly) the HF shadowing effect of the listener's head. The experiment above was repeated and - this time - it demonstrated that the apparent HF image was always further from the median plane for a given inter-channel amplitude ratio: exactly the same effect as noted in the earlier experiments in 1957.

 

To summarise Vanderlyn's remarkable results, he found that amplitude-difference ("panned") stereo, not only produces reliable and accurate low-frequency time-delay effects at the ears of the listener, but it provides high-frequency transient time-delay cues as well. These latter cues are actually all the brain needs to decode an apparent position which accords perfectly with the law for LF cues (the experiment which mimics the "headless listener" proved this). His second experiment demonstrates that, far from enhancing the stereo effect, the shadowing effect of the head at high-frequencies supplements the time-delay cues and causes the stereo image to be exaggerated at high-frequencies; a ubiquitous finding amongst all researchers in this field. The fact that amplitude difference at the two ears is by no means so precise an indicator of direction as time difference, being subject to perturbations due to local objects or wax in the ears, possibly explains why experimenters find so much variation in their HF image results under different conditions and with different experimental subjects.

 

Looked at in this way, we can see the stereo "shuffler" circuit derived back in 1957 and the FRANCINSTIEN are both head-shadow compensator", introducing a reduction in the difference signal at high-frequencies sufficient to "cancel out" the shadowing effect of the head and its tendency to exaggerate high-frequency, directional cues.

 

 

 

References

 

Blumlein, A. (1933) British Patent 394,325 June 14th

Clark, Dutton and Vanderlyn (1958)  The Stereosonic recording and reproducing system: a two-channel system for domestic tape records JAES 6,2, pp102-117

Brice, R. (1997) Multimedia and Virtual Reality Engineering. Newnes

Brice, R. (1998) Music Engineering. Newnes

Vanderlyn P. (1979) Auditory cues in stereophony Wireless World, September

 

 



[1] Note that, if phase angles are directly proportional to frequency, as they are here, this is equivalent to a time delay.