‘Don’t be alarmed,’ said a voice,
‘None of your ventriloquising me,’ said the tramp, rising sharply to his feet.
‘Where are yer? Alarmed indeed!’
‘Don’t be alarmed,’ repeated the voice.
‘You’ll be alarmed in a minute, you silly fool… Where are yer? are you buried?’
H.G. Wells, The Invisible Man
The city talking
Our cities can finally talk to us. They speak from the walls and ceilings of buildings, from station platforms and corporate offices, airport and computer terminals. Elevator cars tell us where we are (Fourteenth floor, doors opening), supermarket checkouts tell us where to go (Cashier number nine please), transit passageways and corridors remind us not to leave anything behind (Unattended luggage will be removed and destroyed). New voice technologies can now extend human speech to any part of the modern city, as if they were bringing it to life as a giant talking machine.
It’s true that we rarely hear this city of audible signs uttering more than a few carefully chosen words at a time (Stand clear of the closing doors please). But those perfectly precise sentences combine the textual authority of a typographical inscription with the vocal familiarity of someone right there talking alongside you (This train terminates at Ealing Broadway). At times they can activate the space around us with the ethereal force of a light being turned on (This is a security announcement!).
The voices of audio-signage also mediate a current stage in the convergence of language and the built environment; a vocal infrastructure that will shape not only our interaction with public spaces but with each other. The presence of even a recorded voice can still suggest an age before the eras of print and mass-media, at the same time as it seems to usher in a fully-automated world populated by electronic devices. This is the future and past of human language, compressed inside the text-to-speech software and MP3 voice files that greet us and speed us through the routing systems of the city.
And if one aim of the talking city is to help keep us moving, part and parcel with the urban flow of goods and capital, it does so by utilising the dramatic effects of the voice. Instant messages break into the air like short localized radio bulletins, as if the city had started to narrate itself, or had simply become a vast theatrical soundstage. Comparing the notion of dramaturgy – the technical construction of a play or novel – and our interaction with the world around us, sociologist Irving Goffman once described how we learn to look for and respond to cues, often in the speech of others. Audio-signs seem to convey Goffman’s stage metaphors literally, prompting us through a scripted public space as if a theatrical quality was somehow fundamental to the messages they impart.
With its discursive dustbins (Berlin), and chatty cigarette machines (Tokyo), not to mention celebrity voice announcements on the subway (Moscow) or in taxis (New York), the talking urban infrastructure already serves as a kind of special effect in the mise en scéne of the modern city. Safety messages and warnings were made doubly dramatic when a cast of famous voices (the singer Eartha Kitt, former Mayor Rudy Giuliani, Elmo from Sesame Street) advised passengers to buckle-up in the back of New York’s cabs. In a survey to find a new voice for London Underground’s public address system, the most popular was reported to be a vocal-impression of Marilyn Monroe.
Certainly the voices of audio-signage recall the modern scripted characters of both typography and cinema. Like typefaces, they convey a sense of identity – stern or friendly, youthful or elderly, male or female. We can also recognize in their tones the lush advertising sirens of Blade Runner and Minority Report, or the authoritarian chill in the speech of the Wizard of Oz.
Cinema may even provide the closest analysis in terms of the construction of audio-sign voices, as well as their impact on us and the places in which we hear them. I’m thinking mainly of the work of film critic Michel Chion, who, as part of his interests in cinema sound, pays particular attention to the use of the voice in establishing movie protagonists that are heard but never seen. Chion describes all those “mysterious and talkative characters hidden behind curtains, in rooms or hideouts, which the sound film has given us; as well as the innumerable voice characters: robots, computers, ghosts…” Some of them, The Invisible Man, for example, have bodies they prefer not to reveal. Others, like Hal, the computer system in Stanley Kubrick’s 2001, a Space Odyssey, no longer seem to need a human body. Each of these disembodied voices are brought to cinematic life through the force of what Chion calls an “acousmatic presence”.
In film and video, the acousmétre is a kind of voice creature – always connected to what we see, but at the same time dissolving any distinction between on and off-screen. This celluloid phantom can be compared to the ambiguous physical identity of an audio announcement on a subway platform – both inside and outside the material surfaces of the city. Inhabiting the air as a presence that seems to be only a voice, even gathering force by the fact that we don’t really know where it comes from or quite what its limits are.
The word acousmatic may have its origins in Greek literature, Chion, though, is interested in more recent uses – in cinema, obviously, but also in the case of sound-projection for other kinds of public space. In the 1950’s, musician and writer Pierre Schaeffer used the term to refer to “sounds one hears without seeing their cause of origin.” Following Schaeffer, the idea of an “acousmatic music” was developed by several composers, particularly Francois Bayle who, in the 1970’s, spoke of the art of projecting recorded sounds into halls and enclosed urban areas.
And as anyone currently waiting in an elevator or on a subway platform knows, these forms of sound and voice projection have made their way into everyday life in the embedded guise of voice chips and recorded announcements. Combining the language of visual signs with the dramatic spectral presence of an invisible speaker, the spoken forms of public information may have introduced a new acoustic dimension to the city, an acousmatic space.
Station to station
Airports and railway stations must have been the first places to use talking signs. These were the kind of public spaces that were already broadcasting fixed schedules and information, flight departures, rail destinations etc, by amplified microphone or tannoy systems. And since the early 1970s, “Mind the gap” – a recorded announcement advising Underground passengers to take care when stepping from the platform into a train – has been a sonic landmark for anyone visiting London.
The famous warning sign is still clearly intoned today, although it has aged in a way that seems to intensify both ghostly and mechanical qualities. As a kind of a facsimile of a post-war BBC voice its authority seems shaken, partly eroded by bad public address systems, if not simply through being rattled over the years by thousands of passing trains, the single phrase echoing repetitiously along station platforms like a low-budget robot. But if the technology that produces it and the voice that speaks it seem equally outdated, the automated announcement has become synonymous with the city of London, cherished as if it were an endangered creature or a listed building – or the strange sonorous hybrid of both.
Mind the Gap features the voice of a sound engineer named Peter Lodge. London Transport doesn’t seem to know much about him or the recording, although there is a story that Lodge used his own voice in order to keep production costs down. Most early examples of audio-signage will be similarly populated by the voices of those willing or available at the time. Travellers to Vienna quickly get to know the polite, nasal voice of Franz Kaida, a former head of transport security who has called out the stops on Viennese trams and underground trains for more than twenty years.
If Peter Lodge personifies the technical construction of an audio-sign voice, Kaida’s background may be further emblematic of a history of this kind of signage – part of an urban-authoritarian infrastructure connecting the time signals and observational calls of the sixteenth century nightwatch (an early form of police force) to the modern day technologies of security and surveillance.
In our age of marketing and public relations, today’s announcements are most likely to be spoken by radio presenters and professional voice-over artists – in some cases having an even more direct association with power and authority. New York subway-car announcements feature presenters from Bloomberg radio, the station owned by the city’s current mayor, Michael Bloomberg. In a situation that brilliantly parodied such formal connections between figures of speech and authority, a group of Viennese activists once nominated the voice of Franz Kaida to stand in a government election against discredited Austrian politician Kurt Waldheim.
In terms of the production and distribution of the voice, audio-signage has moved beyond sudden one-line directives. Peter Lodge’s “Mind the gap” is currently being upgraded, with new voices and more extensive vocal information distributed throughout the various lines of the London Underground. Emma Clarke, one of the new announcers (hers is the voice of the Victoria, Bakerloo and Central lines), is an award winning voice-over specialist for radio ads, corporate presentations and telephone systems. It was Clarke’s emphatically breathy recordings that were reported as belonging to Marilyn Monroe. These were presented alongside other styles, apparently ranging from “newsreader” to “schoolmistress”, with London Underground finally opting for a sort of Thames Valley version of Hollywood-style vocal seduction.
Another key difference between the new Underground voices and that of Peter Lodge, is that while Mind the Gap remains literally stationary, the voice of Emma Clarke rides the trains with us, remapping London as a spoken network of place names and connections. If graphic signs conventionally require a large enough fixed surface and enough room to be seen, almost any kind of space is available as a place to post a sound file or a recorded message. This flexibility is enabling audio technology to have further impact on the already vague provenance of the disembodied voice.
In China I came across an extreme example of the omnipresent capability of talking signs, with a mobile voice that could emerge anywhere throughout the entire public transport environment. The friendly voice of the Shanghai subway seems to jumps from train to platform, to ticket-hall, to street, with the persistence of a running commentary. Pointing out safety features and directions, she also, in a discreet form of advertising, casually recommends bars, restaurants, and department stores – as if to be only a mere breathe away from addressing the passengers by name.
Back at the less garrulous end of the talking-sign spectrum, Spanish poet Jose Angel Valente described his early encounters with the old Mind The Gap recording, heard during visits to London, as a lesson in the conciseness of language. It might also serve as a demonstration of the mysterious technical power of human speech. It wouldn’t be out of place in J. L. Austin’s concept of performative language, from his famous study of the speech act “How to do Things with Words.” Even when many of the conditions associated with human communication are missing, that part of speech that Austin terms the illocutionary act – the clear intention of the utterance – still carries the meaningful force of a statement, allowing it to preside over the social context in which it is heard. Clearly a speaker doesn’t have to understand their own words in order to do things with them, or to coerce others into doing, as any electronic audio-sign will tell you.
The assertive voice of Peter Lodge is one of the most recognisable in London, yet few of us would know whose voice it is, or have any sense whatsoever of that person. Somehow though, there’s a contradiction between the need for these voices to be familiar – famous if possible, ideally (or perhaps perversely) seductive and authoritarian at the same time – and yet materially and technically invisible, even anonymous. The voice, as ever, is able to shift between public and private circumstances.
But even from a purely practical sense there remains a certain obscurity around who it is that produces these spoken forms – ranging, professionally, from recording-studio engineers to artificial-life programmers.
An example of what software developers refer to as extreme natural language processing is already available through certain telephone-operated systems. This is voice recognition, able to respond to a caller’s enquiries. According to one company, customers can “speak freely and have a dialogue that is more like a natural interface, so you feel like you’re talking to a real person.” U.S. railroad company Amtrak uses an automated voice recognition system provided by SpeechWorks International, a prominent voice software developer. Passengers can book tickets and arrange schedules by talking to Julie, “the Voice of Amtrak” – apparently “America’s favourite automated speech personality.”
Julie and other “natural language” voices are programmed attempts at complex vocal interaction, as yet untested in open public space. Other devices still in the developmental stage include ways to distribute the voice. In some cases to deliver it by force. HyperSonic Sound, an alarmingly powerful form of directional sound projection, has been compared to the way a laser beam directs light. HSS transmits a column of ultrasound waves that are only converted back to a human-audible frequency at the point of reaching the listener’s ears. In true acousmatic tradition, the voice is said to arrive inside your head as if from nowhere.
From crowds to solitary individuals, a city is a collection of voices, many of them present to us via the apparatus of telephone, radio, television and computer. These are technologies that have extended the representational spaces of the city and continue to disrupt what we think we know about the voice – about who it belongs to, where it comes from, and who it speaks for. Audio-signs are a powerful addition to this voice-world, acknowledging the voice as a part of the bio-technological fabric of the city.
Peter Lodge’s fading command already seems to issue from a previous age. In the British horror movie Death Line (Gary Sherman,1972) a version of the famous phrase, losing grammatical sense through mindless repetition, threatens at the same time to transform itself into an entire underworld language system. This is not an example of Chion’s acousmétre, but it does suggest a wry comment on the implications of acousmatic space and social interaction. The descendent of Victorian tunnel workers trapped by a cave-in, survives in the labyrinth of tunnels beneath the London Underground. His crude attempt at speech is based entirely on three words picked up from the distorted sound of the recorded subway announcements – in this case a very garbled mind the doors!
Radio, phonograph and telephone are acousmatic media by definition. Although, as Chion points out, with each of these (unlike cinema) it isn’t possible to play with devices of “showing, partially showing and not showing”. Even so, the allocation of voices for radio is an important precedent for the typographical basis of audio-signage. Working in New York in the 1960’s, Tony Schwartz, a sound-obsessed advertising designer, built his reputation on a typographically nuanced use of the human voice in many famous advertising and political campaigns for radio and television.
Inspired by Marshall McLuhan – whose thoughts on the relationship between speech and technology were part of a broader discussion of the struggle between our visual and acoustic sensibilities – Schwartz began to engage with what he recognized as the emerging electronically-generated properties and social conditions of sound. These were part of a shifting relationship to broadcast sounds, in the age of the transistor radio. Schwartz , anticipating electronic devices a few generations ahead – our cell-phones and MP3-players – described new acoustic spaces as being “more like something we wear or sit in than a physical area in which we move.”
In his book The Responsive Chord, Schwartz describes how the new electronic environment was producing its own forms of “auditory acoustic” space – characterized by the way that sound is no longer contained within architectural space, but somehow seems to contain it. The space in which the audio sign, the acousmétre, might be pitched “has no front or back, no above or below, no past or future… no linear directionality.” Messages, transmitted into public places using carefully modulated voices, are present everywhere, even if they apparently come “from nowhere”; and the space they end up creating seems to exist only “for the current fleeting moment”.
But if Schwartz is an early example of a designer involved in the production of voice-characters for projection into public space, millions of people without any professional involvement with sound, also experience the world as a sequence of primarily acoustic, or acousmatic encounters. In an incredible book called Touching the Rock, John Hull, a university lecturer who developed cataracts from an early age, provides a vivid account of the experience of finally losing his sight and coming to terms with the sonic environment. Assessing the way that what we hear and how we hear it places us “within a world,” Hull describes what he calls “the range of contact points between myself and something created by sound…” There is less need to seek them out, he says of sounds – “My eyes, if they could see, would be darting here and there”– because they have the ability to find you.
Hull responds to the auditory qualities of speech in the way that a sighted person might analyse readable forms. On the capacity of the voice to reveal aspects of one’s identity, he runs through a checklist of possibilities: “Is the voice intelligent? Is it colourful? Is there light and shade? Is there melody, humour, gracefulness, accuracy? On the other hand, is the voice lazy? Is it sloppy and careless? Is the range of vocabulary poor and used without precision and sensitivity? These are things which matter to me now.”
Beyond whatever the messages sound like or what they say, a defining feature of acousmatic space is that the spoken words seem to come from nowhere – their infrastructural origins are always carefully concealed. The “speaker”, as voice and source of origin, remains invisible.
Writing in a voice that made me think more of Ralph Ellison’s Invisible Man (“not one of your Hollywood-movie ectoplasms”) than H.G. Wells’, John Hull, no longer visible to himself, imagines that he has become invisible: “To what extent is loss of the image of the face connected with loss of the image of the self? Is this one of the reasons why I often feel I am a mere spirit, a ghost, a memory?” Like Michel Chion’s acousmétre, he cannot visually distinguish himself from the technology that contains him: “Other people have become disembodied voices, speaking out of nowhere, going into nowhere. Am I like this too, now that I have lost my body?”
Perhaps, when we hear the electronic facsimiles of audio signage, we also feel a part of ourselves dissolving into the built environment, becoming a spirit, a ghost, an erasable memory. Or, ironically, even a robot. Whereas the voice, once thought of as uniquely human, is now celebrated for its almost-perfected hybrid status, a new benchmark in our relationship with technology.
We can now conjure up a kind of surrogate workforce apparently from nowhere. They remain out of sight and untouchable, as if somehow unaccountable. And yet they are always there, talking and walking us though those acousmatic spaces of the city. In an elevator on the fourteenth floor (The doors are now closing...); in the carriage of an underground train (Change here for connecting lines...); moving though an automated checkout in the supermarket (Have a nice day).