Transcript:
JIM RUBERTO: When we do our jobs well, we create something that can have an emotional impact, Honestly, in my background role in the entire thing, if I can play my part well enough and leave behind something that has the potential to evoke the emotions that the performers intended, then that’s everything.
00:31 ZACHARY PATTEN: In their founder’s statement, Peter and Cathy Halstead wrote, “There are moments in music where we are, figuratively, struck by lightning, by a phenomenon beyond the music itself, where suddenly the gap between sound and emotion is bridged.” That special bridge not only comes from the live performances in the Olivier Music Barn, but from the high-resolution recordings of those incredible events.
00:59 At Tippet Rise, there’s a team of people who are passionate about music and recording, and among them is Assistant Audio Engineer, Jim Ruberto. The recordings are a product of a rather mysterious term called post-production that usually takes place in a sound-isolated room with lots of daunting, and usually blinking, equipment.
01:22 Post-production is broken into four steps: De-noising, editing, mixing, and mastering. With each step, the engineers carefully shape and sculpt the smallest elements of the recorded music. Jim is responsible for the first step called denoising, the process of removing many hundreds of unintentional sounds from the recording.
01:44 With the acoustics of the Olivier Music Barn, modern audio technology, and the vision of bridging musical sound with human emotion, we hope to make accessible this alchemy of music and ethos. To understand more about the denoising step, we’ll need to start and the end of a concert in this episode of the Tippet Rise podcast.
02:21 JR: When we finish recording a concert, the output of that is nine channels of audio, minimum, and each one of those channels is a microphone situated in the concert hall, which captures a certain perspective. The microphones are all grouped together. There’s an array of three microphones at the front for left, center, and right. There’s a stereo pair behind them pointed towards the back of the room to capture sound for the surround channels. Then, there’s a set of four microphones up above everything to capture what we call height information, and that’s particularly special here in the Olivier Music Barn due to the jewel box construction feature. Usually, I’m the first person to touch a project after it’s been recorded.
03:11 ZP: If you’ve attended a concert at Tippet Rise or have watched online, you might’ve noticed the microphone array Jim just described, informally called “the bird.” Additional microphones are commonly added to the nine for greater proximal detail - like a spot mic on the cello in the music we just heard by Astor Piazzolla, performed by the Gryphon Trio. These microphones are what capture all of the sound information for the team’s collaborative workflow.
Gryphon Trio performs Astor Piazzolla’s “Muerte del Angel” in the Olivier Music Barn.
Photo by Erik Petersen
03:41 JR: From recording time to finished product it goes through several phases of workflow. We’ve made the workflow pretty agile so we can jump back and forth between projects.
03:52 ZP: The workflow is slightly different for recording sessions, but here we’ll be talking about live performance recordings. And for those projects, denoising is the important first step.
04:03 JR: It’s important to do it first because everything else falls out of what you hear the first time you listen to something. There are some psychological processes there.
04:16 ZP: You might remember hearing a favorite piece for the first time, and that performer’s interpretation became the constant to which you inadvertently compared all other performances of that same piece. Well, it’s also true for post-production work in that you don’t want to introduce the bias of hearing something in a singular way. You also don’t want to send noisy files down the workflow chain. While keeping an open mind, there’s a whole host of noises that Jim removes.
04:44 JR: The sort of events that happen that I’ll want to remove are creaks, creaky floors. You know, a hundred people sitting in a concert hall holding their breath is pretty noisy by itself. When a hundred people in a concert hall are in rapt attention and enjoying a concert, they’re still making noise. We’re humans, we don’t sit still for an hour anymore. The big things that happen most often are coughs and sneezes. When somebody coughs, it’s a broad range of frequencies from the bass frequencies all the way up to the super high treble frequencies.
05:27 ZP: When discussing music, we often refer to musical pitches as letters, like the A, C, and E of an A-minor triad. Jim uses this term frequency, which can also describe pitches, but in numeric form. Frequency is the measure of how fast a periodic signal alters and is expressed in cycles per second, or Hz. You’ve probably heard of A440. In one second, there are four hundred and forty cycles and this is a typical frequency to which an orchestra tunes. In fact, frequency covers a lot more than can be played by an orchestra. On average, our ears hear between 20 and 20K Hz. Take a listen to our audible frequency range. Depending on your playback device, you may not be able to hear the full range.
06:50 JR: When I talk about frequency range, what I mean is basically low, middle, and high, for the purposes of this conversation. The lower frequencies will be the base frequencies. You know, bass instruments, the left hand on the piano, perhaps a cello. The middle frequencies are human voice, and basically most of the instruments. A lot of the action happens in the higher frequency range. That’s where the sibilant range is - the “ess” sounds that you hear when people are talking, or the very, very high notes on a violin or a piano. There’s an entire overtone series that when you play a note on any instrument, it’s creating upper harmonics that occupy that high frequency range.
07:38 ZP: So we have musical pitches written as letters, frequencies, or a number of cycles per second expressed in Hz, and then there are overtones, which can be thought of as a combination of a letter, the fundamental, with its related numeric frequencies above it.
07:56 JR: It starts with the physics of the sound being produced by an instrument. When you play a middle C on a piano, I’d say about fifty-one percent of what you’re hearing is middle C. The rest of what you’re hearing is the next C up, and the next C up above that. And also the fifth of the scale as well, along with the other overtones in the series. Most of what you’re hearing, but slightly most of what you’re hearing is what’s called the fundamental frequency, and the rest of what you hear is all overtones. That’s what makes a particular instrument sound like that particular instrument. It’s the kind of fingerprint of that overtone series.
08:44 ZP: You can probably sing along in tune with that pitch, C, which you likely hear as one note. However, if you were to sit at the piano, in a quiet room, and listen as the note goes through the attack, envelope, and decay, you’ll hear other, higher, notes begin to appear.
09:02 In 1822, French Mathematician and Physicist, Joseph Fourier, showed that some functions could be written as an infinite sum of harmonics. For our purposes, we can say that our piano note C, can also be written as a sum of harmonics or overtones, those quiet, higher notes. This is called a spectrum and there’s an image of this located within the transcript of this episode.
Overtone Spectrum of the note C4, played on the piano.
Image by Zachary Patten
09:30 In the 1960s, an algorithm called the Fast Fourier Transform was made to help quickly show the collection of harmonics. It basically takes a sound and slices it into its sine wave parts. Since a sine wave doesn’t have any overtones of its own, it’s simply a fundamental, they can be added in the proper ratio and loudness to recreate that piano C.
09:58 It sounds almost identical when all the sine waves are stacked on top of each other, like a deck of cards, but listen to what happens if you fan out the cards, so that you can see more surface area of each one.
10:24 Like Jim mentioned, the majority of the sound comes from its fundamental, but every time you hear any note or any noise or any sound, you’re hearing that vertical collection of overtones, too. Imagine all the notes in a big piano chord - every note contributes its own collection of overtones. And this is where Jim’s denoising gets pretty complex.
10:49 JR: When I need to remove or massage an offending sound that is in the exact same frequency as the sound I want to keep, it gets very difficult. It’s also easier to clean up when the material that I don’t want to alter is in a different frequency range. Say a pianist is playing in a low range of the keyboard and then somebody drops their keys on the floor. There’s going to be a little bit of a thump to it, but it’s going to be mostly high, jangly, harmonics that those keys generate. And I can just about disappear that without even touching the frequencies that make up the thing that we’re trying to preserve.
11:30 ZP: Our hearing is so sensitive that, with some critical listening, we can usually tell if any range of these frequencies has been altered. So when the overtones of notes and noises sound together and the microphone records all of it, Jim has to tread lightly in order to remove only the noise.
11:51 JR: My approach to de-noising is to use a very light touch and to leave very natural sounds behind. If something sounds synthetic or altered, I’ll find another solution. My philosophy is “First, do no harm.” I would rather leave an offending sound in a concert than create some synthetic sound to mask it. I would rather have a real cough than a fake, synthesized digital blob in the middle of this beautiful, wonderfully captured event.
12:31 ZP: Beyond the notes and noises in a concert, the sound of the room plays a factor in Jim’s work. Even if you were to stand inside an empty concert hall, you’d still hear the room’s ambient sound, and by the way, the microphones record that, too.
12:46 JR: There’s a difference between noise as just normal ambient sound that you hear in the world and these unintentional events that we’re calling noises. I’m talking about removing unintentional sounds and calling them noises, which is a pretty good if colloquial way to use the term noise. It makes me think of Gordon Hempton and his assertion that silence is not the absence of noise, it’s the presence of all the natural sounds that define our environment. If you stood in an anechoic chamber where there was no noise, that’s a jarring experience, which we don’t ever experience in our normal lives.
13:31 ZP: In 1951 composer and philosopher John Cage visited Harvard University’s anechoic chamber. As the story goes, he entered expecting to hear complete silence, but heard two sounds, one high and one low. When he later asked the engineer about these two sounds, Cage was informed that the high sound was his nervous system and the low was his blood circulation. This realization of the impossibility of silence led to his emancipation of noise.
14:04 JR: I think John Cage would take great issue with the notion of de-noising a recording of a concert. In his philosophy, all of the incidental sounds that happen in that performance are a part of that performance. And, if you were to transcribe that performance, you would put them in the score.
14:20 ZP: Consider Cage’s quote from a lecture in 1937, “Wherever we are, what we hear is mostly noise. When we ignore it, it disturbs us. When we listen to it. We find it fascinating. The sound of a truck at 50 MPH. Static between stations. Rain. We want to capture and control these sounds, to use them, not as sound effects, but as musical instruments.” Not only did Cage find noise fascinating, but also the way music has an effect on the self, a concept dating back to the Greek ethos of music - the power to influence the listener’s emotions and behavior, and indeed their morals. Written in 1948, Cage’s solo piano work, “In a Landscape” performed here by Pedja Muzijevic had the purpose, Cage says, of quieting the mind to invite divine influence.
Pedja Muzijevic, just after performing John Cage’s “In A Landscape,” in the Olivier Music Barn.
Photo by Erik Petersen
15:40 JR: With “In A Landscape,” it’s such a special piece of music to me, and it’s such a sparse and repetitive piece of music. It’s a piece of music that can really envelop you if you let it. In terms of preparing a recording for somebody to, hopefully, have that experience while they listen to it, I feel it is a valid activity and a valuable activity to carefully do this clean-up.
16:10 ZP: It’s an interesting position to remove noises from perhaps the greatest advocate of noise, but this brings back Peter and Cathy’s founding purpose, “to bridge sound and emotion,” to really surrender to the ethos of music, and let it cast a spell over you.
16:29 JR: The overall purpose of removing those sounds is that they’re distracting. My threshold is I sit and listen to the music as a music lover and appreciator, I put myself in a different mindset and, you know, try to let the music cast a spell over me like I know it would were I sitting in the front row at the concert. And when a sound happens that breaks the spell, then I know I need to do something with it. All of these notions of “spell” are really amplified.
17:09 ZP: From our founder’s statement, to the performers, through Jim’s denoising and all of the post-production steps, and ultimately to the listener, it’s this catalyst of spell, the bridge between sound and emotion, that we hope to nurture.
17:26 JR: You know, as a great lover of music, I’m engaged in the process as a listener. There’s this somewhat unconscious evaluation process that I go through. I have trouble describing it because it does somewhat happen in the background without my knowledge. When I engage in the process as a listener and a lover of music, it’s very clear which elements are going to have a negative impact on a listener.
17:52 ZP: A great example happened in a 2018 concert in the Olivier Music Barn. Emma Resmini performed an incredible work for solo flute called “The Great Train Race” by composer Ian Clarke.
18:06 JR: Listening through the concert the first time and letting the music, event, and context just kind of sink in and wash over me, with this concert in particular, there was an event during a very quiet part of the concert. It sounds like somebody dropped a program book on the floor, perfectly perpendicular to the floor, so it made a big thump. I remember being there at the concert at the time knowing that I was going to have to deal with that at some point in the future. Flash forward to months later, and I load up this project on my audio workstation, and low and behold it’s still there.
18:47 ZP: The nature of JIm’s work maybe invites a visual art analogy. Like how a conservator restores a painting - maybe an old varnish has darkened or maybe the painting has been defaced, like when Mark Rothko’s Black on Maroon was vandalised with graffiti ink in 2012. Of course, the book drop was unintentional, but conservators have to go through the challenging process of figuring out the chemical makeup of the materials, the paint, the graffiti, and, of course, try to not remove or harm anything original. Jim also has to investigate the complexities of the original sound as well the noises he wants to remove, and they’re both within the canvas backdrop of the room’s ambient sound.
19:36 JR: My process at first is to understand the sound. The offending sound in this case was just a big thump - it was a book hitting the floor, which really covers almost the entire frequency range.
19:52 ZP: Here’s where we want to combine that audible frequency range we heard with the spectral view, but with much more depth and resolution. Jim is able to see and manipulate the sounds of the music, the noises, and the sound of the room.
Audio spectrum with noise.
Image by Jim Ruberto
20:08 JR: If you’re looking at the images on the Tippet Rise website, you can see what looks like a big blob that goes from the bottom of the screen to the top of the screen. The top of the screen is the very high frequencies, and the bottom of the screen are the very low bass frequencies. Before and after that event, you can see the spectral makeup of the flute sounds that she was playing. You think of a flute as an instrument that has almost exclusively high frequencies, but that’s just not the case.
20:45 ZP: Here’s the musical selection directly from the microphones, before Jim worked on it. If you listen closely, you can hear some of the creaks he mentioned earlier, and the book drop as well.
21:14 JR: Particularly with Emma’s playing and the way she was playing that piece, there are actually very low frequency components to the staccato notes that she was playing. When this audio event happened, some of those frequencies were all mixed together. The components of the flute sound that I want to preserve are completely masked by the noise from a book falling on the floor. The array of tools that I use range from a simple copy-paste to a very advanced AI-driven synthesis where you indicate what’s happening before and what’s happening after the sound that you want to remove. It does a lot of analysis to kind of interpolate an average and actually synthesize something believable between the beginning and the end of what you want to replace. That is not without its problems. It will leave behind digital artifacts, and depending on how exposed the sound is, it can be very noticeable and that’s not what we want. In an attempt to remove or mask the offending sound, what I don’t want to do is create some other distracting, unnatural sound. And it really is a judgement call.
22:36 ZP: Here’s an interesting technical point for why we would rather remove noises than leave them in, as a natural part of the performance. And, it has to do with enhanced sensitivity of microphones as well as the audio playback system.
22:51 JR: If you were in the room that day, you experienced that sound along with the music, however, the way that it gets captured can somewhat enhance some of the things that we don’t want. When I listen to the recording of the book hitting the floor, the low frequency component of it is exaggerated. Listening to it on a stereo system, that book hitting the floor would activate the listener’s subwoofer and create this very huge low frequency sound that really isn’t authentic to what was in the room.
23:30 ZP: This brings up another important difference between a live performance and the recording of that performance, and it’s really about context.
23:40 JR: As someone who is attending the concert in the concert hall, you probably have a program book in your lap during the concert, and when that event happened and the program book hit the floor, anybody in the room was probably a little bit surprised by it, but immediately understood what happened and moved on. Fortunately, it really works in our favor that our cognitive systems work really hard to try to make sense of what we see and hear. Now, if you’re a listener at home listening to a reproduction of this concert, you don’t have that context and if you heard that sound, it would be more jarring to you than if you were at the concert. You wouldn’t really have a context to understand it.
24:24 ZP: Once Jim understands the complexity of the sound as well as the context, it’s time to begin the removal process.
24:31 JR: When it’s time to address this problem, I have a multitude of tools at my disposal. My favorite technique that yields the best sounding result is - do nothing. In this case, that wasn’t an option. That’s always my first choice, though. Leave it there, don’t touch it, it’s the most natural sounding result. That did not work in this case. So, I turned to the technological tools. One of the great tools is the visualization of it that’s on my screen. If you’re looking at the image of this on the Tippet Rise website, you can read from left to right, and you can see the notes that Emma is playing. They are fairly staccato and the bright parts are where it’s the loudest. And you can see that all of the sounds have a bright part at the left and then they kind of fade out going to the right. That is the sound resonating in the room and dying away.
25:33 ZP: If the piano overtone spectrum looks like a sketch, then the view Jim has is surely a painting. There’s so much information here: time is represented on the x-axis and the frequency range on the y. Each vertical slash is a note that Emma played, and you can see the horizontal lines above the bright fundamental, indicating that note’s overtones. Anything that made a sound in the Olivier Music Barn, Jim can see on his screen - including the sound of the room.
26:06 JR: With the high-frequency sound in the room, which is really even higher in frequency than the instrument itself, it’s just ambient sound in the room. And, it can be very convincing and very transparent to just copy and paste. If you’ve ever used a word processor, you know what I’m talking about. With the spectral editor, I can copy a second and a half of room tone that happened three or four seconds before this event, paste it over the event, and that can conceal the sound in a very effective way, but also in a very authentic way. It’s not like I’ve injected something new, I’ve just kind of rearranged time a little bit. Now once we arrive at the frequencies of the instrument itself, it gets a little more complicated. The tools can, to a certain extent, actually tell what’s a musical note and what is just noise. And, it does it’s best to separate the two. That does leave digital artifacts sometimes, and there are a half a dozen different settings for each tool to really tweak and really approach it with some nuance.
27:28 ZP: Like Cage said, what we mostly hear is ambient noise, but we generally ignore it. What really gets our attention are the sounds that are in the middle of our listening range. Perhaps our most familiar sound is that of the human voice. Is it coincidence that our voices are exactly in that middle zone? And because we are more conscious of this zone, Jim has to be very careful, like an art conservator slowly scraping away a painting’s old varnish with a razor blade - one millimeter, or one overtone at time.
28:06 JR: And then approaching the mid-range sounds, that’s where it starts to get complicated. Looking at the visualization, you can see that the book drop happened just a little bit after the beginning of that note, if I use a synthesis tool, it will leave behind artifacts that we’ll very readily hear. If I use a clean-up tool, it will also leave behind artifacts.One other challenge is when that book hit the floor, it made a transient sound, but that sound bounced around the room for about a second and a half. And that activated reflections in the room for a second and a half, up to two seconds. And, the entire sound needs to be removed. I feel like there are some corollaries to physical processes with this, you know you can’t unscramble an egg, and that’s what I feel like I’m trying to do sometimes.
29:03 ZP: Hopefully, this is showing just how nuanced this process can be. Between the fundamental sounds of the notes and the noise, all of their many respective overtones, the ambient sound of a room with over a hundred people, and the two second reverberations of all of these sounds coalescing, which without the noise is really the magic of music, But there’s another level of detail to be explored.
29:29 JR: Moving on to the low frequency component of it is it’s very interesting in this particular piece of music, the technique that Emma is using she’s playing these staccato and every time she plays the note, there is actually a fair amount of low frequency content you can hear it that there’s almost a little thump on the beginning of each note. If I just erased the low frequency components of the book drop, that note that she was playing right then, you know the bottom would fall out and it would sound incredibly unnatural.
30:09 ZP: The performer’s articulation and musicality help cast that spell we’ve talked about, and Emma’s performance is enchanting. Remember this composition is called “The Great Train Race.” Take another listen to the audio selection, this time with a little more context.
30:56 A compositional feature is the connection to the sound of a train. Large locomotive trains historically have multiple horns called chimes which produce different notes. When sounded together, they form a chord. And, If you see a cross section of a toy train whistle, you’ll see multiple resonating chambers which also allow the toy to produce a chord.
31:22 Along with her articulation, Emma is using a technique called a multiphonic. Like the toy train whistle, flute multiphonics are possible when the player splits their airstream and uses a forked, or vented fingering so it hits both the fundamental and brings out an upper harmonic. That’s why some of those notes have two bright spots at their base - she’s playing two notes at the same time.
31:58 Jim has to make sure the sound quality matches the nuances of Emma’s performance, and when Emma plays two notes simultaneously, it doubles the number of overtones Jim has to account for. All of these subtle features help make musical performances that much more special, and as Jim said earlier, leaving or removing any of these moments is really a judgement call.
32:22 JR: These judgements that I have to make, where I have to be pretty careful is in assessing the intention. You know, I said that I hope to remove unintentional sounds, and you wouldn’t be wrong to ask, well, who am I to judge that? Fortunately, I have a pretty good toolset for making those judgements. The score of the piece is a primary reference. I also often refer to the film of the concert, particularly if it’s new music or if it’s a piece that I’m not as familiar with, that helps me understand the context of the performance. And also, I was very likely at the performance.
32:58 ZP: Here is the finished selection once Jim competes all of the steps we’ve talked about. Now, we can hear Emma’s wonderfully smooth accelerando without any noisy bumps in the road. An image of the clean audio spectrum is also located in the transcript.
Denoised audio spectrum.
Image by Jim Ruberto
33:44 JR: I felt embolden to make a little more aggressive move to preserve the intention, if not necessarily preserving the exact performance. And in this case, I resorted to copying and pasting, not the entire note, but one small amount of the frequency range from a note that she had just played prior. If that had happened right at the beginning of the note, I wouldn’t have been able to use that note at all. The way that I was able to edit this and have it sound convincing, is that the very beginning of the note - the attack of the note - is actually the note that she played, and I really only had to manipulate the sustain of the note, which works in my benefit in quite a few different ways. Technically, it’s easier to work. Cognitively, if something is wrong with the beginning of the note, we’ll really notice. As an audio engineer, one of the real maxims that we follow is that we trust our ears more than we trust our eyes. These visualization tools are very powerful, but they can also be a little bit deceptive. I could erase that sound from the screen and still hear it. If I work on the screen and feel like I’ve sufficiently removed the sound and then I listen back, there can still be components of the sound that I hear, but aren’t visible on the screen anymore. It’s mostly because they’re all mixed together.
35:41 I feel like I do want to really, you know, really put this in relief that an audio engineer, while it’s a highly technical job, it’s also a very human occupation in that we’re trying to evoke human emotions by playing sounds into your ears. What I’m trying to preserve is the intention of the artist and of the performance. The reality of it is that Emma played the piece very well and I wanted to preserve that moment. And, it’s almost like being on an archeological dig where I have this fragile artifact that I want to preserve, but I have to get all of this hard, awful stuff off it without damaging it, and I think that’s a perfect metaphor for the work.
36:31 ZP: An archeologist uses reason, intuition, and techniques in their search to find something that can be life-changing. And Jim carefully removes these noises to find that treasure of spell. But in both cases, there are no directions and no shortcuts to discovery.
36:50 JR: There’s no roadmap. The tools that we use are very powerful and have many different options and modules for how you can recover sounds that have been masked over and some of them are algorithmic, sometimes it’s a simple copy and paste, and anything in between. I’ve always thought about the denoising process as a reductive process, but it’s more of a massaging, blending and mixing. When I’ve done my job well, you can’t tell that I’ve done anything.
37:23 ZP: Audio engineers are an essential part of making that bridge between sound and emotion, and to translate the performer’s intention through the technology. Some of the heroic work they do, you’d have to hear to believe.
37:37 JR: Sometimes it does feel kind of heroic when you’re faced with just something that seems like it can’t be fixed. And then you eventually find a way to fix it. I don’t like using the word “fix.” Not because it’s aesthetically wrong or undersells it, but because it’s just kind of a shortcut to describe something more complex. I am talking about something philosophical when I object to the word “fix.” And, I think we are back in the realm of John Cage believing the incidental sounds that happen during a performance are a part of that performance, they are an indelible part of that performance.
38:13 ZP: One of the great things about the live performance, and something dearly missed, is experiencing those emotions together, noises and all. There’s an unquantifiable magic in the communal sharing of music. But recordings can live beyond the day of the performance, and audio engineers help to preserve a moment that can continually be discovered and enjoyed over and over again.
38:41 JR: Honestly, the lofty and mostly unachievable goal is to recreate the concert experience in the listener’s head. What audio engineers do is really theater of the mind in a lot of ways. In terms of my fulfillment in doing this work, it sure is lovely to work on work that I really care about. I think it’s very important work that we’re doing here in creating this archive of the experiences here at Tippet Rise. It’s such an asset for the future. It can even take on a deeper sense of importance that we’re preserving these moments and creating these recordings that transport us in time and space.
It can even take on a deeper sense of importance that we’re preserving these moments and creating these recordings that transport us in time and space.
Photo by Erik Petersen