Emerging Technology: Computation, Sensors, and Audio Synthesis
Recall the music technology of 50 years ago, during the British Invasion of 1964. The Beatles, the Who, and the Rolling Stones began to use feedback from electric guitar amplifiers. In academia, mainframes took hours to compute one minute of acoustic signal, to be performed by a tape recorder. Going beyond tape to so-called live electronics, Karlheinz Stockhausen had just begun to experiment with microphones, sine wave generators, and ring modulators.
But nowadays, laptop orchestras are everywhere. If you extend Moore’s Law from Stockhausen’s four-diode ring modulator through today’s billion-transistor laptops, 50 years from now an everyday musical instrument should have more transistors than the entire planet had 5 years ago. An instrument more powerful than all Big Four music labels combined could create an entire genre of music as easily as a Casiotone creates a bleep. (Granted, this overlooks implementation details such as power consumption and heat dissipation. But it will be several decades before such hardware is even invented. These technicalities will be solved.)
What about sensors? They have not advanced as startlingly as the microprocessor. Indeed,
what they sense — position, pressure, light, EEGs — has hardly grown in half a century. But how they sense has advanced, in size, cost, power consumption, and speed. Smartphones, where every milliwatt and every cubic millimeter counts, include sensors for temperature, barometric pressure, humidity, tilt, and of course GPS position. Compare that to the 1964 predecessor of GPS, TRANSIT, which was too heavy for a man to lift, took thousands of times longer to report your location, and was 40 times less accurate.
The sophistication of sensors has also advanced. For example, some image sensor chips in mobile phones report when they detect a smile. (This is not merely software: this is in the chip itself.) Also, combining colocated measurements, called sensor fusion, yields what Stockhausen would have called magic but what we call commonplace: a photograph that is not merely geotagged but also tagged by content such as email addresses of human faces, or websites of storefronts and visible landmarks. Wilder magic happens when the sensors are online, such as pointing a smartphone’s image sensor at a barcode on a supermarket shelf to learn the item’s price in nearby stores.
Sensor fusion can also increase noise immunity and decrease latency. For instance, when measuring the pitch of a plucked string, we can fuse conventional pitch-tracking software with a sensor that measures where the string contacts the fingerboard. At low pitches, the software by itself is too slow as it waits for an entire waveform or two. But that’s exactly when the contact sensor is fast and precise.
Sensor fusion can also increase sensitivity. Eulerian video magnification amplifies a video signal’s otherwise invisible variations of color or motion, such as respiratory motion, the reddening of skin with each heartbeat, or even (as before) the vibration of a guitar string. Fusing a dozen video cameras yields a motion-capture system that tracks the positions of hundreds of points with submillimeter accuracy throughout a large room, thousands of times per second. Fusing several microphones or radiotelescopes into a beamforming array gives them instant, precise aiming. Finally, combining a microphone with clever software yields a sensor for speech — what we usually call speech recognition.
Fifty years hence, we can imagine sensors that are ubiquitous and practically uncountable; cognoscenti call this utility fog. In today’s language, a safe generalization is that you will measure anything you can name, as accurately as you want, as fast as you want, under any conditions.
As far as audio synthesis algorithms go, much of the history of computer music consists of clever tricks to extract ever more interesting sounds from only a few — or a few million — transistors: tricks such as filtered broadband noise, frequency modulation, or plucked- string simulation. But such optimizations are pointless when you have a brain the size of a planet. Brute-force additive synthesis of individual sine waves is easy. So is brute-force simulation of a plucked string, all the way down to the molecular bonds that determine the plectrum’s stiffness and the string’s inertia.
When we summarize all this, the language becomes theological. We get a musical instrument that (within its domain) is omniscient, omnicognizant, and omnipotent. It observes all that can be observed, analyzes these observations completely, and from those conclusions then produces whatever sound is optimal for a particular purpose.
What this means for musical instruments is hard enough to assimilate and ponder. But what such prodigious sensing and computation means for human culture, no one can predict in detail: uploaded minds, computronium (converting all matter into computers), the blurring of human and machine (palely foreshadowed by mobile social media), and immortality. These are aspects of what some call the Singularity, the point in history when change becomes so rapid that language before then cannot even describe it. How we then shall make, share, understand, and enjoy music must remain a mystery.
Still, this undoubtedly highfalutin’ talk informs the XD of today. Occasionally taking the long view, either the half century since the British Invasion or the eons of the pipe organ, escapes the rut of the past week’s RSS feeds, the past year’s product launches. When confronted with the Accordion of Armageddon, even the most far-out creatives must concede that their imaginations could be wilder.
Now, let’s rewind those 50 years, to consider nuts-and-bolts details of some unobtanium- free designs that require only a few billion transistors.