AES PNW Meeting Report - Spatial Audio and its Applications

n.b. Chrome users need to refresh their browsers to ensure they have the latest content.

Meeting held Wednesday, May 11, 2016, 7:30pm Microsoft Research, Building 99

AES PNW Section Meeting Report Spatial Audio and its Applications with Dr. Ivan Tashev Partner Architect; Audio and Acoustics Research Group Microsoft Research Labs

Section Vice Chair Steve Turnidge, presenter Ivan Tashev, Chair Chris Deckard, AES Exec. Director Bob Moses, committee member Rick Chinn

Section committee member and our presenter for this meeting, Dr Ivan Tashev, Microsoft Research Audio Architect, talks about spatial audio.

During an extended break mid-program, Dr Tashev leads demonstrations and explains what is happening during the HoloLens experience.

Audio recordings of the meeting:

96k mp3	64k mp3
Part 1 (64.8MB mp3) Part 2 (20.2MB mp3)	Part 1 (40.7MB mp3) Part 2 (12.8MB mp3)

Archival video and PowerPoint on the Microsoft Research website

Photos by Gary Louie

Dr. Ivan Tashev, the architect behind the audio technologies of many Microsoft products (and PNW Committee member) spoke on Spatial Audio at the May 2016 meeting of the PNW Section. With the recent release of the MS HoloLens augmented reality device, it was timely to now learn how sounds can be manipulated to appear in 3D space. The meeting took place at Microsoft Research's building on one of MS's major campuses in Redmond, WA. Around 58 attendees (26 or so members) were at the meeting, including AES Executive Director Bob Moses.

Dr. Tashev received his Master's degree in Electronic Engineering (1984) and PhD in Computer Science (1990) from the Technical University of Sofia, Bulgaria, where he was also an Assistant Professor. He joined Microsoft in 1998. Currently Dr. Tashev is a Partner Architect and leads the Audio and Acoustics Research Group at Microsoft Research Labs in Redmond, WA. He has published four books, more than 70 papers, and holds 30 U.S. patents. Dr. Tashev created the audio processing technologies incorporated in Windows, Microsoft Auto Platform, and the Round Table device. He served as the audio architect for Microsoft Kinect for Xbox, and HoloLens. Dr. Tashev is also an Affiliate Professor in the Department of Electrical Engineering at the University of Washington in Seattle, WA.

Dr. Tashev began by describing spatial audio as the technique to make a person perceive sound coming from any desired direction, with applications in music, movies, gaming and virtual reality devices. He went through some history of multi-channel audio and spatial audio, from mono through stereo and 5.1 and up. However, these systems are channel-based, and usually planar, with speakers at about the same level around the listener (there are exceptions, of course). With newer systems such as Dolby Atmos, progress towards making sounds seem to appear from any direction is apparent.

A discussion of channel-based sound field rendering was next. Speaker arrays can be manipulated to control sound directionality very effectively. Pros and cons of channel based systems (versus object-based) were noted.

Another method is to use headphones. The history and problems of binaural were mentioned. But to create a really good 3-D sound field with headphones, positional data must utilize the Head Related Transfer Function (HRTF) to account for the physical characteristics of a person's head, along with movement tracking of the head.

He then went on to compare augmented reality, where computer generated sensory inputs work with some aspect of the real, physical world, versus virtual reality, where the environment is all created and simulates a user's presence. He showed components of the devices, products on the market now and some uses.

Dr. Tashev then discussed object-based rendering of spatial audio. Starting with the known characteristics of human vision, hearing acuity and localization ability, you can determine what you need to do to achieve a good spatial sense. The HRTF is especially important - a generic one won't work nearly as well as a custom one. He described many details regarding obtaining personalized HRTFs, including a fast, cost-effective "Eigen-face" method utilizing the MS Kinect device.

Suitable headphones with tracking hardware already on the market were shown.

Ivan talked next about computing/rendering 3-D audio with the object-based method. A sound "scene" takes into account the desired sound objects, where they are supposed to sound like they are (and how to do that), the data about the user's HRTF and head movement, and so on, at perhaps 50 times per second.

An extended break was held, with attendees trying on a HoloLens after getting their own HRTFs. They could then experience an augmented reality 3-D audio scene. Assisting with this was one of Ivan's co-researchers, David Johnston. His name should be familiar if you ever used the program, Cool Edit, which he wrote. Door prizes were also awarded:

MS prizes courtesy Ivan Tashev/MS
- MS Office - won by Rob Baum
- MS sunglasses - Kai Scheer, Lyle Corbin
- MS USB card - Adam Croft, Henry Honig
- MS paper notebooks - Ariel Peake, Rene Jaeger
- MS wireless mouse - Jeff Trefielo
- MS extenda keychain - David Wilson
- MS pens - Adi Oltean
- MS Windows 10 Pro - Jon Johnson
- MS screen cloth - Kevin Conner
AES item courtesy Bob Moses/AES
- AES DVD, John Eargle interviews Ray Dolby - David Nissimoff

Lastly, Dr. Tashev spoke about Modal-based rendering of a sound field. With help from the Fourier Transform, models of sound fields can be made with microphone arrays. He showed a spherical device with 64 MEMS (Micro-Electro-Mechanical-Systems) microphones on its surface (looking just like a mini Death Star).

Ivan's crystal ball says that channel-based approaches have shown their limits. Parametric modal methods are being scrutinized. Joint object and modal decompositions will be commonplace, and device independence will be important.

Reported by Gary Louie, PNW Section Secretary

Last modified 10/14/2016 18:35:00, dtl