Sony / SonyCSL

sony banner

Sony R&D - AI x Audio: From Research to Production

Over its 70-year history, Sony has always sought to provide consumers with extraordinary, excellent-quality audio products right from the era of tape recorders and transistor radios.

We are also focused on using our speech processing, image processing, communication, mechatronics & motion control, semiconductor, sensing, and AI technologies to research and develop products, services and entertainment that enable customers to feel Sony’s exciting “KANDO” power of emotional connection.

We have scheduled to hold sponsor events at ISMIR 2020 to introduce these technologies.

Industry Poster

  1. 8A Thursday Oct 15th @ 17:35-18:55 UTC
  2. 8B Friday Oct 16th @ 6:05-7:25 UTC

Meetup with Industry Session

  1. Session A: Wednesday Oct 14th @ 14:30 UTC
  2. Session B: Thursday Oct 15th @ 2:30 UTC

1. Researches

Recent Works

  • D3Net: Densely connected multidilated DenseNet for music source separation, from N. Takahashi and Y. Mitsufuji
  • All for One and One for All: Improving Music Separation by Bridging Networks, from R. Sawata, S. Uhlich, S. Takahashi, and Y. Mitsufuji
  • Adversarial Attacks on Audio Source Separation, from N. Takahashi, S. Inoue, and Y. Mitsufuji

More publications can be found here.

2. Audio Products & Services Utilizing AI Technologies

Karaoke Application

Sony’s real-time version of music separation has been successfully deployed on mobile devices as an in-app Karaoke module. The demo video can be found in the link below. Please note that the volume of vocal can be controlled freely and the residual can be used to guide the pitch of a Karaoke user. This technology is being licensed to several digital service providers, e.g., Line Music Japan/Taiwan.

* Special thanks to Line Music Japan for the creation of this demonstration video.

360 Reality Audio

360 Reality Audio is a new music experience that uses Sony’s object-based spatial audio technology. Individual sounds such as vocals, chorus, piano, guitar, bass and even sounds of the live audience can be placed in a 360 spherical sound field, giving artists and creators a new way to express their creativity. Listeners can be immersed in a field of sound exactly as intended by artists and creators.

Optimization by personal ear data uses Sony’s original estimation algorithm utilizing machine learning. We analyze that listener’s hearing characteristics by estimating the 3D shape of the ear based a photo of their ear through “Sony | Headphones Connect” app.

Noise Cancelling Headphones

Adaptive Sound Control automatically adjusts to whatever you do. The Sony | Headphones Connect app offers Adaptive Sound Control, a smart function that automatically detects what you’re up to - such as traveling, walking, or waiting - then adjusts ambient sound settings to suit the situation. You can also customize the settings to your preferences.

*As of June 1, 2020. Ambient noise-reduction according to research by Sony Corporation, measured using JEITA-compliant guidelines in Truly Wireless style noise-canceling headphones market.

3. Recruiting Information

If you are interested in working with us, please click here for more open positions of job and internship!

sony banner

Sony CSL - AI-Powered Music Production


1. Sony CSL Music Team

Created in 1996, the music team at Sony CSL works on innovative music production technologies. Researchers in the team focus on two domains : digital signal processing and music generation with deep learning. The team has a rich history of publication at ISMIR. Last year, we have (co-authored) 4 papers:

  • Controlling Symbolic Music Generation based on Concept Learning from Domain Knowledge, from T. Akama (CSL Tokyo)
  • Auto-adaptive Resonance Equalization using Dilated Residual Networks, from M. Grachten and E. Deruty
  • Learning to Traverse Latent Spaces for Musical Score Inpainting, from A. Pati, A. Lerch and G. Hadjeres
  • (Best Paper Award) Learning Complex Basis Functions for Invariant Representations of Audio, from S. Lattner, M. Dorfler and A. Arzt

This year, we have 2 papers at ISMIR:

  • DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks, from J. Nistal, S. Lattner, and G. Richard
  • Connective Fusion: Learning Transformational Joining of Sequences with Application to Melody Creation, from T. Akama

2. CSL and Artists - Interview with producer Donn Healy


Artistic collaborations are an important part of CSL’s work. For us, creativity is a key value. We are working with musicians and music producers. We integrate our technology in their creative processes. For the last few years, people have been worried that A.I. would replace musicians. At CSL we want music production to remain human-centered. Therefore, interaction is an everyday keyword, whether its is between an artist and a machine or between musicians and scientists.

Music is one obsession of humans, and so is technological progress. Both have always gone hand in hand. To paraphrase Robert Moog, at CSL, our ambition is simply to “build stuff that musicians want to use”. We have detailled this approach during this year’s ICASSP Sony workshop: The sound of AI.

For ISMIR 2020, we want to provide an insight into the journey of an artist discovering AI. Donn Healy has been working with us, he has devoted himself to the exploration of AI use in music, producing 12 tracks using our technology.

He also deconstructs one of the tracks he produced, to talk about the different tools he used.

3. Publicly Available Ressources

As said above, we provide artists with prototypes that encapsulate our technologies. We have three categories of prototypes:

  • Compositional tools that operate on symbolic music or directly on audio.
  • AI-based synthesizers and sound design tools.
  • Mixing and mastering tools, that operate on the signal.

Even though the user interfaces and the plugins cannot be shared publicly, we usually publish papers about the underlying models.

  • Markov-Chain based models:
  • Gated Autoencoder based models
    • DrumNet : A drum track generator using learned patterns of rhythmic interaction.
    • New BassNet : A generator of bass guitar tracks with learned interactive control.
  • Variational Autoencoder based models
  • GAN-Based Models
    • DrumGAN : A drum sounds synthesizer with high-level control.
  • Digital Signal Processing & Neural Networks
    • ResonanceEQ : A plugin that targets resonances, to enhance or to smoothen them.

4. Follow us !

(Icons are clickable)

@SonyCSLMusic @SonyCSLMusic @SonyCSLMusicTeam Sony CSL Music Team