Speech, Audio & Music Intelligence Research

Welcome to ByteDance booth! We’re SAMI (Speech, Audio & Music Intelligence) team at ByteDance AI Research lab. Over last 2+ years, we’ve been working on many exciting research projects from our London, California, and Beijing/Shanghai offices. The topics include MIR, intelligent music creation and production, speech analysis and synthesis, multi-modal understanding, audio understanding, and so on.


We’re hiring research scientists, research interns, and software developers at multiple locations as follows.

There are more positions that are going to be listed soon. Please chat with us at our slack channel #sponsor-bytedance for more details!


Our papers at ISMIR 2020

Papers at other venues


Piano transcription | pdf | Code

Using the transcribed result, we could recreate high-quality piano tracks.

Sound event detection | pdf | code

The systems captures what’s happening in the world by listening 👂

Music source separation

Enjoy our source separation technology! Vocal 🎙, drums 🥁, and bass 🎸 - or, piano 🎹 vs violin 🎻

Speech enhancement with weakly labelled data | pdf

Enhance the speech signal by suppressing other signals, and do it only with weakly labelled audio data.

Audio source separation | pdf

Our system separates a target audio source from a mixture of audio - a noisy sport broadcasting content or nature sounds.


Nose-to-music is a gamified music video creation to follow the notes with your, ahem, NOSE 👃!