Speech, Audio & Music Intelligence Research

Welcome to ByteDance booth! We’re SAMI (Speech, Audio & Music Intelligence) team at ByteDance AI Research lab. Over last 2+ years, we’ve been working on many exciting research projects from our London, California, and Beijing/Shanghai offices. The topics include MIR, intelligent music creation and production, speech analysis and synthesis, multi-modal understanding, audio understanding, and so on.

Hiring

We’re hiring research scientists, research interns, and software developers at multiple locations as follows.

There are more positions that are going to be listed soon. Please chat with us at our slack channel #sponsor-bytedance for more details!

Publications

Our papers at ISMIR 2020

The Freesound Loop Dataset and Annotation Tool (Slack channel: #poster-2-16-ramires)
Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops (#poster-3-13-chen)
Human-AI Co-creation in Songwriting(Slack channel: #poster-5-11-huang)
Deep Composer Classification Using Symbolic Representation

Papers at other venues

Demo

Piano transcription | pdf | Code

Using the transcribed result, we could recreate high-quality piano tracks.

Sound event detection | pdf | code

The systems captures what’s happening in the world by listening 👂

demo

Music source separation

Enjoy our source separation technology! Vocal 🎙, drums 🥁, and bass 🎸 - or, piano 🎹 vs violin 🎻

Speech enhancement with weakly labelled data | pdf

Enhance the speech signal by suppressing other signals, and do it only with weakly labelled audio data.

Audio source separation | pdf

Our system separates a target audio source from a mixture of audio - a noisy sport broadcasting content or nature sounds.

Nose-to-music

Nose-to-music is a gamified music video creation to follow the notes with your, ahem, NOSE 👃!