Speech, Audio & Music Intelligence Research
Welcome to ByteDance booth! We’re SAMI (Speech, Audio & Music Intelligence) team at ByteDance AI Research lab. Over last 2+ years, we’ve been working on many exciting research projects from our London, California, and Beijing/Shanghai offices. The topics include MIR, intelligent music creation and production, speech analysis and synthesis, multi-modal understanding, audio understanding, and so on.
We’re hiring research scientists, research interns, and software developers at multiple locations as follows.
- Software Engineer in Music/Audio Signal Processing in Mountain View, California, US
- Research Scientist in Speech & Audio in Mountain View, California, US
- Software Engineer, Real-time Audio C++ in London, UK
There are more positions that are going to be listed soon. Please chat with us at our slack channel #sponsor-bytedance for more details!
Our papers at ISMIR 2020
- The Freesound Loop Dataset and Annotation Tool (Slack channel:
- Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops (
- Human-AI Co-creation in Songwriting(Slack channel:
- Deep Composer Classification Using Symbolic Representation
Papers at other venues
- Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement
- ByteSing: A Chinese Singing Voice Synthesis System using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
- Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis
- Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise
- Xiaomingbot: A Multilingual Robot News Reporter
Using the transcribed result, we could recreate high-quality piano tracks.
The systems captures what’s happening in the world by listening 👂
Music source separation
Enjoy our source separation technology! Vocal 🎙, drums 🥁, and bass 🎸 - or, piano 🎹 vs violin 🎻
Speech enhancement with weakly labelled data | pdf
Enhance the speech signal by suppressing other signals, and do it only with weakly labelled audio data.
Audio source separation | pdf
Our system separates a target audio source from a mixture of audio - a noisy sport broadcasting content or nature sounds.
Nose-to-music is a gamified music video creation to follow the notes with your, ahem, NOSE 👃!