Every month, the joint laboratory invites external speakers to take part in seminars for its partners.
Qiuqiang Kong (The Chinese University of Hong Kong): “Music Transcription, Understanding, and Generation”
Abstract: The joint research of artificial intelligence (AI) and music has been a popular research area in recent years. AI and music have valuable applications in both academia and industry, including music recommendation, production and automatic generation. Music transcription is a fundamental task in AI based music tasks, which aims at transcribing audio recordings into symbolic representations. Music transcription is similar to the automatic speech recognition task in speech processing. We proposed a high-resolution piano transcription system including transcribing piano notes and sustain-pedals. All of onset, offsets and velocities information are transcribed. The system achieves a state-of-the-art piano transcription result by 2020. Using this piano transcription system, we created a GiantMIDI-Piano dataset, the largest symbolic piano MIDI dataset in the world. GiantMIDI-Piano is designed to be a standard dataset to encourage the research on AI and music worldwide. We also introduce the recent works in music understanding and generation with large language models (LLMs).
Bio: Qiuqiang Kong is currently an assistant professor at the EE department of the Chinese University of Hong Kong. He received his Ph.D. degree from the University of Surrey, Guildford, UK, in 2019. Following his Ph.D., he joined ByteDance as a research scientist. His research topic includes the classification, detection, separation, and generation of general sounds and music. He was the top 2% scientist in 2021 in “Updated science-wide author databases of standardized citation indicators. He was known for developing pretrained audio neural networks (PANNs) for audio tagging and was awarded the IEEE SPS Young Author Best Paper in 2023. He won the detection and classification of acoustic scenes and events (DCASE) challenge in 2017. He was known for transcribing the largest piano MIDI dataset GiantMIDI-Piano in the world. He has co-authored over 50 papers in journals and conferences, including IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), ICASSP, INTERSPEECH, IJCAI, DCASE, EUSIPCO, LVA-ICA. He has been cited 3156 times, with an H-index of 28 till Sep. 2023. He was a frequent reviewer for world well known journals and conferences, including TASLP, TMM, SPL, TKDD, JASM, EURASIP, Neurocomputing, Neural Networks, ISMIR, CSMT. He assisted with organizing the LVA-ICA 2018 in Guildford, UK and the DCASE 2018 Workshop in Woking, UK. He is serving as a co-editor for the Frontiers in Signal Processing journal.