Skip to content
View xieh97's full-sized avatar
:octocat:
I may be slow to respond.
:octocat:
I may be slow to respond.

Block or report xieh97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
xieh97/README.md

Hello World! 👋 I'm Huang Xie (谢晃)

✨ About Me

I'm a PhD student specializing in Machine Learning and Signal Processing, with a particular focus on Audio-Language Learning and Audio Information Retrieval. My research involves contrastive learning, zero-shot learning, multimodal learning, language-based audio retrieval, and audio classification.

🔥 Research Interests

  • Audio-Language Learning focuses on building systems that connect audio signals with natural language, enabling seamless interpretation and interaction across these modalities. It involves techniques that align audio features with textual representations, allowing for tasks such as audio captioning, language-based audio retrieval, and audio question answering.
  • Audio Information Retrieval is the process of analyzing and retrieving information from audio content, such as music, speech, or environmental sounds. It encompasses tasks like sound classification, music recommendation, and similarity-based retrieval, facilitating the organization, retrieval, and utilization of audio data in industries ranging from entertainment to security.

🎯 Tech and Interests

📚 Publications

  • 📃 H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen, "Text-based Audio Retrieval by Learning from Similarities between Audio Captions," Accepted by IEEE Signal Processing Letters. 🔥🔥🔥
  • 📃 H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen, "Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2024, pp. 201-205. arXiv
  • 📃 H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen, "Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2023, pp. 226-230. arXiv
  • 📃 H. Xie, O. Räsänen, and T. Virtanen, "On Negative Sampling for Contrastive Audio-Text Retrieval," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2023, pp. 1-5. arXiv
  • 📃 H. Xie, S. Lipping, and T. Virtanen, "Language-based Audio Retrieval Task in DCASE 2022 Challenge," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2022, pp. 216-220. arXiv
  • 📃 H. Xie, O. Räsänen, K. Drossos, and T. Virtanen, "Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2022, pp. 8867-8871. arXiv
  • 📃 H. Xie, O. Räsänen, and T. Virtanen, "Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2021, pp. 326-330. arXiv
  • 📃 H. Xie and T. Virtanen, "Zero-Shot Audio Classification via Semantic Embeddings," in IEEE/ACM Trans. Audio Speech Lang. Process., vol. 29, pp. 1233-1242, 2021. arXiv
  • 📃 H. Xie and T. Virtanen, "Zero-Shot Audio Classification Based on Class Label Embeddings," in Proc. Work. Appl. Signal Process. Audio and Acoustic. (WASPAA), 2019, pp. 264-267. arXiv

🏆 Activities

  • 🧑‍💻 Task coordinator for Language-based Audio Retrieval in DCASE Challenge 2024 (Task 8).
  • 🧑‍💻 Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2023 (Task 6).
  • 🧑‍💻 Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2022 (Task 6).

💬 Connect with Me

Pinned Loading

  1. contrastive-negative-sampling contrastive-negative-sampling Public

    Source code for negative sampling for contrastive audio-text retrieval (ICASSP 2023)

    Python 3

  2. audio-caption-aligning audio-caption-aligning Public

    Source code for audio-caption aligning (ICASSP 2022)

    Python

  3. dcase2023-audio-retrieval dcase2023-audio-retrieval Public

    Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2023 Challenge

    Python 8 2

  4. dcase2022-audio-retrieval dcase2022-audio-retrieval Public

    Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2022 Challenge

    Python 7 1

  5. retrieval-relevance-crowdsourcing retrieval-relevance-crowdsourcing Public

    Data and instructions for crowdsourcing text-based audio retrieval relevance

    HTML

  6. audiocaps-dl audiocaps-dl Public

    Python program to download AudioCaps from YouTube.com

    Python 1