I am now a fourth-year Ph.D. student supervised by Prof.Jiafeng Guo, in Computer Science at Institute of Computing Technology, Chinese Academy of Sciences. My research is focused on information retrieval, and in particular on the recently emerging paradigm of generative retrieval. In generative retrieval, all information about the corpus is encoded within a consolidated model as its parameters. The model autoregressively generate relevant document identifiers for queries, while dense retrieval methods match pre-indexed documents with queries. Generative retrieval enables end-to-end optimization, faster inference, and reduced storage costs. My specific research explores generative retrieval in corpus modeling theory, relevance learning mechanisms for complex search scenarios, and practical applications.

🔥 News

  • Nov. 2024:  🎉🎉 A research paper is accepted at the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025).
  • Nov. 2024:  🎉🎉 I received the “National Scholarship”(国家奖学金).
  • Sep. 2024:  🎉🎉 A research paper is accepted at the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) as Spotlight.
  • May. 2024:  🎉🎉 A research paper is accepted at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
  • Apr. 2024:  🎉🎉 A research paper is accepted at Transactions on Information Systems (TOIS).
  • Dec. 2023:  🎉🎉 I received the “President of Institute of Computing Technology’s Prize Scholarship”(中国科学院计算所所长优秀奖).

📝 Publications

[KDD 2025] Generative Retrieval for Book Search
31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (CCF-A)
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Shihao Liu, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng

[NeurIPS 2024, Spotlight] Generative Retrieval Meets Multi-Graded Relevance
The 38th Annual Conference on Neural Information Processing Systems (CCF-A)
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng

[ACL 2024] Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval [PDF] [Poster] [Slides]
The 62nd Annual Meeting of the Association for Computational Linguistics (CCF-A)
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

[TOIS] Listwise Generative Retrieval Models via a Sequential Learning Process [PDF]
Transactions on Information Systems (CCF-A)
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng

[KDD 2023] Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies [PDF] [Slides]
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (CCF-A)
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Jiangui Chen, Zuowei Zhu, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng

[CCL 2024] 生成式信息检索前沿进展与挑战
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Yixing Fan, Yubao Tang, Jiangui Chen, Ruqing Zhang, Jiafeng Guo

🎙️ Tutorials

[SIGIR 2024] Recent Advances in Generative Information Retrieval [Website]
Yubao Tang, Ruqing Zhang, Zhaochun Ren, Jiafeng Guo, Maarten de Rijke

[TheWebConf 2024] Recent Advances in Generative Information Retrieval [Website]
Yubao Tang, Ruqing Zhang, Weiwei Sun, Zhaochun Ren, Jiafeng Guo, Maarten de Rijke

[ECIR 2024] Recent Advances in Generative Information Retrieval [Website]
Yubao Tang, Ruqing Zhang, Zhaochun Ren, Jiafeng Guo, Maarten de Rijke

[SIGIR-AP 2023] Recent Advances in Generative Information Retrieval [Website]
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke

📝 Industry applications

[2024-Present] Book search
Books contain complex, multi-faceted information. To address the unique challenges in book search, we designed an effective GR framework for book search, including data augmentation and outline-oriented book encoding, which outperforms the state-of-the-art GR baseline by 9.8% in terms of MRR@20 on the Baidu dataset.

[2023] Official site retrieval at Baidu search
Official site retrieval task aims to understand query intents on official sites operated by administrative units, and further guide the search engine to recall relevant official sites. We designed a generative retrieval method, called Semantic-enhanced DSI (Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies), and applied it into this task, which significantly outperformed the existing baseline of Baidu search engine by 8.83% in terms of Recall@20 on the online dataset.

📝 Projects

[2024] Co-training framework for Retrieval-augmented Generation and Generation-augmented Retrieval
We design the framework as a cooperative two-player game, leveraging the complementarity between retrieval-augmented generation (RAG) and generation-augmented retrieval (GAR) to progressively improve RAG performance. Additionally, we address privacy concerns, particularly the leakage of Personally Identifiable Information (PII), by identifying and neutralizing the neurons in the generator responsible for PII while preserving model performance.

[2023] GoMate: RAG Framework within Reliable input,Trusted output [Link]
GoMate is a comprehensive knowledge-based QA application based on large LLMs. It primarily includes a query understanding module, a retrieval module, an LLM module for retrieval enhancement, a reference tracing module, an answer generation module, and an user privacy protection module. Users might inadvertently disclose private information while interacting with GoMate. To address this problem, our approach is to identify and mitigate user-side privacy leaks and to abstract user input into semantically similar but less specific and safer text.

🎖 Honors and Awards

  • 2024: 🎖 National Scholarship (国家奖学金)
  • 2023: 🎖 President of Institute of Computing Technology’s Prize Scholarship (中国科学院计算所所长优秀奖)
  • 2021, 2022, and 2023: 🎖 Merit Student, University of Chinese Academy of Sciences (中国科学院大学三好学生)

📖 Educations

  • Sep. 2021 - Present, Ph.D. Candidate, University of Chinese Academy of Sciences
  • Sep. 2018 - Jun. 2021, Master, University of Chinese Academy of Sciences
  • Sep. 2014 - Jun. 2018, Undergraduate, Sichuan University

🎙️ Talks

  • Sep. 2024, “Generative Information Retrieval: Learning and Application Research” at Baidu Inc.
  • Jan. 2024, “Recent Advances in Generative Information Retrieval” at Baidu Inc. [Slides]
  • Nov. 2023, “Semantic-Enhanced Differentiable Search Index” at Kuaishou Inc. [Slides]

📝 Intership

  • May. 2023-Present, Baidu search

💗 Academic Service

  • Reviewer: Transactions on Information Systems, Information Processing and Management, Gen-IR@SIGIR24, Gen-IR@SIGIR23, WI-IAT 2023
  • Program Committee: CCIR, Gen-IR@SIGIR24, Gen-IR@SIGIR23
  • Volunteer: SIGIR-AP 2023