SIGIR2024 Paper List

论文	作者	组织	摘要	翻译	代码	引用数
Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation	Alireza Salemi, Surya Kallumadi, Hamed Zamani	University of Massachusetts Amherst; Lowe's Companies, Inc.	This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms that solicit feedback from the downstream personalized generation tasks for retrieval optimization–one based on reinforcement learning whose reward function is defined using any arbitrary metric for personalized generation and another based on knowledge distillation from the downstream LLM to the retrieval model. This paper also introduces a pre- and post-generation retriever selection model that decides what retriever to choose for each LLM input. Extensive experiments on diverse tasks from the language model personalization (LaMP) benchmark reveal statistically significant improvements in six out of seven datasets.	本文研究了个性化大型语言模型(LLM)的检索增强方法，这些方法可能对各种应用和领域产生重大影响。我们首次尝试优化检索模型，将有限数量的个人文档提供给大型语言模型，以实现个性化生成。我们开发了两个优化算法，从下游的个性化生成任务中寻求反馈进行检索优化-一个基于强化学习，其奖励函数是定义使用任意指标的个性化生成和另一个基于知识提取从下游 LLM 到检索模型。本文还介绍了一个前生成和后生成的检索器选择模型，该模型决定检索器为每个 LLM 输入选择什么。从语言模型个性化(LaMP)基准对不同任务的广泛实验显示，七个数据集中有六个在统计学上有显著的改善。	code	4
On Generative Agents in Recommendation	An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, TatSeng Chua	National University of Singapore; Tsinghua University; Recommendation System, Large Language Model	Recommender systems are the cornerstone of today's information dissemination,yet a disconnect between offline metrics and online performance greatly hinderstheir development. Addressing this challenge, we envision a recommendationsimulator, capitalizing on recent breakthroughs in human-level intelligenceexhibited by Large Language Models (LLMs). We propose Agent4Rec, a usersimulator in recommendation, leveraging LLM-empowered generative agentsequipped with user profile, memory, and actions modules specifically tailoredfor the recommender system. In particular, these agents' profile modules areinitialized using real-world datasets (e.g. MovieLens, Steam, Amazon-Book),capturing users' unique tastes and social traits; memory modules log bothfactual and emotional memories and are integrated with an emotion-drivenreflection mechanism; action modules support a wide variety of behaviors,spanning both taste-driven and emotion-driven actions. Each agent interactswith personalized recommender models in a page-by-page manner, relying on apre-implemented collaborative filtering-based recommendation algorithm. Wedelve into both the capabilities and limitations of Agent4Rec, aiming toexplore an essential research question: “To what extent can LLM-empoweredgenerative agents faithfully simulate the behavior of real, autonomous humansin recommender systems?” Extensive and multi-faceted evaluations of Agent4Rechighlight both the alignment and deviation between agents and user-personalizedpreferences. Beyond mere performance comparison, we explore insightfulexperiments, such as emulating the filter bubble effect and discovering theunderlying causal relationships in recommendation tasks. Our codes areavailable at https://github.com/LehengTHU/Agent4Rec.	推荐系统是当今信息传播的基石，然而离线指标和在线性能之间的脱节严重阻碍了它们的发展。为了应对这一挑战，我们设想了一个推荐模拟器，利用大型语言模型(LLM)在人类智力水平方面的最新突破。我们推荐 Agent4Rec，一个用户模拟器，利用 LLM 授权的生成代理，配备用户配置文件、内存和专门为推荐系统量身定制的操作模块。特别是，这些代理的个人资料模块是使用真实世界的数据集(如 MovieLens，Stream，Amazon-Book)初始化的，捕捉用户独特的品味和社会特征; 记忆模块记录事实和情感记忆，并与情感驱动的反射机制相结合; 行动模块支持各种各样的行为，跨越品味驱动和情感驱动的行为。每个代理以逐页的方式与个性化推荐模型进行交互，依赖于先前实现的基于协同过滤的推荐算法。探讨 Agent4Rec 的能力和局限性，旨在探索一个基本的研究问题: “ LLM 授权的生成代理能在多大程度上忠实地模拟真实的、自主的人类推荐系统的行为?”Agent4Rechight 的广泛和多方面的评估突出了代理和用户个性化偏好之间的一致性和偏差。除了单纯的性能比较，我们还探索了一些有洞察力的实验，比如模拟过滤泡效应和发现推荐任务中潜在的因果关系。我们的密码可以在 https://github.com/lehengthu/agent4rec 找到。	code	4
C-Pack: Packed Resources For General Chinese Embeddings	Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, JianYun Nie	University of Montreal, Montreal, Canada; UTSC, Hefei, China; Beijing Academy of AI, Beijing, China; Renmin University of China, Beijing, China; HuggingFace, Beijing, China	We introduce C-Pack, a package of resources that significantly advances the field of general text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a massive training dataset for text embedding, which is based on the curation of vast unlabeled corpora and the integration of high-quality labeled corpora. 2) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 3) BGE is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by more than +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for BGE. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models also achieve state-of-the-art performance on the MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. Both Chinese and English datasets are the largest public release of training data for text embeddings. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.	我们介绍了 C-Pack，这是一个资源包，它极大地推动了中文通用文本嵌入领域的发展。C-Pack 包括三个关键资源。1) C-MTP 是一个大规模的文本嵌入训练数据集，它建立在对大量未标记语料库的管理和对高质量标记语料库的集成的基础之上。2) C-MTEB 是一个涵盖6个任务和35个数据集的中文文本嵌入的综合基准。3) BGE 是一个多尺度嵌入模型家族。我们的模型比 C-MTEB 之前的所有中文文本嵌入在发布时的表现都要好10% 以上。我们还整合和优化了 BGE 的整套培训方法。随着我们的一般中文嵌入资源，我们发布了我们的数据和英文文本嵌入模型。英语模型在 MTEB 基准上也达到了最先进的性能，同时，我们发布的英语数据是中文数据的2倍。中文和英文数据集是最大的文本嵌入训练数据的公开发布。所有这些资源都可以在 https://github.com/flagopen/flagembedding 上公开获得。	code	3
Large Language Models can Accurately Predict Searcher Preferences	Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra	Microsoft	Much of the evaluation and tuning of a search system relies on relevance labels---annotations that say whether a document is useful for a given search and searcher. Ideally these come from real searchers, but it is hard to collect this data at scale, so typical experiments rely on third-party labellers who may or may not produce accurate annotations. Label quality is managed with ongoing auditing, training, and monitoring. We discuss an alternative approach. We take careful feedback from real searchers and use this to select a large language model (LLM), and prompt, that agrees with this feedback; the LLM can then produce labels at scale. Our experiments show LLMs are as accurate as human labellers and as useful for finding the best systems and hardest queries. LLM performance varies with prompt features, but also varies unpredictably with simple paraphrases. This unpredictability reinforces the need for high-quality "gold" labels.	搜索系统的大部分评估和调整都依赖于相关标签——说明文档是否对给定的搜索和搜索者有用的注释。理想情况下，这些数据来自真正的搜索者，但是很难大规模地收集这些数据，所以典型的实验依赖于第三方标注者，他们可能会或可能不会产生准确的注释。通过持续的审核、培训和监控来管理标签质量。我们讨论另一种方法。我们从真正的搜索者那里获得仔细的反馈，并使用它来选择一个大型语言模型(LLM) ，并提示符合这个反馈; LLM 然后可以按比例生成标签。我们的实验表明 LLM 和人工标记器一样精确，对于寻找最好的系统和最难的查询也同样有用。LLM 的性能随着提示特征的不同而不同，但也随着简单的转述而不可预测地变化。这种不可预测性加强了对高质量“黄金”标签的需求。	code	3
Data-efficient Fine-tuning for LLM-based Recommendation	Xinyu Lin, Wenjie Wang, Yongqi Li, Shuo Yang, Fuli Feng, Yinwei Wei, TatSeng Chua	University of Technology Sydney; University of Science and Technology of China; National University of Singapore; Monash University; The Hong Kong Polytechnic University	Leveraging Large Language Models (LLMs) for recommendation has recentlygarnered considerable attention, where fine-tuning plays a key role in LLMs'adaptation. However, the cost of fine-tuning LLMs on rapidly expandingrecommendation data limits their practical application. To address thischallenge, few-shot fine-tuning offers a promising approach to quickly adaptLLMs to new recommendation data. We propose the task of data pruning forefficient LLM-based recommendation, aimed at identifying representative samplestailored for LLMs' few-shot fine-tuning. While coreset selection is closelyrelated to the proposed task, existing coreset selection methods often rely onsuboptimal heuristic metrics or entail costly optimization on large-scalerecommendation data. To tackle these issues, we introduce two objectives for the data pruning taskin the context of LLM-based recommendation: 1) high accuracy aims to identifythe influential samples that can lead to high overall performance; and 2) highefficiency underlines the low costs of the data pruning process. To pursue thetwo objectives, we propose a novel data pruning method based on two scores,i.e., influence score and effort score, to efficiently identify the influentialsamples. Particularly, the influence score is introduced to accurately estimatethe influence of sample removal on the overall performance. To achieve lowcosts of the data pruning process, we use a small-sized surrogate model toreplace LLMs to obtain the influence score. Considering the potential gapbetween the surrogate model and LLMs, we further propose an effort score toprioritize some hard samples specifically for LLMs. Empirical results on threereal-world datasets validate the effectiveness of our proposed method. Inparticular, the proposed method uses only 2fine-tuning, reducing time costs by 97	最近，利用大型语言模型(LLM)进行推荐引起了相当大的关注，其中微调在 LLM 的适应过程中起着关键作用。然而，在快速扩展的推荐数据上微调 LLM 的成本限制了它们的实际应用。为了应对这一挑战，少量微调提供了一种有希望的方法来快速适应新的推荐数据 LLM。我们提出了基于 LLM 的有效数据剪枝推荐的任务，旨在识别具有代表性的样本，为 LLM 的少镜头微调定制。虽然协同复位选择与所提出的任务密切相关，但现有的协同复位选择方法往往依赖于次优的启发式度量，或者需要对大规模推荐数据进行代价高昂的优化。为了解决这些问题，我们在基于 LLM 的推荐的背景下为数据修剪任务引入了两个目标: 1)高精度旨在确定可以导致高总体性能的有影响的样本; 2)高效率突出了数据修剪过程的低成本。为了实现这两个目标，我们提出了一种新的基于两个分数的数据修剪方法，即影响分数和努力分数，以有效地识别有影响力的样本。特别地，引入影响分数来准确估计样本去除对整体性能的影响。为了降低数据剪枝过程的成本，我们使用一个小型的代理模型来代替 LLM 来获得影响分数。考虑到代理模型和 LLM 之间的潜在差距，我们进一步提出了一个努力分数，优先考虑一些硬样本，特别是 LLM。在三个实际数据集上的实验结果验证了该方法的有效性。特别是，提出的方法只使用2个微调，减少了97个时间成本	code	2
The Power of Noise: Redefining Retrieval for RAG Systems	Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, Fabrizio Silvestri	Technology Innovation Institute; Sapienza University of Rome; University of Pisa	Retrieval-Augmented Generation (RAG) systems represent a significantadvancement over traditional Large Language Models (LLMs). RAG systems enhancetheir generation ability by incorporating external data retrieved through anInformation Retrieval (IR) phase, overcoming the limitations of standard LLMs,which are restricted to their pre-trained knowledge and limited context window.Most research in this area has predominantly concentrated on the generativeaspect of LLMs within RAG systems. Our study fills this gap by thoroughly andcritically analyzing the influence of IR components on RAG systems. This paperanalyzes which characteristics a retriever should possess for an effectiveRAG's prompt formulation, focusing on the type of documents that should beretrieved. We evaluate various elements, such as the relevance of the documentsto the prompt, their position, and the number included in the context. Ourfindings reveal, among other insights, that including irrelevant documents canunexpectedly enhance performance by more than 30our initial assumption of diminished quality. These findings call fordeveloping specialized approaches tailored to the specific demands ofintegrating retrieval with language generation models and pave the way forfuture research. These results underscore the need for developing specializedstrategies to integrate retrieval with language generation models, therebylaying the groundwork for future research in this field.	检索增强生成(RAG)系统代表了对传统大语言模型(LLM)的重大进步。RAG 系统通过合并通过信息检索(IR)阶段检索的外部数据，克服了标准 LLM 的局限性，提高了系统的生成能力，标准 LLM 局限于其预先训练的知识和有限的上下文窗口。这一领域的大多数研究主要集中在 RAG 系统中 LLM 的生成方面。我们的研究通过彻底和严格地分析红外成分对 RAG 系统的影响来填补这一空白。本文分析了搜索引擎应该具备哪些特征才能有效地快速制定 RAG，重点分析了应该检索的文档类型。我们评估各种因素，例如文件与提示的相关性、它们的位置以及包含在上下文中的数量。我们的研究结果显示，除了其他见解外，包括不相关的文档可以意外地提高超过30个我们最初假设的质量下降的性能。这些研究结果呼吁发展专门的方法，以适应整合检索与语言生成模型的具体要求，并为未来的研究铺平道路。这些结果强调需要制定专门的策略来整合检索和语言生成模型，从而为该领域的未来研究奠定基础。	code	2
LLaRA: Large Language-Recommendation Assistant	Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He	University of Science and Technology of China; The Hong Kong Polytechnic University	Sequential recommendation aims to predict users' next interaction with itemsbased on their past engagement sequence. Recently, the advent of Large LanguageModels (LLMs) has sparked interest in leveraging them for sequentialrecommendation, viewing it as language modeling. Previous studies representitems within LLMs' input prompts as either ID indices or textual metadata.However, these approaches often fail to either encapsulate comprehensive worldknowledge or exhibit sufficient behavioral understanding. To combine thecomplementary strengths of conventional recommenders in capturing behavioralpatterns of users and LLMs in encoding world knowledge about items, weintroduce Large Language-Recommendation Assistant (LLaRA). Specifically, ituses a novel hybrid prompting method that integrates ID-based item embeddingslearned by traditional recommendation models with textual item features.Treating the "sequential behaviors of users" as a distinct modality beyondtexts, we employ a projector to align the traditional recommender's IDembeddings with the LLM's input space. Moreover, rather than directly exposingthe hybrid prompt to LLMs, a curriculum learning strategy is adopted togradually ramp up training complexity. Initially, we warm up the LLM usingtext-only prompts, which better suit its inherent language modeling ability.Subsequently, we progressively transition to the hybrid prompts, training themodel to seamlessly incorporate the behavioral knowledge from the traditionalsequential recommender into the LLM. Empirical results validate theeffectiveness of our proposed framework. Codes are available athttps://github.com/ljy0ustc/LLaRA.	顺序推荐的目的是根据用户过去的参与顺序来预测用户下一次与商品的交互。最近，大型语言模型(LLM)的出现引起了人们对利用它们进行顺序推荐的兴趣，并将其视为语言建模。以前的研究将 LLM 输入提示符中的代表项表示为 ID 索引或文本元数据。然而，这些方法往往不能封装全面的世界知识或表现出足够的行为理解。为了结合传统推荐系统在捕捉用户行为模式方面的互补优势，以及在编码关于项目的世界知识方面的 LLM，我们引入了大型语言推荐助手(LLaRA)。具体来说，它采用了一种新的混合提示方法，将传统推荐模型中基于 ID 的项目嵌入技术与文本项目特征相结合。将“用户的顺序行为”作为文本之外的一种独特模式，我们使用一个投影仪将传统推荐器的 IDembedding 与 LLM 的输入空间对齐。此外，采用课程学习策略来逐渐增加训练的复杂性，而不是直接将混合提示暴露给 LLM。最初，我们使用纯文本提示对 LLM 进行预热，这更适合其固有的语言建模能力。随后，我们逐步过渡到混合提示，训练模型无缝地将来自传统的顺序推荐的行为知识合并到 LLM 中。实证结果验证了我们提出的框架的有效性。代码可通过 https:// github.com/ljy0ustc/llara 查询。	code	2
Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning	Yuyue Zhao, Jiancan Wu, Xiang Wang, Wei Tang, Dingxian Wang, Maarten de Rijke	University of Science and Technology of China, University of Amsterdam; University of Science and Technology of China School of Data Science; University of Amsterdam; University of Technology Sydney; University of Science and Technology of China	Conventional recommender systems (RSs) face challenges in precisely capturingusers' fine-grained preferences. Large language models (LLMs) have showncapabilities in commonsense reasoning and leveraging external tools that mayhelp address these challenges. However, existing LLM-based RSs suffer fromhallucinations, misalignment between the semantic space of items and thebehavior space of users, or overly simplistic control strategies (e.g., whetherto rank or directly present existing results). To bridge these gap, weintroduce ToolRec, a framework for LLM-empowered recommendations via toollearning that uses LLMs as surrogate users, thereby guiding the recommendationprocess and invoking external tools to generate a recommendation list thataligns closely with users' nuanced preferences. We formulate the recommendation process as a process aimed at exploring userinterests in attribute granularity. The process factors in the nuances of thecontext and user preferences. The LLM then invokes external tools based on auser's attribute instructions and probes different segments of the item pool.We consider two types of attribute-oriented tools: rank tools and retrievaltools. Through the integration of LLMs, ToolRec enables conventionalrecommender systems to become external tools with a natural language interface.Extensive experiments verify the effectiveness of ToolRec, particularly inscenarios that are rich in semantic content.	传统的推荐系统(RS)面临着精确捕捉用户细粒度偏好的挑战。大型语言模型(LLM)具有常识推理和利用外部工具的能力，这些工具可能有助于解决这些挑战。然而，现有的基于 LLM 的 RSS 存在幻觉，项目的语义空间和用户的行为空间之间的不一致，或者过于简单的控制策略(例如，是否排名或直接呈现现有的结果)。为了弥合这些差距，我们引入了 ToolRec，这是一个通过使用 LLM 作为替代用户的工具学习来提供 LLM 授权推荐的框架，从而指导推荐过程并调用外部工具来生成一个与用户微妙偏好密切相关的推荐列表。我们将推荐过程描述为一个在属性粒度上探索用户兴趣的过程。这个过程影响到上下文和用户偏好的细微差别。然后 LLM 根据用户的属性指令调用外部工具，并探测项目池的不同部分。我们考虑两种面向属性的工具: 排名工具和检索工具。通过对 LLM 的集成，ToolRec 使传统的推荐系统成为具有自然语言界面的外部工具。大量的实验验证了 ToolRec 的有效性，特别是在语义内容丰富的场景中。	code	2
Evaluating Retrieval Quality in Retrieval-Augmented Generation	Alireza Salemi, Hamed Zamani	University of Massachusetts Amherst	Evaluating retrieval-augmented generation (RAG) presents challenges,particularly for retrieval models within these systems. Traditional end-to-endevaluation methods are computationally expensive. Furthermore, evaluation ofthe retrieval model's performance based on query-document relevance labelsshows a small correlation with the RAG system's downstream performance. Wepropose a novel evaluation approach, eRAG, where each document in the retrievallist is individually utilized by the large language model within the RAGsystem. The output generated for each document is then evaluated based on thedownstream task ground truth labels. In this manner, the downstream performancefor each document serves as its relevance label. We employ various downstreamtask metrics to obtain document-level annotations and aggregate them usingset-based or ranking metrics. Extensive experiments on a wide range of datasetsdemonstrate that eRAG achieves a higher correlation with downstream RAGperformance compared to baseline methods, with improvements in Kendall's τcorrelation ranging from 0.168 to 0.494. Additionally, eRAG offers significantcomputational advantages, improving runtime and consuming up to 50 times lessGPU memory than end-to-end evaluation.	评估检索增强生成(RAG)提出了挑战，特别是在这些系统中的检索模型。传统的端到端评价方法计算量很大。此外，基于查询文档相关标签的检索模型性能评估与 RAG 系统的下游性能相关性较小。我们提出了一种新的评估方法，eRAG，其中检索列表中的每个文档都由 RAG 系统中的大型语言模型单独使用。然后根据下游任务地面真相标签对每个文档生成的输出进行评估。以这种方式，每个文档的下游性能作为其相关标签。我们使用各种下游任务度量来获取文档级注释，并使用基于集合或排名度量来聚合它们。在广泛的数据集上进行的大量实验表明，与基线方法相比，eRAG 与下游 RAG 性能具有更高的相关性，Kendall 的 τ 相关性从0.168到0.494不等。此外，eRAG 提供了显著的计算优势，改善了运行时，并且比端到端计算消耗了多达50倍的 GPU 内存。	code	2
Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization	Hamed Zamani, Michael Bendersky	Google; University of Massachusetts Amherst	This paper introduces Stochastic RAG--a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence, made in most prior work. Stochastic RAG casts the retrieval process in RAG as a stochastic sampling without replacement process. Through this formulation, we employ straight-through Gumbel-top-k that provides a differentiable approximation for sampling without replacement and enables effective end-to-end optimization for RAG. We conduct extensive experiments on seven diverse datasets on a wide range of tasks, from open-domain question answering to fact verification to slot-filling for relation extraction and to dialogue systems. By applying this optimization method to a recent and effective RAG model, we advance state-of-the-art results on six out of seven datasets.	本文介绍了随机RAG——一种用于检索增强生成（RAG）模型端到端优化的创新方法，该方法放松了大多数先前工作中所做的边缘化和文档独立性的简化假设。随机RAG将RAG中的检索过程视为一个无放回的随机抽样过程。通过这一表述，我们采用了直接通过Gumbel-top-k方法，该方法为无放回抽样提供了一个可微分的近似，并实现了对RAG的有效端到端优化。我们在七个多样化的数据集上进行了广泛的实验，涵盖了从开放领域问答、事实验证、关系抽取的槽填充到对话系统等一系列任务。通过将这种优化方法应用于一个最新且有效的RAG模型，我们在七个数据集中的六个上取得了最先进的结果。	code	2
What do Users Really Ask Large Language Models? An Initial Log Analysis of Google Bard Interactions in the Wild	Johanne R. Trippas, Sara Fahad Dawood Al Lawati, Joel Mackenzie, Luke Gallagher				code	2
GraphGPT: Graph Instruction Tuning for Large Language Models	Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang				code	2
UniSAR: Modeling User Transition Behaviors between Search and Recommendation	Teng Shi, Zihua Si, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Dewei Leng, Yanan Niu, Yang Song	Renmin University of China Gaoling School of Artificial Intelligence; Kuaishou Technology Co., Ltd.	Nowadays, many platforms provide users with both search and recommendation services as important tools for accessing information. The phenomenon has led to a correlation between user search and recommendation behaviors, providing an opportunity to model user interests in a fine-grained way. Existing approaches either model user search and recommendation behaviors separately or overlook the different transitions between user search and recommendation behaviors. In this paper, we propose a framework named UniSAR that effectively models the different types of fine-grained behavior transitions for providing users a Unified Search And Recommendation service. Specifically, UniSAR models the user transition behaviors between search and recommendation through three steps: extraction, alignment, and fusion, which are respectively implemented by transformers equipped with pre-defined masks, contrastive learning that aligns the extracted fine-grained user transitions, and cross-attentions that fuse different transitions. To provide users with a unified service, the learned representations are fed into the downstream search and recommendation models. Joint learning on both search and recommendation data is employed to utilize the knowledge and enhance each other. Experimental results on two public datasets demonstrated the effectiveness of UniSAR in terms of enhancing both search and recommendation simultaneously. The experimental analysis further validates that UniSAR enhances the results by successfully modeling the user transition behaviors between search and recommendation.	目前，许多平台为用户提供搜索和推荐服务，作为获取信息的重要工具。这种现象导致了用户搜索和推荐行为之间的相关性，为用户兴趣的细粒度建模提供了机会。现有的方法或者分别模拟用户搜索和推荐行为，或者忽略用户搜索和推荐行为之间的不同转换。在本文中，我们提出一个名为 UniSAR 的框架，有效地模拟不同类型的细粒度行为转换，为用户提供一个统一的搜索和推荐服务。具体而言，UniSAR 通过三个步骤对搜索和推荐之间的用户转换行为进行建模: 提取、对齐和融合，这些步骤分别由配备预定义掩码的变压器实现，对比学习将提取的细粒度用户转换对齐，交叉注意融合不同的转换。为了向用户提供统一的服务，学习表示被反馈到下游搜索和推荐模型中。对搜索数据和推荐数据进行联合学习，以利用知识并相互增强。在两个公共数据集上的实验结果证明了 UniSAR 在同时提高搜索和推荐能力方面的有效性。实验分析进一步验证了 UniSAR 通过成功地模拟用户在搜索和推荐之间的转换行为，提高了结果的准确性。	code	1
Poisoning Decentralized Collaborative Recommender System and Its Countermeasures	Ruiqi Zheng, Liang Qu, Tong Chen, Kai Zheng, Yuhui Shi, Hongzhi Yin	The University of Queensland; University of Electronic Science and Technology of China; Southern University of Science and Technology; The University of Queensland School of Electrical Engineering and Computer Science	To make room for privacy and efficiency, the deployment of many recommender systems is experiencing a shift from central servers to personal devices, where the federated recommender systems (FedRecs) and decentralized collaborative recommender systems (DecRecs) are arguably the two most representative paradigms. While both leverage knowledge (e.g., gradients) sharing to facilitate learning local models, FedRecs rely on a central server to coordinate the optimization process, yet in DecRecs, the knowledge sharing directly happens between clients. Knowledge sharing also opens a backdoor for model poisoning attacks, where adversaries disguise themselves as benign clients and disseminate polluted knowledge to achieve malicious goals like promoting an item's exposure rate. Although research on such poisoning attacks provides valuable insights into finding security loopholes and corresponding countermeasures, existing attacks mostly focus on FedRecs, and are either inapplicable or ineffective for DecRecs. Compared with FedRecs where the tampered information can be universally distributed to all clients once uploaded to the cloud, each adversary in DecRecs can only communicate with neighbor clients of a small size, confining its impact to a limited range. To fill the gap, we present a novel attack method named Poisoning with Adaptive Malicious Neighbors (PAMN). With item promotion in top-K recommendation as the attack objective, PAMN effectively boosts target items' ranks with several adversaries that emulate benign clients and transfers adaptively crafted gradients conditioned on each adversary's neighbors. Moreover, with the vulnerabilities of DecRecs uncovered, a dedicated defensive mechanism based on user-level gradient clipping with sparsified updating is proposed. Extensive experiments demonstrate the effectiveness of the poisoning attack and the robustness of our defensive mechanism.	为了给隐私和效率腾出空间，许多推荐系统的部署正在经历从中央服务器到个人设备的转变，其中联邦推荐系统(FedRecs)和分散式协作推荐系统(DecRecs)可以说是两个最具代表性的范例。虽然两者都利用知识共享(例如，梯度)来促进本地模型的学习，FedRecs 依赖于一个中央服务器来协调优化过程，但在 DecRecs 中，知识共享直接发生在客户之间。知识共享还为模型中毒攻击打开了一个后门，在这种攻击中，对手把自己伪装成良性的客户，传播受污染的知识，以达到恶意目的，比如提高项目的曝光率。尽管对这类中毒攻击的研究为发现安全漏洞和相应的对策提供了有价值的见解，但现有的攻击主要集中在 FedRecs 上，对 DecRecs 要么不适用，要么无效。与 FedRecs 相比，DecRecs 中的每个对手只能与小规模的邻居客户机通信，将其影响限制在有限的范围内。为了填补这一空白，我们提出了一种新的攻击方法，称为自适应恶意邻居中毒(PAMN)。通过在 top-K 推荐中的物品推广作为攻击目标，PAMN 可以有效地提高目标物品的等级，其中有几个对手可以模仿良性客户，并根据每个对手的邻居传输自适应的精心制作的渐变。此外，针对 DecRecs 的漏洞，提出了一种基于用户级梯度裁剪和稀疏更新的专用防御机制。大量的实验证明了中毒攻击的有效性和我们防御机制的鲁棒性。	code	1
Resources for Combining Teaching and Research in Information Retrieval Coursework	Maik Fröbe, Harrisen Scells, Theresa Elstner, Christopher Akiki, Lukas Gienapp, Jan Heinrich Reimer, Sean MacAvaney, Benno Stein, Matthias Hagen, Martin Potthast	Institute for Computer Science, Friedrich-Schiller-Universität Jena, Jena, Germany; University of Glasgow, Glasgow, UK, United Kingdom; Bauhaus-Universität Weimar, imar, Germany; Informatik, Leipzig University, Leipzig, Germany; Leipzig University, Leipzig, Germany; University of Kassel, hessian.AI, and ScaDS.AI, Kassel, Germany; Friedrich-Schiller-Universität Jena, Jena, Germany	The first International Workshop on Open Web Search (WOWS) was held on Thursday, March 28th, at ECIR 2024 in Glasgow, UK. The full-day workshop had two calls for contributions: the first call aimed at scientific contributions to building, operating, and evaluating search engines cooperatively and the cooperative use of the web as a resource for researchers and innovators. The second call for implementations of retrieval components aimed to gain practical experience with joint, cooperative evaluation of search engines and their components. In total, 2~~papers were accepted for the first call, and 11~~software components were submitted for the second. The workshop ended with breakout sessions on how the OpenWebSearch.eu project can incorporate collaborative evaluations and a hub of search engines.	首届开放网络搜索(WOWS)国际研讨会于3月28日(星期四)在英国格拉斯哥的 ECIR 2024上举行。为期一天的讲习班有两项要求作出贡献的呼吁: 第一项呼吁旨在对合作建立、运营和评价搜索引擎作出科学贡献，以及合作利用网络作为研究人员和创新者的资源。第二个实施检索组件的呼吁旨在通过联合、合作评估搜索引擎及其组件获得实际经验。第一次调用共接受2篇论文，第二次调用提交了11个软件组件。研讨会最后分组讨论了 openwebsearch.eu 项目如何将协作评估和搜索引擎中心结合起来。	code	1
Leveraging LLMs for Unsupervised Dense Retriever Ranking	Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Guido Zuccon		This paper introduces a novel unsupervised technique that utilizes large language models (LLMs) to determine the most suitable dense retriever for a specific test(target) corpus. Selecting the appropriate dense retriever is vital for numerous IR applications that employ these retrievers, trained on public datasets, to encode or conduct searches within a new private target corpus. The effectiveness of a dense retriever can significantly diminish when applied to a target corpus that diverges in domain or task from the original training set. The problem becomes more pronounced in cases where the target corpus is unlabeled, e.g. in zero-shot scenarios, rendering direct evaluation of the model's effectiveness on the target corpus unattainable. Therefore, the unsupervised selection of an optimally pre-trained dense retriever, especially under conditions of domain shift, emerges as a critical challenge. Existing methodologies for ranking dense retrievers fall short in addressing these domain shift scenarios. To tackle this, our method capitalizes on LLMs to create pseudo-relevant queries, labels, and reference lists by analyzing a subset of documents from the target corpus. This allows for the ranking of dense retrievers based on their performance with these pseudo-relevant signals. Significantly, this strategy is the first to depend exclusively on the target corpus data, removing the necessity for training data and test labels. We assessed the effectiveness of our approach by compiling a comprehensive pool of cutting-edge dense retrievers and comparing our method against traditional dense retriever selection benchmarks. The findings reveal that our proposed solution surpasses the existing benchmarks in both the selection and ranking of dense retrievers.	本文介绍了一种新的无监督检索技术，该技术利用大语言模型(LLM)来确定特定测试(目标)语料库中最适合的密集检索器。选择合适的密集检索器对于许多 IR 应用程序至关重要，这些应用程序使用这些检索器，在公共数据集上进行培训，以便在新的私有目标语料库中进行编码或搜索。密集检索器的有效性可以显著降低时，应用于目标语料库的领域或任务偏离原来的训练集。在目标语料没有标记的情况下，这个问题变得更加明显，例如，在零射击情况下，无法直接评估模型对目标语料的有效性。因此，无监督选择最佳预训练密集检索，特别是在领域移动的条件下，出现了一个关键的挑战。现有的密集检索器排名方法在解决这些领域转移场景方面存在不足。为了解决这个问题，我们的方法利用 LLM 通过分析目标语料库中的文档子集来创建伪相关查询、标签和引用列表。这允许根据这些伪相关信号的性能对密集检索器进行排名。值得注意的是，这个策略是第一个完全依赖于目标语料库数据的策略，它消除了培训数据和测试标签的必要性。我们评估了我们的方法的有效性，编制了一个全面的前沿密集检索器库，并将我们的方法与传统的密集检索器选择基准进行了比较。研究结果表明，我们提出的解决方案在密集检索器的选择和排序方面都超过了现有的基准。	code	1
Large Language Models for Intent-Driven Session Recommendations	Zhu Sun, Hongyang Liu, Xinghua Qu, Kaidong Feng, Yan Wang, Yew Soon Ong	Macquarie University; A*STAR Centre for Frontier AI Research and Nanyang Technological University; Agency for Science, Technology and Research; Shanda Group AI Lab; Yanshan University	Intent-aware session recommendation (ISR) is pivotal in discerning user intents within sessions for precise predictions. Traditional approaches, however, face limitations due to their presumption of a uniform number of intents across all sessions. This assumption overlooks the dynamic nature of user sessions, where the number and type of intentions can significantly vary. In addition, these methods typically operate in latent spaces, thus hinder the model's transparency.Addressing these challenges, we introduce a novel ISR approach, utilizing the advanced reasoning capabilities of large language models (LLMs). First, this approach begins by generating an initial prompt that guides LLMs to predict the next item in a session, based on the varied intents manifested in user sessions. Then, to refine this process, we introduce an innovative prompt optimization mechanism that iteratively self-reflects and adjusts prompts. Furthermore, our prompt selection module, built upon the LLMs' broad adaptability, swiftly selects the most optimized prompts across diverse domains. This new paradigm empowers LLMs to discern diverse user intents at a semantic level, leading to more accurate and interpretable session recommendations. Our extensive experiments on three real-world datasets demonstrate the effectiveness of our method, marking a significant advancement in ISR systems.	意图感知会话推荐(ISR)在识别会话中的用户意图以进行精确预测方面非常关键。然而，传统的方法由于假定所有会话的意图数量一致而面临局限性。这个假设忽略了用户会话的动态特性，其中意图的数量和类型可能有很大的不同。此外，这些方法通常在潜在的空间操作，从而阻碍了模型的透明度。针对这些挑战，我们引入了一种新的 ISR 方法，利用大型语言模型(LLM)的高级推理能力。首先，这种方法首先生成一个初始提示，指导 LLM 根据用户会话中显示的不同意图预测会话中的下一个项目。然后，为了完善这个过程，我们引入了一个创新的提示优化机制，它可以迭代地自我反映和调整提示。此外，我们的提示选择模块，建立在 LLM 的广泛适应性，迅速选择最优化的提示跨不同的领域。这种新的范式使 LLM 能够在语义层次上识别不同的用户意图，从而产生更加准确和可解释的会话建议。我们在三个实际数据集上的广泛实验证明了我们方法的有效性，标志着 ISR 系统的重大进步。	code	1
Scalable Community Search over Large-scale Graphs based on Graph Transformer	Yuxiang Wang, Xiaoxuan Gou, Xiaoliang Xu, Yuxia Geng, Xiangyu Ke, Tianxing Wu, Zhiyuan Yu, Runhuai Chen, Xiangying Wu	Hangzhou Dianzi University, Zhejiang, China; Southeast University, Jiangsu, China; Zhejiang University, Zhejiang, China; Hangzhou Dianzi University, Hangzhou, China	Given a graph G and a query node q, community search (CS) aims to find a structurally cohesive subgraph from G that contains q. CS is widely used in many real-world applications, such as online recommendation and expert finding. Recently, the rise of learning-based CS methods has garnered extensive research interests, showcasing the promising potential of neural solutions. However, there remains room for optimization: (1) They initialize node features via classical methods, e.g., one-hot, random, and position encoding, which may fall short in capturing valuable community cohesiveness-related features. (2) The reliance on GCN or GCN-like models poses challenges in scaling to large graphs. (3) Existing methods do not adapt well to dynamic graphs, often requiring retraining from scratch. To handle this, we present CSFormer, a scalable CS based on Graph Transformer. First, we present a novel l-hop neighborhood community vector based on n-order h-index to represent each node's community features, generating a sequence of feature vectors by varying the neighborhood scope l. Then, we build a Transformer backbone to learn a good graph embedding that carries rich community features, based on which we perform a prediction-filtering-based online CS to efficiently return a community of q. We extend CSFormer to dynamic graphs and various community models. Extensive experiments on seven real-world graphs show our solution's superiority on effectiveness, e.g., we attain an average improvement of 20.6% in F1-score compared to the latest competitors.	给定一个图 G 和一个查询节点 q，社区搜索(Community Search，CS)旨在从 G 中找到一个结构内聚的子图。最近，基于学习的 CS 方法的兴起引起了广泛的研究兴趣，展示了神经解决方案的潜力。然而，仍有优化的空间: (1)他们通过传统的方法，如一个热点，随机和位置编码初始化节点特征，这可能不能捕获有价值的社区凝聚力相关的特征。(2)对 GCN 或类似 GCN 的模型的依赖对扩展到大图形提出了挑战。(3)现有方法不能很好地适应动态图，往往需要从头开始重新训练。为了处理这个问题，我们提出了 CSForm，一种基于图形转换器的可伸缩 CS。首先，我们提出了一种新的基于 n 阶 h 指数的 l-hop 邻域社区向量来表示每个节点的社区特征，通过改变邻域范围生成一系列的特征向量。然后，我们构建了一个主干变压器来学习一个具有丰富社区特征的良好的图嵌入，在此基础上，我们进行了一个基于预测滤波的在线 CS 来有效地返回一个 q 的社区。我们将 CSForm 扩展到动态图和各种社区模型。在七个现实世界图表上的大量实验表明我们的解决方案在有效性方面的优势，例如，我们在 F1得分上比最新的竞争对手平均提高了20.6% 。	code	1
LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction	Chenhao Fang, Xiaohan Li, Zezhong Fan, Jianpeng Xu, Kaushiki Nag, Evren Körpeoglu, Sushant Kumar, Kannan Achan	Walmart Global Tech; University of Wisconsin-Madison	Product attribute value extraction is a pivotal component in Natural LanguageProcessing (NLP) and the contemporary e-commerce industry. The provision ofprecise product attribute values is fundamental in ensuring high-qualityrecommendations and enhancing customer satisfaction. The recently emergingLarge Language Models (LLMs) have demonstrated state-of-the-art performance innumerous attribute extraction tasks, without the need for domain-specifictraining data. Nevertheless, varying strengths and weaknesses are exhibited bydifferent LLMs due to the diversity in data, architectures, andhyperparameters. This variation makes them complementary to each other, with nosingle LLM dominating all others. Considering the diverse strengths andweaknesses of LLMs, it becomes necessary to develop an ensemble method thatleverages their complementary potentials. In this paper, we propose a novelalgorithm called LLM-ensemble to ensemble different LLMs' outputs for attributevalue extraction. We iteratively learn the weights for different LLMs toaggregate the labels with weights to predict the final attribute value. Notonly can our proposed method be proven theoretically optimal, but it alsoensures efficient computation, fast convergence, and safe deployment. We havealso conducted extensive experiments with various state-of-the-art LLMs,including Llama2-13B, Llama2-70B, PaLM-2, GPT-3.5, and GPT-4, on Walmart'sinternal data. Our offline metrics demonstrate that the LLM-ensemble methodoutperforms all the state-of-the-art single LLMs on Walmart's internal dataset.This method has been launched in several production models, leading to improvedGross Merchandise Volume (GMV), Click-Through Rate (CTR), Conversion Rate(CVR), and Add-to-Cart Rate (ATC).	产品属性值抽取是自然语言处理(NLP)和当代电子商务产业的关键组成部分。提供精确的产品属性值是确保高质量推荐和提高客户满意度的基础。最近出现的大型语言模型(LLM)已经展示了无数属性提取任务的最新性能，而不需要特定于领域的训练数据。然而，由于数据、体系结构和超参数的多样性，不同的 LLM 表现出不同的优缺点。这种变化使它们相互补充，没有单一的 LLM 支配所有其他。考虑到 LLM 的不同优缺点，有必要发展一种利用它们互补潜力的集成方法。本文提出了一种新的 LLM- 集成算法，用于集成不同 LLM 的输出，从而实现属性值的提取。我们迭代学习不同 LLM 的权重，以聚合权重的标签来预测最终的属性值。该方法不仅可以在理论上证明是最优的，而且可以保证计算效率、收敛速度和安全部署。我们还对各种最先进的 LLM 进行了广泛的实验，包括 Llama2-13B、 Llama2-70B、 PaLM-2、 GPT-3.5和 GPT-4，这些都是基于沃尔玛的内部数据。我们的离线指标表明，在沃尔玛的内部数据集上，LLM 集成方法优于所有最先进的单一 LLM。这种方法已经在多种生产模式中推出，从而改善了商品总量(GMV)、点进率(CTR)、转化率(CVR)和购物车率(ATC)。	code	1
Question Suggestion for Conversational Shopping Assistants Using Product Metadata	Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi	Amazon	Digital assistants have become ubiquitous in e-commerce applications,following the recent advancements in Information Retrieval (IR), NaturalLanguage Processing (NLP) and Generative Artificial Intelligence (AI). However,customers are often unsure or unaware of how to effectively converse with theseassistants to meet their shopping needs. In this work, we emphasize theimportance of providing customers a fast, easy to use, and natural way tointeract with conversational shopping assistants. We propose a framework thatemploys Large Language Models (LLMs) to automatically generate contextual,useful, answerable, fluent and diverse questions about products, via in-contextlearning and supervised fine-tuning. Recommending these questions to customersas helpful suggestions or hints to both start and continue a conversation canresult in a smoother and faster shopping experience with reduced conversationoverhead and friction. We perform extensive offline evaluations, and discuss indetail about potential customer impact, and the type, length and latency of ourgenerated product questions if incorporated into a real-world shoppingassistant.	随着信息检索(IR)、自然语言处理(NLP)和生成人工智能(AI)的最新进展，数字助理已经成为电子商务应用程序中无处不在的一部分。然而，顾客往往不确定或不知道如何有效地与这些助理交谈，以满足他们的购物需求。在这项工作中，我们强调的重要性，为客户提供一个快速，易于使用，自然的方式与会话购物助理互动。我们提出了一个框架，使用大语言模型(LLM)自动生成上下文，有用的，可回答的，流畅的和多样化的产品问题，通过上下文内学习和监督微调。将这些问题推荐给顾客，作为开始和继续谈话的有用建议或提示，可以使购物体验更加顺畅和快捷，减少谈话开销和摩擦。我们进行广泛的离线评估，并详细讨论潜在的客户影响，以及我们生成的产品问题的类型，长度和延迟，如果纳入一个现实世界的购物助理。	code	1
A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models	Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon	The University of Queensland; Google Research; CSIRO	We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at .	提出了一种新的基于大语言模型(LLM)的零拍文档排序方法: Setwise 提示方法。我们的方法补充了现有的基于 LLM 的零拍排名的提示方法: Pointwise、 Pairwise 和 Listwise。通过在一致的实验框架内进行首次比较评估，并考虑模型大小、令牌消耗、延迟等因素，我们发现现有方法的内在特征是在效率和效益之间进行权衡。我们发现，虽然 Pointwise 方法在效率上得分较高，但它们的效率较低。相反，成对方法显示出更好的有效性，但是会产生较高的计算开销。相反，与以前的方法相比，我们的 Setwise 方法减少了 LLM 推断的数量和排序过程中的提示令牌消耗量。这显著提高了基于 LLM 的零拍排序的效率，同时也保持了较高的零拍排序效率。我们将我们的代码和结果公开地提供给。	code	1
Ranked List Truncation for Large Language Model-based Re-Ranking	Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke	University of Amsterdam; Leiden University; University of Waterloo	We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank"perspective, where we optimize re-ranking by truncating the retrieved list(i.e., trim re-ranking candidates). RLT is crucial for re-ranking as it canimprove re-ranking efficiency by sending variable-length candidate lists to are-ranker on a per-query basis. It also has the potential to improve re-rankingeffectiveness. Despite its importance, there is limited research into applyingRLT methods to this new perspective. To address this research gap, we reproduceexisting RLT methods in the context of re-ranking, especially newly emergedlarge language model (LLM)-based re-ranking. In particular, we examine to whatextent established findings on RLT for retrieval are generalizable to the"retrieve-then-re-rank" setup from three perspectives: (i) assessing RLTmethods in the context of LLM-based re-ranking with lexical first-stageretrieval, (ii) investigating the impact of different types of first-stageretrievers on RLT methods, and (iii) investigating the impact of differenttypes of re-rankers on RLT methods. We perform experiments on the TREC 2019 and2020 deep learning tracks, investigating 8 RLT methods for pipelines involving3 retrievers and 2 re-rankers. We reach new insights into RLT methods in thecontext of re-ranking.	我们从一个新的“检索-然后重新排名”的角度研究排名列表截断(RLT) ，其中我们通过截断检索列表(即，修剪重新排名的候选人)来优化重新排名。RLT 是重新排序的关键，因为它可以提高重新排序的效率发送可变长度的候选人名单是排名的每个查询的基础上。它还具有提高重新排名效率的潜力。尽管它的重要性，有限的研究应用 RLT 方法到这个新的角度。为了解决这一研究差距，我们在重新排序的背景下重现了现有的 RLT 方法，特别是新出现的基于大语言模型(LLM)的重新排序方法。具体而言，我们从三个角度研究了 RLT 检索的既定发现在多大程度上可以推广到“检索然后重新排名”的设置: (i)在基于 LLM 的词汇第一阶段重新排名的背景下评估 RLTmethod，(ii)调查不同类型的第一阶段检索者对 RLT 方法的影响，以及(iii)调查不同类型的重新排名对 RLT 方法的影响。我们在 TREC 2019年和2020年的深度学习轨道上进行了实验，研究了8种涉及3个检索器和2个重新排序器的管道 RLT 方法。在重新排名的背景下，我们对 RLT 方法有了新的认识。	code	1
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval	Haoqiang Lin, Haokun Wen, Xuemeng Song, Meng Liu, Yupeng Hu, Liqiang Nie	Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China; Shandong University, Jinan, Shandong, China; Shandong Jianzhu University, Jinan, Shandong, China; Shandong University, Qingdao, Shandong, China	Composed Image Retrieval (CIR) allows users to search target images with a multimodal query, comprising a reference image and a modification text that describes the user's modification demand over the reference image. Nevertheless, due to the expensive labor cost of training data annotation, recent researchers have shifted to the challenging task of zero-shot CIR (ZS-CIR), which targets fulfilling CIR without annotated triplets. The pioneer ZS-CIR studies focus on converting the CIR task into a standard text-to-image retrieval task by pre-training a textual inversion network that can map a given image into a single pseudo-word token. Despite their significant progress, their coarse-grained textual inversion may be insufficient to capture the full content of the image accurately. To overcome this issue, in this work, we propose a novel Fine-grained Textual Inversion Network for ZS-CIR, named FTI4CIR. In particular, FTI4CIR comprises two main components: fine-grained pseudo-word token mapping and tri-wise caption-based semantic regularization. The former maps the image into a subject-oriented pseudo-word token and several attribute-oriented pseudo-word tokens to comprehensively express the image in the textual form, while the latter works on jointly aligning the fine-grained pseudo-word tokens to the real-word token embedding space based on a BLIP-generated image caption template. Extensive experiments conducted on three benchmark datasets demonstrate the superiority of our proposed method.	复合图像检索(CIR)允许用户通过多模态查询搜索目标图像，包括参考图像和描述用户对参考图像的修改要求的修改文本。然而，由于培训数据注释的昂贵人工成本，最近的研究人员已经转向具有挑战性的任务零射击 CIR (ZS-CIR) ，其目标是实现没有注释三联体的 CIR。先驱的 ZS-CIR 研究集中在通过预训练文本反演网络将给定的图像映射为单个伪单词标记，将 CIR 任务转换为标准的文本-图像检索任务。尽管取得了显著的进展，但粗粒度的文本反演可能不足以准确地捕捉图像的全部内容。为了克服这一问题，本文提出了一种新的 ZS-CIR 细粒度文本反演网络 FTI4CIR。具体而言，FTI4CIR 包括两个主要组成部分: 细粒度伪词标记映射和基于三分字幕的语义正则化。前者将图像映射为一个面向主题的伪词标记和若干个面向属性的伪词标记，以文本形式综合表示图像，后者基于 BLIP 生成的图像标题模板，将细粒度的伪词标记与实词标记嵌入空间联合对齐。在三个基准数据集上进行的大量实验表明了该方法的优越性。	code	1
Denoising Diffusion Recommender Model	Jujia Zhao, Wenjie Wang, Yiyan Xu, Teng Sun, Fuli Feng, TatSeng Chua	University of Science and Technology of China School of Data Science; National University of Singapore School of Computing; Leiden University; University of Science and Technology of China; National University of Singapore; Shandong University	Recommender systems often grapple with noisy implicit feedback. Most studiesalleviate the noise issues from data cleaning perspective such as dataresampling and reweighting, but they are constrained by heuristic assumptions.Another denoising avenue is from model perspective, which proactively injectsnoises into user-item interactions and enhances the intrinsic denoising abilityof models. However, this kind of denoising process poses significant challengesto the recommender model's representation capacity to capture noise patterns. To address this issue, we propose Denoising Diffusion Recommender Model(DDRM), which leverages multi-step denoising process of diffusion models torobustify user and item embeddings from any recommender models. DDRM injectscontrolled Gaussian noises in the forward process and iteratively removesnoises in the reverse denoising process, thereby improving embedding robustnessagainst noisy feedback. To achieve this target, the key lies in offeringappropriate guidance to steer the reverse denoising process and providing aproper starting point to start the forward-reverse process during inference. Inparticular, we propose a dedicated denoising module that encodes collaborativeinformation as denoising guidance. Besides, in the inference stage, DDRMutilizes the average embeddings of users' historically liked items as thestarting point rather than using pure noise since pure noise lackspersonalization, which increases the difficulty of the denoising process.Extensive experiments on three datasets with three representative backendrecommender models demonstrate the effectiveness of DDRM.	推荐系统经常与含噪隐式反馈作斗争。大多数研究从数据清理的角度缓解噪声问题，如数据采样和重新加权，但他们受到启发式假设的约束。另一种去噪方法是从模型的角度出发，主动地将噪声注入到用户-项目的交互中，提高模型的内在去噪能力。然而，这种去噪过程对推荐模型捕获噪声模式的表示能力提出了严峻的挑战。为了解决这一问题，我们提出了去噪扩散推荐模型(DDRM) ，它利用扩散模型的多步去噪过程来模糊任何推荐模型中的用户和项目嵌入。DDRM 在正向过程中注入受控的高斯噪声，在反向过程中迭代去除噪声，从而提高了对噪声反馈的嵌入鲁棒性。要实现这一目标，关键在于提供适当的指导来引导反向去噪过程，并在推理过程中提供适当的起点来启动正向反向去噪过程。特别是，我们提出了一个专用的去噪模块，编码协作信息作为去噪指导。此外，在推理阶段，由于纯噪声缺乏个性化，DDRM 利用用户历史喜好项的平均嵌入作为起点，而非纯噪声，增加了去噪过程的难度。通过对三个具有代表性的后向推荐模型的三个数据集的大量实验，证明了 DDRM 的有效性。	code	1
Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR	Nandan Thakur, Luiz Bonifacio, Maik Fröbe, Alexander Bondarenko, Ehsan Kamalloo, Martin Potthast, Matthias Hagen, Jimmy Lin	UNICAMP, University of Waterloo; Friedrich-Schiller-Universität Jena; Leipzig University and ScaDS.AI; University of Waterloo	The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark – a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touché 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what makes argument retrieval so "special". To more deeply analyze the respective potential limits of neural retrieval models, we run a reproducibility study on the Touché 2020 data. In our study, we focus on two experiments: (i) a black-box evaluation (i.e., no model retraining), incorporating a theoretical exploration using retrieval axioms, and (ii) a data denoising evaluation involving post-hoc relevance judgments. Our black-box evaluation reveals an inherent bias of neural models towards retrieving short passages from the Touché 2020 data, and we also find that quite a few of the neural models' results are unjudged in the Touché 2020 data. As many of the short Touché passages are not argumentative and thus non-relevant per se, and as the missing judgments complicate fair comparison, we denoise the Touché 2020 data by excluding very short passages (less than 20 words) and by augmenting the unjudged data with post-hoc judgments following the Touché guidelines. On the denoised data, the effectiveness of the neural models improves by up to 0.52 in nDCG@10, but BM25 is still more effective. Our code and the augmented Touché 2020 dataset are available at .	神经检索模型的零点效应通常是在 BEIR 基准上进行评估的，BEIR 基准是不同的 IR 评估数据集的组合。有趣的是，以往的研究发现，特别是在 BEIR 子集 Touché 2020(一个论点检索任务)上，神经检索模型的有效性明显低于 BM25。尽管如此，到目前为止，还没有进一步的调查，使论点检索如此“特殊”。为了更深入地分析神经检索模型各自的潜在局限性，我们对 Touché 2020数据进行了重复性研究。在我们的研究中，我们侧重于两个实验: (i)黑盒评估(即，没有模型再训练) ，结合使用检索公理的理论探索，和(ii)涉及事后相关性判断的数据去噪评估。我们的黑匣子评估揭示了神经模型对从 Touché 2020数据中检索短文本的固有偏见，并且我们还发现相当多的神经模型的结果在 Touché 2020数据中是未经判断的。由于许多简短的 Touché 段落并不具有争议性，因此本身并不相关，并且由于缺失的判断使公平比较复杂化，我们通过排除非常短的段落(少于20个单词)以及按照 Touché 指南用事后判断增加未经判断的数据来降低 Touché 2020数据的噪声。在去噪数据中，神经模型的有效性在 nDCG@10中提高了0.52，但 BM25仍然更有效。我们的代码和增强的 Touché 2020数据集可以在。	code	1
Generative Retrieval as Multi-Vector Dense Retrieval	Shiguang Wu, Wenda Wei, Mengqi Zhang, Zhumin Chen, Jun Ma, Zhaochun Ren, Maarten de Rijke, Pengjie Ren	Shandong University; University of Amsterdam; Leiden University	Generative retrieval generates identifiers of relevant documents in anend-to-end manner using a sequence-to-sequence architecture for a given query.The relation between generative retrieval and other retrieval methods,especially those based on matching within dense retrieval models, is not yetfully comprehended. Prior work has demonstrated that generative retrieval withatomic identifiers is equivalent to single-vector dense retrieval. Accordingly,generative retrieval exhibits behavior analogous to hierarchical search withina tree index in dense retrieval when using hierarchical semantic identifiers.However, prior work focuses solely on the retrieval stage without consideringthe deep interactions within the decoder of generative retrieval. In this paper, we fill this gap by demonstrating that generative retrievaland multi-vector dense retrieval share the same framework for measuring therelevance to a query of a document. Specifically, we examine the attentionlayer and prediction head of generative retrieval, revealing that generativeretrieval can be understood as a special case of multi-vector dense retrieval.Both methods compute relevance as a sum of products of query and documentvectors and an alignment matrix. We then explore how generative retrievalapplies this framework, employing distinct strategies for computing documenttoken vectors and the alignment matrix. We have conducted experiments to verifyour conclusions and show that both paradigms exhibit commonalities of termmatching in their alignment matrix.	生成检索使用给定查询的序列到序列体系结构以端到端的方式生成相关文档的标识符。生成检索与其他检索方法之间的关系，特别是那些基于密集检索模型中匹配的检索方法之间的关系，还没有得到充分的理解。先前的工作已经证明，具有原子标识符的生成检索等价于单向量密集检索。相应地，在使用层次语义标识符进行密集检索时，生成检索表现出类似于树索引层次检索的行为。然而，先前的工作仅仅集中在检索阶段，而没有考虑生成检索解码器内部的深层交互作用。本文通过证明生成检索和多向量密集检索在测量文档查询的相关性方面具有相同的框架来填补这一空白。具体来说，我们考察了生成检索的注意层和预测头，发现生成检索可以理解为多向量密集检索的一个特例。这两种方法都将相关性计算为查询、文档向量和对齐矩阵的乘积之和。然后，我们探讨如何生成检索应用这个框架，使用不同的策略计算文档令牌向量和对齐矩阵。我们已经进行了实验来验证你的结论，并表明这两种范例在它们的对齐矩阵中表现出术语匹配的共性。	code	1
A Workbench for Autograding Retrieve/Generate Systems	Laura Dietz	University of New Hampshire	This resource paper addresses the challenge of evaluating Information Retrieval (IR) systems in the era of autoregressive Large Language Models (LLMs). Traditional methods relying on passage-level judgments are no longer effective due to the diversity of responses generated by LLM-based systems. We provide a workbench to explore several alternative evaluation approaches to judge the relevance of a system's response that incorporate LLMs: 1. Asking an LLM whether the response is relevant; 2. Asking the LLM which set of nuggets (i.e., relevant key facts) is covered in the response; 3. Asking the LLM to answer a set of exam questions with the response. This workbench aims to facilitate the development of new, reusable test collections. Researchers can manually refine sets of nuggets and exam questions, observing their impact on system evaluation and leaderboard rankings. Resource available at https://github.com/TREMA-UNH/rubric-grading-workbench	本资源文件阐述了在自回归大语言模型(LLM)时代评估信息检索(IR)系统所面临的挑战。由于基于 LLM 的系统所产生的响应的多样性，传统的依赖于通道级判断的方法已不再有效。我们提供了一个工作台来探索几种替代的评估方法，以判断一个系统的响应的相关性，包括 LLM: 1。询问 LLM 的响应是否相关; 2。询问 LLM 回答中包含了哪些重要信息(即相关的关键事实) ; 3。要求 LLM 使用响应来回答一组考试问题。这个工作台旨在促进新的、可重用的测试集合的开发。研究人员可以手动完善成套的金块和考试问题，观察它们对系统评估和排行榜的影响。Https://github.com/trema-unh/rubric-grading-workbench 可提供的资源	code	1
Evaluating Generative Ad Hoc Information Retrieval	Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guido Zuccon, Benno Stein, Matthias Hagen, Martin Potthast	; Leipzig University & ScaDS.AI, Leipzig, Germany; Bauhaus-Universität Weimar, Weimar, Germany; University of Kassel & hessian.AI, Kassel, Germany; Leipzig University, Leipzig, Germany; Friedrich-Schiller-Universität Jena, Jena, Germany	Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an information need instead of the traditional document ranking. Quantifying the utility of these types of responses is essential for evaluating generative retrieval systems. As the established evaluation methodology for ranking-based ad hoc retrieval may seem unsuitable for generative retrieval, new approaches for reliable, repeatable, and reproducible experimentation are required. In this paper, we survey the relevant information retrieval and natural language processing literature, identify search tasks and system architectures in generative retrieval, develop a corresponding user model, and study its operationalization. This theoretical analysis provides a foundation and new insights for the evaluation of generative ad hoc retrieval systems.	大型语言模型的最新进展使得可行的生成信息检索系统的开发成为可能。一个生成检索系统返回一个接地生成的文本，以响应信息需求，而不是传统的文档排序。量化这些类型的响应的效用对于评估生成检索系统是必不可少的。由于已建立的基于排序的特别检索的评估方法似乎不适合于生成性检索，因此需要可靠的、可重复的和可重现的实验的新方法。在本文中，我们调查了相关的信息检索和自然语言处理文献，确定了生成检索中的搜索任务和系统架构，开发了相应的用户模型，并研究了它的操作主义。这一理论分析为生成式自组织检索系统的评估提供了基础和新的视角。	code	1
Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection	Ekaterina Khramtsova, Teerapong Leelanupab, Shengyao Zhuang, Mahsa Baktashmotlagh, Guido Zuccon	The University of Queensland; University of Queensland	In this demo we present a web-based application for selecting an effective pre-trained dense retriever to use on a private collection. Our system, DenseQuest, provides unsupervised selection and ranking capabilities to predict the best dense retriever among a pool of available dense retrievers, tailored to an uploaded target collection. DenseQuest implements a number of existing approaches, including a recent, highly effective method powered by Large Language Models (LLMs), which requires neither queries nor relevance judgments. The system is designed to be intuitive and easy to use for those information retrieval engineers and researchers who need to identify a general-purpose dense retrieval model to encode or search a new private target collection. Our demonstration illustrates conceptual architecture and the different use case scenarios of the system implemented on the cloud, enabling universal access and use. DenseQuest is available at https://densequest.ielab.io.	在本演示中，我们展示了一个基于网络的应用程序，用于在私有数据集上选择有效的预训练密集检索器。我们的系统DenseQuest提供了无监督的选择和排序功能，以预测在一组可用密集检索器中，最适合上传目标集合的最佳密集检索器。DenseQuest实现了多种现有方法，包括一种由大型语言模型（LLMs）驱动的新近高效方法，该方法既不需要查询也不需要相关性判断。该系统设计得直观且易于使用，适用于那些需要为新的私有目标集合编码或搜索而识别通用密集检索模型的信息检索工程师和研究人员。我们的演示展示了系统的概念架构以及在云上实现的不同的使用案例场景，实现了普遍的访问和使用。DenseQuest可通过https://densequest.ielab.io访问。	code	1
QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims	Venktesh V, Abhijit Anand, Avishek Anand, Vinay Setty	TU Delft; L3S Research Institute; University of Stavanger	With the growth of misinformation on the web, automated fact checking has garnered immense interest for detecting growing misinformation and disinformation. Current systems have made significant advancements in handling synthetic claims sourced from Wikipedia, and noteworthy progress has been achieved in addressing real-world claims that are verified by fact-checking organizations as well. We compile and release QuanTemp, a diverse, multi-domain dataset focused exclusively on numerical claims, encompassing comparative, statistical, interval, and temporal aspects, with detailed metadata and an accompanying evidence collection. This addresses the challenge of verifying real-world numerical claims, which are complex and often lack precise information, a gap not filled by existing works that mainly focus on synthetic claims. We evaluate and quantify these gaps in existing solutions for the task of verifying numerical claims. We also evaluate claim decomposition based methods, numerical understanding based natural language inference (NLI) models and our best baselines achieves a macro-F1 of 58.32. This demonstrates that QuanTemp serves as a challenging evaluation set for numerical claim verification.	随着网络上错误信息的增多，自动化事实核查因其在检测不断增长的虚假信息和误导信息方面的潜力而引起了极大的关注。当前的系统在处理源自维基百科的合成声明方面取得了显著进展，同时在解决由事实核查组织验证的真实世界声明方面也取得了值得注意的进步。我们编制并发布了QuanTemp，这是一个专注于数值声明的多领域、多样化的数据集，涵盖了比较性、统计性、区间性和时间性方面，并附有详细的元数据和相应的证据集合。这一数据集解决了验证真实世界数值声明的挑战，这些声明通常复杂且缺乏精确信息，而现有的工作主要集中在合成声明上，未能填补这一空白。我们评估并量化了现有解决方案在验证数值声明任务中的差距。我们还评估了基于声明分解的方法、基于数值理解的自然语言推理（NLI）模型，以及我们最佳基线模型的宏F1得分为58.32。这表明QuanTemp作为数值声明验证的具有挑战性的评估集。	code	1
Instruction-based Hypergraph Pretraining	Mingdai Yang, Zhiwei Liu, Liangwei Yang, Xiaolong Liu, Chen Wang, Hao Peng, Philip S. Yu				code	1
Characterizing Information Seeking Processes with Multiple Physiological Signals	Kaixin Ji, Danula Hettiachchi, Flora D. Salim, Falk Scholer, Damiano Spina				code	1
Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses	Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, JhengHong Yang, Jimmy Lin	Naver Labs Europe; University of Waterloo	BEIR is a benchmark dataset originally designed for zero-shot evaluation of retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of models based on representation learning, which naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? While BEIR was designed to answer this question, our work addresses two shortcomings that prevent the benchmark from achieving its full potential: First, the sophistication of modern neural methods and the complexity of current software infrastructure create barriers to entry for newcomers. To this end, we provide reproducible reference implementations that cover learned dense and sparse models. Second, comparisons on BEIR are performed by reducing scores from heterogeneous datasets into a single average that is difficult to interpret. To remedy this, we present meta-analyses focusing on effect sizes across datasets that are able to accurately quantify model differences. By addressing both shortcomings, our work facilitates future explorations in a range of interesting research questions.	BEIR是一个基准数据集，最初设计用于对跨18个不同领域/任务组合的检索模型进行零样本评估。近年来，我们见证了基于表示学习的模型日益流行，这自然引出了一个问题：当面对与训练数据不同的查询和文档时，这些模型的效果如何？尽管BEIR旨在回答这个问题，但我们的工作针对该基准存在的两个缺陷，这两个缺陷阻碍了其充分发挥潜力：首先，现代神经方法的复杂性和当前软件基础设施的复杂性为新进入者设置了门槛。为此，我们提供了涵盖学习型密集和稀疏模型的可重现参考实现。其次，在BEIR上的比较是通过将来自异构数据集的分数简化为一个难以解释的单一平均值来进行的。为了解决这个问题，我们提出了专注于跨数据集效应大小的元分析，这些分析能够准确量化模型差异。通过解决这两个缺陷，我们的工作有助于未来在各种有趣的研究问题上的探索。	code	1
On the Evaluation of Machine-Generated Reports	James Mayfield, Eugene Yang, Dawn J. Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Selin Kayi, Kate Sanders, Marc Mason, Noah Hibbler	University of Maryland; University of Glasgow; Allen Institute for AI; NIST; Johns Hopkins University	Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users. In this perspective paper, we draw together opinions from industry and academia, and from a variety of related research areas, to present our vision for automatic report generation, and---critically---a flexible framework by which such reports can be evaluated. In contrast with other summarization tasks, automatic report generation starts with a detailed description of an information need, stating the necessary background, requirements, and scope of the report. Further, the generated reports should be complete, accurate, and verifiable. These qualities, which are desirable---if not required---in many analytic report-writing settings, require rethinking how to build and evaluate systems that exhibit these qualities. To foster new efforts in building these systems, we present an evaluation framework that draws on ideas found in various evaluations. To test completeness and accuracy, the framework uses nuggets of information, expressed as questions and answers, that need to be part of any high-quality generated report. Additionally, evaluation of citations that map claims made in the report to their source documents ensures verifiability.	大型语言模型（LLMs）已开辟了满足信息需求的新途径。尽管在将它们应用于文档排序和短文本生成等领域取得了显著进展，但它们在撰写完整、准确且可验证的长篇报告方面仍面临挑战。这些特性的报告对于满足用户复杂、微妙或多方面的信息需求是必要的。在这篇观点论文中，我们汇集了来自行业和学术界以及各种相关研究领域的意见，提出了我们对自动报告生成的愿景，并——关键的是——提出了一个灵活的评估框架，用于评估这些报告。与其它摘要任务不同，自动报告生成始于对信息需求的详细描述，明确必要的背景、要求和报告范围。此外，生成的报告应具备完整性、准确性和可验证性。这些特性，虽然在许多分析报告撰写场景中是可取的，甚至是必需的，但需要重新思考如何构建和评估展现这些特性的系统。为了促进构建这些系统的新努力，我们提出了一个评估框架，借鉴了各种评估中的理念。为了测试完整性和准确性，该框架使用了信息片段，这些片段以问答形式表达，需要成为任何高质量生成报告的一部分。此外，对报告中引用的评估，即将报告中的主张与其来源文档对应起来，确保了可验证性。	code	1
Are Large Language Models Good at Utility Judgments?	Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng				code	1
CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance Ranking	Josef Vonásek, Milan Straka, Rostislav Krc, Lenka Lasonová, Ekaterina Egorova, Jana Straková, Jakub Náplava	Institute of Formal and Applied Linguistics, Charles University; Seznam.cz	We present CWRCzech, Click Web Ranking dataset for Czech, a 100M query-document Czech click dataset for relevance ranking with user behavior data collected from search engine logs of Seznam.cz. To the best of our knowledge, CWRCzech is the largest click dataset with raw text published so far. It provides document positions in the search results as well as information about user behavior: 27.6M clicked documents and 10.8M dwell times. In addition, we also publish a manually annotated Czech test for the relevance task, containing nearly 50k query-document pairs, each annotated by at least 2 annotators. Finally, we analyze how the user behavior data improve relevance ranking and show that models trained on data automatically harnessed at sufficient scale can surpass the performance of models trained on human annotated data. CWRCzech is published under an academic non-commercial license and is available to the research community at https://github.com/seznam/CWRCzech.	我们为捷克提供了一个100M 的查询文档捷克点击数据集，用于从 Seznam.cz 的搜索引擎日志中收集的用户行为数据进行相关性排名。据我们所知，CWR 捷克是迄今为止发布原始文本的最大的点击数据集。它提供了搜索结果中的文档位置以及关于用户行为的信息: 27.6 M 的单击文档和10.8 M 的停留时间。此外，我们还为相关任务发布了一个手动注释的捷克测试，包含近50k 个查询-文档对，每个查询-文档对至少由2个注释者进行注释。最后，我们分析了用户行为数据如何提高相关性排名，并表明在足够大的规模上自动利用数据训练的模型可以超过在人类注释数据上训练的模型的性能。捷克语研究中心以学术非商业许可证的形式发表论文，研究团体可以在 https://github.com/seznam/CWRCzech 获得该论文。	code	0
A Unified Search and Recommendation Framework Based on Multi-Scenario Learning for Ranking in E-commerce	Jinhan Liu, Qiyu Chen, Junjie Xu, Junjie Li, Baoli Li, Sulong Xu	JD; JD or JD.com	Search and recommendation (S R) are the two most important scenarios in e-commerce. The majority of users typically interact with products in S R scenarios, indicating the need and potential for joint modeling. Traditional multi-scenario models use shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of individual tasks. This coarse-grained modeling approach does not effectively capture the differences between S R scenarios. Furthermore, this approach does not sufficiently exploit the information across the global label space. These issues can result in the suboptimal performance of multi-scenario models in handling both S R scenarios. To address these issues, we propose an effective and universal framework for Unified Search and Recommendation (USR), designed with S R Views User Interest Extractor Layer (IE) and S R Views Feature Generator Layer (FG) to separately generate user interests and scenario-agnostic feature representations for S R. Next, we introduce a Global Label Space Multi-Task Layer (GLMT) that uses global labels as supervised signals of auxiliary tasks and jointly models the main task and auxiliary tasks using conditional probability. Extensive experimental evaluations on real-world industrial datasets show that USR can be applied to various multi-scenario models and significantly improve their performance. Online A/B testing also indicates substantial performance gains across multiple metrics. Currently, USR has been successfully deployed in the 7Fresh App.	搜索和推荐(S R)是电子商务中最重要的两种情况。大多数用户通常在 S R 场景中与产品交互，这表明了联合建模的需要和潜力。传统的多场景模型使用共享参数来学习多个任务的相似性，使用特定任务的参数来学习单个任务的差异性。这种粗粒度建模方法不能有效地捕获 S R 场景之间的差异。此外，这种方法不能充分利用全局标签空间中的信息。这些问题可能导致多场景模型在处理两个 S R 场景时的次优性能。为了解决这些问题，我们提出了一个有效和通用的统一搜索和推荐(USR)框架，该框架使用 S R 视图用户兴趣提取层(IE)和 S R 视图特征生成层(FG)分别生成用户兴趣和场景无关的特征表示。接下来，我们引入了一个全局标签空间多任务层(GLMT) ，它使用全局标签作为辅助任务的监督信号，并使用条件概率联合建模主要任务和辅助任务。对现实世界工业数据集的大量实验评估表明，USR 可以应用于各种多场景模型，并显著提高其性能。在线 A/B 测试还表明跨多个指标的性能显著提高。目前，USR 已经成功地部署在7Fresh 应用程序中。	code	0
Improving Embedding-Based Retrieval in Friend Recommendation with ANN Query Expansion	Pau PerngHwa Kung, Zihao Fan, Tong Zhao, Yozen Liu, Zhixin Lai, Jiahui Shi, Yan Wu, Jun Yu, Neil Shah, Ganesh Venkataraman	Snap	Embedding-based retrieval in graph-based recommendation has shown great improvements over traditional graph walk retrieval methods, and has been adopted in large-scale industry applications such as friend recommendations [16]. However, it is not without its challenges: retraining graph embeddings frequently due to changing data is slow and costly, and producing high recall of approximate nearest neighbor search (ANN) on such embeddings is challenging due to the power law distribution of the indexed users. In this work, we address theses issues by introducing a simple query expansion method in ANN, called FriendSeedSelection, where for each node query, we construct a set of 1-hop embeddings and run ANN search. We highlight our approach does not require any model-level tuning, and is inferred from the data at test-time. This design choice effectively enables our recommendation system to adapt to the changing graph distribution without frequent heavy model retraining. We also discuss how we design our system to efficiently construct such queries online to support 10k+ QPS. For friend recommendation, our method shows improvements of recall, and 11% relative friend reciprocated communication metric gains, now serving over 800 million monthly active users at Snapchat.	在基于图的推荐中嵌入式检索已经显示出对传统的图步检索方法的巨大改进，并且已经被大规模的工业应用如好友推荐所采用[16]。然而，这并非没有挑战: 由于数据的变化而频繁地重新训练图嵌入是缓慢和昂贵的，并且由于索引用户的幂律分布，在这种嵌入上产生高召回的近似最近邻搜索(ANN)是具有挑战性的。在这项工作中，我们通过在人工神经网络中引入一个简单的查询扩展方法，称为 FriendSeedSelection，对于每个节点查询，我们构造一组1跳嵌入并运行人工神经网络搜索来解决这些问题。我们强调我们的方法不需要任何模型级别的调优，并且是从测试时的数据中推断出来的。这种设计选择有效地使我们的推荐系统能够适应变化的图形分布，而不需要频繁的重模型再训练。我们还讨论了如何设计我们的系统，以有效地构建这样的查询在线支持10k + QPS。对于朋友推荐，我们的方法显示了回忆的改进，11% 的亲属朋友回馈了通信指标的收益，现在为 Snapchat 超过8亿的月活跃用户提供服务。	code	0
Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search	Hideaki Joko, Shubham Chatterjee, Andrew Ramsay, Arjen P. de Vries, Jeff Dalton, Faegheh Hasibi	Radboud University; University of Edinburgh; University of Glasgow	The future of conversational agents will provide users with personalized information responses. However, a significant challenge in developing models is the lack of large-scale dialogue datasets that span multiple sessions and reflect real-world user preferences. Previous approaches rely on experts in a wizard-of-oz setup that is difficult to scale, particularly for personalized tasks. Our method, LAPS, addresses this by using large language models (LLMs) to guide a single human worker in generating personalized dialogues. This method has proven to speed up the creation process and improve quality. LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations, including extracting user preferences. When compared to existing datasets, LAPS-produced conversations are as natural and diverse as expert-created ones, which stays in contrast with fully synthetic methods. The collected dataset is suited to train preference extraction and personalized response generation. Our results show that responses generated explicitly using extracted preferences better match user's actual preferences, highlighting the value of using extracted preferences over simple dialogue history. Overall, LAPS introduces a new method to leverage LLMs to create realistic personalized conversational data more efficiently and effectively than previous methods.	会话代理的未来将为用户提供个性化的信息响应。然而，在开发模型方面的一个重大挑战是缺乏跨越多个会议并反映真实世界用户偏好的大规模对话数据集。以前的方法依赖于难以伸缩的绿色向导设置中的专家，特别是对于个性化任务。我们的方法 LAPS 通过使用大型语言模型(LLM)来指导单个人类工作者生成个性化对话来解决这个问题。这种方法已被证明可以加快创作过程，提高质量。LAPS 可以收集大规模、人工编写、多会话和多域会话，包括提取用户首选项。与现有的数据集相比，LAPS 产生的对话和专家创建的对话一样自然和多样化，这与完全合成的方法形成了对比。采集的数据集适合于训练偏好提取和个性化响应生成。我们的研究结果表明，明确使用提取的偏好生成的响应更好地匹配用户的实际偏好，突出了使用提取的偏好的价值超过简单的对话历史。总的来说，LAPS 引入了一种新的方法，利用 LLM 创建真实的个性化会话数据，比以前的方法更有效率和效果。	code	0
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models	Alireza Salemi, Hamed Zamani	University of Massachusetts Amherst	This paper introduces uRAG–a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication between the search engine and the downstream RAG systems that engage in optimizing the retrieval model. This lays the groundwork for us to build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine. Using this experimentation ecosystem, we answer a number of fundamental research questions that improve our understanding of promises and challenges in developing search engines for machines.	本文介绍了一个统一检索引擎的框架 uRAG，它可以为多个下游检索增强生成(RAG)系统提供服务。每个 RAG 系统都为一个独特的目的使用检索结果，例如开放域问题回答、事实验证、实体链接和关系提取。我们引入了一个通用的培训指导方针，它标准化了搜索引擎与下游 RAG 系统之间的通信，这些 RAG 系统参与优化检索模型。这为我们建立一个大规模实验生态系统奠定了基础，该系统包括18个参与培训的 RAG 系统和18个使用 uRAG 作为搜索引擎新用户的未知 RAG 系统。利用这个实验生态系统，我们回答了许多基础研究问题，这些问题提高了我们对开发机器搜索引擎的承诺和挑战的理解。	code	0
Sequential Recommendation with Collaborative Explanation via Mutual Information Maximization	Yi Yu, Kazunari Sugiyama, Adam Jatowt	University of Innsbruck; Kyoto University; Osaka Seikei University	Current research on explaining sequential recommendations lacks reliable benchmarks and quantitative metrics, making it difficult to compare explanation performance between different models. In this work, we propose a new explanation type, namely, collaborative explanation, into sequential recommendation, allowing a unified approach for modeling user actions and assessing the performance of both recommendation and explanation. We accomplish this by framing the problem as a joint sequential prediction task, which takes a sequence of user's past item-explanation pairs and predicts the next item along with its associated explanation. We propose a pipeline that comprises data preparation and a model adaptation framework called Sequential recommendation with Collaborative Explanation (SCE). This framework can be flexibly applied to any sequential recommendation model for this problem. Furthermore, to address the issue of inconsistency between item and explanation representations when learning both sub-tasks, we propose Sequential recommendation with Collaborative Explanation via Mutual Information Maximization (SCEMIM). Our extensive experiments demonstrate that: (i) SCE framework is effective in enabling sequential models to make recommendations and provide accurate explanations. (ii) Importantly, SCEMIM enhances the consistency between recommendations and explanations, leading to further improvements in the performance of both sub-tasks.	目前关于解释顺序推荐的研究缺乏可靠的基准和量化指标，因此难以比较不同模型之间的解释性能。在这项工作中，我们提出了一个新的解释类型，即协作解释，到顺序推荐，允许一个统一的方法来建模用户的行为和评估两者的性能的推荐和解释。我们通过将问题框架为一个联合的顺序预测任务来完成这个任务，该任务采用用户过去的项目解释对的序列，并预测下一个项目及其相关的解释。我们提出了一个流水线，包括数据准备和模型适应框架称为顺序推荐与协作解释(SCE)。该框架可以灵活地应用于该问题的任何顺序推荐模型。此外，为了解决两个子任务学习过程中项目表征与解释表征不一致的问题，本文提出了基于互信息最大化的协同解释的序贯推荐方法。我们的大量实验表明: (i) SCE 框架能够有效地使序贯模型提出建议并提供准确的解释。(ii)重要的是，SCEMIM 加强了建议和解释之间的一致性，从而进一步改善了这两个子任务的表现。	code	0
A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search	Thomas Vecchiato, Claudio Lucchese, Franco Maria Nardini, Sebastian Bruch	Pinecone; ISTI-CNR; Ca' Foscari University of Venice	A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters – a process known as routing – then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k configuration, the ground-truth is the set of clusters that contain the exact top-k vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.	现代信息检索难题的一个关键部分是近似最近邻搜索。它的目标是返回一组最接近查询点的 k 个数据点，其精度由返回集中捕获的精确最近邻点的比例来衡量。解决这个问题的一种流行方法是聚类: 索引算法将数据分割成不重叠的子集，并用一个点(如其质心)表示每个分区。查询处理算法首先识别最近的集群——一个称为路由的过程——然后仅对这些集群执行最近邻搜索。在这项工作中，我们做了一个简单的观察: 路由函数解决了一个排序问题。因此，它的质量可以评估与排名度量，使功能适合学习到排名。有趣的是，地面真相通常是免费提供的: 给定 top-k 配置中的查询分布，地面真相是包含精确 top-k 向量的集合。我们开发了这种洞察力，并将其应用于最大内部产品搜索(MIPS)。正如我们在各种数据集上的经验证明，学习一个简单的线性函数可以持续地提高基于聚类的 MIPS 的准确性。	code	0
A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval	Ivica Kostric, Krisztian Balog	University of Stavanger; University of Stavanger & Google Research	Conversational passage retrieval is challenging as it often requires the resolution of references to previous utterances and needs to deal with the complexities of natural language, such as coreference and ellipsis. To address these challenges, pre-trained sequence-to-sequence neural query rewriters are commonly used to generate a single de-contextualized query based on conversation history. Previous research shows that combining multiple query rewrites for the same user utterance has a positive effect on retrieval performance. We propose the use of a neural query rewriter to generate multiple queries and show how to integrate those queries in the passage retrieval pipeline efficiently. The main strength of our approach lies in its simplicity: it leverages how the beam search algorithm works and can produce multiple query rewrites at no additional cost. Our contributions further include devising ways to utilize multi-query rewrites in both sparse and dense first-pass retrieval. We demonstrate that applying our approach on top of a standard passage retrieval pipeline delivers state-of-the-art performance without sacrificing efficiency.	会话短文检索是一个具有挑战性的问题，因为它往往需要解决对以前话语的引用，并需要处理自然语言的复杂性，如共引和省略。为了应对这些挑战，预先训练的序列到序列神经查询重写器通常用于生成基于会话历史的单个去上下文化查询。以往的研究表明，对同一用户语句进行多次查询重写对检索性能有积极的影响。我们提出使用神经查询重写器来生成多个查询，并说明如何有效地将这些查询集成到文章检索流水线中。我们方法的主要优点在于它的简单性: 它利用了束搜索算法的工作方式，并且可以在不增加成本的情况下产生多个查询重写。我们的贡献还包括设计在稀疏和密集首通检索中利用多查询重写的方法。我们演示了将我们的方法应用于标准通道检索流水线之上，可以在不牺牲效率的情况下提供最先进的性能。	code	0
Memory-Efficient Deep Recommender Systems using Approximate Rotary Compositional Embedding	Dongning Ma, Xun Jiao	Villanova University ECE; Villanova University Electrical and Computer Engineering	Embedding tables in deep recommender systems (DRS) process categorical data, which can be memory-intensive due to the high feature cardinality. In this paper, we propose Approximate Rotary Compositional Embedding (ARCE), which intentionally trades off performance to aggressively reduce the size of the embedding tables. Specifically, ARCE uses compositional embedding to split large embedding tables into smaller compositions and replaces index look-ups with vector rotations. To regain the performance loss of this trade-off, ARCE features an input approximation where one index is mapped into multiple indices, creating a larger space for a potential increased learning capability. Experimental results show that using ARCE can reduce the memory overhead of embedding tables in DRS by more than 1000x with less than 3% performance loss, highlighting the potential of using ARCE for less memory intensive DRS designs. We open-source ARCE at https://github.com/VU-DETAIL/arce.	深度推荐系统(DRS)中嵌入表处理分类数据，由于特征基数高，可能会占用大量内存。在本文中，我们提出了近似旋转组合嵌入(ARCE) ，它有意地牺牲性能以积极地减少嵌入表的大小。具体来说，ARCE 使用组合嵌入将大型嵌入表拆分为较小的组合，并用向量旋转替换索引查找。为了重新获得这种折衷的性能损失，ARCE 采用了一种输入近似，其中一个索引映射到多个索引中，为潜在的增强的学习能力创造了更大的空间。实验结果表明，使用 ARCE 可以将 DRS 中嵌入表的内存开销减少1000倍以上，性能损失小于3% ，突出了使用 ARCE 进行内存密集型 DRS 设计的潜力。我们开源的 ARCE https://github.com/vu-detail/ARCE。	code	0
Retrieval-Augmented Conversational Recommendation with Prompt-based Semi-Structured Natural Language State Tracking	Sara Kemper, Justin Cui, Kai Dicarlantonio, Kathy Lin, Danjie Tang, Anton Korikov, Scott Sanner	University of Toronto; University of Waterloo	Conversational recommendation (ConvRec) systems must understand rich and diverse natural language (NL) expressions of user preferences and intents, often communicated in an indirect manner (e.g., "I'm watching my weight"). Such complex utterances make retrieving relevant items challenging, especially if only using often incomplete or out-of-date metadata. Fortunately, many domains feature rich item reviews that cover standard metadata categories and offer complex opinions that might match a user's interests (e.g., "classy joint for a date"). However, only recently have large language models (LLMs) let us unlock the commonsense connections between user preference utterances and complex language in user-generated reviews. Further, LLMs enable novel paradigms for semi-structured dialogue state tracking, complex intent and preference understanding, and generating recommendations, explanations, and question answers. We thus introduce a novel technology RA-Rec, a Retrieval-Augmented, LLM-driven dialogue state tracking system for ConvRec, showcased with a video, open source GitHub repository, and interactive Google Colab notebook.	会话推荐系统必须理解用户偏好和意图的丰富多样的自然语言(NL)表达，通常以间接的方式进行沟通(例如，“我在减肥”)。这种复杂的语句使得检索相关项目变得具有挑战性，特别是如果仅仅使用不完整或过时的元数据。幸运的是，许多域名都有丰富的项目评论，涵盖标准的元数据类别，并提供可能符合用户兴趣的复杂意见(例如，“优雅的约会联合”)。然而，直到最近才有了大型语言模型(LLM) ，让我们能够在用户生成的评论中解开用户偏好话语和复杂语言之间的常识性联系。此外，LLM 为半结构化对话状态跟踪、复杂意图和偏好理解以及生成建议、解释和问题答案提供了新的范例。因此，我们引入了一种新技术 RA-Rec，这是一种用于 ConvRec 的恢复增强的 LLM 驱动的对话状态跟踪系统，通过一个视频、开源 GitHub 仓库和交互式 Google Colab 笔记本进行了展示。	code	0
LLMGR: Large Language Model-based Generative Retrieval in Alipay Search	Chen Wei, Yixin Ji, Zeyuan Chen, Jia Xu, Zhongyi Liu	Soochow University School of Computer Science & Technology; Ant Group; Ant Group Search Recommendation Technology Department	The search system aims to help users quickly find items according to queries they enter, which includes the retrieval and ranking modules. Traditional retrieval is a multi-stage process, including indexing and sorting, which cannot be optimized end-to-end. With the real data about mini-apps in the Alipay search, we find that many complex queries fail to display the relevant mini-apps, seriously threatening users' search experience. To address the challenges, we propose a Large Language Model-based Generative Retrieval (LLMGR) approach for retrieving mini-app candidates. The information of the mini-apps is encoded into the large model, and the title of the mini-app is directly generated. Through the online A/B test in Alipay search, LLMGR as a supplementary source has statistically significant improvements in the Click-Through Rate (CTR) of the search system compared to traditional methods. In this paper, we have deployed a novel retrieval method for the Alipay search system and demonstrated that generative retrieval methods based on LLM can improve the performance of search system, particularly for complex queries, which have an average increase of 0.2% in CTR.	该搜索系统旨在帮助用户根据输入的查询快速查找项目，其中包括检索和排序模块。传统的检索是一个多阶段的过程，包括索引和排序，不能实现端到端的优化。通过对支付宝搜索中迷你应用的真实数据进行分析，我们发现许多复杂的查询都无法显示相关的迷你应用，严重威胁了用户的搜索体验。为了应对这些挑战，我们提出了一种基于大语言模型的生成检索(LLMGR)方法来检索迷你应用程序候选者。迷你应用程序的信息被编码到大模型中，并直接生成迷你应用程序的标题。通过支付宝搜索中的在线 A/B 测试，作为补充来源的 LLMGR 在统计学上显著改善了搜索系统的点进率(ctrr) ，而不是传统方法。本文针对支付宝搜索系统提出了一种新的检索方法，并证明了基于 LLM 的生成式检索方法可以提高搜索系统的性能，尤其是对于平均点击率提高0.2% 的复杂查询。	code	0
Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking Model	Enqiang Xu, Yiming Qiu, Junyang Bai, Ping Zhang, Dadong Miao, Songlin Wang, Guoyu Tang, Lin Liu, Mingming Li	JD.com	In large e-commerce platforms, search systems are typically composed of a series of modules, including recall, pre-ranking, and ranking phases. The pre-ranking phase, serving as a lightweight module, is crucial for filtering out the bulk of products in advance for the downstream ranking module. Industrial efforts on optimizing the pre-ranking model have predominantly focused on enhancing ranking consistency, model structure, and generalization towards long-tail items. Beyond these optimizations, meeting the system performance requirements presents a significant challenge. Contrasting with existing industry works, we propose a novel method: a Generalizable and RAnk-ConsistEnt Pre-Ranking Model (GRACE), which achieves: 1) Ranking consistency by introducing multiple binary classification tasks that predict whether a product is within the top-k results as estimated by the ranking model, which facilitates the addition of learning objectives on common point-wise ranking models; 2) Generalizability through contrastive learning of representation for all products by pre-training on a subset of ranking product embeddings; 3) Ease of implementation in feature construction and online deployment. Our extensive experiments demonstrate significant improvements in both offline metrics and online A/B test: a 0.75 increase in CVR.	在大型电子商务平台中，搜索系统通常由一系列模块组成，包括召回、预排序和排序阶段。预排序阶段作为一个轻量级模块，对于提前过滤掉下游排序模块的大部分产品至关重要。优化预排序模型的工业努力主要集中在增强排序一致性、模型结构和对长尾项目的推广。除了这些优化之外，满足系统性能需求也是一个重大的挑战。与现有的行业工作相比，我们提出了一种新的方法: 一个一般化和排名一致的预排名模型(GRACE) ，它实现了: 1)排名一致性通过引入多个二进制分类任务，预测一个产品是否在由排名模型估计的前 k 结果之内，这有助于增加学习目标的共同点明智的排名模型; 2)通过对比学习的表示对所有产品的一个排名产品嵌入子集的预训练的一般化; 3)易于实施的功能构建和在线部署。我们的大量实验表明，在离线指标和在线 A/B 测试方面都有显著改善: CVR 增加了0.75。	code	0
A Preference-oriented Diversity Model Based on Mutual-information in Re-ranking for E-commerce Search	Huimu Wang, Mingming Li, Dadong Miao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu, Jinghe Hu	JD.com; JD or JD.com	Re-ranking is a process of rearranging ranking list to more effectively meet user demands by accounting for the interrelationships between items. Existing methods predominantly enhance the precision of search results, often at the expense of diversity, leading to outcomes that may not fulfill the varied needs of users. Conversely, methods designed to promote diversity might compromise the precision of the results, failing to satisfy the users' requirements for accuracy. To alleviate the above problems, this paper proposes a Preference-oriented Diversity Model Based on Mutual-information (PODM-MI), which consider both accuracy and diversity in the re-ranking process. Specifically, PODM-MI adopts Multidimensional Gaussian distributions based on variational inference to capture users' diversity preferences with uncertainty. Then we maximize the mutual information between the diversity preferences of the users and the candidate items using the maximum variational inference lower bound to enhance their correlations. Subsequently, we derive a utility matrix based on the correlations, enabling the adaptive ranking of items in line with user preferences and establishing a balance between the aforementioned objectives. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of PODM-MI, and we have successfully deployed PODM-MI on an e-commerce search platform.	重新排序是通过考虑项目之间的相互关系来重新安排排序列表以更有效地满足用户需求的过程。现有的方法主要是提高搜索结果的精确度，往往以牺牲多样性为代价，导致结果可能无法满足用户的不同需求。相反，旨在促进多样性的方法可能会损害结果的精确性，不能满足用户对精确性的要求。针对上述问题，本文提出了一种基于互信息的偏好导向多样性模型(PODM-MI) ，该模型在重排序过程中同时考虑了准确性和多样性。具体来说，PODM-MI 采用基于变分推理的多维高斯分布来捕获具有不确定性的用户多样性偏好。然后利用最大变分推理下界，最大化用户多样性偏好与候选项之间的相互信息，以增强它们之间的相关性。随后，我们推导出一个基于相关性的效用矩阵，使项目的自适应排序符合用户偏好，并建立上述目标之间的平衡。在实际的在线电子商务系统上的实验结果表明，PODM-MI 算法得到了显著的改进，并成功地在电子商务搜索平台上部署了 PODM-MI 算法。	code	0
Query Performance Prediction for Conversational Search and Beyond	Chuan Meng	University of Amsterdam	Query performance prediction (QPP) is a key task in information retrieval (IR) [1]. The QPP task is to estimate the retrieval quality of a search system for a query without human relevance judgments. In summary, I aim to solve 4 limitations identified in previous QPP studies: I have published 3 papers that address 3 of these limitations, while the remaining one is the focus of my future work. While extensively explored for traditional ad-hoc search, QPP for conversational search (CS) [4] has been little studied. I have identified limitation 1 in previous QPP studies: There is a lack of a comprehensive investigation into how well existing QPP methods designed for ad-hoc search perform in the context of CS. To fill this research gap, I have conducted a comprehensive reproducibility study [5], where I examined various QPP methods that were designed for ad-hoc search in the CS setting. I have made the code and data publicly available on https://github.com/ChuanMeng/QPP4CS. Moreover, I have identified limitation 2 in previous studies on QPP for CS: There is a lack of research in investigating and leveraging the CS-specific features that do not exist in ad-hoc search to improve QPP quality for CS. I have authored a paper to fill this research gap [3]. Specifically, my empirical analysis indicates a correlation between query rewriting quality in CS and the actual retrieval quality. Based on this finding, I have proposed a perplexity-based pre-retrieval QPP framework (PPL-QPP) for CS, which integrates query rewriting quality into existing QPP methods. Experimental results show that PPL-QPP improves QPP quality. Beyond the scope of QPP for CS, I have identified drawbacks in general QPP methods. Existing QPP methods typically return a single scalar value that indicates the retrieval quality, which results in two issues: (i) relying on a single value to represent different IR metrics leads to a "one size fits all" issue, and (ii) a single value constraints the interpretability of QPP. Thus, I have identified limitation 3: there is a shortage of QPP methods that are capable of effectively predicting various IR evaluation metrics while maintaining interpretability. To address the limitation, I have proposed a QPP framework using automatically generated reevance judgments (QPP-GenRE); it decomposes QPP into independent subtasks of judging the relevance of each item in a ranked list to a given query [6]. QPP-GenRE enables the prediction of any IR metric using generated relevance judgments as pseudo-labels, and enables the interpretation of predicted IR metrics based on generated judgments. I have fine-tuned an open-source large language model (LLM) for judging relevance. Experimental results show that QPP-GenRE achieves state-of-the-art QPP quality; my fine-tuned LLM demonstrates a high relevance judgment agreement with human assessors. I have made the code and data publicly available on https://github.com/ChuanMeng/QPP-GenRE. As part of my future work, I plan to solve limitation 4: No study has explored the application of QPP in retrieval-augmented generation (RAG) to predict when not to rely on low-quality retrieved items that have the potential to hurt RAG's text generation.	查询性能预测(QPP)是信息检索(IR)[1]中的一项关键任务。QPP 任务是在没有人类相关性判断的情况下，对查询搜索系统的检索质量进行评估。总之，我的目标是解决在以前的 QPP 研究中发现的4个局限性: 我已经发表了3篇论文，解决了其中的3个局限性，而其余的一个是我未来工作的重点。虽然对传统的自组织搜索进行了广泛的研究，但是对会话搜索的 QPP 研究却很少。我已经在以前的 QPP 研究中确定了局限性1: 缺乏一个全面的调查，以了解现有的 QPP 方法设计的特别搜索在 CS 的情况下表现如何。为了填补这个研究空白，我进行了一个全面的重复性研究[5] ，其中我检查了各种 QPP 方法，这些方法是为在 CS 设置中的特别搜索而设计的。我已经把代码和数据公布在 https://github.com/chuanmeng/qpp4cs 上了。此外，我已经在以前的 CS QPP 研究中确定了局限性2: 缺乏研究调查和利用 CS 特定的特征，这些特征在特别搜索中不存在，以提高 CS 的 QPP 质量。我已经写了一篇论文来填补这个研究空白[3]。具体来说，本文的实证分析表明了 CS 中查询重写质量与实际检索质量之间的相关性。基于这一发现，我提出了一个基于实例的检索前 QPP 框架(PPL-QPP) ，该框架将查询重写质量与现有的 QPP 方法相结合。实验结果表明，PPL-QPP 提高了 QPP 的质量。除了 CS 的 QPP 范围，我已经确定了一般 QPP 方法的缺点。现有的 QPP 方法通常返回指示检索质量的单个标量值，这导致两个问题: (i)依赖于单个值来表示不同的 IR 指标导致“一种尺寸适合所有”问题，以及(ii)单个值限制了 QPP 的可解释性。因此，我已经确定了局限性3: 缺乏能够有效预测各种 IR 评估指标同时保持可解释性的 QPP 方法。为了解决这个局限性，我提出了一个 QPP 框架，它使用了自动的 < u > gen ated < u > re 事件判断(QPP-GenRE) ; 它将 QPP 分解为独立的子任务，判断排序列表中的每个项目与给定查询的相关性[6]。QPP-GenRE 能够使用生成的相关性判断作为伪标签来预测任何 IR 度量，并且能够基于生成的判断来解释预测的 IR 度量。我已经微调了一个用于判断相关性的开源大型语言模型(LLM)。实验结果表明，QPP-GenRE 实现了最先进的 QPP 质量，我的微调 LLM 与人类评估者的相关性判断一致性很高。我已经把代码和数据公布在 https://github.com/chuanmeng/qpp-genre 上了。作为我未来工作的一部分，我计划解决局限性4: 还没有研究探索 QPP 在检索增强生成(RAG)中的应用，以预测何时不依赖于有可能损害 RAG 文本生成的低质量检索项。	code	0
Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset	Philipp Hager, Romain Deffayet, JeanMichel Renders, Onno Zoeter, Maarten de Rijke	Naver Labs Europe; University of Amsterdam; Booking.com	Unbiased learning-to-rank (ULTR) is a well-established framework for learning from user clicks, which are often biased by the ranker collecting the data. While theoretically justified and extensively tested in simulation, ULTR techniques lack empirical validation, especially on modern search engines. The dataset released for the WSDM Cup 2023, collected from Baidu's search engine, offers a rare opportunity to assess the real-world performance of prominent ULTR techniques. Despite multiple submissions during the WSDM Cup 2023 and the subsequent NTCIR ULTRE-2 task, it remains unclear whether the observed improvements stem from applying ULTR or other learning techniques. We revisit and extend the available experiments. We find that unbiased learning-to-rank techniques do not bring clear performance improvements, especially compared to the stark differences brought by the choice of ranking loss and query-document features. Our experiments reveal that ULTR robustly improves click prediction. However, these gains in click prediction do not translate to enhanced ranking performance on expert relevance annotations, implying that conclusions strongly depend on how success is measured in this benchmark.	无偏学习排名(ULTR)是一个从用户点击中学习的成熟的框架，用户点击往往受到排名收集数据的影响。ULTR 技术虽然在理论上得到了验证，并在仿真中得到了广泛的测试，但缺乏经验验证，特别是在现代搜索引擎上。从百度搜索引擎收集的2023年 WSDM 杯的数据集提供了一个难得的机会来评估突出的 ULTR 技术在现实世界中的表现。尽管在2023年 WSDM 杯和随后的 NTCIR ULTRE-2任务期间提交了多份申请，但目前尚不清楚观察到的改善是否源于应用 ULTR 或其他学习技术。我们重新审视并扩展现有的实验。我们发现，无偏见的学习排序技术并不能带来明显的性能改善，尤其是与排序丢失和查询文档特性的选择所带来的明显差异相比。我们的实验表明，ULTR 强有力地改善点击预测。然而，在点击预测方面取得的这些进展并不能转化为专家相关性注释排名表现的提高，这意味着结论在很大程度上取决于如何在这一基准中衡量成功。	code	0
CMCLRec: Cross-modal Contrastive Learning for User Cold-start Sequential Recommendation	Xiaolong Xu, Hongsheng Dong, Lianyong Qi, Xuyun Zhang, Haolong Xiang, Xiaoyu Xia, Yanwei Xu, Wanchun Dou	Nanjing University; Macquarle Unnversity; Macquarie University; College of Intelligence and Computing, Tianjin University; China University of Petroleum; Nanjing University of Information Science and Technology; RMIT University	Sequential recommendation models generate embeddings for items through the analysis of historical user-item interactions and utilize the acquired embeddings to predict user preferences. Despite being effective in revealing personalized preferences for users, these models heavily rely on user-item interactions. However, due to the lack of interaction information, new users face challenges when utilizing sequential recommendation models for predictions, which is recognized as the cold-start problem. Recent studies, while addressing this problem within specific structures, often neglect the compatibility with existing sequential recommendation models, making seamless integration into existing models unfeasible.To address this challenge, we propose CMCLRec, a Cross-Modal Contrastive Learning framework for user cold-start RECommendation. This approach aims to solve the user cold-start problem by customizing inputs for cold-start users that align with the requirements of sequential recommendation models in a cross-modal manner. Specifically, CMCLRec adopts cross-modal contrastive learning to construct a mapping from user features to user-item interactions based on warm user data. It then generates a simulated behavior sequence for each cold-start user in turn for recommendation purposes. In this way, CMCLRec is theoretically compatible with any extant sequential recommendation model. Comprehensive experiments conducted on real-world datasets substantiate that, compared with state-of-the-art baseline models, CMCLRec markedly enhances the performance of conventional sequential recommendation models, particularly for cold-start users.	序贯推荐模型通过分析历史上的用户-项目交互，生成项目的嵌入，并利用获得的嵌入来预测用户偏好。尽管这些模型能够有效地向用户展示个性化偏好，但它们严重依赖于用户项目交互。然而，由于缺乏交互信息，新用户在使用顺序推荐模型进行预测时面临着挑战，这被认为是冷启动问题。最近的研究虽然在特定的结构内解决了这个问题，但往往忽视了与现有顺序推荐模型的兼容性，使得与现有模型的无缝集成变得不可行。为了应对这一挑战，我们提出了 CMCLRec，一个用于用户冷启动推荐的跨模态对比学习框架。这种方法旨在解决用户冷启动问题，为冷启动用户定制输入，以跨模式的方式符合顺序推荐模型的要求。具体来说，CMCLRec 采用跨模态对比学习方法，构建了基于暖用户数据的用户特征到用户项交互的映射关系。然后，它为每个冷启动用户依次生成一个模拟的行为序列，用于推荐目的。这样，CMCLRec 在理论上与任何现存的顺序推荐模型兼容。在真实世界数据集上进行的综合实验证实，与最先进的基线模型相比，CMCLRec 显著提高了传统顺序推荐模型的性能，特别是对于冷启动用户。	code	0
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention	Ziru Liu, Shuchang Liu, Zijian Zhang, Qingpeng Cai, Xiangyu Zhao, Kesen Zhao, Lantao Hu, Peng Jiang, Kun Gai	City University of Hong Kong; Unaffiliated; Kuaishou Technology Strategy Algorithm Department; City University of Hong Kong School of Data Science; Kuaishou Technology	In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards. Nevertheless, it suffers from instability in the learning process, stemming from the intricate interactions among bootstrapping, off-policy training, and function approximation. Moreover, in multi-reward recommendation scenarios, designing a proper reward setting that reconciles the inner dynamics of various tasks is quite intricate. In response to these challenges, we introduce DT4IER, an advanced decision transformer-based recommendation model that is engineered to not only elevate the effectiveness of recommendations but also to achieve a harmonious balance between immediate user engagement and long-term retention. The DT4IER applies an innovative multi-reward design that adeptly balances short and long-term rewards with user-specific attributes, which serve to enhance the contextual richness of the reward sequence ensuring a more informed and personalized recommendation process. To enhance its predictive capabilities, DT4IER incorporates a high-dimensional encoder, skillfully designed to identify and leverage the intricate interrelations across diverse tasks. Furthermore, we integrate a contrastive learning approach within the action embedding predictions, a strategy that significantly boosts the model's overall performance. Experiments on three real-world datasets demonstrate the effectiveness of DT4IER against state-of-the-art Sequential Recommender Systems (SRSs) and Multi-Task Learning (MTL) models in terms of both prediction accuracy and effectiveness in specific tasks. The source code is accessible online to facilitate replication	在推荐系统(RS)应用领域，强化学习(rL)最近已经成为一种强大的工具，这主要是由于它在优化长期回报方面的熟练程度。尽管如此，由于自学、非政策培训和函数逼近之间错综复杂的相互作用，它在学习过程中存在不稳定性。此外，在多奖励推荐场景中，设计一个适当的奖励设置来协调各种任务的内部动态是相当复杂的。为了应对这些挑战，我们引入了 DT4IER，这是一种基于决策转换器的高级推荐模型，不仅旨在提高推荐的有效性，而且还旨在实现直接用户参与和长期保留之间的和谐平衡。DT4IER 采用了一种创新的多奖励设计，能够巧妙地平衡短期和长期奖励与用户特定属性之间的关系，这有助于增强奖励序列的上下文丰富性，确保推荐过程更加知情和个性化。为了增强其预测能力，DT4IER 采用了高维编码器，巧妙地设计识别和利用不同任务之间错综复杂的相互关系。此外，我们在嵌入预测的动作中整合了一种对比学习方法，这种策略显著地提高了模型的整体性能。在三个实际数据集上的实验证明了 DT4IER 对最先进的顺序推荐系统(SRS)和多任务学习(MTL)模型在特定任务的预测准确性和有效性方面的有效性。可以联机访问源代码，以便于复制	code	0
Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images	Shicheng Xu, Danyang Hou, Liang Pang, Jingcheng Deng, Jun Xu, Huawei Shen, Xueqi Cheng	Institute of Computing Technology, Chinese Academy of Sciences; Gaoling School of Artificial Intelligence, Renmin University of China	With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon causes source bias in text retrieval for web search. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.	随着生成模型的进步，人工智能生成的内容(AIGC)正变得越来越现实，充斥着互联网。最近的一项研究表明，这种现象造成源偏见的文本检索的网络搜索。具体来说，神经检索模型对生成文本的排名往往高于人写文本。在本文中，我们将这种偏差的研究扩展到跨模态检索。首先，我们成功地构建了一个合适的基准来研究这种偏差的存在。随后在这个基准上进行的大量实验表明，人工智能生成的图像给文本图像检索模型带来了不可见的相关性偏差。具体来说，我们的实验表明，文本图像检索模型对人工智能生成的图像的排序往往高于真实图像，即使人工智能生成的图像并没有表现出更多的视觉相关特征的查询比真实图像。这种看不见的相关性偏差在具有不同训练数据和结构的检索模型中普遍存在。此外，我们随后的研究表明，在检索模型的训练数据中包含人工智能生成的图像加剧了不可见的相关性偏差。上述现象引发了一个恶性循环，使得无形的关联偏差越来越严重。为了阐明隐性相关产生的潜在原因并解决上述问题，我们引入了一种有效的训练方法来缓解隐性相关偏差。随后，我们应用我们提出的去偏方法来追溯识别不可见相关性的原因，揭示了人工智能生成的图像诱导图像编码器嵌入额外的信息到他们的表示。这些信息在生成的具有不同语义的图像之间表现出一定的一致性，并且可以使检索器估计出更高的相关性得分。	code	0
Fair Sequential Recommendation without User Demographics	Huimin Zeng, Zhankui He, Zhenrui Yue, Julian J. McAuley, Dong Wang	University of Illinois Urbana-Champaign; University of Illinois at Urbana-Champaign; University of California, San Diego	Much existing literature on fair recommendation (i.e., group fairness) leverages users' demographic attributes (e.g., gender) to develop fair recommendation methods. However, in real-world scenarios, due to privacy concerns and convenience considerations, users may not be willing to share their demographic information with the system, which limits the application of many existing methods. Moreover, sequential recommendation (SR) models achieve state-of-the-art performance compared to traditional collaborative filtering (CF) recommenders, and can represent users solely using user-item interactions (user-free). This leaves a wrong impression that SR models are free from group unfairness by design. In this work, we explore a critical question: how can we build a fair sequential recommendation system without even knowing user demographics? To address this problem, we propose Agnostic FairSeqRec (A-FSR): a model-agnostic and demographic-agnostic debiasing framework for sequential recommendation without requiring users' demographic attributes. Firstly, A-FSR reduces the correlation between the potential stereotypical patterns in the input sequences and final recommendations via Dirichlet neighbor smoothing. Secondly, A-FSR estimates an under-represented group of sequences via a gradient-based heuristic, and implicitly moves training focus towards the under-represented group by minimizing a distributionally robust optimization (DRO) based objective. Results on real-world datasets show that A-FSR achieves significant improvements on group fairness in sequential recommendation, while outperforming other state-of-the-art baselines.	关于公平推荐(即群体公平)的许多现有文献利用用户的人口统计特征(如性别)来发展公平推荐方法。然而，在现实世界的情况下，由于隐私问题和方便的考虑，用户可能不愿意与系统共享他们的人口统计信息，这限制了许多现有方法的应用。此外，序贯推荐(SR)模型与传统的协同过滤推荐(CF)模型相比，可以实现最先进的性能，并且可以完全使用用户项交互(用户自由)来代表用户。这就给人留下了一个错误的印象，认为 SR 模型在设计上不存在群体不公平。在这项工作中，我们探讨了一个关键问题: 我们如何建立一个公平的顺序推荐系统，甚至不知道用户的人口统计？为了解决这个问题，我们提出了不可知的 FairSeqRec (A-FSR) : 一个不需要用户人口统计属性的模型不可知和人口统计不可知的连续推荐消偏框架。首先，A-FSR 通过 Dirichlet 邻域平滑降低了输入序列中潜在的常规模式与最终推荐值之间的相关性。其次，A-FSR 通过基于梯度的启发式算法估计一组未被充分表示的序列，并通过最小化基于分布鲁棒优化(DRO)的目标隐式地将训练焦点移向未被充分表示的序列。实际数据集的结果表明，A-FSR 在顺序推荐方面取得了显著的改善，同时优于其他最先进的基线。	code	0
Negative Sampling Techniques for Dense Passage Retrieval in a Multilingual Setting	Thilina Chaturanga Rajapakse, Andrew Yates, Maarten de Rijke	University of Amsterdam	The bi-encoder transformer architecture has become popular in open-domain retrieval, surpassing traditional sparse retrieval methods. Using hard negatives during training can improve the effectiveness of dense retrievers, and various techniques have been proposed to generate these hard negatives. We investigate the effectiveness of multiple negative sampling methods based on lexical methods (BM25), clustering, and periodically updated dense indices. We examine techniques that were introduced for finding hard negatives in a monolingual setting and reproduce them in a multilingual setting. We discover a gap amongst these techniques that we fill by proposing a novel clustered training method. Specifically, we focus on monolingual retrieval using multilingual dense retrievers across a broad set of diverse languages. We find that negative sampling based on BM25 negatives is surprisingly effective in an in-distribution setting, but this finding does not generalize to out-of-distribution and zero-shot settings, where the newly proposed method achieves the best results. We conclude with recommendations on which negative sampling methods may be the most effective given different multilingual retrieval scenarios.	双编码器变压器结构已经成为开放域检索中的热点，超越了传统的稀疏检索方法。在训练过程中使用硬负片可以提高密集型检索器的效率，人们提出了各种技术来产生这些硬负片。我们研究了基于词汇方法(BM25)、聚类和周期性更新密集指数的多重负抽样方法的有效性。我们研究了在单语环境下寻找硬负面的技术，并在多语环境下重现这些技术。我们通过提出一种新的聚类训练方法来填补这些技术之间的空白。具体来说，我们的重点是使用多语言密集检索器跨多种语言的单语言检索。我们发现基于 BM25负值的负采样在分布内环境中有惊人的效果，但是这一发现并没有推广到分布外环境和零拍环境中，在这两种环境中，新提出的方法取得了最好的效果。最后，我们给出了在不同的多语言检索场景下，哪种负抽样方法可能是最有效的建议。	code	0
M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework	Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai		Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive multi-domain multi-task mixture-of-experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.	多领域推荐和多任务推荐在利用来自不同领域和目标的公共信息进行全面的用户建模方面展示了它们的有效性。尽管如此，实际的推荐通常同时面对多个领域和任务，而这些领域和任务不能被当前的方法很好地处理。为此，我们介绍了一个自适应的多领域多任务混合专家推荐框架 M3oE。M3oE 集成了多领域信息，映射了跨领域和任务的知识，并优化了多个目标。我们利用三个专家混合模块分别学习通用、领域方面和任务方面的用户偏好，以解决多个领域和任务之间的复杂依赖关系。此外，我们设计了一个两级融合机制，用于精确控制不同领域和任务的特征提取和融合。通过应用 AutoML 技术，进一步提高了框架的适应性，实现了动态结构优化。据作者所知，我们的 M3oE 首次尝试自适应地解决多领域多任务推荐问题。针对不同基线的两个基准数据集的大量实验证明了 M3oE 的优越性能。实现代码可用于确保可重复性。	code	0
NFARec: A Negative Feedback-Aware Recommender Model	Xinfeng Wang, Fumiyo Fukumoto, Jin Cui, Yoshimi Suzuki, Dongjin Yu	School of Computer Science and Technology, Hangzhou Dianzi University; Graduate Faculty of Interdisciplinary Research, University of Yamanashi; Faculty of Engineering, Integrated Graduate School of Medicine, Engineering, and Agricultural Sciences	Graph neural network (GNN)-based models have been extensively studied for recommendations, as they can extract high-order collaborative signals accurately which is required for high-quality recommender systems. However, they neglect the valuable information gained through negative feedback in two aspects: (1) different users might hold opposite feedback on the same item, which hampers optimal information propagation in GNNs, and (2) even when an item vastly deviates from users' preferences, they might still choose it and provide a negative rating. In this paper, we propose a negative feedback-aware recommender model (NFARec) that maximizes the leverage of negative feedback. To transfer information to multi-hop neighbors along an optimal path effectively, NFARec adopts a feedback-aware correlation that guides hypergraph convolutions (HGCs) to learn users' structural representations. Moreover, NFARec incorporates an auxiliary task - predicting the feedback sentiment polarity (i.e., positive or negative) of the next interaction - based on the Transformer Hawkes Process. The task is beneficial for understanding users by learning the sentiment expressed in their previous sequential feedback patterns and predicting future interactions. Extensive experiments demonstrate that NFARec outperforms competitive baselines. Our source code and data are released at https://github.com/WangXFng/NFARec.	基于图形神经网络(GNN)的推荐系统模型能够准确地提取高阶协同信号，是高质量推荐系统所必需的。然而，他们忽视了通过负面反馈获得的有价值的信息在两个方面: (1)不同的用户可能对同一个项目持有相反的反馈，这阻碍了最佳信息在 GNN 中的传播，和(2)即使一个项目大大偏离用户的喜好，他们仍然可能选择它，并提供一个负面评价。在本文中，我们提出了一个负反馈感知的推荐模型(NFARec) ，最大限度地利用负反馈。NFARec 采用反馈感知关联算法，引导超图卷积(HGC)学习用户的结构表示，有效地将信息沿着最优路径传递给多跳邻居。此外，NFARec 还包含了一个辅助任务——预测下一次交互的反馈情绪极性(即正极或负极)——基于变压器霍克斯过程。这项任务有利于了解用户的情绪表达在他们以前的顺序反馈模式和预测未来的交互。大量的实验表明，NFARec 的表现优于竞争基线。我们的源代码和数据在 https://github.com/wangxfng/nfarec 公布。	code	0
Modeling User Fatigue for Sequential Recommendation	Nian Li, Xin Ban, Cheng Ling, Chen Gao, Lantao Hu, Peng Jiang, Kun Gai, Yong Li, Qingmin Liao	Kuaishou Inc.; Shenzhen International Graduate School, Tsinghua University; Department of Electronic Engineering, Tsinghua University; Independent; Tsinghua University	Recommender systems filter out information that meets user interests. However, users may be tired of the recommendations that are too similar to the content they have been exposed to in a short historical period, which is the so-called user fatigue. Despite the significance for a better user experience, user fatigue is seldom explored by existing recommenders. In fact, there are three main challenges to be addressed for modeling user fatigue, including what features support it, how it influences user interests, and how its explicit signals are obtained. In this paper, we propose to model user Fatigue in interest learning for sequential Recommendations (FRec). To address the first challenge, based on a multi-interest framework, we connect the target item with historical items and construct an interest-aware similarity matrix as features to support fatigue modeling. Regarding the second challenge, built upon feature cross, we propose a fatigue-enhanced multi-interest fusion to capture long-term interest. In addition, we develop a fatigue-gated recurrent unit for short-term interest learning, with temporal fatigue representations as important inputs for constructing update and reset gates. For the last challenge, we propose a novel sequence augmentation to obtain explicit fatigue signals for contrastive learning. We conduct extensive experiments on real-world datasets, including two public datasets and one large-scale industrial dataset. Experimental results show that FRec can improve AUC and GAUC up to 0.026 and 0.019 compared with state-of-the-art models, respectively. Moreover, large-scale online experiments demonstrate the effectiveness of FRec for fatigue reduction. Our codes are released at https://github.com/tsinghua-fib-lab/SIGIR24-FRec.	推荐系统过滤出符合用户兴趣的信息。然而，用户可能会厌倦那些与他们在很短的历史时期内接触到的内容过于相似的推荐，这就是所谓的用户疲劳。尽管这对于更好的用户体验意义重大，但是现有的推荐者很少探讨用户疲劳问题。实际上，建立用户疲劳模型需要解决三个主要问题，包括哪些特性支持用户疲劳，它如何影响用户兴趣，以及如何获得用户疲劳的显性信号。在本文中，我们提出了模型用户疲劳的兴趣学习顺序推荐(FRec)。为了解决第一个问题，我们基于一个多兴趣框架，将目标项目与历史项目连接起来，构造一个感兴趣的相似矩阵作为特征来支持疲劳建模。针对第二个挑战，建立在特征交叉的基础上，我们提出了一种疲劳增强的多兴趣融合来捕获长期兴趣。此外，我们开发了一个用于短期兴趣学习的疲劳门控循环单元，以时间疲劳表示作为构造更新门和复位门的重要输入。针对最后一个挑战，我们提出了一种新的序列增强方法，用于获得用于对比学习的显式疲劳信号。我们对真实世界的数据集进行了广泛的实验，包括两个公共数据集和一个大规模的工业数据集。实验结果表明，与现有模型相比，FRec 可以提高 AUC 和 GAUC，分别达到0.026和0.019。此外，大规模的在线实验证明了 FRec 对疲劳减振的有效性。我们的密码在 https://github.com/tsinghua-fib-lab/sigir24-frec 公布。	code	0
DDPO: Direct Dual Propensity Optimization for Post-Click Conversion Rate Estimation	Hongzu Su, Lichao Meng, Lei Zhu, Ke Lu, Jingjing Li	University of Electronic Science and Technology of China; Tongji University	In online advertising, the sample selection bias problem is a major cause of inaccurate conversion rate estimates. Current mainstream solutions only perform causality-based optimization in the click space since the conversion labels in the non-click space are absent. However, optimization for unclicked samples is equally essential because the non-click space contains more samples and user characteristics than the click space. To exploit the unclicked samples, we propose a Direct Dual Propensity Optimization (DDPO) framework to optimize the model directly in impression space with both clicked and unclicked samples. In this framework, we specifically design a click propensity network and a conversion propensity network. The click propensity network is dedicated to ensuring that optimization in the click space is unbiased. The conversion propensity network is designed to generate pseudo-conversion labels for unclicked samples, thus overcoming the challenge of absent labels in non-click space. With these two propensity networks, we are able to perform causality-based optimization in both click space and non-click space. In addition, to strengthen the causal relationship, we design two causal transfer modules for the conversion rate prediction model with the attention mechanism. The proposed framework is evaluated on five real-world public datasets and one private Tencent advertising dataset. Experimental results verify that our method is able to improve the prediction performance significantly. For instance, our method outperforms the previous state-of-the-art method by 7.0% in terms of the Area Under the Curve on the Ali-CCP dataset.	在网络广告中，样本选择偏差问题是导致转化率估计不准确的主要原因。当前的主流解决方案只在点击空间中执行基于因果关系的优化，因为非点击空间中没有转换标签。然而，对未点击样本的优化同样重要，因为非点击空间比点击空间包含更多的样本和用户特征。为了利用未点击样本，我们提出了一个直接双倾向优化(DDPO)框架，直接在印象空间中对点击样本和未点击样本进行优化。在这个框架中，我们具体设计了一个点击倾向网络和一个转换倾向网络。点击倾向网络致力于确保点击空间的优化是无偏的。转换倾向网络的设计目的是为未点击样本生成伪转换标签，从而克服非点击空间中标签缺失的困难。有了这两个倾向网络，我们就能够在点击空间和非点击空间进行基于因果关系的优化。此外，为了加强因果关系，我们设计了两个具有注意机制的因果传递模块用于转化率预测模型。建议的框架是根据五个真实世界的公共数据集和一个私人腾讯广告数据集进行评估的。实验结果表明，该方法能够显著提高预测性能。例如，在 Ali-CCP 数据集的曲线下面积方面，我们的方法比以前最先进的方法高出7.0% 。	code	0
A Generic Behavior-Aware Data Augmentation Framework for Sequential Recommendation	Jing Xiao, Weike Pan, Zhong Ming	Shenzhen University	Multi-behavior sequential recommendation (MBSR), which models multi-behavior sequentiality and heterogeneity to better learn users' multifaceted intentions has achieved remarkable success. Though effective, the performance of these approaches may be limited due to the sparsity inherent in a real-world data. Existing data augmentation methods in recommender systems focus solely on a single type of behavior, overlooking the variations in expressing user preferences via different types of behaviors. During the augmentation of samples, it is easy to introduce excessive disturbance or noise, which may mislead the next-item recommendation. To address this limitation, we propose a novel generic framework called multi-behavior data augmentation for sequential recommendation (MBASR). Specifically, we design three behavior-aware data augmentation operations to construct rich training samples. Each augmentation operation takes into account the correlations between behaviors and aligns with the users' behavior patterns. In addition, we introduce a position-based sampling strategy that can effectively reduce the perturbation brought by the augmentation operations to the original data. Note that our model is data-oriented and can thus be embedded in different downstream MBSR models, so the overall framework is generic. Extensive experiments on three real-world datasets demonstrate the effectiveness of our MBASR and its applicability to a wide variety of mainstream MBSR models. Our source code is available at https://github.com/XiaoJing-C/MBASR.	多行为顺序推荐(MBRR)模型对多行为顺序性和异构性进行建模，以更好地了解用户的多方面意图，已取得了显著的成功。尽管这些方法有效，但由于真实世界数据中固有的稀疏性，它们的性能可能会受到限制。推荐系统中现有的数据增强方法只关注单一类型的行为，忽略了通过不同类型的行为表达用户偏好的差异。在样本的增大过程中，容易引入过多的干扰或噪声，从而误导下一项的推荐。为了解决这个问题，我们提出了一种新的通用框架，称为序贯推荐的多行为数据增强(MBASR)。具体来说，我们设计了三个行为感知的数据增强操作来构造丰富的训练样本。每个增强操作都考虑到行为之间的相关性，并与用户的行为模式保持一致。此外，我们还引入了一种基于位置的采样策略，可以有效地减少增广操作对原始数据的干扰。注意，我们的模型是面向数据的，因此可以嵌入到不同的下游 MBSR 模型中，所以总体框架是通用的。在三个实际数据集上的大量实验证明了我们的 MBASR 的有效性及其对各种主流 MBSR 模型的适用性。我们的源代码可以在 https://github.com/xiaojing-c/mbasr 找到。	code	0
FineRec: Exploring Fine-grained Sequential Recommendation	Xiaokun Zhang, Bo Xu, Youlin Wu, Yuan Zhong, Hongfei Lin, Fenglong Ma	Dalian University of Technology; Pennsylvania State University	Sequential recommendation is dedicated to offering items of interest for users based on their history behaviors. The attribute-opinion pairs, expressed by users in their reviews for items, provide the potentials to capture user preferences and item characteristics at a fine-grained level. To this end, we propose a novel framework FineRec that explores the attribute-opinion pairs of reviews to finely handle sequential recommendation. Specifically, we utilize a large language model to extract attribute-opinion pairs from reviews. For each attribute, a unique attribute-specific user-opinion-item graph is created, where corresponding opinions serve as the edges linking heterogeneous user and item nodes. Afterwards, we devise a diversity-aware convolution operation to aggregate information within the graphs, enabling attribute-specific user and item representation learning. Ultimately, we present an interaction-driven fusion mechanism to integrate attribute-specific user/item representations across all attributes for generating recommendations. Extensive experiments conducted on several real-world datasets demonstrate the superiority of our FineRec over existing state-ofthe-art methods. Further analysis also verifies the effectiveness of our fine-grained manner in handling the task.	序列推荐致力于根据用户的历史行为为他们提供感兴趣的项目。由用户在项目评论中表达的属性-意见对提供了在细粒度水平上捕获用户偏好和项目特征的潜力。为此，我们提出了一个新的框架 FineRec，探索评论的属性-意见对，以精细处理顺序推荐。具体来说，我们利用一个大型的语言模型来从评论中提取属性-意见对。对于每个属性，创建一个惟一的特定于属性的用户意见项图，其中相应的意见作为连接异构用户和项目节点的边。然后，我们设计一个多样性感知的卷积运算来聚集图中的信息，使特定属性的用户和项目表示学习。最后，我们提出了一种交互驱动的融合机制，用于跨所有属性集成特定于属性的用户/项表示，以生成建议。在几个真实世界数据集上进行的大量实验证明了我们的 FineRec 相对于现有最先进的方法的优越性。进一步的分析还验证了我们处理任务的细粒度方式的有效性。	code	0
ReFer: Retrieval-Enhanced Vertical Federated Recommendation for Full Set User Benefit	Wenjie Li, Zhongren Wang, Jinpeng Wang, Shutao Xia, Jile Zhu, Mingjian Chen, Jiangke Fan, Jia Cheng, Jun Lei	Tsinghua University; Meituan	As an emerging privacy-preserving approach to leveraging cross-platform user interactions, vertical federated learning (VFL) has been increasingly applied in recommender systems. However, vanilla VFL is only applicable to overlapped users, ignoring potential universal interest patterns hidden among non-overlapped users and suffers from limited user group benefits, which hinders its application in real-world recommenders. In this paper, we extend the traditional vertical federated recommendation problem (VFR) to a more realistic Fully-Vertical federated recommendation setting (Fully-VFR) which aims to utilize all available data and serve full user groups. To tackle challenges in implementing Fully-VFR, we propose a Retrieval-enhanced Vertical Federated recommender (ReFer), a groundbreaking initiative that explores retrieval-enhanced machine learning approaches in VFL. Specifically, we establish a general "retrieval-and-utilization" algorithm to enhance the quality of representations across all parties. We design a flexible federated retrieval augmentation (RA) mechanism for VFL: (i) Cross-RA to complement field missing and (ii) Local-RA to promote mutual understanding between user groups. We conduct extensive experiments on both public and industry datasets. Results on both sequential and non-sequential CTR prediction tasks demonstrate that our method achieves significant performance improvements over baselines and is beneficial for all user groups.	作为一种新兴的利用跨平台用户交互的隐私保护方法，垂直联邦学习(VFL)在推荐系统中得到了越来越多的应用。然而，普通的 VFL 只适用于重叠用户，忽略了隐藏在非重叠用户之间的潜在通用兴趣模式，并且受到用户组好处的限制，这阻碍了它在实际推荐中的应用。本文将传统的垂直联邦推荐问题(VFR)扩展到一个更加现实的全垂直联邦推荐设置(Full-VFR) ，其目的是利用所有可用的数据，为全用户组提供服务。为了解决在实施完全 VFR 的挑战，我们提出了一个检索增强垂直联邦推荐(参考) ，一个突破性的倡议，探索检索增强机器学习方法在 VFL。具体来说，我们建立了一个通用的“检索和利用”算法，以提高所有各方的表示质量。我们设计了一个灵活的 VFL 联邦检索增强(RA)机制: (i)交叉 RA 来补充字段缺失; (ii)本地 RA 来促进用户组之间的相互理解。我们在公共和行业数据集上进行广泛的实验。在顺序和非顺序 CTR 预测任务中的结果表明，我们的方法比基线性能有了显著的提高，并且对所有用户组都有利。	code	0
Pacer and Runner: Cooperative Learning Framework between Single- and Cross-Domain Sequential Recommendation	Chung Park, Taesan Kim, Hyungjun Yoon, Junui Hong, Yelim Yu, Mincheol Cho, Minsung Choi, Jaegul Choo	SK Telelcom; Korea Advanced Institute of Science and Technology; SK Telelcom / KAIST; SK Telecom / KAIST	Cross-Domain Sequential Recommendation (CDSR) improves recommendation performance by utilizing information from multiple domains, which contrasts with Single-Domain Sequential Recommendation (SDSR) that relies on a historical interaction within a specific domain. However, CDSR may underperform compared to the SDSR approach in certain domains due to negative transfer, which occurs when there is a lack of relation between domains or different levels of data sparsity. To address the issue of negative transfer, our proposed CDSR model estimates the degree of negative transfer of each domain and adaptively assigns it as a weight factor to the prediction loss, to control gradient flows through domains with significant negative transfer. To this end, our model compares the performance of a model trained on multiple domains (CDSR) with a model trained solely on the specific domain (SDSR) to evaluate the negative transfer of each domain using our asymmetric cooperative network. In addition, to facilitate the transfer of valuable cues between the SDSR and CDSR tasks, we developed an auxiliary loss that maximizes the mutual information between the representation pairs from both tasks on a per-domain basis. This cooperative learning between SDSR and CDSR tasks is similar to the collaborative dynamics between pacers and runners in a marathon. Our model outperformed numerous previous works in extensive experiments on two real-world industrial datasets across ten service domains. We also have deployed our model in the recommendation system of our personal assistant app service, resulting in 21.4% increase in click-through rate compared to existing models, which is valuable to real-world business1.	跨域序列推荐(CDSR)通过利用来自多个域的信息来提高推荐性能，这与依赖于特定域内的历史交互的单域序列推荐(SDSR)形成了鲜明的对比。然而，CDSR 方法在某些领域的表现可能不如 SDSR 方法，这是由于负迁移，这种负迁移发生在领域之间缺乏联系或不同层次的数据稀疏时。为了解决负迁移问题，我们提出的 CDSR 模型估计每个域的负迁移程度，并自适应地将其作为预测损失的权重因子，以控制梯度流通过具有显著负迁移的域。为此，我们的模型比较了在多域(CDSR)训练的模型和单独在特定域(SDSR)训练的模型的性能，以评估使用我们的非对称合作网络的每个域的负迁移。此外，为了促进 SDSR 和 CDSR 任务之间有价值线索的传递，我们开发了一个辅助损失模型，该模型在每个领域的基础上最大化两个任务表征对之间的相互信息。SDSR 和 CDSR 任务之间的协作学习类似于马拉松中步行者和跑步者之间的协作动力学。我们的模型在十个服务领域的两个实际工业数据集上进行了广泛的实验，其性能优于以前的许多工作。我们也在个人助理应用程序服务的推荐系统中使用了我们的模型，与现有模型相比，点进率增加了21.4% ，这对于现实世界的商业来说是很有价值的。	code	0
Aiming at the Target: Filter Collaborative Information for Cross-Domain Recommendation	Hanyu Li, Weizhi Ma, Peijie Sun, Jiayu Li, Cunxiang Yin, Yancheng He, Guoqiang Xu, Min Zhang, Shaoping Ma	Tencent; Tsinghua University	Cross-domain recommender (CDR) systems aim to enhance the performance of the target domain by utilizing data from other related domains. However, irrelevant information from the source domain may instead degrade target domain performance, which is known as the negative transfer problem. There have been some attempts to address this problem, mostly by designing adaptive representations for overlapped users. Whereas, representation adaptions solely rely on the expressive capacity of the CDR model, lacking explicit constraint to filter the irrelevant source-domain collaborative information for the target domain. In this paper, we propose a novel Collaborative information regularized User Transformation (CUT) framework to tackle the negative transfer problem by directly filtering users' collaborative information. In CUT, user similarity in the target domain is adopted as a constraint for user transformation learning to filter the user collaborative information from the source domain. CUT first learns user similarity relationships from the target domain. Then, source-target information transfer is guided by the user similarity, where we design a user transformation layer to learn target-domain user representations and a contrastive loss to supervise the user collaborative information transferred. The results show significant performance improvement of CUT compared with SOTA single and cross-domain methods. Further analysis of the target-domain results illustrates that CUT can effectively alleviate the negative transfer problem.	跨域推荐(CDR)系统旨在通过利用其他相关域的数据来提高目标域的性能。然而，来自源域的不相关信息反而会降低目标域的性能，这就是所谓的负迁移问题。已经有一些尝试来解决这个问题，主要是通过为重叠用户设计自适应表示。然而，表示适配仅仅依赖于 CDR 模型的表达能力，缺乏明确的约束来过滤不相关的源域协同信息。本文提出了一种新的协同信息规范化用户转换(CUT)框架，通过直接过滤用户的协同信息来解决负迁移问题。在 CUT 中，采用目标域中的用户相似度作为用户转换学习的约束条件，对源域中的用户协作信息进行过滤。CUT 首先从目标域学习用户相似性关系。然后，以用户相似性为指导，设计了用户转换层来学习目标域用户表示，并通过对比度损失来监督用户协同信息的传输。结果表明，与 SOTA 单域和跨域方法相比，CUT 的性能有了显著的提高。对目标域结果的进一步分析表明，CUT 可以有效地缓解负迁移问题。	code	0
On the Negative Perception of Cross-domain Recommendations and Explanations	Denis Kotkov, Alan Medlar, Yang Liu, Dorota Glowacka	University of Helsinki	Recommender systems typically operate within a single domain, for example, recommending books based on users' reading habits. If such data is unavailable, it may be possible to make cross-domain recommendations and recommend books based on user preferences from another domain, such as movies. However, despite considerable research on cross-domain recommendations, no studies have investigated their impact on users' behavioural intentions or system perceptions compared to single-domain recommendations. Similarly, while single-domain explanations have been shown to improve users' perceptions of recommendations, there are no comparable studies for the cross-domain case. In this article, we present a between-subject study (N=237) of users' behavioural intentions and perceptions of book recommendations. The study was designed to disentangle the effects of whether recommendations were single- or cross-domain from whether explanations were present or not. Our results show that cross-domain recommendations have lower trust and interest than single-domain recommendations, regardless of their quality. While these negative effects can be ameliorated by cross-domain explanations, they are still perceived as inferior to single-domain recommendations without explanations. Last, we show that explanations decrease interest in the single-domain case, but increase perceived transparency and scrutability in both single- and cross-domain recommendations. Our findings offer valuable insights into the impact of recommendation provenance on user experience and could inform the future development of cross-domain recommender systems.	推荐系统通常在单一领域内运作，例如，根据用户的阅读习惯推荐书籍。如果这样的数据是不可用的，它可能会作出跨领域的建议，并推荐书籍的基础上用户喜好从另一个领域，如电影。然而，尽管对跨领域建议进行了大量的研究，但没有研究调查它们对用户行为意图或系统感知的影响，与单领域建议相比。同样，虽然单一领域的解释已被证明可以改善用户对推荐的看法，但是对于跨领域的案例没有可比较的研究。在这篇文章中，我们提出了一个主题间的研究(N = 237)用户的行为意图和感知的书籍推荐。这项研究的目的是将建议是单一还是跨领域的影响与解释是否存在区分开来。我们的研究结果表明，无论其质量如何，跨域建议比单域建议具有更低的信任度和兴趣。虽然这些负面影响可以通过跨领域的解释得到改善，但它们仍然被认为不如没有解释的单领域建议。最后，我们表明，解释降低兴趣的单一领域的情况下，但增加感知的透明度和审查在单一和跨领域的建议。我们的研究结果为推荐来源对用户体验的影响提供了有价值的见解，并且可以为跨域推荐系统的未来发展提供信息。	code	0
Multi-Domain Sequential Recommendation via Domain Space Learning	Junyoung Hwang, Hyunjun Ju, SeongKu Kang, Sanghwan Jang, Hwanjo Yu	University of Illinois Urbana-Champaign; 42dot; Pohang University of Science and Technology	This paper explores Multi-Domain Sequential Recommendation (MDSR), an advancement of Multi-Domain Recommendation that incorporates sequential context. Recent MDSR approach exploits domain-specific sequences, decoupled from mixed-domain histories, to model domain-specific sequential preference, and use mixeddomain histories to model domain-shared sequential preference. However, the approach faces challenges in accurately obtaining domain-specific sequential preferences in the target domain, especially when users only occasionally engage with it. In such cases, the history of users in the target domain is limited or not recent, leading the sequential recommender system to capture inaccurate domain-specific sequential preferences. To address this limitation, this paper introduces Multi-Domain Sequential Recommendation via Domain Space Learning (MDSR-DSL). Our approach utilizes cross-domain items to supplement missing sequential context in domain-specific sequences. It involves creating a "domain space" to maintain and utilize the unique characteristics of each domain and a domain-to-domain adaptation mechanism to transform item representations across domain spaces. To validate the effectiveness of MDSR-DSL, this paper extensively compares it with state-of-the-art MD(S)R methods and provides detailed analyses.	多域顺序推荐(MDSR)是结合顺序上下文的多域推荐的一种进步。最近的 MDSR 方法利用领域特定的序列，从混合领域历史解耦，建模领域特定的顺序偏好，并使用混合领域历史建模领域共享的顺序偏好。然而，该方法在准确获取目标域中特定于领域的顺序首选项时面临挑战，特别是当用户只是偶尔使用它时。在这种情况下，目标域的用户历史是有限的或不是最近的，导致顺序推荐系统捕获不准确的领域特定的顺序首选项。针对这一局限性，本文引入了基于领域空间学习的多领域序贯推荐(MDSR-DSL)。我们的方法利用跨领域的项目来补充领域特定序列中缺少的顺序上下文。它包括创建一个“域空间”来维护和利用每个域的独特特征，以及一个域到域的适应机制来跨域空间转换项表示。为了验证 MDSR-DSL 的有效性，本文将其与最新的 MD (S) R 方法进行了广泛的比较，并给出了详细的分析。	code	0
Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems	Dayu Yang, Fumian Chen, Hui Fang	University of Delaware	Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead to decreased accuracy in recommendations and lower user satisfaction. Despite its importance, existing studies in CRS lack a study about how to measure such behavior discrepancy. To fill this gap, we propose Behavior Alignment, a new evaluation metric to measure how well the recommendation strategies made by a LLM-based CRS are consistent with human recommenders'. Our experiment results show that the new metric is better aligned with human preferences and can better differentiate how systems perform than existing evaluation metrics. As Behavior Alignment requires explicit and costly human annotations on the recommendation strategies, we also propose a classification-based method to implicitly measure the Behavior Alignment based on the responses. The evaluation results confirm the robustness of the method.	大语言模型(LLM)在会话推荐系统(CRS)中显示出巨大的潜力。然而，LLM 在 CRS 中的应用暴露了基于 LLM 的 CRS 和人类推荐者之间显着的行为差异: LLM 往往显得不灵活和被动，经常在没有充分询问的情况下匆忙完成推荐任务。这种行为差异可能导致推荐的准确性下降和用户满意度降低。尽管 CRS 具有重要意义，但是现有的研究缺乏如何测量这种行为差异的研究。为了填补这个空白，我们提出了行为校准，一个新的评估指标，以衡量如何以 LLM 为基础的 CRS 的推荐策略是一致的人类推荐者的。我们的实验结果表明，与现有的评估指标相比，新的指标更符合人类的偏好，能够更好地区分系统的执行情况。由于行为对齐需要对推荐策略进行明确而昂贵的人工注释，我们还提出了一种基于分类的方法来隐式地度量基于响应的行为对齐。评价结果证实了该方法的鲁棒性。	code	0
Bi-Objective Negative Sampling for Sensitivity-Aware Search	Jack McKechnie, Graham McDonald, Craig Macdonald	University of Glasgow	Cross-encoders leverage fine-grained interactions between documents and queries for effective relevance ranking. Such ranking models are typically trained to satisfy the single objective of providing relevant information to the users. However, not all information should be made available. For example, documents containing sensitive information, such as personal or confidential information, should not be returned in the search results. Sensitivity-aware search (SAS) aims to develop retrieval models that can satisfy two objectives, namely: (1) providing the user with relevant search results, while (2) ensuring that no documents that contain sensitive information are included in the ranking. In this work, we propose three novel negative sampling strategies that enable cross-encoders to be trained to satisfy the bi-objective task of SAS. Additionally, we investigate and compare with filtering sensitive documents in ranking pipelines. Our experiments on a collection labelled for sensitivity show that our proposed negative sampling strategies lead to a ~37% increase in terms of cost-sensitive nDCG (nCSDCG) for SAS.	交叉编码器利用文档和查询之间的细粒度交互来进行有效的相关性排序。这种排名模型通常经过训练，以满足向用户提供相关信息的单一目标。然而，并非所有的信息都应该提供。例如，包含敏感信息(如个人或机密信息)的文档不应在搜索结果中返回。敏感性搜索(SAS)旨在开发能够满足两个目标的检索模型，即: (1)为用户提供相关的搜索结果，同时(2)确保没有包含敏感信息的文档被包含在排名中。在这项工作中，我们提出了三种新颖的负采样策略，使交叉编码器的训练，以满足 SAS 的双目标任务。此外，我们还研究和比较了在排序管道中过滤敏感文档的方法。我们对标记为敏感性的集合的实验表明，我们提出的阴性采样策略导致 SAS 的成本敏感性 nDCG (nCSDCG)增加约37% 。	code	0
Relevance Feedback Method For Patent Searching Using Vector Subspaces	Sebastian Björkqvist	IPRally Technologies Oy	Searching for novelty-destroying prior art is an important part of patent application drafting and invalidation. The task is challenging due to the detailed information needed to determine whether a document is novelty-destroying or simply closely related, resulting in the original search results not always being fully on target. Allowing the user to provide feedback on the relevance of the initial search results and iterating on the search may thus improve the results significantly. We present a relevance feedback method based on computing the affine vector subspace spanned by the relevant document vectors. The method can be used with any dense retrieval system, and we demonstrate its effectiveness in improving recall in prior art searches. We compare the subspace-based method to the Rocchio algorithm and show that the method is less sensitive to changes in hyperparameters when the number of relevant documents increases.	查找毁新技术是专利申请起草和失效的重要组成部分。这项任务具有挑战性，因为确定一份文件是否具有新颖性或仅仅是密切相关所需的详细信息，导致原始搜索结果并不总是完全符合目标。因此，允许用户就初始搜索结果的相关性提供反馈并对搜索进行迭代，可以大大改进搜索结果。我们提出了一种基于计算相关文档向量所跨越的仿射向量子空间的关联反馈方法。该方法可以应用于任何密集检索系统，并证明了该方法在提高现有技术检索中的召回率方面的有效性。我们比较了基于子空间的方法和 Rocchio 算法，发现当相关文档数量增加时，该方法对超参数的变化不太敏感。	code	0
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations	Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini	Pinecone; Dipartimento di Informatica, Università di Pisa; ISTI-CNR	Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the MS MARCO dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin.	学习的稀疏表示形成了一类有吸引力的文本检索上下文嵌入。之所以如此，是因为它们是有效的相关性模型，可以通过设计加以解释。然而，尽管它们与反向索引具有明显的兼容性，但是通过稀疏嵌入进行检索仍然具有挑战性。这是由于学习嵌入和基于词汇频率的关联词汇模型(如 BM25)之间的分布差异造成的。认识到这一挑战，大量的研究已经进入，除其他事项外，设计检索算法适合于学习稀疏表示的属性，包括近似检索系统。事实上，这项任务在 NeurIPS 2023最新的 BigANN 挑战中占有显著地位，在这个挑战中，通过吞吐量和召回率对大型基准数据集上的近似算法进行了评估。在这项工作中，我们提出了一种新的组织倒排索引，使快速而有效的近似检索学习稀疏嵌入。我们的方法将倒排的列表组织成具有几何内聚性的块，每个块配备一个汇总向量。在查询处理过程中，我们快速确定是否必须使用摘要计算块。正如我们的实验表明，使用我们的方法，地震，单线程查询处理达到亚毫秒每查询延迟各种稀疏嵌入的 MS MARCO 数据集，同时保持高召回率。我们的研究结果表明，地震数量级比最先进的基于倒排索引的解决方案快一到两倍，并进一步优于 BigANN 挑战赛的获胜者(基于图表的)。	code	0
Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance	Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma	University of Copenhagen	Relevance and fairness are two major objectives of recommender systems (RSs). Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures). While fairness-only measures have been studied extensively, we look into whether joint measures can be trusted. We collect all joint evaluation measures of RS relevance and fairness, and ask: How much do they agree with each other? To what extent do they agree with relevance/fairness measures? How sensitive are they to changes in rank position, or to increasingly fair and relevant recommendations? We empirically study for the first time the behaviour of these measures across 4 real-world datasets and 4 recommenders. We find that most of these measures: i) correlate weakly with one another and even contradict each other at times; ii) are less sensitive to rank position changes than relevance- and fairness-only measures, meaning that they are less granular than traditional RS measures; and iii) tend to compress scores at the low end of their range, meaning that they are not very expressive. We counter the above limitations with a set of guidelines on the appropriate usage of such measures, i.e., they should be used with caution due to their tendency to contradict each other and of having a very small empirical range.	相关性和公平性是推荐系统的两个主要目标。最近的研究提出了 RS 公平性的测量方法，这些测量方法要么独立于相关性(仅仅是公平性) ，要么以相关性(联合测量)为条件。虽然只有公平的措施已经得到了广泛的研究，但是我们研究的是联合措施是否可以信任。我们收集了所有 RS 相关性和公平性的联合评价指标，并问: 它们之间有多大程度的一致性？它们在多大程度上同意相关性/公平性措施？他们对职位的变化，或者对越来越公平和相关的建议有多敏感？我们首次实证研究了这些措施的行为在4个真实世界的数据集和4个推荐。我们发现这些测量中的大多数: i)彼此之间相关性很弱，有时甚至相互矛盾; ii)对排名位置变化的敏感性低于相关性和公平性测量，这意味着它们比传统的 RS 测量粒度更小; iii)倾向于压缩其范围的低端分数，这意味着它们不是非常具有表现力。针对上述限制，我们制定了一套关于适当使用此类措施的指导方针，即应谨慎使用这些措施，因为它们往往相互矛盾，而且经验范围很小。	code	0
Sequential Recommendation with Latent Relations based on Large Language Model	Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang	Tsinghua University; Meituan	Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeling of user historical sequences, where most relations are extracted from knowledge graphs. However, existing methods rely on manually predefined relations and suffer the sparsity issue, limiting the generalization ability in diverse scenarios with varied item relations. In this paper, we propose a novel relation-aware sequential recommendation framework with Latent Relation Discovery (LRD). Different from previous relation-aware models that rely on predefined rules, we propose to leverage the Large Language Model (LLM) to provide new types of relations and connections between items. The motivation is that LLM contains abundant world knowledge, which can be adopted to mine latent relations of items for recommendation. Specifically, inspired by that humans can describe relations between items using natural language, LRD harnesses the LLM that has demonstrated human-like knowledge to obtain language knowledge representations of items. These representations are fed into a latent relation discovery module based on the discrete state variational autoencoder (DVAE). Then the self-supervised relation discovery tasks and recommendation tasks are jointly optimized. Experimental results on multiple public datasets demonstrate our proposed latent relations discovery method can be incorporated with existing relation-aware sequential recommendation models and significantly improve the performance. Further analysis experiments indicate the effectiveness and reliability of the discovered latent relations.	顺序推荐系统通过基于历史交互对用户偏好进行建模来预测用户可能感兴趣的项目。传统的顺序推荐方法依赖于捕捉项目之间隐含的协同过滤信号。最近的关系感知序列推荐模型已经取得了良好的性能，明确地结合项目关系到用户历史序列的建模，其中大多数关系是从知识图提取。然而，现有的方法依赖于人工预定义的关系，并且存在稀疏性问题，限制了在不同项目关系的不同场景中的泛化能力。本文提出了一种新的基于潜在关系发现(LRD)的关系感知序列推荐框架。与以前依赖于预定义规则的关系感知模型不同，我们建议利用大语言模型(LLM)来提供新类型的关系和项之间的连接。其动机是 LLM 包含了丰富的世界知识，可以用来挖掘推荐项目的潜在关系。具体来说，受到人类可以使用自然语言描述项目之间关系的启发，LRD 利用已经证明类似于人类的知识的 LLM 来获得项目的语言知识表示。这些表示被反馈到基于离散状态变分自动编码器(DVAE)的潜在关系发现模块中。然后对自监督关系发现任务和推荐任务进行联合优化。在多个公共数据集上的实验结果表明，本文提出的潜在关系发现方法可以与现有的关系感知顺序推荐模型相结合，从而显著提高推荐性能。进一步的分析实验表明了所发现的潜在关系的有效性和可靠性。	code	0
Enhancing Sequential Recommenders with Augmented Knowledge from Aligned Large Language Models	Yankun Ren, Zhongde Chen, Xinxing Yang, Longfei Li, Cong Jiang, Lei Cheng, Bo Zhang, Linjian Mo, Jun Zhou	Ant Group	Recommender systems are widely used in various online platforms. In the context of sequential recommendation, it is essential to accurately capture the chronological patterns in user activities to generate relevant recommendations. Conventional ID-based sequential recommenders have shown promise but lack comprehensive real-world knowledge about items, limiting their effectiveness. Recent advancements in Large Language Models (LLMs) offer the potential to bridge this gap by leveraging the extensive real-world knowledge encapsulated in LLMs. However, integrating LLMs into sequential recommender systems comes with its own challenges, including inadequate representation of sequential behavior patterns and long inference latency. In this paper, we propose SeRALM (Enhancing Sequential Recommenders with Augmented Knowledge from Aligned Large Language Models) to address these challenges. SeRALM integrates LLMs with conventional ID-based sequential recommenders for sequential recommendation tasks. We combine text-format knowledge generated by LLMs with item IDs and feed this enriched data into ID-based recommenders, benefitting from the strengths of both paradigms. Moreover, we develop a theoretically underpinned alignment training method to refine LLMs' generation using feedback from ID-based recommenders for better knowledge augmentation. We also present an asynchronous technique to expedite the alignment training process. Experimental results on public benchmarks demonstrate that SeRALM significantly improves the performances of ID-based sequential recommenders. Further, a series of ablation studies and analyses corroborate SeRALM's proficiency in steering LLMs to generate more pertinent and advantageous knowledge across diverse scenarios.	推荐系统广泛应用于各种在线平台。在顺序推荐的背景下，准确地捕获用户活动中的顺序模式以生成相关的推荐是至关重要的。传统的基于 ID 的顺序推荐已经显示出希望，但是缺乏关于项目的全面的现实世界知识，限制了它们的有效性。大型语言模型(LLM)中的最新进展提供了通过利用 LLM 中封装的广泛的现实世界知识来弥补这一差距的潜力。然而，将 LLM 集成到顺序推荐系统中也有其自身的挑战，包括顺序行为模式的不充分表示和长的推理延迟。在这篇论文中，我们提出了 SeRALM (增强 < u > Se 量 < u > R 从 < u > A 线性大 < u > L 语言 < u > M 模型的增强知识推荐)来解决这些挑战。SerRALM 将 LLM 与传统的基于 ID 的顺序推荐器集成在一起，用于顺序推荐任务。我们将 LLM 生成的文本格式知识与项目 ID 结合起来，并将这些丰富的数据提供给基于 ID 的推荐程序，这两种范例的优势使我们受益匪浅。此外，我们开发了一个理论上支持的对齐训练方法来细化 LLM 的生成，使用基于 ID 的推荐者的反馈来更好地增强知识。我们还提出了一种异步技术，以加快对准训练过程。对公共基准测试的实验结果表明，基于 ID 的顺序推荐算法的性能得到了明显的改善。此外，一系列的消融研究和分析证实了 SerRALM 在指导 LLM 方面的能力，以便在不同的情况下产生更相关和更有利的知识。	code	0
Adaptive Fair Representation Learning for Personalized Fairness in Recommendations via Information Alignment	Xinyu Zhu, Lilin Zhang, Ning Yang	Sichuan University	Personalized fairness in recommendations has been attracting increasing attention from researchers. The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training cost incurred by the explosion of attribute combinations, and the suboptimal trade-off between fairness and accuracy. In this paper, we propose a novel Adaptive Fair Representation Learning (AFRL) model, which achieves a real personalized fairness due to its advantage of training only one model to adaptively serve different fairness requirements during inference phase. Particularly, AFRL treats fairness requirements as inputs and can learn an attribute-specific embedding for each attribute from the unfair user embedding, which endows AFRL with the adaptability during inference phase to determine the non-sensitive attributes under the guidance of the user's unique fairness requirement. To achieve a better trade-off between fairness and accuracy in recommendations, AFRL conducts a novel Information Alignment to exactly preserve discriminative information of non-sensitive attributes and incorporate a debiased collaborative embedding into the fair embedding to capture attribute-independent collaborative signals, without loss of fairness. Finally, the extensive experiments conducted on real datasets together with the sound theoretical analysis demonstrate the superiority of AFRL.	推荐的个性化公平性越来越受到研究者的关注。现有的公平需求表示为一组敏感属性，是一个超参数，通过从学习公平嵌入中完全去除敏感属性的信息来追求极端公平，这种方法面临着两个挑战: 属性组合爆炸所带来的巨大训练成本，以及公平性和准确性之间的次优权衡。本文提出了一种新的自适应公平表示学习(AFRL)模型，该模型由于在推理阶段只训练一个模型来适应不同的公平需求，从而实现了真正的个性化公平。特别地，AFRL 将公平性要求视为输入，可以从不公平的用户嵌入中学习每个属性的特定属性嵌入，从而赋予 AFRL 在推理阶段在用户唯一公平性要求指导下确定非敏感属性的适应性。为了在推荐的公平性和准确性之间取得更好的平衡，AFRL 进行了一种新的信息对齐，以精确地保留非敏感属性的区分信息，并在公平嵌入中加入去偏见的协作嵌入，以捕获与属性无关的协作信号，而不会损失公平性。最后，在实际数据集上进行了广泛的实验，结合可靠的理论分析，验证了 AFRL 的优越性。	code	0
MealRec+: A Meal Recommendation Dataset with Meal-Course Affiliation for Personalization and Healthiness	Ming Li, Lin Li, Xiaohui Tao, Jimmy Xiangji Huang	York University; University of Southern Queensland; Wuhan University of Technology	Meal recommendation, as a typical health-related recommendation task, contains complex relationships between users, courses, and meals. Among them, meal-course affiliation associates user-meal and user-course interactions. However, an extensive literature review demonstrates that there is a lack of publicly available meal recommendation datasets including meal-course affiliation. Meal recommendation research has been constrained in exploring the impact of cooperation between two levels of interaction on personalization and healthiness. To pave the way for meal recommendation research, we introduce a new benchmark dataset called MealRec^+. Due to constraints related to user health privacy and meal scenario characteristics, the collection of data that includes both meal-course affiliation and two levels of interactions is impeded. Therefore, a simulation method is adopted to derive meal-course affiliation and user-meal interaction from the user's dining sessions simulated based on user-course interaction data. Then, two well-known nutritional standards are used to calculate the healthiness scores of meals. Moreover, we experiment with several baseline models, including separate and cooperative interaction learning methods. Our experiment demonstrates that cooperating the two levels of interaction in appropriate ways is beneficial for meal recommendations. Furthermore, in response to the less healthy recommendation phenomenon found in the experiment, we explore methods to enhance the healthiness of meal recommendations. The dataset is available on GitHub (https://github.com/WUT-IDEA/MealRecPlus).	膳食推荐作为一项典型的与健康相关的推荐任务，包含用户、课程和膳食之间的复杂关系。其中，用餐过程的联系将用户-用餐和用户-过程的交互联系起来。然而，一个广泛的文献回顾表明，有缺乏公开可用的膳食推荐数据集，包括膳食过程的联系。饮食推荐研究在探讨两个互动水平之间的合作对个性化和健康的影响方面受到了限制。为了为膳食推荐研究铺平道路，我们引入了一个新的基准数据集 MealRec ^ + 。由于与用户健康隐私和用餐场景特征有关的限制，收集包括用餐过程关联和两个层次的互动的数据受到阻碍。为此，采用一种仿真方法，从基于用户-过程交互数据的用户用餐会话模拟中，推导出用户-过程关联关系和用户-用餐交互关系。然后，使用两个众所周知的营养标准来计算膳食的健康评分。此外，我们还实验了几个基线模型，包括分离式和合作式交互学习方法。我们的实验表明，以适当的方式协调两个层次的互动对于推荐用餐是有益的。此外，针对实验中发现的不太健康的推荐现象，我们探讨了提高膳食推荐健康性的方法。该数据集可在 GitHub ( https://GitHub.com/wut-idea/mealrecplus )上获得。	code	0
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT	Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M. Jose	University of Glasgow; University of Glasgow school pf computing science; Telefonica Research; Shandong University; Amazon	Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation tasks, most research prioritizes parameter efficiency, often overlooking critical factors like GPU memory efficiency and training speed. Addressing this gap, our paper introduces IISAN (Intra- and Inter-modal Side Adapted Network for Multimodal Representation), a simple plug-and-play architecture using a Decoupled PEFT structure and exploiting both intra- and inter-modal adaptation. IISAN matches the performance of full fine-tuning (FFT) and state-of-the-art PEFT. More importantly, it significantly reduces GPU memory usage - from 47GB to just 3GB for multimodal sequential recommendation tasks. Additionally, it accelerates training time per epoch from 443s to 22s compared to FFT. This is also a notable improvement over the Adapter and LoRA, which require 37-39 GB GPU memory and 350-380 seconds per epoch for training. Furthermore, we propose a new composite efficiency metric, TPME (Training-time, Parameter, and GPU Memory Efficiency) to alleviate the prevalent misconception that "parameter efficiency represents overall efficiency". TPME provides more comprehensive insights into practical efficiency comparisons between different methods. Besides, we give an accessible efficiency analysis of all PEFT and FFT approaches, which demonstrate the superiority of IISAN. We release our codes and other materials at https://github.com/GAIR-Lab/IISAN.	多模态基础模型在顺序推荐系统中具有变革性，利用了强大的表示学习能力。虽然参数有效微调(PEFT)通常用于为推荐任务调整基础模型，但大多数研究优先考虑参数有效性，往往忽略了 GPU 内存效率和训练速度等关键因素。针对这一差距，本文介绍了 IISAN (Intra-and Inter-modal Side Adapted Network for Multimodal Reform) ，这是一个简单的即插即用的结构，采用了解耦 PEFT 结构，同时利用了模式内和模式间的自适应。IISAN 匹配全微调(FFT)和最先进的 PEFT 的性能。更重要的是，它显著降低了 GPU 内存使用量——对于多通道顺序推荐任务，从47GB 降至仅3GB。此外，与 FFT 相比，它将每个历元的训练时间从443秒提高到22秒。与 Adapter 和 LoRA 相比，这也是一个显著的改进，后者需要37-39 GB 的 GPU 内存和350-380秒每个纪元的训练时间。此外，我们提出了一个新的组合效率度量，TPME (训练时间，参数和 GPU 内存效率) ，以缓解流行的误解“参数效率代表整体效率”。TPME 为不同方法之间的实际效率比较提供了更全面的见解。此外，我们还对所有 PEFT 和 FFT 方法进行了有效性分析，从而验证了 IISAN 方法的优越性。我们在 https://github.com/gair-lab/iisan 公布我们的代码和其他材料。	code	0
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation	Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon	CSIRO; The University of Queensland School of Information Technology and Electrical Engineering; The University of Queensland ITEE	Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent. With the increasing uptake of Retrieval-Augmented Generation (RAG) pipelines, federated search can play a pivotal role in sourcing relevant information across heterogeneous data sources to generate informed responses. However, existing datasets, such as those developed in the past TREC FedWeb tracks, predate the RAG paradigm shift and lack representation of modern information retrieval challenges. To bridge this gap, we present FeB4RAG, a novel dataset specifically designed for federated search within RAG frameworks. This dataset, derived from 16 sub-collections of the widely used benchmarking collection, includes 790 information requests (akin to conversational queries) tailored for chatbot applications, along with top results returned by each resource and associated LLM-derived relevance judgements. Additionally, to support the need for this collection, we demonstrate the impact on response generation of a high quality federated search system for RAG compared to a naive approach to federated search. We do so by comparing answers generated through the RAG pipeline through a qualitative side-by-side comparison. Our collection fosters and supports the development and evaluation of new federated search methods, especially in the context of RAG pipelines.	联邦搜索系统聚合来自多个搜索引擎的结果，选择适当的来源，以提高结果质量，并与用户意图保持一致。随着检索增强生成(RAG)流水线的日益普及，联邦搜索在跨异构数据源获取相关信息以产生知情响应方面可以发挥关键作用。然而，现有的数据集，比如在过去 TREC 联邦网络跟踪中开发的数据集，早于 RAG 范式转变，缺乏对现代信息检索挑战的描述。为了弥补这一差距，我们提出了 FeB4RAG，一个专门为 RAG 框架内的联邦搜索而设计的新型数据集。该数据集来源于广泛使用的基准测试集合的16个子集，包括为聊天机器人应用程序量身定制的790个信息请求(类似于对话查询) ，以及每个资源返回的最高结果和相关的 LLM 衍生的相关性判断。此外，为了支持对这个集合的需求，我们演示了 RAG 的高质量联邦搜索系统与联邦搜索的简单方法相比对响应生成的影响。我们通过定性的并行比较来比较通过 RAG 管道产生的答案。我们的集合支持开发和评估新的联邦搜索方法，特别是在 RAG 管道上下文中。	code	0
Dynamic Demonstration Retrieval and Cognitive Understanding for Emotional Support Conversation	Zhe Xu, Daoyuan Chen, Jiayi Kuang, Zihao Yi, Yaliang Li, Ying Shen	Alibaba Group; Sun Yat-sen University; Alibaba group	Emotional Support Conversation (ESC) systems are pivotal in providing empathetic interactions, aiding users through negative emotional states by understanding and addressing their unique experiences. In this paper, we tackle two key challenges in ESC: enhancing contextually relevant and empathetic response generation through dynamic demonstration retrieval, and advancing cognitive understanding to grasp implicit mental states comprehensively. We introduce Dynamic Demonstration Retrieval and Cognitive-Aspect Situation Understanding (), a novel approach that synergizes these elements to improve the quality of support provided in ESCs. By leveraging in-context learning and persona information, we introduce an innovative retrieval mechanism that selects informative and personalized demonstration pairs. We also propose a cognitive understanding module that utilizes four cognitive relationships from the ATOMIC knowledge source to deepen situational awareness of help-seekers' mental states. Our supportive decoder integrates information from diverse knowledge sources, underpinning response generation that is both empathetic and cognitively aware. The effectiveness of is demonstrated through extensive automatic and human evaluations, revealing substantial improvements over numerous state-of-the-art models, with up to 13.79% enhancement in overall performance of ten metrics. Our codes are available for public access to facilitate further research and development.	情绪支持对话(ESC)系统是提供移情互动的关键，帮助用户通过理解和处理他们独特的经验的消极情绪状态。本文研究了 ESC 中的两个关键问题: 通过动态实证检索来提高情境相关性和同理心反应的产生; 通过提高认知理解来全面掌握内隐心理状态。我们介绍了动态演示检索和认知方面情境理解() ，一种新的方法，协同这些要素，以提高质量的支持提供在胚胎干细胞。通过利用上下文学习和人物角色信息，我们引入了一种创新的检索机制，选择信息丰富和个性化的演示对。我们还提出了一个认知理解模块，该模块利用来自 ATOMIC 知识源的四种认知关系来加深求助者心理状态的情势察觉。我们的支持性解码器整合了来自不同知识来源的信息，支持同理心和认知意识的反应生成。其有效性通过广泛的自动和人工评估得到了证实，显示出在许多最先进的模型上有了实质性的改进，10个指标的总体性能提高了13.79% 。我们的守则可供公众查阅，以促进进一步的研究和发展。	code	0
Broadening the View: Demonstration-augmented Prompt Learning for Conversational Recommendation	Huy Dao, Yang Deng, Dung D. Le, Lizi Liao	College of Engineering and Computer Science, VinUniversity; National University of Singapore; Singapore Management University	Conversational Recommender Systems (CRSs) leverage natural language dialogues to provide tailored recommendations. Traditional methods in this field primarily focus on extracting user preferences from isolated dialogues. It often yields responses with a limited perspective, confined to the scope of individual conversations. Recognizing the potential in collective dialogue examples, our research proposes an expanded approach for CRS models, utilizing selective analogues from dialogue histories and responses to enrich both generation and recommendation processes. This introduces significant research challenges, including: (1) How to secure high-quality collections of recommendation dialogue exemplars? (2) How to effectively leverage these exemplars to enhance CRS models? To tackle these challenges, we introduce a novel Demonstration-enhanced Conversational Recommender System (DCRS), which aims to strengthen its understanding on the given dialogue contexts by retrieving and learning from demonstrations. In particular, we first propose a knowledge-aware contrastive learning method that adeptly taps into the mentioned entities and the dialogue's contextual essence for pretraining the demonstration retriever. Subsequently, we further develop two adaptive demonstration-augmented prompt learning approaches, involving contextualized prompt learning and knowledge-enriched prompt learning, to bridge the gap between the retrieved demonstrations and the two end tasks of CRS, i.e., response generation and item recommendation, respectively. Rigorous evaluations on two established benchmark datasets underscore DCRS's superior performance over existing CRS methods in both item recommendation and response generation.	会话推荐系统(CRS)利用自然语言对话提供量身定制的推荐。该领域的传统方法主要侧重于从孤立对话中提取用户首选项。它常常产生一个有限的视角的回应，局限于个人对话的范围。认识到集体对话实例的潜力，我们的研究提出了一种扩展的 CRS 模型方法，利用对话历史和回应中的选择性类比来丰富生成和推荐过程。这引入了重大的研究挑战，包括: (1)如何保证高质量的推荐对话样本集？(2)如何有效地利用这些范例来增强 CRS 模型？为了应对这些挑战，我们引入了一个新颖的示范增强会话推荐系统(dCRS) ，目的是通过检索和学习示范来加强对特定对话背景的理解。特别地，我们首先提出了一种知识感知的对比学习方法，该方法能够很好地利用上述实体和对话的语境本质来预先训练示范检索器。随后，我们进一步开发了两种适应性示范增强的及时学习方法，包括上下文化的及时学习和知识丰富的及时学习，以弥合检索的示范和 CRS 的两个最终任务之间的差距，即响应生成和项目推荐。对两个已建立的基准数据集的严格评估强调了 DCRS 在项目推荐和响应生成方面优于现有 CRS 方法的性能。	code	0
ProCIS: A Benchmark for Proactive Retrieval in Conversations	Chris Samarinas, Hamed Zamani	University of Massachusetts Amherst	The field of conversational information seeking, which is rapidly gaining interest in both academia and industry, is changing how we interact with search engines through natural language interactions. Existing datasets and methods are mostly evaluating reactive conversational information seeking systems that solely provide response to every query from the user. We identify a gap in building and evaluating proactive conversational information seeking systems that can monitor a multi-party human conversation and proactively engage in the conversation at an opportune moment by retrieving useful resources and suggestions. In this paper, we introduce a large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations. We conduct crowdsourcing experiments to obtain high-quality and relatively complete relevance judgments through depth-k pooling. We also collect annotations related to the parts of the conversation that are related to each document, enabling us to evaluate proactive retrieval systems. We introduce normalized proactive discounted cumulative gain (npDCG) for evaluating these systems, and further provide benchmark results for a wide range of models, including a novel model we developed for this task. We believe that the developed dataset, called ProCIS, paves the path towards developing proactive conversational information seeking systems.	会话信息搜索领域正在迅速引起学术界和工业界的兴趣，它正在改变我们通过自然语言交互与搜索引擎进行互动的方式。现有的数据集和方法主要是评估反应式会话信息搜索系统，这种系统只对用户的每个查询提供响应。我们发现在建立和评估积极主动的会话信息搜索系统方面存在差距，这种系统可以监控多方的人类会话，并通过检索有用的资源和建议，在适当的时候积极主动地参与会话。在这篇文章中，我们介绍了一个大型的主动文献检索数据集，包括超过280万次对话。我们进行众包实验，以获得高质量和相对完整的相关性判断通过深度 k 池。我们还收集与会话中与每个文档相关的部分相关的注释，使我们能够评估主动检索系统。我们引入标准化的前瞻性折扣累积增益(npDCG)来评估这些系统，并进一步提供基准结果的范围广泛的模型，包括一个新的模型，我们开发的这项任务。我们相信，所开发的数据集，称为 ProCIS，铺平了发展前瞻性会话信息搜索系统的道路。	code	0
An Empirical Analysis on Multi-turn Conversational Recommender Systems	Lu Zhang, Chen Li, Yu Lei, Zhu Sun, Guanfeng Liu	Macquarie University; Agency for Science, Technology and Research, Singapore; Yanshan University; Chengdu University of Information Technology	The rise of conversational recommender systems (CRSs) brings the evolution of the recommendation paradigm, which enables users to interact with the system and achieve dynamic recommendations. As one essential branch, multi-turn CRSs, built on the user simulator paradigm, have attracted great attention due to their powerful ability to accomplish recommendations without real dialogue resources. Recent multi-turn CRS models, equipped with various delicately designed components (e.g., conversation module), achieve state-of-the-art (SOTA) performance. We, for the first time, propose a comprehensive experimental evaluation for existing SOTA multi-turn CRSs to investigate three research questions: (1) reproducibility - are the designed components beneficial to target multi-turn CRSs? (2) scenario-specific adaptability - how do these components perform in various scenarios? and (3) generality - can the effective components from the target CRS be effectively transferred to other multi-turn CRSs? To answer these questions, we design and conduct experiments under different settings, including carefully selected SOTA baselines, components of CRSs, datasets, and evaluation metrics, thus providing an experimental aspect overview of multi-turn CRSs. As a result, we derive several significant insights whereby effective guidelines are provided for future multi-turn CRS model designs across diverse scenarios.	会话推荐系统(CRS)的兴起带来了推荐范式的演变，使得用户能够与系统进行交互，实现动态推荐。作为一个重要的分支，建立在用户模拟器范式之上的多回合 CRS 由于其在没有真实对话资源的情况下完成推荐的强大能力而引起了人们的极大关注。最近的多回转 CRS 模型，配备了各种精心设计的组件(例如，会话模块) ，实现了最先进的(SOTA)性能。我们首次对现有的 SOTA 多回转 CRS 进行了全面的实验评价，以探讨三个研究问题: (1)可重复性——所设计的部件是否有利于靶向多回转 CRS？(2)特定场景的适应性——这些组件在各种场景中如何执行？(3)通用性——目标 CRS 的有效部件能否有效地转移到其他多回路 CRS 上？为了回答这些问题，我们在不同的设置下设计和进行实验，包括精心选择的 SOTA 基线，CRS 的组成部分，数据集和评估指标，从而提供多回合 CRS 的实验方面的概述。因此，我们得出了几个重要的见解，从而为未来多回合 CRS 模型设计提供了有效的指导方针，跨越不同的情景。	code	0
SM-RS: Single- and Multi-Objective Recommendations with Contextual Impressions and Beyond-Accuracy Propensity Scores	Patrik Dokoupil, Ladislav Peska, Ludovico Boratto	University of Cagliari; Faculty of Mathematics and Physics, Charles University, Prague, Czechia	Recommender systems (RS) rely on interaction data between users and items to generate effective results. Historically, RS aimed to deliver the most consistent (i.e., accurate) items to the trained user profiles. However, the attention towards additional (beyond-accuracy) quality criteria has increased tremendously in recent years. Both the research and applied models are being optimized for diversity, novelty, or fairness, to name a few. Naturally, the proper functioning of such optimization methods depends on the knowledge of users' propensities towards interacting with recommendations having certain quality criteria. However, so far, no dataset that captures such propensities exists. To bridge this research gap, we present SM-RS (single-objective + multi-objective recommendations dataset) that links users' self-declared propensity toward relevance, novelty, and diversity criteria with impressions and corresponding item selections. After presenting the dataset's collection procedure and basic statistics, we propose three tasks that are rarely available to conduct using existing RS datasets: impressions-aware click prediction, users' propensity scores prediction, and construction of recommendations proportional to the users' propensity scores. For each task, we also provide detailed evaluation procedures and competitive baselines. The dataset is available at https://osf.io/hkzje/.	推荐系统(RS)依赖于用户和项目之间的交互数据来生成有效的结果。从历史上看，RS 的目标是向训练有素的用户配置文件提供最一致(即准确)的条目。然而，对于额外的(超精确度)质量标准的关注在最近几年已经大大增加。研究和应用模型都在为多样性、新颖性或公平性而进行优化。当然，这种优化方法的正确功能取决于用户对具有某些质量标准的建议的交互倾向的了解。然而，到目前为止，还没有数据集能够捕捉到这种倾向。为了弥合这一研究差距，我们提出了 SM-RS (单目标 + 多目标推荐数据集) ，将用户自我声明的相关性，新颖性和多样性标准与印象和相应的项目选择联系起来。在介绍了数据集的收集过程和基本统计数据之后，我们提出了三个使用现有 RS 数据集很少可用的任务: 印象感知的点击预测，用户倾向得分预测，以及与用户倾向得分成比例的建议的构建。对于每项任务，我们还提供了详细的评估程序和竞争基线。数据集可在 https://osf.io/hkzje/下载。	code	0
To Search or to Recommend: Predicting Open-App Motivation with Neural Hawkes Process	Zhongxiang Sun, Zihua Si, Xiao Zhang, Xiaoxue Zang, Yang Song, Hongteng Xu, Jun Xu	Renmin Unversity of China Gaoling School of Artificial Intelligence; Kuaishou Technology Co., Ltd. Recommendation; Kuaishou Technology Co., Ltd.; Renmin Unversity of China; Renmin University of China Gaoling School of Artificial Intelligence	Incorporating Search and Recommendation (S R) services within a singularapplication is prevalent in online platforms, leading to a new task termedopen-app motivation prediction, which aims to predict whether users initiatethe application with the specific intent of information searching, or toexplore recommended content for entertainment. Studies have shown thatpredicting users' motivation to open an app can help to improve user engagementand enhance performance in various downstream tasks. However, accuratelypredicting open-app motivation is not trivial, as it is influenced byuser-specific factors, search queries, clicked items, as well as their temporaloccurrences. Furthermore, these activities occur sequentially and exhibitintricate temporal dependencies. Inspired by the success of the Neural HawkesProcess (NHP) in modeling temporal dependencies in sequences, this paperproposes a novel neural Hawkes process model to capture the temporaldependencies between historical user browsing and querying actions. The model,referred to as Neural Hawkes Process-based Open-App Motivation prediction model(NHP-OAM), employs a hierarchical transformer and a novel intensity function toencode multiple factors, and open-app motivation prediction layer to integratetime and user-specific information for predicting users' open-app motivations.To demonstrate the superiority of our NHP-OAM model and construct a benchmarkfor the Open-App Motivation Prediction task, we not only extend the public S Rdataset ZhihuRec but also construct a new real-world Open-App MotivationDataset (OAMD). Experiments on these two datasets validate NHP-OAM'ssuperiority over baseline models. Further downstream application experimentsdemonstrate NHP-OAM's effectiveness in predicting users' Open-App Motivation,highlighting the immense application value of NHP-OAM.	将搜索和推荐(S R)服务整合到一个单一的应用程序中在在线平台中非常普遍，这导致了一个新的任务，即开放应用程序动机预测，其目的是预测用户是以信息搜索的特定意图启动应用程序，还是为娱乐探索推荐的内容。研究表明，预测用户打开应用程序的动机有助于提高用户参与度，并提高各种下游任务的性能。然而，准确预测开放应用程序的动机并非易事，因为它受到用户特定因素、搜索查询、点击项以及它们的时间出现的影响。此外，这些活动发生顺序和表现出复杂的时间依赖性。受到神经霍克斯过程(NHP)在序列时间依赖性建模方面的成功启发，提出了一种新的神经霍克斯过程模型来捕捉历史用户浏览和查询操作之间的时间依赖性。该模型被称为基于神经霍克斯过程的开放应用动机预测模型(NHP-OAM) ，采用分层变换器和新颖的强度函数对多个因素进行编码，并使用开放应用动机预测层整合时间和用户特定信息来预测用户的开放应用动机。为了证明我们的 NHP-OAM 模型的优越性，构建开放应用动机预测任务的基准，我们不仅扩展了公共 S 数据集 ZhhuRec，而且构建了一个新的现实世界的开放应用动机数据集(OAMD)。在这两个数据集上的实验验证了 NHP-OAM 算法相对于基线模型的优越性。进一步的下游应用实验证明了 NHP-OAM 在预测用户开放应用动机方面的有效性，突出了 NHP-OAM 的巨大应用价值。	code	0
Counterfactual Ranking Evaluation with Flexible Click Models	Alexander Buchholz, Ben London, Giuseppe Di Benedetto, Jan Malte Lichtenberg, Yannik Stein, Thorsten Joachims	Amazon Music, Berlin, Germany; Amazon Music, Seattle, WA, USA; Amazon Music, Ithaca, NY, NY, USA	Evaluating a new ranking policy using data logged by a previously deployed policy requires a counterfactual (off-policy) estimator that corrects for presentation and selection biases. Some estimators (e.g., the position-based model) perform this correction by making strong assumptions about user behavior, which can lead to high bias if the assumptions are not met. Other estimators (e.g., the item-position model) rely on randomization to avoid these assumptions, but they often suffer from high variance. In this paper, we develop a new counterfactual estimator, called Interpol, that provides a tunable trade-off in the assumptions it makes, thus providing a novel ability to optimize the bias-variance trade-off. We analyze the bias of our estimator, both theoretically and empirically, and show that it achieves lower error than both the position-based model and the item-position model, on both synthetic and real datasets. This improvement in accuracy not only benefits offline evaluation of ranking policies, we also find that Interpol improves learning of new ranking policies when used as the training objective for learning-to-rank.	使用先前部署的策略记录的数据来评估新的排序策略需要一个反事实(非策略)估计器来纠正表示和选择偏差。一些估计量(例如，基于位置的模型)通过对用户行为做出强有力的假设来执行这种修正，如果假设不能满足，就会导致高偏差。其他估计量(例如，项目位置模型)依赖于随机化来避免这些假设，但是它们经常受到高方差的影响。在本文中，我们发展了一个新的反事实估计器，称为国际刑警组织，它提供了一个可调的权衡，它所做的假设，从而提供了一个新的能力，优化偏差-方差权衡。从理论和实证两个方面分析了估计器的误差，结果表明，无论是在合成数据集上还是在实际数据集上，该估计器都比基于位置的模型和项目位置的模型具有更低的误差。这种准确性的提高不仅有利于排序策略的离线评估，我们还发现国际刑警组织在将新的排序策略作为学习排序的培训目标时改善了学习效果。	code	0
Deep Pattern Network for Click-Through Rate Prediction	Hengyu Zhang, Junwei Pan, Dapeng Liu, Jie Jiang, Xiu Li	Tencent; Tsinghua University; Tsinghua University	Click-through rate (CTR) prediction tasks play a pivotal role in real-worldapplications, particularly in recommendation systems and online advertising. Asignificant research branch in this domain focuses on user behavior modeling.Current research predominantly centers on modeling co-occurrence relationshipsbetween the target item and items previously interacted with by users in theirhistorical data. However, this focus neglects the intricate modeling of userbehavior patterns. In reality, the abundance of user interaction recordsencompasses diverse behavior patterns, indicative of a spectrum of habitualparadigms. These patterns harbor substantial potential to significantly enhanceCTR prediction performance. To harness the informational potential within userbehavior patterns, we extend Target Attention (TA) to Target Pattern Attention(TPA) to model pattern-level dependencies. Furthermore, three criticalchallenges demand attention: the inclusion of unrelated items within behaviorpatterns, data sparsity in behavior patterns, and computational complexityarising from numerous patterns. To address these challenges, we introduce theDeep Pattern Network (DPN), designed to comprehensively leverage informationfrom user behavior patterns. DPN efficiently retrieves target-related userbehavior patterns using a target-aware attention mechanism. Additionally, itcontributes to refining user behavior patterns through a pre-training paradigmbased on self-supervised learning while promoting dependency learning withinsparse patterns. Our comprehensive experiments, conducted across three publicdatasets, substantiate the superior performance and broad compatibility of DPN.	点进率(ctrl)预测任务在现实世界的应用程序中扮演着关键角色，特别是在推荐系统和在线广告中。该领域的一个重要研究分支是用户行为建模。目前的研究主要集中在建模共现关系之间的目标项目和项目以前互动的用户在他们的历史数据。然而，这种关注忽略了用户行为模式的复杂建模。实际上，用户交互记录的丰富性包含了不同的行为模式，表明了一系列的习惯范式。这些模式具有显著提高 CTR 预测性能的巨大潜力。为了利用用户行为模式中的信息潜力，我们将目标注意力(TA)扩展到目标模式注意力(TPA) ，以建立模式级别的依赖关系。此外，三个关键的挑战需要注意: 在行为模式中包含不相关的项目，行为模式中的数据稀疏，以及由许多模式引起的计算复杂性。为了应对这些挑战，我们引入了深度模式网络(Deep Pattern Network，DPN) ，它旨在全面利用来自用户行为模式的信息。DPN 使用目标感知注意机制有效地检索与目标相关的用户行为模式。此外，它有助于细化用户的行为模式，通过预训练范式的基础上自我监督学习，同时促进稀疏模式的依赖性学习。我们在三个公共数据集上进行的全面实验证实了 DPN 的优越性能和广泛的兼容性。	code	0
AFDGCF: Adaptive Feature De-correlation Graph Collaborative Filtering for Recommendations	Wei Wu, Chao Wang, Dazhong Shen, Chuan Qin, Liyi Chen, Hui Xiong	HKUST Fok Ying Tung Research Institute, The Hong Kong University of Science and Technology (Guangzhou); Shanghai Artificial Intelligence Laboratory; University of Science and Technology of China; The Hong Kong University of Science and Technology (Guangzhou); BOSS Zhipin	Collaborative filtering methods based on graph neural networks (GNNs) havewitnessed significant success in recommender systems (RS), capitalizing ontheir ability to capture collaborative signals within intricate user-itemrelationships via message-passing mechanisms. However, these GNN-based RSinadvertently introduce excess linear correlation between user and itemembeddings, contradicting the goal of providing personalized recommendations.While existing research predominantly ascribes this flaw to the over-smoothingproblem, this paper underscores the critical, often overlooked role of theover-correlation issue in diminishing the effectiveness of GNN representationsand subsequent recommendation performance. Up to now, the over-correlationissue remains unexplored in RS. Meanwhile, how to mitigate the impact ofover-correlation while preserving collaborative filtering signals is asignificant challenge. To this end, this paper aims to address theaforementioned gap by undertaking a comprehensive study of the over-correlationissue in graph collaborative filtering models. Firstly, we present empiricalevidence to demonstrate the widespread prevalence of over-correlation in thesemodels. Subsequently, we dive into a theoretical analysis which establishes apivotal connection between the over-correlation and over-smoothing issues.Leveraging these insights, we introduce the Adaptive Feature De-correlationGraph Collaborative Filtering (AFDGCF) framework, which dynamically appliescorrelation penalties to the feature dimensions of the representation matrix,effectively alleviating both over-correlation and over-smoothing issues. Theefficacy of the proposed framework is corroborated through extensiveexperiments conducted with four representative graph collaborative filteringmodels across four publicly available datasets.	基于图形神经网络(GNN)的协同过滤方法在推荐系统(RS)中取得了巨大的成功，利用了它们通过消息传递机制在复杂的用户-项目关系中捕获协作信号的能力。然而，这些基于 GNN 的 RSN 无意中在用户和项目嵌入之间引入了过多的线性相关性，与提供个性化推荐的目标相矛盾。虽然现有的研究主要把这个缺陷归因于过度平滑问题，但本文强调了过度相关问题在降低 GNN 表示的有效性和随后的推荐性能方面的关键作用，往往被忽视。到目前为止，过度相关性问题在 RS 中仍然没有得到探讨。同时，如何在保留协同过滤信号的同时减轻过度相关的影响是一个重大挑战。为此，本文旨在通过对图形协同过滤模型中的过度相关问题进行全面研究来弥补上述差距。首先，我们提出的经验证据表明，在这些模型中过度相关的广泛流行。随后，我们深入进行了理论分析，建立了过度相关和过度平滑问题之间的关键联系。利用这些见解，我们引入了自适应特征去相关图协同过滤(AFDGCF)框架，该框架动态地将相关惩罚应用于表示矩阵的特征维度，有效地缓解了过度相关和过度平滑的问题。该框架的有效性通过四个具有代表性的图形协同过滤模型在四个公开数据集上进行的广泛实验得到了证实。	code	0
TransGNN: Harnessing the Collaborative Power of Transformers and Graph Neural Networks for Recommender Systems	Peiyan Zhang, Yuchen Yan, Xi Zhang, Chaozhuo Li, Senzhang Wang, Feiran Huang, Sunghun Kim	Hong Kong University of Science and Technology; Peking University School of Intelligence Science and Technology; Fuzhou University Interdisciplinary Institute for Medical Engineering; Jinan University; Microsoft Research Asia; Central South University	Graph Neural Networks (GNNs) have emerged as promising solutions forcollaborative filtering (CF) through the modeling of user-item interactiongraphs. The nucleus of existing GNN-based recommender systems involvesrecursive message passing along user-item interaction edges to refine encodedembeddings. Despite their demonstrated effectiveness, current GNN-based methodsencounter challenges of limited receptive fields and the presence of noisy"interest-irrelevant" connections. In contrast, Transformer-based methods excelin aggregating information adaptively and globally. Nevertheless, theirapplication to large-scale interaction graphs is hindered by inherentcomplexities and challenges in capturing intricate, entangled structuralinformation. In this paper, we propose TransGNN, a novel model that integratesTransformer and GNN layers in an alternating fashion to mutually enhance theircapabilities. Specifically, TransGNN leverages Transformer layers to broadenthe receptive field and disentangle information aggregation from edges, whichaggregates information from more relevant nodes, thereby enhancing the messagepassing of GNNs. Additionally, to capture graph structure informationeffectively, positional encoding is meticulously designed and integrated intoGNN layers to encode such structural knowledge into node attributes, thusenhancing the Transformer's performance on graphs. Efficiency considerationsare also alleviated by proposing the sampling of the most relevant nodes forthe Transformer, along with two efficient sample update strategies to reducecomplexity. Furthermore, theoretical analysis demonstrates that TransGNN offersincreased expressiveness compared to GNNs, with only a marginal increase inlinear complexity. Extensive experiments on five public datasets validate theeffectiveness and efficiency of TransGNN.	图形神经网络(GNN)通过对用户项交互图的建模，成为协同过滤(CF)的有效解决方案。现有的基于 GNN 的推荐系统的核心涉及递归消息传递沿用户项交互边缘细化编码解码。尽管已经证明了这些方法的有效性，但是目前基于 GNN 的方法遇到了接受域有限和存在噪声“兴趣无关”连接的挑战。相比之下，基于 Transform- 的方法优于自适应和全局聚合信息。然而，在获取错综复杂、纠缠不清的结构信息方面，它们在大尺度相互作用图中的应用受到了固有的复杂性和挑战性的阻碍。在本文中，我们提出了 TransGNN，一个新颖的模型，集成变压器和 GNN 层在一个交替的方式，以相互增强他们的能力。具体来说，TransGNN 利用 TransGNN 层来扩展接收字段，并从边界中分离信息聚合，从而从更相关的节点聚合信息，从而增强 GNN 的消息传递。此外，为了有效地获取图结构信息，位置编码被精心设计并集成到 GNN 层中，将这些结构知识编码到节点属性中，从而提高了变压器在图上的性能。通过提出变压器最相关节点的抽样，以及两种有效的样本更新策略来降低复杂性，也减轻了效率方面的考虑。此外，理论分析表明，与 GNN 相比，TransGNN 提供了更高的表达能力，只是略微增加了非线性复杂度。通过对五个公共数据集的大量实验验证了 TransGNN 的有效性和高效性。	code	0
Lightweight Embeddings for Graph Collaborative Filtering	Xurong Liang, Tong Chen, Lizhen Cui, Yang Wang, Meng Wang, Hongzhi Yin	Shandong University; The University of Queensland School of Electrical Engineering and Computer Science; Hefei University of Technology	Graph neural networks (GNNs) are currently one of the most performantcollaborative filtering methods. Meanwhile, owing to the use of an embeddingtable to represent each user/item as a distinct vector, GNN-based recommendershave inherited the long-standing defect of parameter inefficiency. As a commonpractice for scalable embeddings, parameter sharing enables the use of fewerembedding vectors (i.e., meta-embeddings). When assigning meta-embeddings, mostexisting methods are a heuristically designed, predefined mapping from eachuser's/item's ID to the corresponding meta-embedding indexes, thus simplifyingthe optimization problem into learning only the meta-embeddings. However, inthe context of GNN-based collaborative filtering, such a fixed mapping omitsthe semantic correlations between entities that are evident in the user-iteminteraction graph, leading to suboptimal recommendation performance. To thisend, we propose Lightweight Embeddings for Graph Collaborative Filtering(LEGCF), a parameter-efficient embedding framework dedicated to GNN-basedrecommenders. LEGCF innovatively introduces an assignment matrix as an extralearnable component on top of meta-embeddings. To jointly optimize these twoheavily entangled components, aside from learning the meta-embeddings byminimizing the recommendation loss, LEGCF further performs efficient assignmentupdate by enforcing a novel semantic similarity constraint and finding itsclosed-form solution based on matrix pseudo-inverse. The meta-embeddings andassignment matrix are alternately updated, where the latter is sparsified onthe fly to ensure negligible storage overhead. Extensive experiments on threebenchmark datasets have verified LEGCF's smallest trade-off between size andperformance, with consistent accuracy gain over state-of-the-art baselines. Thecodebase of LEGCF is available in https://github.com/xurong-liang/LEGCF.	图神经网络(GNN)是目前性能最好的协同过滤方法之一。同时，由于使用嵌入表将每个用户/项目表示为一个独立的向量，基于 GNN 的推荐器继承了长期以来参数低效的缺陷。作为可伸缩嵌入的常见实践，参数共享使嵌入向量的使用更少(即元嵌入)。当分配元嵌入时，大多数现有的方法都是启发式设计的，预定义的从每个用户/项目的 ID 到相应元嵌入索引的映射，从而简化了最佳化问题，只学习元嵌入。然而，在基于 GNN 的协同过滤中，这种固定的映射忽略了在用户-项目/交互图中显而易见的实体之间的语义相关性，导致了次优的推荐性能。为此，我们提出了图形协同过滤的轻量级嵌入(legCF) ，这是一个专门针对基于 GNN 的推荐程序的参数高效嵌入框架。LEGCF 在元嵌入的基础上创新地引入了指派矩阵作为可学习的组件。为了联合优化这两个高度纠缠的组件，LEGCF 除了通过最小化推荐丢失来学习元嵌入之外，还通过强制执行一种新的语义相似性约束并基于矩阵伪逆寻找其封闭形式解来进一步执行有效的赋值更新。元嵌入和分配矩阵交替更新，其中后者被动态稀疏化，以确保可以忽略存储开销。在三个基准数据集上的大量实验已经证实了 LEGCF 在大小和性能之间的最小权衡，在最先进的基准上有一致的准确性增益。LEgCF 的代码库有 https://github.com/xurong-liang/LEGCF。	code	0
Graded Relevance Scoring of Written Essays with Dense Retrieval	Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed	Qatar University Computer Science and Engineering Department	Automated Essay Scoring automates the grading process of essays, providing agreat advantage for improving the writing proficiency of students. Whileholistic essay scoring research is prevalent, a noticeable gap exists inscoring essays for specific quality traits. In this work, we focus on therelevance trait, which measures the ability of the student to stay on-topicthroughout the entire essay. We propose a novel approach for graded relevancescoring of written essays that employs dense retrieval encoders. Denserepresentations of essays at different relevance levels then form clusters inthe embeddings space, such that their centroids are potentially separate enoughto effectively represent their relevance levels. We hence use the simple1-Nearest-Neighbor classification over those centroids to determine therelevance level of an unseen essay. As an effective unsupervised dense encoder,we leverage Contriever, which is pre-trained with contrastive learning anddemonstrated comparable performance to supervised dense retrieval models. Wetested our approach on both task-specific (i.e., training and testing on sametask) and cross-task (i.e., testing on unseen task) scenarios using the widelyused ASAP++ dataset. Our method establishes a new state-of-the-art performancein the task-specific scenario, while its extension for the cross-task scenarioexhibited a performance that is on par with the state-of-the-art model for thatscenario. We also analyzed the performance of our approach in a more practicalfew-shot scenario, showing that it can significantly reduce the labeling costwhile sacrificing only 10	自动化论文评分自动化了论文的评分过程，为提高学生的写作水平提供了巨大的优势。虽然整体论文评分研究是普遍存在的，一个明显的差距存在评分论文的具体质量特征。在这项工作中，我们关注的是关联特质，它衡量的是学生在整篇文章中始终保持主题的能力。我们提出了一种新的方法分级相关护航的书面文章，使用密集检索编码器。随后，在嵌入空间中，不同关联水平的密集表示形成聚类，这样它们的质心可能足够分离，以有效地表示它们的关联水平。因此，我们使用这些质心上的简单1-最近邻分类来确定一篇看不见的文章的关联水平。作为一个有效的无监督密集编码器，我们利用捐助者，这是预先训练与对比学习，并显示了可比性能的监督密集检索模型。使用广泛使用的 ASAP + + 数据集，对我们的方法进行了任务特定(即同一任务的培训和测试)和跨任务(即未知任务的测试)场景的测试。我们的方法在任务特定场景中建立了一种新的最先进的表现，而它在跨任务场景中的扩展表现出了与该场景的最先进模型相当的表现。我们还分析了我们的方法在一个更实用的少镜头场景中的性能，表明它可以显著降低标签成本，同时只牺牲10	code	0
Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models	Catherine Chen, Jack Merullo, Carsten Eickhoff	Brown University; University of Tübingen	Neural models have demonstrated remarkable performance across diverse rankingtasks. However, the processes and internal mechanisms along which theydetermine relevance are still largely unknown. Existing approaches foranalyzing neural ranker behavior with respect to IR properties rely either onassessing overall model behavior or employing probing methods that may offer anincomplete understanding of causal mechanisms. To provide a more granularunderstanding of internal model decision-making processes, we propose the useof causal interventions to reverse engineer neural rankers, and demonstrate howmechanistic interpretability methods can be used to isolate componentssatisfying term-frequency axioms within a ranking model. We identify a group ofattention heads that detect duplicate tokens in earlier layers of the model,then communicate with downstream heads to compute overall document relevance.More generally, we propose that this style of mechanistic analysis opens upavenues for reverse engineering the processes neural retrieval models use tocompute relevance. This work aims to initiate granular interpretability effortsthat will not only benefit retrieval model development and training, butultimately ensure safer deployment of these models.	神经模型在不同的排序任务中表现出了显著的性能。然而，它们决定相关性的过程和内部机制在很大程度上仍然是未知的。现有的方法分析神经排序行为方面的 IR 特性依赖于评估整体模型行为或使用探测方法，可能提供不完整的因果机制的理解。为了提供对内部模型决策过程的更细粒度的理解，我们提出使用因果干预来逆向工程神经排序器，并演示如何使用机械可解释性方法在排序模型中分离满足术语频率公理的组件。我们识别出一组注意头，它们检测模型早期层中的重复标记，然后与下游头通信以计算整个文档的相关性。更一般地说，我们认为这种类型的机械分析为神经检索模型用于计算相关性的过程开辟了逆向工程。这项工作的目的是启动细粒度的可解释性工作，这不仅有利于检索模型的开发和培训，而且最终确保这些模型的更安全的部署。	code	0
Optimizing Learning-to-Rank Models for Ex-Post Fair Relevance	Sruthi Gorantla, Eshaan Bhansali, Amit Deshpande, Anand Louis	Indian Institute of Science Bangalore; Indian Institute of Science; Microsoft; University of Wisconsin-Madison	Learning-to-rank (LTR) models rank items based on specific features, aiming to maximize ranking utility by prioritizing highly relevant items. However, optimizing only for ranking utility can lead to representational harm and may fail to address implicit bias in relevance scores. Prior studies introduced algorithms to train stochastic ranking models, such as the Plackett-Luce ranking model, that maximize expected ranking utility while achieving fairness in expectation (ex-ante fairness). Still, every sampled ranking may not satisfy group fairness (ex-post fairness). Post-processing methods ensure ex-post fairness; however, the LTR model lacks awareness of this step, creating a mismatch between the objective function the LTR model optimizes and the one it is supposed to optimize. In this paper, we first propose a novel objective where the relevance (or the expected ranking utility) is computed over only those rankings that satisfy given representation constraints for groups of items. We call this the ex-post fair relevance. We then give a framework for training Group-Fair LTR models to maximize our proposed ranking objective. Leveraging an efficient sampler for ex-post group-fair rankings and efficient algorithms to train the Plackett-Luce LTR model, we demonstrate their use in training the Group-Fair Plackett-Luce model in our framework. Experiments on MovieLens and Kiva datasets reveal improved fairness and relevance with our group-fair Plackett-Luce model compared to post-processing. In scenarios with implicit bias, our algorithm generally outperforms existing LTR baselines in both fairness and relevance.	学习排序(LTR)模型根据特定的特征对项目进行排序，目的是通过对高度相关的项目进行优先排序，使排序效用最大化。然而，仅仅为了排名效用而优化可能会导致代表性损害，并且可能无法解决相关性得分中的隐性偏差。先前的研究介绍了训练随机排名模型的算法，例如 Plackett-Luce 排名模型，这种算法在实现预期公平(事前公平)的同时最大化预期排名效用。尽管如此，每个抽样排名可能不满足组公平性(事后公平性)。后处理方法确保了事后公平性，但是 LTR 模型缺乏对这一步骤的意识，使得 LTR 模型优化的目标函数与其应该优化的目标函数不匹配。在本文中，我们首先提出一个新的目标，其中的相关性(或期望的排名效用)是计算只有那些满足给定的表示约束的项目组的排名。我们称之为事后公平相关。然后，我们给出了一个训练集团公平的长期资产负债率模型的框架，以最大限度地提出我们的排名目标。我们利用一个有效的后集团公平排名采样器和有效的算法来训练 Plackett-Luce LTR 模型，在我们的框架中演示了它们在训练集团公平 Plackett-Luce 模型中的应用。在 MovieLens 和 Kiva 数据集上进行的实验表明，与后处理相比，我们的组公平 Plackett-Luce 模型提高了公平性和相关性。在存在隐性偏差的情况下，我们的算法通常在公平性和相关性方面都优于现有的 LTR 基线。	code	0
Scaling Sequential Recommendation Models with Transformers	Pablo Zivic, Hernán Ceferino Vázquez, Jorge Sánchez	Mercado Libre Inc., Córdoba, Argentina; Mercado Libre Inc., Buenos Aires, Argentina	Modeling user preferences has been mainly addressed by looking at users' interaction history with the different elements available in the system. Tailoring content to individual preferences based on historical data is the main goal of sequential recommendation. The nature of the problem, as well as the good performance observed across various domains, has motivated the use of the transformer architecture, which has proven effective in leveraging increasingly larger amounts of training data when accompanied by an increase in the number of model parameters. This scaling behavior has brought a great deal of attention, as it provides valuable guidance in the design and training of even larger models. Taking inspiration from the scaling laws observed in training large language models, we explore similar principles for sequential recommendation. Addressing scalability in this context requires special considerations as some particularities of the problem depart from the language modeling case. These particularities originate in the nature of the content catalogs, which are significantly larger than the vocabularies used for language and might change over time. In our case, we start from a well-known transformer-based model from the literature and make two crucial modifications. First, we pivot from the traditional representation of catalog items as trainable embeddings to representations computed with a trainable feature extractor, making the parameter count independent of the number of items in the catalog. Second, we propose a contrastive learning formulation that provides us with a better representation of the catalog diversity. We demonstrate that, under this setting, we can train our models effectively on increasingly larger datasets under a common experimental setup. We use the full Amazon Product Data dataset, which has only been partially explored in other studies, and reveal scaling behaviors similar to those found in language models. Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application. We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains. Our approach and findings provide a strategic roadmap for model training and deployment in real high-dimensional preference spaces, facilitating better training and inference efficiency. We hope this paper bridges the gap between the potential of transformers and the intrinsic complexities of high-dimensional sequential recommendation in real-world recommender systems. Code and models can be found at https://github.com/mercadolibre/srt.	建模用户偏好主要通过查看用户的交互历史和系统中可用的不同元素来解决。根据历史数据为个人偏好定制内容是顺序推荐的主要目标。这一问题的性质以及在各个领域观察到的良好性能促使使用了变压器结构，事实证明，在模型参数数量增加的同时，变压器结构有效地利用了越来越多的培训数据。这种缩放行为引起了广泛的关注，因为它为更大型模型的设计和训练提供了有价值的指导。从训练大型语言模型中观察到的比例规律中获得启发，我们探索了顺序推荐的相似原则。在这种情况下处理可伸缩性需要特别的考虑，因为问题的一些特殊性与语言建模情况不同。这些特殊性源于内容目录的性质，它们明显大于用于语言的词汇表，并且可能随着时间的推移而变化。在我们的例子中，我们从文献中的一个著名的基于变压器的模型开始，并进行了两个关键的修改。首先，我们将目录项的传统表示从可训练的嵌入转向使用可训练的特征提取器计算的表示，使得参数计数独立于目录项的数量。其次，我们提出了一个对比学习公式，为我们提供了一个更好的表示目录多样性。我们证明，在这种设置下，我们可以在一个通用的实验设置下，在越来越大的数据集上有效地训练我们的模型。我们使用完整的亚马逊产品数据集，这只是在其他研究中部分探索，并揭示了类似于在语言模型中发现的缩放行为。计算优化训练是可能的，但是需要对特定于应用程序的计算性能权衡进行仔细分析。我们还表明，通过在较小的特定于任务的领域上微调较大的预先训练的模型，性能伸缩转化为下游任务。我们的方法和研究结果提供了一个战略路线图的模型训练和部署在真正的高维偏好空间，促进更好的训练和推理效率。我们希望本文能够弥补现实推荐系统中变压器的潜力和高维顺序推荐的内在复杂性之间的差距。代码和模型可以在 https://github.com/mercadolibre/srt 找到。	code	0
SelfGNN: Self-Supervised Graph Neural Networks for Sequential Recommendation	Yuxi Liu, Lianghao Xia, Chao Huang	; University of Hong Kong	Sequential recommendation effectively addresses information overload bymodeling users' temporal and sequential interaction patterns. To overcome thelimitations of supervision signals, recent approaches have adoptedself-supervised learning techniques in recommender systems. However, there arestill two critical challenges that remain unsolved. Firstly, existingsequential models primarily focus on long-term modeling of individualinteraction sequences, overlooking the valuable short-term collaborativerelationships among the behaviors of different users. Secondly, real-world dataoften contain noise, particularly in users' short-term behaviors, which canarise from temporary intents or misclicks. Such noise negatively impacts theaccuracy of both graph and sequence models, further complicating the modelingprocess. To address these challenges, we propose a novel framework calledSelf-Supervised Graph Neural Network (SelfGNN) for sequential recommendation.The SelfGNN framework encodes short-term graphs based on time intervals andutilizes Graph Neural Networks (GNNs) to learn short-term collaborativerelationships. It captures long-term user and item representations at multiplegranularity levels through interval fusion and dynamic behavior modeling.Importantly, our personalized self-augmented learning structure enhances modelrobustness by mitigating noise in short-term graphs based on long-term userinterests and personal stability. Extensive experiments conducted on fourreal-world datasets demonstrate that SelfGNN outperforms variousstate-of-the-art baselines. Our model implementation codes are available athttps://github.com/HKUDS/SelfGNN.	顺序推荐通过建模用户的时间和顺序交互模式有效地解决了信息超载问题。为了克服监督信号的局限性，最近的方法在推荐系统中采用了自监督学习技术。然而，仍然有两个关键的挑战没有得到解决。首先，现有的序贯模型主要侧重于个体交互序列的长期建模，忽视了不同用户行为之间有价值的短期协作关系。其次，真实世界的数据往往包含噪音，特别是在用户的短期行为，这可能是由于临时意图或错误点击。这种噪声对图模型和序列模型的精度都有负面影响，使得建模过程更加复杂。为了应对这些挑战，我们提出了一个新的框架，称为自我监督图神经网络(SelfGNN)的顺序推荐。自组织神经网络(SelfGNN)框架根据时间间隔对短期图形进行编码，并利用图形神经网络(GNN)学习短期协作关系。它通过区间融合和动态行为建模，在多粒度级别捕获长期用户和项目表示。重要的是，我们的个性化自增强学习结构通过减少基于长期用户兴趣和个人稳定性的短期图中的噪声来增强模型鲁棒性。在四个真实世界数据集上进行的大量实验表明，SelfGNN 的性能优于各种最先进的基线。我们的模型实现代码可以通过 https:// github.com/hkuds/selfgnn 获得。	code	0
Revisit Targeted Model Poisoning on Federated Recommendation: Optimize via Multi-objective Transport	Jiajie Su, Chaochao Chen, Weiming Liu, Zibin Lin, Shuheng Shen, Weiqiang Wang, Xiaolin Zheng	Zhejiang University, Hangzhou, China; Ant Group, Hangzhou, China	Federated Recommendation (FedRec) is popularly investigated in personalized recommenders for preserving user privacy. However, due to the distributed training paradigm, FedRec is vulnerable to model poisoning attacks. In this paper, we focus on the targeted model poisoning attack against FedRec, which aims at effectively attacking the FedRec via uploading poisoned gradients to raise the exposure ratio of a multi-target item set. Previous attack methods excel with fewer target items but suffer performance decline as the amount of target items increases, which reveals two perennially neglected issues: (i) The simple promotion of prediction scores without considering intrinsic collaborations between users and items is ineffective in multi-target cases. (ii) Target items are heterogeneous, which requires discriminative attacking users and strategies for different targets. To address the issues, we propose a novel Heterogeneous Multi-target Transfer Attack framework named HMTA which consists of two stages, i.e., (1) diverse user agent generation and (2) optimal multi-target transport attack. The former stage leverages collaboration-aware manifold learning to extract latent associations among users and items, and develops a differentiable contrastive sorting to generate user agents from both difficulty and diversity scale. The latter stage conducts poisoning in a fine-grained and distinguishing way, which first completes distribution mapping from target items to generated user agents and then achieves a hybrid multi-target attack. Extensive experiments on benchmark datasets demonstrate the effectiveness of HMTA.	为了保护用户隐私，联邦推荐(FedRec)在个性化推荐中得到了广泛的应用。然而，由于分布式训练范例，FedRec 容易受到模型中毒攻击。本文研究了针对 FedRec 的目标模型中毒攻击，目的是通过上传中毒梯度来有效地攻击 FedRec，以提高多目标项目集的暴露率。以往的攻击方法优于较少的目标项目，但随着目标项目数量的增加性能下降，这揭示了两个长期被忽视的问题: (i)在多目标情况下，不考虑用户和项目之间的内在协作的简单预测得分的提升是无效的。(ii)目标项是异构的，需要针对不同目标的区分性攻击用户和策略。针对这一问题，提出了一种新的异构多目标传输攻击框架 HMTA，该框架由两个阶段组成，即(1)多用户代理生成阶段和(2)最优多目标传输攻击阶段。前一阶段利用协作感知流形学习来提取用户和项目之间的潜在关联，并开发了一种可微对比排序方法来从难度和多样性尺度生成用户代理。后一阶段采用细粒度和区分的方式进行中毒，首先完成目标项到生成用户代理的分布映射，然后实现混合多目标攻击。在基准数据集上的大量实验证明了 HMTA 算法的有效性。	code	0
LoRec: Combating Poisons with Large Language Model for Robust Sequential Recommendation	Kaike Zhang, Qi Cao, Yunfan Wu, Fei Sun, Huawei Shen, Xueqi Cheng	Institute of Computing Technology, CAS	Sequential recommender systems stand out for their ability to capture users' dynamic interests and the patterns of item transitions. However, the inherent openness of sequential recommender systems renders them vulnerable to poisoning attacks, where fraudsters are injected into the training data to manipulate learned patterns. Traditional defense methods predominantly depend on predefined assumptions or rules extracted from specific known attacks, limiting their generalizability to unknown attacks. To solve the above problems, considering the rich open-world knowledge encapsulated in Large Language Models (LLMs), we attempt to introduce LLMs into defense methods to broaden the knowledge beyond limited known attacks. We propose LoRec, an innovative framework that employs LLM-Enhanced Calibration to strengthen the robustness of sequential Recommender systems against poisoning attacks. LoRec integrates an LLM-enhanced CalibraTor (LCT) that refines the training process of sequential recommender systems with knowledge derived from LLMs, applying a user-wise reweighting to diminish the impact of attacks. Incorporating LLMs' open-world knowledge, the LCT effectively converts the limited, specific priors or rules into a more general pattern of fraudsters, offering improved defenses against poisons. Our comprehensive experiments validate that LoRec, as a general framework, significantly strengthens the robustness of sequential recommender systems.	顺序推荐系统因其捕捉用户动态兴趣和项目转换模式的能力而脱颖而出。然而，顺序推荐系统固有的开放性使它们容易受到中毒攻击，欺诈者被注入培训数据以操纵学到的模式。传统的防御方法主要依赖于预定义的假设或从特定的已知攻击中提取的规则，限制了它们对未知攻击的普遍性。为了解决上述问题，考虑到大语言模型(LLM)中包含的丰富的开放世界知识，我们尝试将 LLM 引入防御方法中，以扩展已知攻击范围之外的知识。我们提出 LoRec，一个创新的框架，使用 LLM 增强校准，以加强顺序推荐系统对中毒攻击的健壮性。LoRec 集成了一个 LLM 增强的 CalibraTor (LCT) ，它利用从 LLM 获得的知识完善了顺序推荐系统的训练过程，应用了用户明智的重新加权来减少攻击的影响。结合 LLM 的开放世界知识，LCT 有效地将有限的、特定的前科或规则转化为更普遍的欺诈者模式，提供了针对有毒物质的更好的防御。我们的综合实验验证了 LoRec 作为一个通用框架，显著增强了顺序推荐系统的鲁棒性。	code	0
Treatment Effect Estimation for User Interest Exploration on Recommender Systems	Jiaju Chen, Wenjie Wang, Chongming Gao, Peng Wu, Jianxiong Wei, Qingsong Hua	Beijing Technology and Business University; National University of Singapore; University of Science and Technology of China; Meituan	Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to consider the potential rewards of recommending different categories of items and lack the global scheduling of allocating top-N recommendations to categories, leading to suboptimal exploration. In this work, we propose an Uplift model-based Recommender (UpliftRec) framework, which regards top-N recommendation as a treatment optimization problem. UpliftRec estimates the treatment effects, i.e., the click-through rate (CTR) under different category exposure ratios, by using observational user feedback. UpliftRec calculates group-level treatment effects to discover users' hidden interests with high CTR rewards and leverages inverse propensity weighting to alleviate confounder bias. Thereafter, UpliftRec adopts a dynamic programming method to calculate the optimal treatment for overall CTR maximization. We implement UpliftRec on different backend models and conduct extensive experiments on three datasets. The empirical results validate the effectiveness of UpliftRec in discovering users' hidden interests while achieving superior recommendation accuracy.	推荐系统通过用户反馈(如点击)学习个性化用户偏好。然而，用户反馈通常偏向于部分观察到的兴趣，使得许多用户的隐藏兴趣未被探索。现有的方法通常会减轻偏差，增加推荐多样性，或者使用盗贼算法来平衡勘探与开发的权衡。然而，他们没有考虑推荐不同类别项目的潜在回报，缺乏将前 N 项推荐分配给类别的全球时间表，导致次优探索。在这项工作中，我们提出了一个基于 UpliftRec (UpliftRec)模型的推荐框架，它将 top-N 推荐作为一个治疗最佳化问题。UpliftRec 通过观察用户反馈来估计治疗效果，即不同类别暴露比例下的点进率。UpliftRec 计算组级治疗效果，以发现用户的高点击率奖励隐藏的兴趣，并利用反倾向加权，以减轻混杂偏见。然后，UpliftRec 采用动态规划方法来计算总体 CTR 最大化的最优处理。我们在不同的后端模型上实现 UpliftRec，并在三个数据集上进行了广泛的实验。实证结果验证了 UpliftRec 在发现用户隐藏兴趣的同时达到更高的推荐准确率的有效性。	code	0
Disentangling Instructive Information from Ranked Multiple Candidates for Multi-Document Scientific Summarization	Pancheng Wang, Shasha Li, Dong Li, Kehan Long, Jintao Tang, Ting Wang	National University of Defense Technology	Automatically condensing multiple topic-related scientific papers into asuccinct and concise summary is referred to as Multi-Document ScientificSummarization (MDSS). Currently, while commonly used abstractive MDSS methodscan generate flexible and coherent summaries, the difficulty in handling globalinformation and the lack of guidance during decoding still make it challengingto generate better summaries. To alleviate these two shortcomings, this paperintroduces summary candidates into MDSS, utilizing the global information ofthe document set and additional guidance from the summary candidates to guidethe decoding process. Our insights are twofold: Firstly, summary candidates canprovide instructive information from both positive and negative perspectives,and secondly, selecting higher-quality candidates from multiple optionscontributes to producing better summaries. Drawing on the insights, we proposea summary candidates fusion framework – Disentangling Instructive informationfrom Ranked candidates (DIR) for MDSS. Specifically, DIR first uses aspecialized pairwise comparison method towards multiple candidates to pick outthose of higher quality. Then DIR disentangles the instructive information ofsummary candidates into positive and negative latent variables with ConditionalVariational Autoencoder. These variables are further incorporated into thedecoder to guide generation. We evaluate our approach with three differenttypes of Transformer-based models and three different types of candidates, andconsistently observe noticeable performance improvements according to automaticand human evaluation. More analyses further demonstrate the effectiveness ofour model in handling global information and enhancing decodingcontrollability.	将多篇与主题相关的科学论文自动压缩成简洁、简洁的摘要，称为多文档科学摘要(MDSS)。目前，虽然常用的抽象 MDSS 方法可以生成灵活和连贯的摘要，但全球信息处理的困难和解码过程中缺乏指导仍然使得生成更好的摘要具有挑战性。为了克服这两个缺点，本文将摘要候选集引入 MDSS，利用文档集的全局信息和摘要候选集的额外指导来指导解码过程。我们的见解是双重的: 首先，总结候选人可以从正面和负面的角度提供有益的信息，其次，从多个选项中选择更高质量的候选人有助于产生更好的总结。在此基础上，我们提出了一个综合候选人融合框架——从排名候选人(DIR)中分离指导性信息，用于 MDSS。具体来说，DIR 首先对多个候选人使用专门的成对比较方法来挑选那些质量较高的候选人。然后用条件变分自动编码器将总结候选人的指导信息分解为正变量和负变量。这些变量进一步合并到解码器中以指导生成。我们评估我们的方法与三个不同类型的变压器为基础的模型和三个不同类型的候选人，并一致地观察显着的性能改善，根据自动和人工评估。更多的分析进一步证明了该模型在处理全局信息和提高译码可控性方面的有效性。	code	0
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check	Linhao Ye, Zhikai Lei, Jianghao Yin, Qin Chen, Jie Zhou, Liang He	East China Normal University	Retrieval-Augmented Generation (RAG) aims to generate more reliable andaccurate responses, by augmenting large language models (LLMs) with theexternal vast and dynamic knowledge. Most previous work focuses on using RAGfor single-round question answering, while how to adapt RAG to the complexconversational setting wherein the question is interdependent on the precedingcontext is not well studied. In this paper, we propose a conversation-level RAGapproach, which incorporates fine-grained retrieval augmentation and self-checkfor conversational question answering (CQA). In particular, our approachconsists of three components, namely conversational question refiner,fine-grained retriever and self-check based response generator, which workcollaboratively for question understanding and relevant information acquisitionin conversational settings. Extensive experiments demonstrate the greatadvantages of our approach over the state-of-the-art baselines. Moreover, wealso release a Chinese CQA dataset with new features including reformulatedquestion, extracted keyword, retrieved paragraphs and their helpfulness, whichfacilitates further researches in RAG enhanced CQA.	检索增强生成(RAG)旨在通过扩充大型语言模型(LLM)和外部海量动态知识来产生更可靠、更准确的响应。以往的研究主要集中在 RAG 在单轮问答中的应用，而如何将 RAG 应用到复杂的会话环境中，使问题相互依赖于前面的上下文，这方面的研究还不多。在本文中，我们提出了一种会话级的 RAG 方法，该方法结合了细粒度检索增强和会话问题回答(CQA)的自我检查。特别是，我们的方法由三个部分组成，即会话问题细化，细粒度检索和基于自我检查的响应生成器，它们协同工作的问题理解和相关信息的获取在会话环境中。大量的实验证明了我们的方法相对于最先进的基线的巨大优势。此外，我们还发布了一个中文 CQA 数据集，该数据集具有重构问题、提取关键词、检索段落及其有用性等新特征，为进一步研究 RAG 增强 CQA 提供了方便。	code	0
Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?	Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky	Google Research; University of Waterloo	Query expansion has been widely used to improve the search results offirst-stage retrievers, yet its influence on second-stage, cross-encoderrankers remains under-explored. A recent work of Weller et al. [44] shows thatcurrent expansion techniques benefit weaker models such as DPR and BM25 butharm stronger rankers such as MonoT5. In this paper, we re-examine thisconclusion and raise the following question: Can query expansion improvegeneralization of strong cross-encoder rankers? To answer this question, wefirst apply popular query expansion methods to state-of-the-art cross-encoderrankers and verify the deteriorated zero-shot performance. We identify twovital steps for cross-encoders in the experiment: high-quality keywordgeneration and minimal-disruptive query modification. We show that it ispossible to improve the generalization of a strong neural ranker, by promptengineering and aggregating the ranking results of each expanded query viafusion. Specifically, we first call an instruction-following language model togenerate keywords through a reasoning chain. Leveraging self-consistency andreciprocal rank weighting, we further combine the ranking results of eachexpanded query dynamically. Experiments on BEIR and TREC Deep Learning2019/2020 show that the nDCG@10 scores of both MonoT5 and RankT5 followingthese steps are improved, which points out a direction for applying queryexpansion to strong cross-encoder rankers.	查询扩展已被广泛用于改善第一阶段检索者的搜索结果，但其对第二阶段交叉编码者的影响尚未得到充分研究。Weller 等人最近的一项工作[44]表明，目前的扩展技术有利于较弱的模型，如 DPR 和 BM25，但损害较强的排名，如 MonoT5。在本文中，我们重新审视这个结论，并提出以下问题: 查询扩展能否改善强交叉编码器排序器的泛化？为了回答这个问题，我们首先将流行的查询扩展方法应用于最先进的交叉编码器，并验证恶化的零射击性能。在实验中，我们确定了交叉编码器的两个重要步骤: 高质量的关键字生成和最小干扰的查询修改。我们表明，通过提示工程和聚合每个扩展查询的排序结果，提高一个强神经排序器的泛化是可能的。具体来说，我们首先调用一个指令遵循语言模型来通过一个推理链生成关键字。利用自相容性和相互排序加权，进一步动态组合每个扩展查询的排序结果。BEIR 和 TREC Deep Learning2019/2020的实验表明，遵循这些步骤的 MonoT5和 RankT5的 nDCG@10分数得到了改善，这为将查询扩展应用于强交叉编码器排序器指明了方向。	code	0
EASE-DR: Enhanced Sentence Embeddings for Dense Retrieval	Xixi Zhou, Yang Gao, Xin Jie, Xiaoxu Cai, Jiajun Bu, Haishuai Wang	Zhejiang University, Hangzhou, China	Recent neural information retrieval models using dense text representations generated by pre-trained models commonly face two issues. First, a pre-trained model (e.g., BERT) usually truncates a long document before giving its representation, which may cause the loss of some important semantic information. Second, although pre-training models like BERT have been widely used in generating sentence embeddings, a substantial body of literature has shown that the pre-training models often represent sentence embeddings in a homogeneous and narrow space, known as the problem of representation anisotropy, which hurts the quality of dense vector retrieval. In this paper, we split the query and the document in information retrieval into two sets of natural sentences and generate their sentence embeddings with BERT, the most popular pre-trained model. Before aggregating the sentence embeddings to get the entire embedding representations of the input query and document, to alleviate the usual representation degeneration problem of sentence embeddings from BERT, we sample the variational auto-encoder's latent space distribution to obtain isotropic sentence embeddings and utilize supervised contrastive learning to uniform the distribution of these sentence embeddings in the representation space. Our proposed model undergoes training optimization for both the query and the document in the abovementioned aspects. Our model performs well in evaluating three extensively researched neural information retrieval datasets.	最近的神经信息检索模型使用由预先训练的模型产生的密集文本表示，通常面临两个问题。首先，预先训练的模型(例如 BERT)在给出表示之前通常会截断一个长文档，这可能会导致一些重要语义信息的丢失。其次，尽管像 BERT 这样的预训练模型已经被广泛应用于生成句子嵌入，但是大量的文献表明，预训练模型往往代表同质和狭窄空间中的句子嵌入，即所谓的表示各向异性问题，这损害了密集向量检索的质量。在本文中，我们将查询和信息检索中的文档分成两组自然句子，并使用 BERT (最流行的预训练模型)生成它们的句子嵌入。在对输入查询和文档的句子嵌入进行聚合得到完整的嵌入表示之前，为了缓解 BERT 中常见的句子嵌入表示退化问题，采用变分自动编码器的潜空间分布来获得各向同性的句子嵌入，并利用监督对比学习来统一这些句子嵌入在表示空间中的分布。我们提出的模型在上述方面对查询和文档进行了训练优化。我们的模型在评估三个广泛研究的神经信息检索数据集方面表现良好。	code	0
Explainable Uncertainty Attribution for Sequential Recommendation	Carles Balsells Rodas, Fan Yang, Zhishen Huang, Yan Gao	Amazon.com Inc, Seattle, WA, USA; Imperial College London, London, United Kingdom	Sequential recommendation systems suggest products based on users' historical behaviours. The inherent sparsity of user-item interactions in a vast product space often leads to unreliable recommendations. Recent research addresses this challenge by leveraging auxiliary product relations to mitigate recommendation uncertainty, and quantifying uncertainty in recommendation scores to modify the candidates selection. However, such approaches may not be efficient due to the requirement of additional side information or providing suboptimal recommendations. To enhance sequential recommendation performance by leveraging uncertainty information, we introduce Explainable Uncertainty Attribution (ExUA). We employ gradient-based saliency attribution to identify sources of uncertainty stemming from sequential interactions. Experimental findings on Amazon and MovieLens datasets demonstrate ExUA's effectiveness in identifying interactions that induce uncertainty, resulting in a 6%+ improvement in NDCG@20 scores when the uncertainty information is integrated into a post-hoc training phase.	连续推荐系统根据用户的历史行为推荐产品。在广阔的产品空间中，用户项交互的固有稀疏性常常导致不可靠的建议。最近的研究通过利用辅助产品关系来减少推荐的不确定性，并量化推荐分数的不确定性来修改候选人的选择，从而解决了这一挑战。然而，由于需要额外的辅助信息或提供次优的建议，这种方法可能不是有效的。为了利用不确定性信息提高序贯推荐的性能，我们引入了可解释的不确定性归因(ExUA)。我们使用基于梯度的显著性归因来识别源于序列相互作用的不确定性。Amazon 和 MovieLens 数据集上的实验结果证明了 ExUA 在识别诱导不确定性的相互作用方面的有效性，当不确定性信息被整合到事后训练阶段时，导致 NDCG@20分数提高6% + 。	code	0
FedUD: Exploiting Unaligned Data for Cross-Platform Federated Click-Through Rate Prediction	Wentao Ouyang, Rui Dong, Ri Tao, Xiangzheng Liu	Alibaba Group, Beijing, China	Click-through rate (CTR) prediction plays an important role in online advertising platforms. Most existing methods use data from the advertising platform itself for CTR prediction. As user behaviors also exist on many other platforms, e.g., media platforms, it is beneficial to further exploit such complementary information for better modeling user interest and for improving CTR prediction performance. However, due to privacy concerns, data from different platforms cannot be uploaded to a server for centralized model training. Vertical federated learning (VFL) provides a possible solution which is able to keep the raw data on respective participating parties and learn a collaborative model in a privacy-preserving way. However, traditional VFL methods only utilize aligned data with common keys across parties, which strongly restricts their application scope. In this paper, we propose FedUD, which is able to exploit unaligned data, in addition to aligned data, for more accurate federated CTR prediction. FedUD contains two steps. In the first step, FedUD utilizes aligned data across parties like traditional VFL, but it additionally includes a knowledge distillation module. This module distills useful knowledge from the guest party's high-level representations and guides the learning of a representation transfer network. In the second step, FedUD applies the learned knowledge to enrich the representations of the host party's unaligned data such that both aligned and unaligned data can contribute to federated model training. Experiments on two real-world datasets demonstrate the superior performance of FedUD for federated CTR prediction.	点进率预测在在线广告平台中扮演着重要的角色。大多数现有的方法使用来自广告平台本身的数据进行点击率预测。由于用户行为也存在于许多其他平台上，如媒体平台，因此进一步利用这些互补信息有利于更好地建立用户兴趣模型和提高 CTR 预测性能。然而，由于隐私问题，不同平台的数据不能上传到服务器进行集中的模型培训。垂直联邦学习(VFL)提供了一种可能的解决方案，它能够保存各参与方的原始数据，并以保护隐私的方式学习协作模型。然而，传统的 VFL 方法只利用跨各方公共密钥的对齐数据，这严重限制了它们的应用范围。在本文中，我们提出了 FedUD，它能够利用未对齐的数据，除了对齐的数据，更准确的联邦点击率预测。FedUD 包含两个步骤。在第一个步骤中，FedUD 利用传统 VFL 等各方之间的对齐数据，但是它还包括一个知识提取模块。该模块从客方的高层次表示中提取有用的知识，并指导表示传递网络的学习。在第二步中，FedUD 应用所学到的知识来丰富主机方的未对齐数据的表示，这样对齐和未对齐的数据都可以有助于联邦模型的训练。在两个实际数据集上的实验表明，FedUD 在联邦 CTR 预测方面具有优越的性能。	code	0
Generalizable Tip-of-the-Tongue Retrieval with LLM Re-ranking	Luís Borges, Rohan Jha, Jamie Callan, Bruno Martins	Instituto Superior Técnico and INESC-ID, Lisbon, Portugal; The University of Texas at Austin, Austin, Texas, USA; Carnegie Mellon University, Pittsburgh, Pennsylvania, USA	Tip-of-the-Tongue (ToT) retrieval is challenging for search engines because the queries are usually natural-language, verbose, and contain uncertain and inaccurate information. This paper studies the generalization capabilities of existing retrieval methods with ToT queries in multiple domains. We curate a multi-domain dataset and evaluate the effectiveness of recall-oriented first-stage retrieval methods across the different domains, considering in-domain, out-of-domain, and multi-domain training settings. We further explore the use of a Large Language Model (LLM), i.e. GPT-4, for zero-shot re-ranking in various ToT domains, relying solely on the item titles. Results show that multi-domain training enhances recall, and that LLMs are strong zero-shot re-rankers, especially for popular items, outperforming direct GPT-4 prompting without first-stage retrieval. Datasets and code can be found on GitHub https://github.com/LuisPB7/TipTongue	舌尖检索(ToT)对于搜索引擎来说是一个挑战，因为查询通常是自然语言的，冗长的，并且包含不确定和不准确的信息。本文研究了现有的多领域 ToT 查询检索方法的泛化能力。我们策划一个多领域的数据集，并评估面向召回的第一阶段检索方法在不同领域的有效性，考虑域内，域外和多领域的训练设置。我们进一步探索了大型语言模型(LLM)的使用，即 GPT-4，用于在各种 ToT 域中重新排序，仅仅依赖于项目标题。结果表明，多领域训练有助于提高记忆力，LLM 具有较强的零击重排能力，尤其是对于热门项目，其表现优于没有第一阶段提取的直接 GPT-4提示。数据集和代码可以在 gitHub https://GitHub.com/luispb7/tiptongue 上找到	code	0
Grasping Both Query Relevance and Essential Content for Query-focused Summarization	Ye Xiong, Hidetaka Kamigaito, Soichiro Murakami, Peinan Zhang, Hiroya Takamura, Manabu Okumura	Tokyo Institute of Technology, Tokyo, Japan; CyberAgent, Inc., Tokyo, Japan	Numerous effective methods have been developed to improve query-focused summarization (QFS) performance, e.g., pre-trained model-based and query-answer relevance-based methods. However, these methods still suffer from missing or redundant information due to the inability to capture and effectively utilize the interrelationship between the query and the source document, as well as between the source document and its generated summary, resulting in the summary being unable to answer the query or containing additional unrequired information. To mitigate this problem, we propose an end-to-end hierarchical two-stage summarization model, that first predicts essential content, and then generates a summary by emphasizing the predicted important sentences while maintaining separate encodings for the query and the source, so that it can comprehend not only the query itself but also the essential information in the source. We evaluated the proposed model on two QFS datasets, and the results indicated its overall effectiveness and that of each component.	为了提高查询聚焦摘要(QFS)的性能，已经开发了许多有效的方法，例如基于预训练模型的方法和基于查询-回答相关性的方法。然而，由于无法捕获和有效利用查询与源文档之间以及源文档与其生成的摘要之间的相互关系，这些方法仍然存在信息缺失或多余的问题，导致摘要无法回答查询或包含额外的不必要信息。为了解决这个问题，我们提出了一种端到端的层次化两阶段摘要模型，该模型首先对重要内容进行预测，然后通过强调预测的重要句子来生成摘要，同时对查询和源代码保持单独的编码，从而不仅能够理解查询本身，而且能够理解源代码中的重要信息。我们在两个 QFS 数据集上对所提出的模型进行了评估，结果表明了该模型的整体有效性和每个组件的有效性。	code	0
MoME: Mixture-of-Masked-Experts for Efficient Multi-Task Recommendation	Jiahui Xu, Lu Sun, Dengji Zhao	ShanghaiTech University, Shanghai, China	Multi-task learning techniques have attracted great attention in recommendation systems because they can meet the needs of modeling multiple perspectives simultaneously and improve recommendation performance. As promising multi-task recommendation system models, Mixture-of-Experts (MoE) and related methods use an ensemble of expert sub-networks to improve generalization and have achieved significant success in practical applications. However, they still face key challenges in efficient parameter sharing and resource utilization, especially when they are applied to real-world datasets and resource-constrained devices. In this paper, we propose a novel framework called Mixture-of-Masked-Experts (MoME) to address the challenges. Unlike MoE, expert sub-networks in MoME are extracted from an identical over-parameterized base network by learning binary masks. It utilizes a binary mask learning mechanism composed of neuron-level model masking and weight-level expert masking to achieve coarse-grained base model pruning and fine-grained expert pruning, respectively. Compared to existing MoE-based models, MoME achieves efficient parameter sharing and requires significantly less sub-network storage since it actually only trains a base network and a mixture of partially overlapped binary expert masks. Experimental results on real-world datasets demonstrate the superior performance of MoME in terms of recommendation accuracy and computational efficiency. Our code is available at https://https://github.com/Xjh0327/MoME.	多任务学习技术能够满足同时建立多视角模型的需要，提高推荐系统的性能，因而受到推荐系统的广泛关注。专家混合推荐系统作为一种有前途的多任务推荐系统模型，利用专家子网络集成技术提高推荐系统的泛化能力，在实际应用中取得了显著的成功。然而，它们在有效的参数共享和资源利用方面仍然面临着关键的挑战，特别是当它们应用于真实世界的数据集和资源受限的设备时。在本文中，我们提出了一个新的框架，称为蒙版专家混合(MoME) ，以解决这一挑战。与 MoE 不同的是，MoME 中的专家子网络是通过学习二进制掩码从相同的过参数化基网络中提取出来的。该方法利用神经元级模型掩蔽和权重级专家掩蔽组成的二元掩蔽学习机制，分别实现了粗粒度基模型和细粒度专家模型的修剪。与现有的基于 MoE 的模型相比，MoME 实现了有效的参数共享，并且需要的子网存储量明显减少，因为它实际上只训练一个基本网络和部分重叠的二进制专家掩码的混合。在实际数据集上的实验结果表明，MoME 在推荐精度和计算效率方面具有优越的性能。我们的代码可以在 https://https://github.com/xjh0327/mome 找到。	code	0
Multi-Layer Ranking with Large Language Models for News Source Recommendation	Wenjia Zhang, Lin Gui, Rob Procter, Yulan He	The University of Warwick; University of Warwick; King's College London	To seek reliable information sources for news events, we introduce a noveltask of expert recommendation, which aims to identify trustworthy sources basedon their previously quoted statements. To achieve this, we built a noveldataset, called NewsQuote, consisting of 23,571 quote-speaker pairs sourcedfrom a collection of news articles. We formulate the recommendation task as theretrieval of experts based on their likelihood of being associated with a givenquery. We also propose a multi-layer ranking framework employing Large LanguageModels to improve the recommendation performance. Our results show thatemploying an in-context learning based LLM ranker and a multi-layerranking-based filter significantly improve both the predictive quality andbehavioural quality of the recommender system.	为了为新闻事件寻找可靠的信息来源，本文提出了一种新颖的专家推荐任务，该任务的目的是根据新闻事件的可靠信息来源的先前引用的陈述来确定其可靠性。为了实现这一点，我们建立了一个名为 NewsQuote 的新颖数据集，其中包含23,571对引用说话者，这些引用说话者来自一系列新闻文章。我们根据专家与特定查询相关联的可能性来制定推荐任务。我们还提出了一个使用大型语言模型的多层次排序框架，以提高推荐性能。我们的研究结果表明，使用基于上下文学习的 LLM 排名器和基于多层次排名的过滤器可以显著提高推荐系统的预测质量和行为质量。	code	0
Neural Click Models for Recommender Systems	Mikhail Shirokikh, Ilya Shenbin, Anton Alekseev, Anna Volodkevich, Alexey Vasilev, Andrey V. Savchenko, Sergey I. Nikolenko	Sber AI Lab; Sber AI Lab, Moscow, Russian Federation; PDMI RAS & St. Petersburg University, St. Petersburg, Russian Federation; St. Petersburg State University, St. Petersburg, Russian Federation; PDMI RAS, St. Petersburg, Russian Federation; Steklov Mathematical Institute, St. Petersburg	We develop and evaluate neural architectures to model the user behavior in recommender systems (RS) inspired by click models for Web search but going beyond standard click models. Proposed architectures include recurrent networks, Transformer-based models that alleviate the quadratic complexity of self-attention, adversarial and hierarchical architectures. Our models outperform baselines on the ContentWise and RL4RS datasets and can be used in RS simulators to model user response for RS evaluation and pretraining.	我们开发和评估神经结构，以模拟推荐系统(RS)中的用户行为，该系统受到 Web 搜索的点击模型的启发，但超越了标准的点击模型。建议的体系结构包括循环网络，基于变压器的模型，减轻自我注意的二次复杂性，对手和分层体系结构。我们的模型优于 ContentWise 和 RL4RS 数据集的基线，可以用于 RS 模拟器，为 RS 评估和预训练建立用户响应模型。	code	0
SpherE: Expressive and Interpretable Knowledge Graph Embedding for Set Retrieval	Zihao Li, Yuyi Ao, Jingrui He	University of Illinois at Urbana-Champaign	Knowledge graphs (KGs), which store an extensive number of relational facts(head, relation, tail), serve various applications. While many downstream taskshighly rely on the expressive modeling and predictive embedding of KGs, most ofthe current KG representation learning methods, where each entity is embeddedas a vector in the Euclidean space and each relation is embedded as atransformation, follow an entity ranking protocol. On one hand, such anembedding design cannot capture many-to-many relations. On the other hand, inmany retrieval cases, the users wish to get an exact set of answers without anyranking, especially when the results are expected to be precise, e.g., whichgenes cause an illness. Such scenarios are commonly referred to as "setretrieval". This work presents a pioneering study on the KG set retrievalproblem. We show that the set retrieval highly depends on expressive modelingof many-to-many relations, and propose a new KG embedding model SpherE toaddress this problem. SpherE is based on rotational embedding methods, but eachentity is embedded as a sphere instead of a vector. While inheriting the highinterpretability of rotational-based models, our SpherE can more expressivelymodel one-to-many, many-to-one, and many-to-many relations. Through extensiveexperiments, we show that our SpherE can well address the set retrieval problemwhile still having a good predictive ability to infer missing facts. The codeis available at https://github.com/Violet24K/SpherE.	知识图(KGs)存储了大量的关系事实(头部、关系、尾部) ，服务于各种应用。虽然许多下游任务高度依赖于 KG 的表达式建模和预测嵌入，但目前大多数 KG 表示学习方法都遵循实体排序协议，其中每个实体嵌入在欧几里德空间中作为一个向量，每个关系嵌入作为变换。一方面，这样的嵌入式设计不能捕获多对多的关系。另一方面，在许多检索案例中，用户希望在没有任何排名的情况下得到一组精确的答案，特别是当结果被期望是精确的时候，例如，哪些基因导致疾病。这样的场景通常被称为“设置检索”。本文对 KG 集检索问题进行了开创性的研究。我们证明了集合检索高度依赖于多对多关系的表达式建模，并提出了一种新的 KG 嵌入模型 SpherE 来解决这一问题。球面嵌入是基于旋转嵌入的方法，但是每个实体都是球面嵌入而不是向量嵌入。在继承了基于旋转的模型的高可解释性的同时，我们的 SpherE 可以更有表现力地建立一对多、多对一和多对多的关系模型。通过大量的实验表明，我们的 SphereE 能够很好地解决集合检索问题，同时仍然具有很好的预测能力来推断丢失的事实。密码可以在 https://github.com/violet24k/sphere 找到。	code	0
CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models	Peiyuan Gong, Jiamian Li, Jiaxin Mao	Renmin University of China Gaoling School of Artificial Intelligence	Collaborative search supports multiple users working together to accomplish aspecific search task. Research has found that designing lightweightcollaborative search plugins within instant messaging platforms aligns betterwith users' collaborative habits. However, due to the complexity of multi-userinteraction scenarios, it is challenging to implement a fully functioninglightweight collaborative search system. Therefore, previous studies onlightweight collaborative search had to rely on the Wizard of Oz paradigm. Inrecent years, large language models (LLMs) have been demonstrated to interactnaturally with users and achieve complex information-seeking tasks throughLLM-based agents. Hence, to better support the research in collaborativesearch, in this demo, we propose CoSearchAgent, a lightweight collaborativesearch agent powered by LLMs. CoSearchAgent is designed as a Slack plugin thatcan support collaborative search during multi-party conversations on thisplatform. Equipped with the capacity to understand the queries and context inmulti-user conversations and the ability to search the Web for relevantinformation via APIs, CoSearchAgent can respond to user queries with answersgrounded on the relevant search results. It can also ask clarifying questionswhen the information needs are unclear. The proposed CoSearchAgent is highlyflexible and would be useful for supporting further research on collaborativesearch. The code and demo video are accessible.	协同搜索支持多个用户协同工作来完成特定的搜索任务。研究发现，在即时通讯平台上设计轻量级/协作式搜索插件更符合用户的协作习惯。然而，由于多用户交互场景的复杂性，实现一个全功能的轻量级协同搜索系统是一个挑战。因此，之前对轻量级协作搜索的研究必须依赖于绿野仙踪范式。近年来，大型语言模型(LLM)已被证明可以与用户进行自然交互，并通过基于 LLM 的代理实现复杂的信息搜索任务。因此，为了更好地支持协作搜索的研究，在本演示中，我们提出了 CoSearchAgent，一个由 LLM 支持的轻量级协作搜索代理。CoSearchAgent 被设计成一个 Slack 插件，可以在这个平台上支持多方对话的协作搜索。CoSearchAgent 具有理解多用户对话中的查询和上下文的能力，并能够通过 API 在网上搜索相关信息，CoSearchAgent 可以根据相关搜索结果回答用户的查询。当信息需求不明确时，它也可以提出澄清问题。提出的 CoSearchAgent 具有高度灵活性，将有助于支持协作研究的进一步研究。代码和演示视频是可访问的。	code	0
MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation	Zijie J. Wang, Duen Horng Chau	Georgia Tech	Retrieval-augmented text generation (RAG) addresses the common limitations of large language models (LLMs), such as hallucination, by retrieving information from an updatable external knowledge base. However, existing approaches often require dedicated backend servers for data storage and retrieval, thereby limiting their applicability in use cases that require strict data privacy, such as personal finance, education, and medicine. To address the pressing need for client-side dense retrieval, we introduce MeMemo, the first open-source JavaScript toolkit that adapts the state-of-the-art approximate nearest neighbor search technique HNSW to browser environments. Developed with modern and native Web technologies, such as IndexedDB and Web Workers, our toolkit leverages client-side hardware capabilities to enable researchers and developers to efficiently search through millions of high-dimensional vectors in the browser. MeMemo enables exciting new design and research opportunities, such as private and personalized content creation and interactive prototyping, as demonstrated in our example application RAG Playground. Reflecting on our work, we discuss the opportunities and challenges for on-device dense retrieval. MeMemo is available at https://github.com/poloclub/mememo.	检索增强型文本生成(RAG)通过从可更新的外部知识库中检索信息，解决了大型语言模型(LLM)的常见局限性，如幻觉。然而，现有的方法通常需要专用的后端服务器来存储和检索数据，从而限制了它们在需要严格数据隐私的用例中的适用性，例如个人理财、教育和医疗。为了满足客户端密集检索的迫切需求，我们引入了 MeMemo，这是第一个开源 JavaScript 工具包，它将最先进的近似最近邻搜索技术 HNSW 应用于浏览器环境。我们的工具包利用 IndexedDB 和 Web Workers 等现代和本地 Web 技术开发，利用客户端硬件能力，使研究人员和开发人员能够在浏览器中有效地搜索数百万个高维向量。MeMemo 提供了令人兴奋的新设计和研究机会，如私人和个性化的内容创建和交互式原型制作，如我们的示例应用程序 RAG Playground 所示。回顾我们的工作，我们讨论了设备上密集检索的机会和挑战。备忘录可在 https://github.com/poloclub/MeMemo 下载。	code	0
Monitoring the Evolution of Behavioural Embeddings in Social Media Recommendation	Srijan Saket, Olivier Jeunen, Md. Danish Kalim	ShareChat; Sharechat	Emerging short-video platforms like TikTok, Instagram Reels, and ShareChatpresent unique challenges for recommender systems, primarily originating from acontinuous stream of new content. ShareChat alone receives approximately 2million pieces of fresh content daily, complicating efforts to assess quality,learn effective latent representations, and accurately match content with theappropriate user base, especially given limited user feedback. Embedding-basedapproaches are a popular choice for industrial recommender systems because theycan learn low-dimensional representations of items, leading to effectiverecommendation that can easily scale to millions of items and users. Our work characterizes the evolution of such embeddings in short-videorecommendation systems, comparing the effect of batch and real-time updates tocontent embeddings. We investigate how embeddings change with subsequentupdates, explore the relationship between embeddings and popularity bias, andhighlight their impact on user engagement metrics. Our study unveils thecontrast in the number of interactions needed to achieve mature embeddings in abatch learning setup versus a real-time one, identifies the point of highestinformation updates, and explores the distribution of ℓ_2-norms across thetwo competing learning modes. Utilizing a production system deployed on alarge-scale short-video app with over 180 million users, our findings offerinsights into designing effective recommendation systems and enhancing usersatisfaction and engagement in short-video applications.	新兴的短视频平台，如 TikTok、 Instagram Reels 和 ShareChat.com 对推荐系统提出了独特的挑战，这些平台主要来源于源源不断的新内容。ShareChat 每天接收大约200万条新内容，这使得评估质量、学习有效的潜在表现形式以及将内容与合适的用户群准确匹配的工作变得复杂，特别是在用户反馈有限的情况下。基于嵌入的方法是工业推荐系统的一个流行选择，因为它们可以学习项目的低维表示，导致有效的推荐，可以轻松地扩展到数百万个项目和用户。我们的工作描述了这种嵌入在短视频推荐系统中的演变，比较了批量和实时更新对内容嵌入的影响。我们调查嵌入如何随着后续更新而改变，探索嵌入与流行偏差之间的关系，并强调它们对用户参与度量的影响。我们的研究揭示了在批量学习设置中实现成熟嵌入与实时嵌入所需的交互数量的对比，确定了最高信息更新的点，并探索了在两种竞争学习模式中 l _ 2-规范的分布。我们的研究结果利用一个部署在拥有超过1.8亿用户的大规模短视频应用程序上的生产系统，为设计有效的推荐系统、提高用户满意度和参与短视频应用程序提供了见解。	code	0
Embedding Based Deduplication in E-commerce AutoComplete	Shaodan Zhai, Yuwei Chen, Yixue Li	Coupang Inc., Mountain View, CA, USA; Coupang Inc., Mountain View, USA	Query AutoComplete (QAC) is an important feature in e-commerce search engines, aimed at enhancing user experience by offering relevant query suggestions. However, these suggestions often include semantically duplicate entries derived from user logs. While the existing literature has made significant progress in query similarity learning for e-commerce applications, the specific challenge of query deduplication has received less attention. To address this issue, this paper presents a new industry-scale framework for QAC deduplication at Coupang, utilizing diverse data augmentation techniques to enhance deduplication accuracy effectively. Our results reveal that this approach substantially outperforms existing query similarity methods, providing valuable insights into the utility of various pre-trained models and data augmentation strategies. Online A/B testing further validates the significant impact of our deduplication framework on improving the e-commerce search experience, highlighting the importance of addressing semantic duplicates in QAC suggestions and offering a practical solution with proven effectiveness in a live e-commerce environment.	查询自动完成(Query AutoComplete，QAC)是电子商务搜索引擎的一个重要特性，旨在通过提供相关的查询建议来增强用户体验。但是，这些建议通常包括来自用户日志的语义重复条目。虽然现有的文献在电子商务应用中的查询相似性学习方面取得了显著的进展，但是查询重复数据删除这一具体挑战却没有得到足够的重视。为了解决这个问题，本文提出了一个新的行业规模的质保局在 Coupang 重复数据删除框架，利用不同的数据增强技术，以有效地提高重复数据删除的准确性。我们的研究结果表明，这种方法大大优于现有的查询相似性方法，为各种预先训练的模型和数据增强策略的实用性提供了有价值的见解。在线 A/B 测试进一步验证了我们的重复数据删除框架对改善电子商务搜索体验的重要影响，突出了解决质量保证委员会建议中的语义重复的重要性，并提供了一个实用的解决方案，在实时电子商务环境中被证明是有效的。	code	0
Using and Evaluating Quantum Computing for Information Retrieval and Recommender Systems	Maurizio Ferrari Dacrema, Andrea Pasin, Paolo Cremonesi, Nicola Ferro	Università degli Studi di Padova, Padova, Italy; Politecnico di Milano, Milano, Italy	The field of Quantum Computing (QC) has gained significant popularity in recent years, due to its potential to provide benefits in terms of efficiency and effectiveness when employed to solve certain computationally intensive tasks. In both Information Retrieval (IR) and Recommender Systems (RS) we are required to build methods that apply complex processing on large and heterogeneous datasets, it is natural therefore to wonder whether QC could also be applied to boost their performance. The tutorial aims to provide first an introduction to QC for an audience that is not familiar with the technology, then to show how to apply the QC paradigm of Quantum Annealing (QA) to solve practical problems that are currently faced by IR and RS systems. During the tutorial, participants will be provided with the fundamentals required to understand QC and to apply it in practice by using a real D-Wave quantum annealer through APIs.	近年来，量子计算(QC)因其在解决某些计算密集型任务时在效率和有效性方面的潜力而广受欢迎。在信息检索(IR)和推荐系统(RS)中，我们都被要求建立在大型异构数据集上应用复杂处理的方法，因此很自然地想知道是否也可以应用质量控制来提高它们的性能。本教程旨在首先为不熟悉这项技术的观众介绍质量控制，然后展示如何应用质量控制量子退火(QA)范例来解决当前 IR 和 RS 系统所面临的实际问题。在本教程期间，参与者将提供必要的基本知识，以了解质量控制，并应用它在实践中使用一个真正的 D-Wave 量子退火器通过 API。	code	0
Reinforcing Long-Term Performance in Recommender Systems with User-Oriented Exploration Policy	Changshuo Zhang, Sirui Chen, Xiao Zhang, Sunhao Dai, Weijie Yu, Jun Xu	University of International Business and Economics School of Information Technology and Management; Gaoling School of AI, Renmin University of China, Beijing, China; University of Illinois at Urbana-Champaign, Champaign, USA	Reinforcement learning (RL) has gained popularity in recommender systems for improving long-term performance by effectively exploring users' interests. However, modern recommender systems face the challenge of different user behavioral patterns among millions of items, making exploration more difficult. For example, users with varying activity levels require different exploration intensities. Unfortunately, previous studies often overlook this aspect and apply a uniform exploration strategy to all users, which ultimately hampers long-term user experiences. To tackle these challenges, we propose User-Oriented Exploration Policy (UOEP), a novel approach that enables fine-grained exploration among user groups. We first construct a distributional critic that allows policy optimization based on varying quantile levels of cumulative reward feedback from users, representing user groups with different activity levels. Using this critic as a guide, we design a population of distinct actors dedicated to effective and fine-grained exploration within their respective user groups. To simultaneously enhance diversity and stability during the exploration process, we also introduce a population-level diversity regularization term and a supervision module. Experimental results on public recommendation datasets validate the effectiveness of our approach, as it outperforms all other baselines in terms of long-term performance. Moreover, further analyses reveal the benefits of our approach, including improved performance for low-activity users and increased fairness among users.	在推荐系统中，强化学习(RL)通过有效地探索用户的兴趣来改善长期表现，这种做法已经越来越受欢迎。然而，现代推荐系统面临着数以百万计的项目中不同的用户行为模式的挑战，使得探索变得更加困难。例如，活动级别不同的用户需要不同的探索强度。不幸的是，以往的研究往往忽视了这一方面，对所有用户采用统一的探索策略，这最终阻碍了长期的用户体验。为了应对这些挑战，我们提出了面向用户的探索策略(UOEP) ，这是一种新颖的方法，能够在用户组之间进行细粒度的探索。我们首先构造一个分布式批评，允许基于来自用户的累积奖励反馈的不同分位数级别的策略优化，代表具有不同活动级别的用户组。以这个批评家为指导，我们设计了一组不同的参与者，致力于在他们各自的用户群中进行有效和细粒度的探索。为了在勘探过程中同时增强多样性和稳定性，我们还引入了一个种群级多样性正则化项和一个监督模块。对公共推荐数据集的实验结果验证了我们方法的有效性，因为它在长期性能方面优于所有其他基线。此外，进一步的分析揭示了我们的方法的好处，包括改善低活动用户的性能和增加用户之间的公平性。	code	0
Unsupervised Cross-Domain Image Retrieval with Semantic-Attended Mixture-of-Experts	Kai Wang, Jiayang Liu, Xing Xu, Jingkuan Song, Xin Liu, Heng Tao Shen	College of Electronic and Information Engineering, Tongji University, Shanghai, China; University of Electronic Science and Technology of China; College of Computer Science and Technology, Huaqiao University, Xiamen, China	Unsupervised cross-domain image retrieval is designed to facilitate the retrieval between images in different domains in an unsupervised way. Without the guidance of labels, both intra-domain semantic learning and inter-domain semantic alignment pose significant challenges to the model's learning process. The resolution of these challenges relies on the accurate capture of domain-invariant semantic features by the model. Based on this consideration, we propose our Semantic-Attended Mixture of Experts (SA-MoE) model. Leveraging the proficiency of MoE network in capturing visual features, we enhance the model's focus on semantically relevant features through a series of strategies. We first utilize the self-attention mechanism of Vision Transformer to adaptively collect information with different weights on instances from different domains. In addition, we introduce contextual semantic association metrics to more accurately measure the semantic relatedness between instances. By utilizing the association metrics, secondary clustering is performed in the feature space to reinforce semantic relationships. Finally, we employ the metrics for information selection on the fused data to remove the semantic noise. We conduct extensive experiments on three widely used datasets. The consistent comparison results with existing methods indicate that our model possesses the state-of-the-art performance.	无监督跨域图像检索是为了方便不同域间图像的无监督检索而设计的。在没有标签指导的情况下，域内语义学习和域间语义对齐都给模型的学习过程带来了巨大的挑战。这些挑战的解决依赖于模型对领域不变语义特征的准确捕获。基于这一考虑，我们提出了语义参与的专家混合模型(SA-MoE)。利用 MoE 网络捕获视觉特征的能力，我们通过一系列策略提高了模型对语义相关特征的关注度。首先利用视觉变压器的自注意机制对不同领域的实例进行不同权重的信息自适应采集。此外，我们引入上下文语义关联度量来更准确地度量实例之间的语义关联。利用关联度量，在特征空间中进行二次聚类以增强语义关系。最后，利用融合数据的信息选择度量去除语义噪声。我们在三个广泛使用的数据集上进行了广泛的实验。与现有方法的一致性比较结果表明，我们的模型具有最先进的性能。	code	0
Multilingual Meta-Distillation Alignment for Semantic Retrieval	Meryem M'hamdi, Jonathan May, Franck Dernoncourt, Trung Bui, Seunghyun Yoon	Adobe Research, Seattle, WA, USA; Microsoft & University of Southern California, Redmond, WA, USA; Adobe Research, San Jose, CA, USA; University of Southern California, Los Angeles, CA, USA	Multilingual semantic retrieval involves retrieving semantically relevant content to a query irrespective of the language. Compared to monolingual and bilingual semantic retrieval, multilingual semantic retrieval requires a stronger alignment approach to pull the contents to be retrieved close to the representation of their corresponding queries, no matter their language combinations. Traditionally, this is achieved through more supervision in the form of multilingual parallel resources, which are expensive to obtain, especially for low-resource languages. In this work, on top of an optimization-based Model-Agnostic Meta-Learner (MAML), we propose a data-efficient meta-distillation approach: MAML-Align,1 specifically for low-resource multilingual semantic retrieval. Our approach simulates a gradual feedback loop from monolingual to bilingual and from bilingual to multilingual semantic retrieval. We systematically compare multilingual meta-distillation learning to different baselines and conduct ablation studies on the role of different sampling approaches in the meta-task construction. We show that MAML-Align's gradual feedback loop boosts the generalization to different languages, including zero-shot ones, better than naive fine-tuning and vanilla MAML.	多语言语义检索包括检索与查询语言无关的语义相关内容。与单语言和双语语义检索相比，多语言语义检索需要一种更强的对齐方法来将要检索的内容拉近其相应查询的表示，而不管它们的语言组合如何。传统上，这是通过以多语言并行资源的形式进行更多的监督来实现的，这些资源是昂贵的，特别是对于资源较少的语言。本文在基于优化的模型不可知元学习器(MAML)的基础上，提出了一种数据高效的元精馏方法: MAML-Align，1，专门用于低资源的多语言语义检索。我们的方法模拟了一个从单语到双语，从双语到多语的语义检索的渐进反馈循环。我们系统地比较了不同基线的多语言元精馏学习，并对不同抽样方法在元任务构建中的作用进行了消融研究。我们展示了 MAML-Align 的渐进反馈循环提高了对不同语言的泛化能力，包括0-shot 语言，这比单纯的微调和普通的 MAML 要好。	code	0
Dataset and Models for Item Recommendation Using Multi-Modal User Interactions	Simone Borg Bruun, Krisztian Balog, Maria Maistro	Dr. Scient.; Tenure Track Assistant Professor; PhD	While recommender systems with multi-modal item representations (image,audio, and text), have been widely explored, learning recommendations frommulti-modal user interactions (e.g., clicks and speech) remains an openproblem. We study the case of multi-modal user interactions in a setting whereusers engage with a service provider through multiple channels (website andcall center). In such cases, incomplete modalities naturally occur, since notall users interact through all the available channels. To address thesechallenges, we publish a real-world dataset that allows progress in thisunder-researched area. We further present and benchmark various methods forleveraging multi-modal user interactions for item recommendations, and proposea novel approach that specifically deals with missing modalities by mappinguser interactions to a common feature space. Our analysis reveals importantinteractions between the different modalities and that a frequently occurringmodality can enhance learning from a less frequent one.	虽然具有多模态项目表示(图像、音频和文本)的推荐系统已经得到了广泛的探索，但是从多模态用户交互(例如点击和语音)中学习推荐仍然是一个尚未解决的问题。我们研究的情况下，多模式的用户交互设置，其中用户与服务提供商通过多个渠道(网站和呼叫中心)。在这种情况下，不完整的模式自然会出现，因为没有用户通过所有可用的渠道进行交互。为了应对这些挑战，我们发布了一个真实世界的数据集，允许在这个研究不足的领域取得进展。我们进一步介绍和基准利用多模态用户交互的项目推荐的各种方法，并提出新的方法，具体处理缺失的模式映射到一个共同的功能空间的用户交互。我们的分析揭示了不同模式之间的重要相互作用，一个频繁出现的模式可以增强从一个不太频繁的学习。	code	0
Behavior-Contextualized Item Preference Modeling for Multi-Behavior Recommendation	Mingshi Yan, Fan Liu, Jing Sun, Fuming Sun, Zhiyong Cheng, Yahong Han	Hefei University of Technology; National University of Singapore; Dalian Minzu University; Tianjin University	In recommender systems, multi-behavior methods have demonstrated theireffectiveness in mitigating issues like data sparsity, a common challenge intraditional single-behavior recommendation approaches. These methods typicallyinfer user preferences from various auxiliary behaviors and apply them to thetarget behavior for recommendations. However, this direct transfer canintroduce noise to the target behavior in recommendation, due to variations inuser attention across different behaviors. To address this issue, this paperintroduces a novel approach, Behavior-Contextualized Item Preference Modeling(BCIPM), for multi-behavior recommendation. Our proposedBehavior-Contextualized Item Preference Network discerns and learns users'specific item preferences within each behavior. It then considers only thosepreferences relevant to the target behavior for final recommendations,significantly reducing noise from auxiliary behaviors. These auxiliarybehaviors are utilized solely for training the network parameters, therebyrefining the learning process without compromising the accuracy of the targetbehavior recommendations. To further enhance the effectiveness of BCIPM, weadopt a strategy of pre-training the initial embeddings. This step is crucialfor enriching the item-aware preferences, particularly in scenarios where datarelated to the target behavior is sparse. Comprehensive experiments conductedon four real-world datasets demonstrate BCIPM's superior performance comparedto several leading state-of-the-art models, validating the robustness andefficiency of our proposed approach.	在推荐系统中，多行为方法已经证明了它们在缓解诸如数据稀疏等问题上的有效性，这是传统的单行为推荐方法所面临的共同挑战。这些方法通常从各种辅助行为推断出用户偏好，并将其应用于目标行为以获得推荐。然而，由于不同行为的用户注意力的差异，这种直接转移会给推荐中的目标行为带来噪声。为了解决这个问题，本文提出了一种新的方法，行为上下文项目偏好建模(BCIPM) ，用于多行为推荐。我们提出的上下文化项目偏好网络识别和学习用户的特定项目偏好在每个行为。然后它只考虑那些与目标行为相关的偏好作为最终建议，显著减少辅助行为的噪音。这些辅助行为仅用于训练网络参数，从而在不影响目标行为建议准确性的前提下完善学习过程。为了进一步提高 BCIPM 的有效性，我们采用了预先训练初始嵌入的策略。这一步对于丰富项目感知偏好是至关重要的，特别是在与目标行为相关的数据稀少的情况下。在四个真实世界数据集上进行的综合实验表明，BCIPM 的性能优于几个领先的最先进的模型，验证了我们提出的方法的鲁棒性和有效性。	code	0
Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative Filtering	Yi Zhang, Lei Sang, Yiwen Zhang	Anhui University	Intent modeling has attracted widespread attention in recommender systems. Asthe core motivation behind user selection of items, intent is crucial forelucidating recommendation results. The current mainstream modeling method isto abstract the intent into unknowable but learnable shared or non-sharedparameters. Despite considerable progress, we argue that it still confronts thefollowing challenges: firstly, these methods only capture the coarse-grainedaspects of intent, ignoring the fact that user-item interactions will beaffected by collective and individual factors (e.g., a user may choose a moviebecause of its high box office or because of his own unique preferences);secondly, modeling believable intent is severely hampered by implicit feedback,which is incredibly sparse and devoid of true semantics. To address thesechallenges, we propose a novel recommendation framework designated as BilateralIntent-guided Graph Collaborative Filtering (BIGCF). Specifically, we take acloser look at user-item interactions from a causal perspective and put forththe concepts of individual intent-which signifies private preferences-andcollective intent-which denotes overall awareness. To counter the sparsity ofimplicit feedback, the feature distributions of users and items are encoded viaa Gaussian-based graph generation strategy, and we implement the recommendationprocess through bilateral intent-guided graph reconstruction re-sampling.Finally, we propose graph contrastive regularization for both interaction andintent spaces to uniformize users, items, intents, and interactions in aself-supervised and non-augmented paradigm. Experimental results on threereal-world datasets demonstrate the effectiveness of BIGCF compared withexisting solutions.	意图建模在推荐系统中引起了广泛的关注。作为用户选择项目背后的核心动机，意图是否定推荐结果的关键。目前主流的建模方法是将意图抽象为不可知但可学的共享或非共享参数。尽管取得了相当大的进步，我们认为它仍然面临以下挑战: 首先，这些方法只捕获意图的粗粒度方面，忽略了用户项目的交互将受到集体和个人因素的影响(例如，用户可能会选择一部电影，因为它的高票房或因为他自己的独特偏好) ; 其次，建模可信的意图是严重阻碍隐式反馈，这是令人难以置信的稀疏和缺乏真正的语义。为了应对这些挑战，我们提出了一个新的推荐框架，命名为双边/意图引导的图形协同过滤(bIGCF)。具体来说，我们从因果关系的角度来看待用户-项目的交互，提出了个人意图的概念——表示私人偏好——和集体意图——表示整体意识。针对隐式反馈的稀疏性，采用基于高斯的图生成策略对用户和项目的特征分布进行编码，并通过双边意图引导的图重构重采样实现推荐过程。最后，我们提出了交互空间和意图空间的图形对比正则化，在自监督和非增强的范式中统一用户、项目、意图和交互。在三个实际数据集上的实验结果表明了 BIGCF 方法与现有方法相比的有效性。	code	0
Content-based Graph Reconstruction for Cold-start Item Recommendation	Jinri Kim, Eungi Kim, Kwangeun Yeo, Yujin Jeon, Chanwoo Kim, Sewon Lee, Joonseok Lee	Seoul National Univ., Seoul, Republic of Korea	Graph convolutions have been successfully applied to recommendation systems, utilizing high-order collaborative signals present in the user-item interaction graph. This idea, however, has not been applicable to the cold-start items, since cold nodes are isolated in the graph and thus do not take advantage of information exchange from neighboring nodes. Recently, there have been a few attempts to utilize graph convolutions on item-item or user-user attribute graphs to capture high-order collaborative signals for cold-start cases, but these approaches are still limited in that the item-item or user-user graph falls short in capturing the dynamics of user-item interactions, as their edges are constructed based on arbitrary and heuristic attribute similarity. In this paper, we introduce Content-based Graph Reconstruction for Cold-start item recommendation (CGRC), employing a masked graph autoencoder structure and multimodal contents to directly incorporate interaction-based high-order connectivity, applicable even in cold-start scenarios. To address the cold-start items directly on the interaction graph, our approach trains the model to reconstruct plausible user-item interactions from masked edges of randomly chosen cold items, simulating fresh items without connection to users. This strategy enables the model to infer potential edges for unseen cold-start nodes. Extensive experiments on real-world datasets demonstrate the superiority of our model.	图卷积已成功应用于推荐系统，利用高阶协作信号存在于用户项交互图中。然而，这种思想并不适用于冷启动项，因为冷节点在图中是孤立的，因此不利用相邻节点之间的信息交换。近年来，利用项目卷积或用户-用户属性图来获取冷启动情况下的高阶协作信号的方法已经有了一些尝试，但这些方法仍然受到项目卷积或用户-用户图在获取用户-项目交互动态方面的局限，因为它们的边是基于任意和启发式属性相似性构造的。本文介绍了基于内容的图重构技术在冷启动项目推荐中的应用，该技术采用屏蔽图自动编码器结构和多模态内容直接结合基于交互的高阶连通性，适用于冷启动项目推荐。为了直接处理交互图上的冷启动项目，我们的方法训练模型从随机选择的冷启动项目的掩盖边缘重建合理的用户-项目交互，模拟与用户没有联系的新鲜项目。该策略使模型能够推断出未知冷启动节点的潜在边。在实际数据集上的大量实验证明了该模型的优越性。	code	0
Unbiased Learning-to-Rank Needs Unconfounded Propensity Estimation	Dan Luo, Lixin Zou, Qingyao Ai, Zhiyu Chen, Chenliang Li, Dawei Yin, Brian D. Davison	Tsinghua University, Beijing, China; Lehigh University, Bethlehem, PA, USA; Baidu Inc., Beijing, China; Wuhan University, Wuhan, China; Amazon.com, Inc., Seattle, WA, USA	The logs of the use of a search engine provide sufficient data to train a better ranker. However, it is well known that such implicit feedback reflects biases, and in particular a presentation bias that favors higher-ranked results. Unbiased Learning-to-Rank (ULTR) methods attempt to optimize performance by jointly modeling this bias along with the ranker so that the bias can be removed. Such methods have been shown to provide theoretical soundness, and promise superior performance and low deployment costs. However, existing ULTR methods don't recognize that query-document relevance is a confounder -- it affects both the likelihood of a result being clicked because of relevance and the likelihood of the result being ranked high by the base ranker. Moreover, the performance guarantees of existing ULTR methods assume the use of a weak ranker -- one that does a poor job of ranking documents based on relevance to a query. In practice, of course, commercial search engines use highly tuned rankers, and desire to improve upon them using the implicit judgments in search logs. This results in a significant correlation between position and relevance, which leads existing ULTR methods to overestimate click propensities in highly ranked results, reducing ULTR's effectiveness. This paper is the first to demonstrate the problem of propensity overestimation by ULTR algorithms, based on a causal analysis. We develop a new learning objective based on a backdoor adjustment. In addition, we introduce the Logging-Policy-aware Propensity (LPP) model that can jointly learn LPP and a more accurate ranker. We extensively test our approach on two public benchmark tasks and show that our proposal is effective, practical and significantly outperforms the state of the art.	使用搜索引擎的日志提供了足够的数据来训练一个更好的排名。然而，众所周知，这种隐性反馈反映了偏见，特别是偏向于排名较高的结果的表示偏见。无偏学习排序(ULTR)方法试图通过将这种偏差与排序器联合建模来优化性能，从而消除这种偏差。这样的方法已被证明提供了理论上的可靠性，并承诺优越的性能和低部署成本。然而，现有的 ULTR 方法没有认识到查询文档的相关性是一个混杂因素——它影响因为相关性而被点击的结果的可能性和基础排名结果被排名高的可能性。此外，现有 ULTR 方法的性能保证假设使用了一个弱排名器——一个根据与查询的相关性对文档进行排名的工作做得很糟糕的排名器。当然，在实践中，商业搜索引擎使用高度调整的排名，并希望利用搜索日志中隐含的判断来改进它们。这导致了位置和相关性之间的显著相关性，这导致现有的 ULTR 方法高估了高排名结果中的点击倾向，降低了 ULTR 的有效性。本文首次在分析因果关系的基础上，论证了 ULTR 算法存在的倾向高估问题。我们开发了一个新的学习目标的基础上后门调整。此外，我们还介绍了日志策略感知倾向(LPP)模型，该模型可以联合学习 LPP 和一个更准确的排名。我们在两个公共基准任务上广泛测试了我们的方法，并表明我们的建议是有效的、实用的，而且明显优于最先进的水平。	code	0
Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context	Moyu Zhang, Yongxiang Tang, Jinxin Hu, Yu Zhang	Unaffiliated; Lazada Group	Existing methods often adjust representations adaptively only after aggregating user behavior sequences. This coarse-grained approach to re-weighting the entire user sequence hampers the model's ability to accurately model the user interest migration across different scenarios. To enhance the model's capacity to capture user interests from historical behavior sequences in each scenario, we develop a ranking framework named the Scenario-Adaptive Fine-Grained Personalization Network (SFPNet), which designs a kind of fine-grained method for multi-scenario personalized recommendations. Specifically, SFPNet comprises a series of blocks named as Scenario-Tailoring Block, stacked sequentially. Each block initially deploys a parameter personalization unit to integrate scenario information at a coarse-grained level by redefining fundamental features. Subsequently, we consolidate scenario-adaptively adjusted feature representations to serve as context information. By employing residual connection, we incorporate this context into the representation of each historical behavior, allowing for context-aware fine-grained customization of the behavior representations at the scenario-level, which in turn supports scenario-aware user interest modeling.	现有的方法通常只在聚合用户行为序列之后才自适应地调整表示。这种重新加权整个用户序列的粗粒度方法阻碍了模型在不同场景间精确建模用户兴趣迁移的能力。为了提高模型从每个场景中的历史行为序列中获取用户兴趣的能力，提出了一种基于场景-自适应细粒度个性化网络(Scenario-AdaptiveFine-GrainedPersonalization Network，SFPNet)的排序框架，该框架设计了一种用于多场景个性化推荐的细粒度方法。具体来说，SFPNet 包含一系列名为 Scenario-Tailoring Block 的块，按顺序堆叠。每个块最初部署一个参数个性化单元，通过重新定义基本特性在粗粒度级别集成场景信息。随后，我们整合场景-自适应调整的特征表示作为上下文信息。通过使用剩余连接，我们将这个上下文整合到每个历史行为的表示中，允许在场景级别对行为表示进行上下文感知的细粒度定制，这反过来又支持场景感知的用户兴趣建模。	code	0
EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention	Zhen Tian, Wayne Xin Zhao, Changwang Zhang, Xin Zhao, Zhongrui Ma, JiRong Wen	Renmin University of China; Huawei	To capture user preference, transformer models have been widely applied tomodel sequential user behavior data. The core of transformer architecture liesin the self-attention mechanism, which computes the pairwise attention scoresin a sequence. Due to the permutation-equivariant nature, positional encodingis used to enhance the attention between token representations. In thissetting, the pairwise attention scores can be derived by both semanticdifference and positional difference. However, prior studies often model thetwo kinds of difference measurements in different ways, which potentiallylimits the expressive capacity of sequence modeling. To address this issue,this paper proposes a novel transformer variant with complex vector attention,named EulerFormer, which provides a unified theoretical framework to formulateboth semantic difference and positional difference. The EulerFormer involvestwo key technical improvements. First, it employs a new transformation functionfor efficiently transforming the sequence tokens into polar-form complexvectors using Euler's formula, enabling the unified modeling of both semanticand positional information in a complex rotation form.Secondly, it develops adifferential rotation mechanism, where the semantic rotation angles can becontrolled by an adaptation function, enabling the adaptive integration of thesemantic and positional information according to the semanticcontexts.Furthermore, a phase contrastive learning task is proposed to improvethe anisotropy of contextual representations in EulerFormer. Our theoreticalframework possesses a high degree of completeness and generality. It is morerobust to semantic variations and possesses moresuperior theoretical propertiesin principle. Extensive experiments conducted on four public datasetsdemonstrate the effectiveness and efficiency of our approach.	为了捕获用户偏好，转换器模型已被广泛应用于模型顺序用户行为数据。变压器结构的核心是自注意机制，它计算一个序列的成对注意得分。由于置换等变的特性，位置编码被用来增强标记表示之间的注意力。在这种情况下，成对的注意分数可以通过语义差异和位置差异得出。然而，先前的研究往往以不同的方式对这两种差异测量进行建模，这可能限制了序列建模的表达能力。为了解决这一问题，本文提出了一种新的具有复矢量注意力的变换器变体，称为欧拉变换器，它提供了一个统一的理论框架来描述语义差异和位置差异。欧拉前者包括两个关键的技术改进。首先，利用欧拉公式有效地将序列标记转换为极形复合向量，使得语义和位置信息以复杂的旋转形式统一建模成为可能。其次，提出了一种差分旋转机制，该机制通过自适应函数控制语义旋转角度，实现了语义和位置信息根据语义上下文的自适应集成。此外，本文还提出了一个相位对比学习任务来改善 EulerForm 中上下文表示的各向异性。我们的理论框架具有较高的完整性和普遍性。它对语义变异具有更强的鲁棒性，在原则上具有更优越的理论性。在四个公共数据集上进行的大量实验证明了我们方法的有效性和效率。	code	0
Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors	Binzong Geng, Zhaoxin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, Linjian Mo	Westlake University; Ant Group	With the rise of large language models (LLMs), recent works have leveragedLLMs to improve the performance of click-through rate (CTR) prediction.However, we argue that a critical obstacle remains in deploying LLMs forpractical use: the efficiency of LLMs when processing long textual userbehaviors. As user sequences grow longer, the current efficiency of LLMs isinadequate for training on billions of users and items. To break through theefficiency barrier of LLMs, we propose Behavior Aggregated HierarchicalEncoding (BAHE) to enhance the efficiency of LLM-based CTR modeling.Specifically, BAHE proposes a novel hierarchical architecture that decouplesthe encoding of user behaviors from inter-behavior interactions. Firstly, toprevent computational redundancy from repeated encoding of identical userbehaviors, BAHE employs the LLM's pre-trained shallow layers to extractembeddings of the most granular, atomic user behaviors from extensive usersequences and stores them in the offline database. Subsequently, the deeper,trainable layers of the LLM facilitate intricate inter-behavior interactions,thereby generating comprehensive user embeddings. This separation allows thelearning of high-level user representations to be independent of low-levelbehavior encoding, significantly reducing computational complexity. Finally,these refined user embeddings, in conjunction with correspondingly processeditem embeddings, are incorporated into the CTR model to compute the CTR scores.Extensive experimental results show that BAHE reduces training time and memoryby five times for CTR models using LLMs, especially with longer user sequences.BAHE has been deployed in a real-world system, allowing for daily updates of 50million CTR data on 8 A100 GPUs, making LLMs practical for industrial CTRprediction.	随着大型语言模型(LLM)的兴起，最近的工作已经利用 LLM 来提高点进率预测(CTR)的性能。然而，我们认为在实际应用中部署 LLM 仍然存在一个关键的障碍: 当处理长文本用户行为时 LLM 的效率。随着用户序列的增长，LLM 目前的效率不足以对数十亿用户和项目进行培训。为了突破 LLM 的效率障碍，我们提出了行为聚合层次编码(BAHE)来提高基于 LLM 的 CTR 建模的效率。具体来说，BAHE 提出了一种新的层次结构，将用户行为的编码与行为间的交互分离开来。首先，为了防止计算冗余重复编码相同的用户行为，BAHE 使用 LLM 预先训练的浅层来从大量的用户序列中提取最细粒度的原子用户行为，并将它们存储在离线数据库中。随后，LLM 的更深层、可训练的层促进了复杂的行为间交互，从而产生了全面的用户嵌入。这种分离使得高级用户表示的学习独立于低级行为编码，显著降低了计算复杂度。最后，这些改进的用户嵌入，结合相应的处理过的项目嵌入，被合并到 CTR 模型中来计算 CTR 分数。大量的实验结果表明，BAHE 使用 LLM 将 CTR 模型的训练时间和记忆降低了5倍，特别是对于更长的用户序列。 BAHE 已经部署在现实世界的系统中，允许在8个 A100图形处理器上每天更新5000万个 CTR 数据，使 LLM 用于工业 CTR 预测变得实用。	code	0
Multi-intent-aware Session-based Recommendation	Minjin Choi, Hyeyoung Kim, Hyunsouk Cho, Jongwuk Lee	Sungkyunkwan University; Sungkyunkwan University Artificial intelligence; Ajou University	Session-based recommendation (SBR) aims to predict the following item a userwill interact with during an ongoing session. Most existing SBR models focus ondesigning sophisticated neural-based encoders to learn a sessionrepresentation, capturing the relationship among session items. However, theytend to focus on the last item, neglecting diverse user intents that may existwithin a session. This limitation leads to significant performance drops,especially for longer sessions. To address this issue, we propose a novel SBRmodel, called Multi-intent-aware Session-based Recommendation Model (MiaSRec).It adopts frequency embedding vectors indicating the item frequency in sessionto enhance the information about repeated items. MiaSRec represents varioususer intents by deriving multiple session representations centered on each itemand dynamically selecting the important ones. Extensive experimental resultsshow that MiaSRec outperforms existing state-of-the-art SBR models on sixdatasets, particularly those with longer average session length, achieving upto 6.27https://github.com/jin530/MiaSRec.	基于会话的推荐(SBR)旨在预测用户将在正在进行的会话期间与以下项目进行交互。大多数现有的 SBR 模型侧重于设计复杂的基于神经的编码器来学习会话表示，捕获会话项之间的关系。然而，他们倾向于关注最后一项，忽略了会话中可能存在的不同用户意图。这种限制会导致显著的性能下降，特别是对于较长的会话。为了解决这个问题，我们提出了一种新的 SBR 模型，称为多意图感知的基于会话的推荐模型(MiaSRec)。采用频率嵌入向量表示会话中项目的频率，增强重复项目的信息。MiaSRec 通过派生以每个项为中心的多个会话表示并动态选择重要的会话表示来表示各种用户意图。大量的实验结果表明，MiaSrec 在6个数据集上优于现有的最先进的 SBR 模型，特别是那些平均会话长度更长的模型，达到了6.27的 https://github.com/jin530/MiaSRec。	code	0
PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval	Dawn J. Lawrie, Efsun Selin Kayi, Eugene Yang, James Mayfield, Douglas W. Oard	Johns Hopkins University HLTCOE; University of Maryland	PLAID, an efficient implementation of the ColBERT late interaction bi-encoderusing pretrained language models for ranking, consistently achievesstate-of-the-art performance in monolingual, cross-language, and multilingualretrieval. PLAID differs from ColBERT by assigning terms to clusters andrepresenting those terms as cluster centroids plus compressed residual vectors.While PLAID is effective in batch experiments, its performance degrades instreaming settings where documents arrive over time because representations ofnew tokens may be poorly modeled by the earlier tokens used to select clustercentroids. PLAID Streaming Hierarchical Indexing that Runs on Terabytes ofTemporal Text (PLAID SHIRTTT) addresses this concern using multi-phaseincremental indexing based on hierarchical sharding. Experiments on ClueWeb09and the multilingual NeuCLIR collection demonstrate the effectiveness of thisapproach both for the largest collection indexed to date by the ColBERTarchitecture and in the multilingual setting, respectively.	PLAID 是 ColBERT 后期交互双编码器的有效实现，使用预先训练的语言模型进行排名，始终在单语言、跨语言和多语言检索方面取得最佳性能。PLAID 不同于 ColBERT，它将术语分配给聚类，并将这些术语表示为聚类质心加上压缩的残差向量。虽然 PLAID 在批处理实验中是有效的，但它的性能降低了文档随时间到达的流化设置，因为用于选择集群中心的早期令牌可能对新令牌的表示建模不足。基于 T 级时态文本的 PLAID 流式分层索引(PLAID SHIRTTT)使用基于分层分片的多阶段增量索引解决了这个问题。在 ClueWeb09和多语言 NeuCLIR 集合上的实验分别证明了这种方法对 ColBERT 架构迄今为止索引的最大集合和多语言设置的有效性。	code	0
Towards Ethical Item Ranking: A Paradigm Shift from User-Centric to Item-Centric Approaches	Guilherme Ramos, Mirko Marras, Ludovico Boratto	University of Cagliari, Cagliari, Italy; Instituto de Telecomunicações, and Instituto Superior Técnico, ULisboa, Lisbon, Portugal	Ranking systems are instrumental in shaping user experiences by determining the relevance and order of presented items. However, current approaches, particularly those revolving around user-centric reputation scoring, raise ethical concerns associated with scoring individuals. To counter such issues, in this paper, we introduce a novel item ranking system approach that strategically transitions its emphasis from scoring users to calculating item rankings relying exclusively on items' ratings information, to achieve the same objective. Experiments on three datasets show that our approach achieves higher effectiveness and efficiency than state-of-the-art baselines. Furthermore, the resulting rankings are more robust to spam and resistant to bribery, contributing to a novel and ethically sound direction for item ranking systems.	排名系统有助于通过确定呈现项目的相关性和顺序来塑造用户体验。然而，目前的方法，特别是那些围绕以用户为中心的声誉评分的方法，提出了与个人评分相关的伦理问题。为了解决这些问题，本文提出了一种新的项目排名系统方法，该方法从对用户进行评分转变为完全依靠项目的评分信息来计算项目排名，以达到同样的目的。在三个数据集上的实验表明，该方法比最先进的基线方法具有更高的效率和效果。此外，由此产生的排名更强大的垃圾邮件和抵抗贿赂，有助于一个新颖和道德上健全的项目排名系统的方向。	code	0
A Large-scale Offer Alignment Model for Partitioning Filtering and Matching Product Offers	Wenyu Huang, André Melo, Jeff Z. Pan	Huawei Technologies R&D, Edinburgh, United Kingdom; The University of Edinburgh, Edinburgh, United Kingdom	Offer alignment is a key step in a product knowledge graph construction pipeline. It aims to align retailer offers of the same product for better coverage of product details. With the rapid development of online shopping services, the offer alignment task is applied in ever larger datasets. This work aims to build an offer alignment system that can efficiently be used in large-scale offer data. The key components of this system include: 1) common offer encoders for encoding text offer data into representations; 2) trainable LSH partitioning module to divide similar offers into small blocks; 3) lightweight sophisticated late-interactions for efficient filtering and scoring of offer alignment candidate pairs. We evaluate the system on public WDC offer alignment dataset, as well as DBLP-Scholar and DBLP-ACM.	报价对齐是产品知识图构建流程中的一个关键步骤。它旨在使零售商提供的同一产品，以更好地覆盖产品的细节。随着网上购物服务的迅速发展，报价对齐任务在越来越大的数据集中得到了广泛的应用。本文旨在建立一个能够有效应用于大规模报价数据的报价对齐系统。该系统的关键组成部分包括: 1)用于编码文本的通用报价编码器将数据提供到表示中; 2)可训练的 LSH 分区模块将类似的报价划分为小块; 3)轻量级复杂的后期交互，以便对报价对齐候选对进行有效的过滤和评分。我们在公共 WDC 提供的比对数据集上以及 DBLP-Scholar 和 DBLP-ACM 上对该系统进行了评估。	code	0
Interest Clock: Time Perception in Real-Time Streaming Recommendation System	Yongchun Zhu, Jingwu Chen, Ling Chen, Yitan Li, Feng Zhang, Zuotao Liu	ByteDance	User preferences follow a dynamic pattern over a day, e.g., at 8 am, a usermight prefer to read news, while at 8 pm, they might prefer to watch movies.Time modeling aims to enable recommendation systems to perceive time changes tocapture users' dynamic preferences over time, which is an important andchallenging problem in recommendation systems. Especially, streamingrecommendation systems in the industry, with only available samples of thecurrent moment, present greater challenges for time modeling. There is still alack of effective time modeling methods for streaming recommendation systems.In this paper, we propose an effective and universal method Interest Clock toperceive time information in recommendation systems. Interest Clock firstencodes users' time-aware preferences into a clock (hour-level personalizedfeatures) and then uses Gaussian distribution to smooth and aggregate them intothe final interest clock embedding according to the current time for the finalprediction. By arming base models with Interest Clock, we conduct online A/Btests, obtaining +0.509duration respectively. Besides, the extended offline experiments showimprovements as well. Interest Clock has been deployed on Douyin Music App.	用户偏好在一天中遵循一种动态模式，例如，在早上8点，用户可能更喜欢阅读新闻，而在晚上8点，他们可能更喜欢看电影。时间建模的目的是使推荐系统能够感知时间的变化，捕获用户随时间变化的动态偏好，这是推荐系统中的一个重要而又具有挑战性的问题。特别是，业界的流式推荐系统，只有当前时刻的可用样本，对时间建模提出了更大的挑战。针对流媒体推荐系统中时间建模方法的不足，本文提出了一种有效而通用的时间建模方法——兴趣时钟法。兴趣时钟首先将用户的时间感知偏好编码成一个时钟(小时级别的个性化功能) ，然后使用正态分布来平滑和聚合它们，根据当前的时间嵌入到最终的兴趣时钟中，进行最终的预测。通过使用兴趣时钟武装基本模型，我们进行在线 A/Btest，分别获得 + 0.509的持续时间。此外，延长的离线实验也有所改善。兴趣时钟已经在抖音音乐应用程序上部署。	code	0
Unsupervised Large Language Model Alignment for Information Retrieval via Contrastive Feedback	Qian Dong, Yiding Liu, Qingyao Ai, Zhijing Wu, Haitao Li, Yiqun Liu, Shuaiqiang Wang, Dawei Yin, Shaoping Ma	; Beijing Institute of Technology; Baidu Inc.; Baidu Inc. Search Science; Tsinghua University Department of Computer Science and Technology; Tsinghua University Computer Science and Technology; Tsinghua University	Large language models (LLMs) have demonstrated remarkable capabilities acrossvarious research domains, including the field of Information Retrieval (IR).However, the responses generated by off-the-shelf LLMs tend to be generic,i.e., cannot capture the distinctiveness of each document with similar content.This limits the performance of LLMs in IR because finding and distinguishingrelevant documents from substantial similar documents is a typical problem inmany IR tasks. To address this issue, we propose an unsupervised alignmentmethod, namely Reinforcement Learning from Contrastive Feedback (RLCF),empowering LLMs to generate both high-quality and context-specific responses.Our approach constructs unsupervised contrastive feedback signals based onsimilar document groups, and adopts a reward function, named group-wisereciprocal rank, to optimize LLMs within a standard Proximal PolicyOptimization. We conduct extensive experiments to evaluate the effectiveness ofRLCF on LLMs built with different languages and parameter sizes on multipledownstream IR applications. RLCF significantly outperforms existing alignmentmethods, and RLCF-optimized LLMs demonstrate considerable improvement ingenerating responses with distinctiveness.	大型语言模型(LLM)已经在各个研究领域展示了卓越的能力，包括信息检索领域(IR)。然而，现成 LLM 生成的响应往往是通用的，即不能捕获具有相似内容的每个文档的独特性。这限制了 LLM 在 IR 中的性能，因为在许多 IR 任务中，查找和区分相关文档和大量类似文档是一个典型的问题。为了解决这个问题，我们提出了一种无监督的对齐方法，即从对比反馈(强化学习) ，授权 LLM 生成高质量和上下文特定的反应。该方法基于相似文档组构造无监督对比反馈信号，并采用一种称为组-智力互惠秩的奖励函数，在标准的近似策略优化中优化 LLM。我们进行了广泛的实验来评估 RLCF 在不同语言构建的 LLM 上的有效性，以及在多下游 IR 应用中的参数大小。RLCF 明显优于现有的比对方法，并且 RLCF 优化的 LLM 显示出相当大的改进，产生独特的响应。	code	0
Amazon-KG: A Knowledge Graph Enhanced Cross-Domain Recommendation Dataset	Yuhan Wang, Qing Xie, Mengzi Tang, Lin Li, Jingling Yuan, Yongjian Liu	Wuhan University Of Technology; Wuhan University of Technology	Cross-domain recommendation (CDR) aims to utilize the information from relevant domains to guide the recommendation task in the target domain, and shows great potential in alleviating the data sparsity and cold-start problems of recommender systems. Most existing methods utilize the interaction information (e.g., ratings and clicks) or consider auxiliary information (e.g., tags and comments) to analyze the users' cross-domain preferences, but such kinds of information ignore the intrinsic semantic relationship of different domains. In order to effectively explore the inter-domain correlations, encyclopedic knowledge graphs (KG) involving different domains are highly desired in cross-domain recommendation tasks because they contain general information covering various domains with structured data format. However, there are few datasets containing KG information for CDR tasks, so in order to enrich the available data resource, we build a KG-enhanced cross-domain recommendation dataset, named Amazon-KG, based on the widely used Amazon dataset for CDR and the well-known KG DBpedia. In this work, we analyze the potential of KG applying in cross-domain recommendations, and describe the construction process of our dataset in detail. Finally, we perform quantitative statistical analysis on the dataset. We believe that datasets like Amazon-KG contribute to the development of knowledge-aware cross-domain recommender systems. Our dataset has been released at https://github.com/WangYuhan-0520/Amazon-KG-v2.0-dataset.	跨域推荐技术是利用相关领域的信息来指导目标领域的推荐任务，在缓解推荐系统的数据稀疏性和冷启动问题方面显示出巨大的潜力。大多数现有的方法利用交互信息(如评分和点击)或考虑辅助信息(如标签和评论)来分析用户的跨域偏好，但这类信息忽视了不同域的内在语义关系。为了有效地探索域间相关性，涉及不同领域的百科知识图(KG)是跨领域推荐任务中非常需要的，因为它包含了覆盖不同领域的结构化数据格式的一般信息。然而，包含用于 CDR 任务的 KG 信息的数据集很少，因此为了丰富可用的数据资源，我们基于广泛使用的用于 CDR 的 Amazon 数据集和著名的 KG DBpedia，构建了一个 KG 增强的跨域推荐数据集，命名为 Amazon-KG。在这项工作中，我们分析了 KG 在跨领域推荐中的应用潜力，并详细描述了我们的数据集的构建过程。最后，对数据集进行定量统计分析。我们相信像 Amazon-KG 这样的数据集有助于开发知识感知的跨领域推荐系统。我们的数据 https://github.com/wangyuhan-0520/amazon-kg-v2.0-dataset 已经发布了。	code	0
Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion	Yu Zhao, Ying Zhang, Baohang Zhou, Xinying Qian, Kehui Song, Xiangrui Cai	Nankai University College of Computer Science, VCIP, TMCC, TBI Center; Tiangong University School of Software	A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. Moreover, they focus on aggregating structural neighbors from existing KGs, which of emerging entities are usually limited. However, the semantic neighbors are decoupled from the topology linkage and usually imply the true target entity. In this paper, we propose the IMKGC task and a semantic neighbor retrieval-enhanced IMKGC framework CMR, where the contrast brings the helpful semantic neighbors close, and then the memorize supports semantic neighbor retrieval to enhance inference. Specifically, we first propose a unified cross-modal contrastive learning to simultaneously capture the textual-visual and textual-textual correlations of query-entity pairs in a unified representation space. The contrastive learning increases the similarity of positive query-entity pairs, therefore making the representations of helpful semantic neighbors close. Then, we explicitly memorize the knowledge representations to support the semantic neighbor retrieval. At test time, we retrieve the nearest semantic neighbors and interpolate them to the query-entity similarity distribution to augment the final prediction. Extensive experiments validate the effectiveness of CMR on three inductive MKGC datasets. Codes are available at https://github.com/OreOZhao/CMR.	大量的研究已经出现的多模态知识图完成(MKGC) ，以预测缺失的环节，在 MKG。然而，很少有研究提出研究诱导性 MKGC (IMKGC)涉及新兴实体在培训期间看不见。现有的归纳法侧重于学习文本实体表征，忽略了视觉形态的丰富语义信息。此外，他们的重点是聚集结构性邻居从现有的幼儿园，这些新兴实体通常是有限的。然而，语义邻居与拓扑链接是解耦的，通常意味着真正的目标实体。在本文中，我们提出了 IMKGC 任务和一个语义邻居检索增强的 IMKGC 框架 CMR，其中对比度使有用的语义邻居更加接近，然后记忆支持语义邻居检索增强推理。具体来说，我们首先提出一种统一的跨模态对比学习方法，在统一的表示空间中同时捕获查询实体对的文本-视觉和文本-文本关联。对比学习增加了正向查询实体对的相似性，从而使有用语义邻居的表示更加紧密。然后，我们显式地记忆知识表示，以支持语义邻居检索。在测试时，我们检索最近的语义邻居，并将它们插值到查询实体的相似度分布中，以增强最终的预测。大量的实验验证了 CMR 在三个归纳 MKGC 数据集上的有效性。密码可在 https://github.com/oreozhao/cmr 索取。	code	0
The Treatment of Ties in Rank-Biased Overlap	Matteo Corsi, Julián Urbano	Delft University of Technology; TU Delft	Rank-Biased Overlap (RBO) is a similarity measure for indefinite rankings: itis top-weighted, and can be computed when only a prefix of the rankings isknown or when they have only some items in common. It is widely used forinstance to analyze differences between search engines by comparing therankings of documents they retrieve for the same queries. In these situations,though, it is very frequent to find tied documents that have the same score.Unfortunately, the treatment of ties in RBO remains superficial and incomplete,in the sense that it is not clear how to calculate it from the ranking prefixesonly. In addition, the existing way of dealing with ties is very different fromthe one traditionally followed in the field of Statistics, most notably foundin rank correlation coefficients such as Kendall's and Spearman's. In thispaper we propose a generalized formulation for RBO to handle ties, thanks towhich we complete the original definitions by showing how to perform prefixevaluation. We also use it to fully develop two variants that align with theones found in the Statistics literature: one when there is a reference rankingto compare to, and one when there is not. Overall, these three variants provideresearchers with flexibility when comparing rankings with RBO, by clearlydetermining what ties mean, and how they should be treated. Finally, using bothsynthetic and TREC data, we demonstrate the use of these new tie-aware RBOmeasures. We show that the scores may differ substantially from the originaltie-unaware RBO measure, where ties had to be broken at random or by arbitrarycriteria such as by document ID. Overall, these results evidence the need for aproper account of ties in rank similarity measures such as RBO.	排名偏差重叠(RBO)是对不确定排名的一种相似度量: 它是最权重的，可以在只知道排名的前缀或者只有一些共同点的情况下计算出来。它被广泛用于分析搜索引擎之间的差异，例如通过比较它们为相同的查询检索的文档的排名。但是，在这些情况下，经常会发现具有相同分数的绑定文档。不幸的是，RBO 中的关系处理仍然是肤浅和不完整的，因为不清楚如何仅仅从排名前缀来计算它。此外，现有的处理关系的方法与统计学领域中传统的方法有很大的不同，最明显的是发现级别相关系数，如肯德尔的和斯皮尔曼的。在本文中，我们提出了一个处理关系的广义 RBO 公式，由此我们完成了原来的定义，通过展示如何执行前缀评价。我们也使用它来完全开发两个变体，与在统计文献中发现的变体一致: 一个当有一个参考排名进行比较时，一个当没有时。总的来说，这三个变量为研究人员提供了灵活性，当比较排名与 RBO，通过清楚地确定什么是联系意味着，以及他们应该如何处理。最后，使用综合数据和 TREC 数据，我们演示了这些新的领带感知 RBO 措施的使用。我们表明，分数可能大不相同的原始意识 RBO 措施，其中关系必须打破随机或任意标准，如文件 ID。总的来说，这些结果证明了在 RBO 等等级相似性度量中需要适当考虑关系。	code	0
What Matters in a Measure? A Perspective from Large-Scale Search Evaluation	Paul Thomas, Gabriella Kazai, Nick Craswell, Seth Spielman	Microsoft, Adelaide, Australia; Microsoft, Boulder, CO, USA; Microsoft, Seattle, WA, USA; Amazon, London, United Kingdom	Information retrieval (IR) has a large literature on evaluation, dating back decades and forming a central part of the research culture. The largest proportion of this literature discusses techniques to turn a sequence of relevance labels into a single number, reflecting the system's performance: precision or cumulative gain, for example, or dozens of alternatives. Those techniques-metrics-are themselves evaluated, commonly by reference to sensitivity and validity. In our experience measuring search in industrial settings, a measurement regime needs many other qualities to be practical. For example, we must also consider how much a metric costs; how robust it is to the happenstance of sampling; whether it is debuggable; and what activities are incentivised when a metric is taken as a goal. In this perspective paper we discuss what makes a search metric successful in large-scale settings, including factors which are not often canvassed in IR research but which are important in "real-world" use. We illustrate this with examples, including from industrial settings, and offer suggestions for metrics as part of a working system.	信息检索(IR)有大量关于评估的文献，可以追溯到几十年前，是研究文化的核心部分。文献中最大的部分讨论了将一系列相关标签转化为单个数字的技术，这些技术反映了系统的性能: 例如，精度或累积增益，或者几十种替代方案。这些技术-度量-本身是评估，通常参照敏感性和有效性。根据我们在工业环境中测量搜索的经验，测量制度需要许多其他的质量才能实用。例如，我们还必须考虑一个度量标准的成本有多高; 它对抽样的偶然性有多强大; 它是否可调试; 以及当一个度量标准被作为一个目标时，激励什么活动。在这篇透视文章中，我们讨论了是什么使得搜索度量在大规模环境中成功，包括那些在 IR 研究中不经常被提及但在“现实世界”使用中非常重要的因素。我们用示例(包括来自工业设置的示例)来说明这一点，并提供作为工作系统一部分的指标建议。	code	0
CaDRec: Contextualized and Debiased Recommender Model	Xinfeng Wang, Fumiyo Fukumoto, Jin Cui, Yoshimi Suzuki, Jiyi Li, Dongjin Yu	School of Computer Science and Technology, Hangzhou Dianzi University; Graduate Faculty of Interdisciplinary Research, University of Yamanashi; Faculty of Engineering, Integrated Graduate School of Medicine, Engineering, and Agricultural Sciences	Recommender models aimed at mining users' behavioral patterns have raisedgreat attention as one of the essential applications in daily life. Recent workon graph neural networks (GNNs) or debiasing methods has attained remarkablegains. However, they still suffer from (1) over-smoothing node embeddingscaused by recursive convolutions with GNNs, and (2) the skewed distribution ofinteractions due to popularity and user-individual biases. This paper proposesa contextualized and debiased recommender model (CaDRec). To overcome theover-smoothing issue, we explore a novel hypergraph convolution operator thatcan select effective neighbors during convolution by introducing bothstructural context and sequential context. To tackle the skewed distribution,we propose two strategies for disentangling interactions: (1) modelingindividual biases to learn unbiased item embeddings, and (2) incorporating itempopularity with positional encoding. Moreover, we mathematically show that theimbalance of the gradients to update item embeddings exacerbates the popularitybias, thus adopting regularization and weighting schemes as solutions.Extensive experiments on four datasets demonstrate the superiority of theCaDRec against state-of-the-art (SOTA) methods. Our source code and data arereleased at https://github.com/WangXFng/CaDRec.	以挖掘用户行为模式为目标的推荐模型作为其在日常生活中的重要应用之一，引起了人们的广泛关注。近年来研究的图形神经网络(GNN)或消偏方法取得了显著的进展。然而，他们仍然遭受(1)过度平滑的节点嵌入造成的递归卷积与 GNN，(2)由于流行和用户个人偏见的交互分布偏斜。本文提出了一种情境化和无偏的推荐模型(CaDRec)。为了克服过平滑问题，我们通过引入结构上下文和序列上下文，提出了一种新的超图卷积算子，它可以在卷积过程中选择有效的邻居。为了解决偏态分布问题，我们提出了两种分离交互作用的策略: (1)建立个体偏好模型来学习无偏项嵌入，(2)将项流行性与位置编码相结合。此外，我们从数学上证明了更新项目嵌入的梯度不平衡加剧了流行偏差，因此采用正则化和加权方案作为解决方案。在四个数据集上的大量实验证明了 CaDRec 与最先进的(SOTA)方法相比的优越性。我们的源代码和数据已经在 https://github.com/wangxfng/cadrec 公布了。	code	0
Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems	Jin Huang, Harrie Oosterhuis, Masoud Mansoury, Herke van Hoof, Maarten de Rijke	Radboud University; University of Amsterdam	Two typical forms of bias in user interaction data with recommender systems(RSs) are popularity bias and positivity bias, which manifest themselves as theover-representation of interactions with popular items or items that usersprefer, respectively. Debiasing methods aim to mitigate the effect of selectionbias on the evaluation and optimization of RSs. However, existing debiasingmethods only consider single-factor forms of bias, e.g., only the item(popularity) or only the rating value (positivity). This is in stark contrastwith the real world where user selections are generally affected by multiplefactors at once. In this work, we consider multifactorial selection bias inRSs. Our focus is on selection bias affected by both item and rating valuefactors, which is a generalization and combination of popularity and positivitybias. While the concept of multifactorial bias is intuitive, it brings a severepractical challenge as it requires substantially more data for accurate biasestimation. As a solution, we propose smoothing and alternating gradientdescent techniques to reduce variance and improve the robustness of itsoptimization. Our experimental results reveal that, with our proposedtechniques, multifactorial bias corrections are more effective and robust thansingle-factor counterparts on real-world and synthetic datasets.	用户与推荐系统交互数据中两种典型的偏差形式是流行偏差和正向偏差，它们分别表现为与流行项目或用户喜欢的项目交互的过度表现。去偏方法旨在减轻选择性偏差对 RS 评价和优化的影响。然而，现有的去偏方法只考虑单因素形式的偏差，例如，只有项目(受欢迎程度)或只有评分值(积极性)。这与现实世界形成了鲜明的对比，在现实世界中，用户的选择通常同时受到多种因素的影响。在这项工作中，我们考虑了多因素选择偏差。我们的重点是选择偏差影响项目和评价价值因素，这是一个普遍性和积极性偏差的综合。虽然多因素偏差的概念是直观的，但它带来了严峻的实际挑战，因为它需要大量的数据进行准确的偏差估计。作为解决方案，我们提出了平滑和交替梯度下降技术，以减少方差和提高其优化的鲁棒性。我们的实验结果表明，我们提出的技术，多因素偏差修正是更有效的和鲁棒性的单因素对应的真实世界和合成数据集。	code	0
Configurable Fairness for New Item Recommendation Considering Entry Time of Items	Huizhong Guo, Dongxia Wang, Zhu Sun, Haonan Zhang, Jinfeng Li, Jie Zhang	Nanyang Technological University, Singapore, Singapore; Alibaba Group, Hangzhou, China; Zhejiang University, Hangzhou, China; Singapore University of Technology and Design, Singapore, Singapore; Zhejiang University, Hangzhou, Zhejiang, China, China	Recommender systems tend to excessively expose longer-standing items, resulting in significant unfairness to new items with little interaction records, despite they may possess potential to attract considerable amount of users. The existing fairness-based solutions do not specifically consider the exposure fairness of new items, for which a systematic definition also lacks, discouraging the promotion of new items or contents. In this work, we introduce a multi-degree new-item exposure fairness definition, which considers item entry-time, and also is configurable regarding different fairness requirements. We then propose a configurable new-item fairness-aware framework named CNIF, which employs two-stage training where fairness degrees are incorporated for guidance. Extensive experiments on multiple popular datasets and backbone models demonstrate that CNIF can effectively enhance fairness of the existing models regarding the exposure resources of new items (including the brand-new items with no interaction). Specifically, CNIF demonstrates a substantial advancement with a 65.59% improvement in fairness metric and a noteworthy 9.97% improvement in recommendation accuracy compared to backbone models on the KuaiRec dataset. In comparison to various fairness-based solutions, it stands out by achieving the best trade-off between fairness and recommendation accuracy, surpassing the best baseline by 14.20%.	推荐系统往往过分暴露存放时间较长的项目，导致对交互记录较少的新项目的严重不公平，尽管这些项目可能具有吸引大量用户的潜力。现有的基于公平的解决方案没有特别考虑新项目的曝光公平性，对此也缺乏系统的定义，从而阻碍了新项目或内容的推广。在本文中，我们引入了一个多级新项目曝光公平性的定义，它考虑了项目进入时间，并且可以根据不同的公平性需求进行配置。然后，我们提出了一个可配置的新项目公平感知框架 CNIF，该框架采用两阶段的训练，其中包含公平度作为指导。在多个流行数据集和骨干模型上的大量实验表明，CNIF 能够有效地提高现有模型对新项目(包括没有交互的全新项目)曝光资源的公平性。具体而言，与 KuaiRec 数据集上的主干模型相比，CNIF 显示出实质性的进步，公平性指标提高了65.59% ，推荐准确率提高了9.97% 。与各种基于公平的解决方案相比，它在公平性和推荐准确性之间取得了最佳的平衡，比最佳基线高出14.20% 。	code	0
Generative Retrieval via Term Set Generation	Peitian Zhang, Zheng Liu, Yujia Zhou, Zhicheng Dou, Fangchao Liu, Zhao Cao	Beijing Academy of Artificial Intelligence NLP; Renmin University of China Gaoling School of Artificial Intelligence; Huawei Poisson Lab	Recently, generative retrieval emerges as a promising alternative totraditional retrieval paradigms. It assigns each document a unique identifier,known as DocID, and employs a generative model to directly generate therelevant DocID for the input query. A common choice for DocID is one or severalnatural language sequences, e.g. the title or n-grams, so that the pre-trainedknowledge of the generative model can be utilized. However, a sequence isgenerated token by token, where only the most likely candidates are kept andthe rest are pruned at each decoding step, thus, retrieval fails if any tokenwithin the relevant DocID is falsely pruned. What's worse, during decoding, themodel can only perceive preceding tokens in DocID while being blind tosubsequent ones, hence is prone to make such errors. To address this problem,we present a novel framework for generative retrieval, dubbed Term-SetGeneration (TSGen). Instead of sequences, we use a set of terms as DocID, whichare automatically selected to concisely summarize the document's semantics anddistinguish it from others. On top of the term-set DocID, we propose apermutation-invariant decoding algorithm, with which the term set can begenerated in any permutation yet will always lead to the correspondingdocument. Remarkably, TSGen perceives all valid terms rather than only thepreceding ones at each decoding step. Given the constant decoding space, it canmake more reliable decisions due to the broader perspective. TSGen is alsoresilient to errors: the relevant DocID will not be pruned as long as thedecoded term belongs to it. Lastly, we design an iterative optimizationprocedure to incentivize the model to generate the relevant term set in itsfavorable permutation. We conduct extensive experiments on popular benchmarks,which validate the effectiveness, the generalizability, the scalability, andthe efficiency of TSGen.	近年来，生成检索成为传统检索模式的一个有前途的替代方案。它为每个文档分配一个唯一标识符，称为 DocID，并使用一个生成模型直接为输入查询生成相关的 DocID。DocID 的一个常见选择是一个或几个自然语言序列，例如标题或 n-gram，这样就可以利用生成模型的预先训练的知识。然而，一个序列是按令牌生成的，其中只保留最有可能的候选者，其余的在每个解码步骤中被删除，因此，如果相关 DocID 中的任何令牌被错误地删除，则检索失败。更糟糕的是，在解码过程中，模型只能感知 DocID 中的前一个标记，而不能感知后一个标记，因此容易出现这样的错误。为了解决这个问题，我们提出了一种新的生成检索框架，称为术语设置生成(Term-SetGeneration，TSGen)。我们不使用序列，而是使用一组术语作为 DocID，这些术语被自动选择以简明地总结文档的语义并与其他术语区分开来。在术语集 DocID 的基础上，提出了一种不变译码算法，该算法可以在任意置换情况下生成术语集，并且始终生成相应的文档。值得注意的是，TSGen 在每个解码步骤中感知所有有效的术语，而不仅仅是前面的术语。给定不变的解码空间，它可以做出更可靠的决定，由于更广泛的角度。TSGen 对错误也很有弹性: 只要解码的术语属于它，相关的 DocID 就不会被删除。最后，我们设计了一个迭代优化过程来激励模型以其有利的排列生成相关的条件集。我们在流行的基准上进行了广泛的实验，验证了 TSGen 的有效性、通用性、可扩展性和效率。	code	0
MIRROR: A Multi-View Reciprocal Recommender System for Online Recruitment	Zhi Zheng, Xiao Hu, Shanshan Gao, Hengshu Zhu, Hui Xiong	Career Science Lab, Boss Zhipin, Beijing, China; BOSS Zhipin, Beijing, China; University of Science and Technology of China; The Hong Kong University of Science and Technology (Guangzhou); Career Science Lab, BOSS Zhipin, Beijing, China	Reciprocal Recommender Systems (RRSs) which aim to satisfy the preferences of both service providers and seekers simultaneously has attracted significant research interest in recent years. Existing studies on RRSs mainly focus on modeling the bilateral interactions between the users on both sides to capture the user preferences. However, due to the presence of exposure bias, modeling user preferences solely based on bilateral interactions often lacks precision. Additionally, in RRSs, users may exhibit varying preferences when acting in different roles, and how to effectively model users from multiple perspectives remains a substantial problem. To solve the above challenges, in this paper, we propose a novel MultI-view Reciprocal Recommender system for Online Recruitment (MIRROR). Specifically, we first propose to model the users from three different views, respectively search, active, and passive views, and we further design several Transformer-based sequential models to capture the user representation corresponding to each view. Then, we propose to divide the bilateral matching process into three stages, respectively apply, reply, and match, and a multi-stage output layer is designed based on the above multi-view modeling results. To train our MIRROR model, we first design a multi-task learning loss based on the multi-stage output results. Moreover, to bridge the semantic gap between search queries and user behaviors, we additionally design a supplementary task for next-query prediction. Finally, we conduct both offline experiments on five real-world datasets and online A/B tests, and the experiment results clearly validate the effectiveness of our MIRROR model compared with several state-of-the-art baseline methods.	旨在同时满足服务提供者和寻求者偏好的互惠推荐系统(RRS)近年来引起了人们的广泛研究兴趣。现有的关于 RRS 的研究主要集中在对双方用户之间的双边交互进行建模，以获取用户的偏好。然而，由于暴露偏差的存在，仅仅基于双边交互的用户偏好建模往往缺乏精度。此外，在 RRS 中，用户在扮演不同角色时可能表现出不同的偏好，如何从多个角度有效地为用户建模仍然是一个实质性问题。为了解决上述挑战，在本文中，我们提出了一个新颖的多视图在线招聘互惠推荐系统(mIRROR)。具体来说，我们首先提出从搜索视图、主动视图和被动视图三个不同的视图对用户进行建模，并进一步设计了几个基于 Transformer 的序列模型来捕获每个视图对应的用户表示。然后，我们提出将双边匹配过程分为应用、回复和匹配三个阶段，并根据上述多视图建模结果设计了一个多阶段输出层。为了训练我们的 MIRROR 模型，我们首先设计了一个基于多阶段输出结果的多任务学习丢失模型。此外，为了缩小搜索查询和用户行为之间的语义差距，我们还设计了一个用于下一次查询预测的补充任务。最后，我们在五个实际数据集上进行了离线实验和在线 A/B 测试，实验结果清楚地验证了我们的 MIRROR 模型与几种最先进的基线方法相比的有效性。	code	0
Who To Align With: Feedback-Oriented Multi-Modal Alignment in Recommendation Systems	Yang Li, Qi'ao Zhao, Chen Lin, Jinsong Su, Zhilin Zhang	Amazon, Seattle, Washington, USA; Institute of Artificial Intelligence, Xiamen University, Xiamen, China; School of Informatics, Xiamen University, Xiamen, China	Multi-modal Recommendation Systems (MRSs) utilize diverse modalities, such as image and text, to enrich item representations and enhance recommendation accuracy. Current MRSs overlook the large misalignment between multi-modal content features and ID embeddings. While bidirectional alignment between visual and textual modalities has been extensively studied in large multi-modal models, this study suggests that multi-modal alignment in MRSs should be in a one-way direction. A plug-and-play framework is presented, called FEedback-orienTed mulTi-modal aLignmEnt (FETTLE). FETTLE contains three novel solutions: (1) it automatically determines item-level alignment direction between each pair of modalities based on estimated user feedback; (2) it coordinates the alignment directions among multiple modalities; (3) it implements cluster-level alignment from both user and item perspectives for more stable alignments. Extensive experiments on three real datasets demonstrate that FETTLE significantly improves various backbone models. Conventional collaborative filtering models are improved by 24.79%-62.79%, and recent MRSs are improved by 5.91% - 20.11%.	多模态推荐系统(MRS)利用多种模式，如图像和文本，以丰富项目表示和提高推荐的准确性。当前的 MRS 忽视了多模态内容特征和 ID 嵌入之间的大错位。虽然视觉模式和文本模式之间的双向对齐已经在大型多模式模型中得到了广泛的研究，但是本研究认为 MRS 中的多模式对齐应该是单向的。提出了一种即插即用的框架，称为面向反馈的多模态 aLignmEnt (FETTLE)。FETTLE 包含三种新颖的解决方案: (1)基于用户反馈的估计自动确定每对模式之间的项目级对齐方向; (2)协调多个模式之间的对齐方向; (3)从用户和项目的角度实现簇级对齐，以获得更稳定的对齐。在三个实际数据集上的大量实验表明，FETTLE 显著改善了各种骨干模型。传统的协同过滤模型改善了24.79% -62.79% ，最近的 MRS 改善了5.91% -20.11% 。	code	0
EEG-SVRec: An EEG Dataset with User Multidimensional Affective Engagement Labels in Short Video Recommendation	Shaorun Zhang, Zhiyu He, Ziyi Ye, Peijie Sun, Qingyao Ai, Min Zhang, Yiqun Liu	Tsinghua University; Tsinghua University Computer Science and Technology; Tsinghua University Department of Computer Science and Technology	In recent years, short video platforms have gained widespread popularity,making the quality of video recommendations crucial for retaining users.Existing recommendation systems primarily rely on behavioral data, which faceslimitations when inferring user preferences due to issues such as data sparsityand noise from accidental interactions or personal habits. To address thesechallenges and provide a more comprehensive understanding of user affectiveexperience and cognitive activity, we propose EEG-SVRec, the first EEG datasetwith User Multidimensional Affective Engagement Labels in Short VideoRecommendation. The study involves 30 participants and collects 3,657interactions, offering a rich dataset that can be used for a deeper explorationof user preference and cognitive activity. By incorporating selfassessmenttechniques and real-time, low-cost EEG signals, we offer a more detailedunderstanding user affective experiences (valence, arousal, immersion,interest, visual and auditory) and the cognitive mechanisms behind theirbehavior. We establish benchmarks for rating prediction by the recommendationalgorithm, showing significant improvement with the inclusion of EEG signals.Furthermore, we demonstrate the potential of this dataset in gaining insightsinto the affective experience and cognitive activity behind user behaviors inrecommender systems. This work presents a novel perspective for enhancing shortvideo recommendation by leveraging the rich information contained in EEGsignals and multidimensional affective engagement scores, paving the way forfuture research in short video recommendation systems.	近年来，短视频平台得到了广泛的普及，这使得视频推荐的质量对于留住用户至关重要。现有的推荐系统主要依赖于行为数据，由于数据稀疏和意外交互或个人习惯造成的噪音等问题，在推断用户偏好时，行为数据面临限制。为了解决这些问题，提供对用户情感体验和认知活动的更全面的理解，我们提出了 EEG-SVRec，这是第一个在短视频推荐中使用用户多维情感参与标签的 EEG 数据集。这项研究涉及30名参与者，收集了3657次互动，提供了丰富的数据集，可用于更深入地探索用户偏好和认知活动。通过结合自我评估技术和实时、低成本的脑电信号，我们提供了一个更详细的理解用户情感体验(效价，唤醒，沉浸，兴趣，视觉和听觉)和他们行为背后的认知机制。我们通过推荐算法建立了评分预测的基准，在包含脑电信号的情况下有了显著的改善。此外，我们证明了这个数据集的潜力，在获得深入的情感体验和认知活动背后的用户行为的推荐系统。这项工作提出了一个新的视角来提高短视频推荐利用丰富的信息包含在脑电信号和多维情感参与分数，为未来的研究铺平了道路的短视频推荐系统。	code	0
Multimodality Invariant Learning for Multimedia-Based New Item Recommendation	Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, Meng Wang	Academy of Cyber; Hefei University of Technology	Multimedia-based recommendation provides personalized item suggestions bylearning the content preferences of users. With the proliferation of digitaldevices and APPs, a huge number of new items are created rapidly over time. Howto quickly provide recommendations for new items at the inference time ischallenging. What's worse, real-world items exhibit varying degrees of modalitymissing(e.g., many short videos are uploaded without text descriptions). Thoughmany efforts have been devoted to multimedia-based recommendations, they eithercould not deal with new multimedia items or assumed the modality completenessin the modeling process. In this paper, we highlight the necessity of tackling the modality missingissue for new item recommendation. We argue that users' inherent contentpreference is stable and better kept invariant to arbitrary modality missingenvironments. Therefore, we approach this problem from a novel perspective ofinvariant learning. However, how to construct environments from finite userbehavior training data to generalize any modality missing is challenging. Totackle this issue, we propose a novel Multimodality Invariant LearningreCommendation(a.k.a. MILK) framework. Specifically, MILK first designs across-modality alignment module to keep semantic consistency from pretrainedmultimedia item features. After that, MILK designs multi-modal heterogeneousenvironments with cyclic mixup to augment training data, in order to mimic anymodality missing for invariant user preference learning. Extensive experimentson three real datasets verify the superiority of our proposed framework. Thecode is available at https://github.com/HaoyueBai98/MILK.	基于多媒体的推荐通过学习用户的内容偏好来提供个性化的项目建议。随着数字设备和应用程序的普及，随着时间的推移，大量的新项目被迅速创造出来。如何在推理时间快速提供新项目的建议是具有挑战性的。更糟糕的是，现实世界的项目表现出不同程度的模式缺失(例如，许多短视频上传没有文本描述)。虽然许多努力已致力于基于多媒体的建议，他们要么不能处理新的多媒体项目或承担的模式在建模过程中的完整性。本文强调了解决新项目推荐模式缺失问题的必要性。我们认为用户固有的内容偏好是稳定的，并且能够更好地保持对任意模态缺失环境的不变性。因此，我们从一个新的角度来探讨这个问题的不变学习。然而，如何从有限的用户行为训练数据中构建环境来推广任何模态缺失是一个挑战。为了解决这一问题，我们提出了一种新的多模态不变学习推荐(MILK)框架。具体来说，MILK 首先设计跨模态对齐模块，以保持语义的一致性从预训练的多媒体项目特征。然后，MILK 设计多模态混合异构环境来增加训练数据，以模拟不变用户偏好学习中缺失的任何模态。在三个实际数据集上的大量实验验证了我们提出的框架的优越性。密码可在 https://github.com/haoyuebai98/milk 下载。	code	0
Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal Retrieval	Junsheng Wang, Tiantian Gong, Yan Yan	Nanjing University of Aeronautics and Astronautics, Nanjing, China; Nanjing University of Science and Technology, Nanjing, China; Illinois Institute of Technology, Chicago, IL, USA	Semi-supervised cross-modal retrieval (SS-CMR) aims at learning modality invariance and semantic discrimination from labeled data and unlabeled data, which is crucial for practical applications in the real-world. The key to essentially addressing the SS-CMR task is to solve the semantic association and modality heterogeneity problems. To address these issues, in this paper, we propose a novel semi-supervised cross-modal retrieval method, namely Semi-supervised Prototype Semantic Association Learning (SPAL) for robust cross-modal retrieval. To be specific, we employ shared semantic prototypes to associate labeled and unlabeled data over both modalities to minimize intra-class and maximize inter-class variations, thereby improving discriminative representations on unlabeled data. What is more important is that we propose a novel pseudo-label guided contrastive learning to refine cross-modal representation consistency in the common space, which leverages pseudo-label semantic graph information to constrain cross-modal consistent representations. Meanwhile, multi-modal data inevitably suffers from the cost and difficulty of data collection, resulting in the incomplete multimodal data problem. Thus, to strengthen the robustness of the SS-CMR, we propose a novel prototype propagation method for incomplete data to reconstruct completion representations which preserves the semantic consistency. Extensive evaluations using several baseline methods across four benchmark datasets demonstrate the effectiveness of our method.	半监督跨模态检索(SS-CMR)是从标记数据和未标记数据中学习模态不变性和语义识别的一种检索方法，对于实际应用具有重要意义。从根本上解决 SS-CMR 任务的关键是解决语义关联和情态异质性问题。为了解决这些问题，本文提出了一种新的半监督跨模态检索方法，即半监督原型语义关联学习(SPAL)。具体来说，我们使用共享的语义原型将标记和未标记的数据关联到两种模式上，以最小化类内变量和最大化类间变量，从而改进对未标记数据的区分表示。更重要的是，我们提出了一种新的伪标签引导的对比学习方法来提炼公共空间中的跨模态表示一致性，该方法利用伪标签语义图信息来约束跨模态一致性表示。同时，多模态数据不可避免地受到数据采集成本和难度的影响，导致了多模态数据的不完全性问题。因此，为了增强 SS-CMR 的鲁棒性，我们提出了一种新的不完全数据的原型传播方法来重构完备表示，从而保持了语义的一致性。使用四个基准数据集中的几个基线方法进行广泛的评估，证明了我们方法的有效性。	code	0
Hypergraph Convolutional Network for User-Oriented Fairness in Recommender Systems	Zhongxuan Han, Chaochao Chen, Xiaolin Zheng, Li Zhang, Yuyuan Li	Computer Science and Technology, Zhejiang University, Hangzhou, China; Polytechnic Institute, Zhejiang University, Hangzhou, China	The service system involves multiple stakeholders, making it crucial to ensure fairness. In this paper, we take the example of a typical service system, the recommender system, to investigate how to identify and tackle fairness issues within the service system. Recommender systems often exhibit bias towards a small user group, resulting in pronounced unfairness in recommendation performance, specifically the User-Oriented Fairness (UOF) issue. Existing research on UOF faces limitations in addressing two pivotal challenges: CH1: Current methods fall short in addressing the root cause of the UOF issue, stemming from an unfair training process between advantaged and disadvantaged users. CH2: Current methods struggle to unveil compelling correlations among users in sparse datasets. In this paper, we propose a novel Hypergraph Convolutional Network for User-Oriented Fairness, namely HyperUOF, to address the aforementioned challenges. HyperUOF serves as a versatile framework applicable to various backbone recommendation models for achieving UOF. To address CH1, HyperUOF employs an in-processing method that enhances the training process of disadvantaged users during model training. To addressCH2, HyperUOF incorporates a hypergraph-based approach, proven effective in sparse datasets, to explore high-order correlations among users. We conduct extensive experiments on three real-world datasets based on four backbone recommendation models to prove the effectiveness of our proposed HyperUOF.	服务体系涉及多个利益相关者，保证服务的公平性至关重要。本文以一个典型的服务系统——推荐系统为例，探讨如何识别和处理服务系统内的公平问题。推荐系统往往表现出对小用户群体的偏见，导致推荐性能的明显不公平，特别是面向用户的公平性(UOF)问题。现有的 UOF 研究在解决两个关键挑战方面面临着局限性: CH1: 由于优势用户和劣势用户之间的不公平培训过程，目前的方法在解决 UOF 问题的根本原因方面存在缺陷。CH2: 目前的方法很难揭示稀疏数据集中用户之间令人信服的相关性。在本文中，我们提出了一种新的面向用户公平的 Hypergraph 卷积网络，即 HyperUOF，以解决上述挑战。HyperUOF 作为一个多功能的框架，适用于实现 UOF 的各种骨干推荐模型。为了解决 CH1问题，HyperUOF 采用了内处理的方法，在模型训练过程中增强了弱势用户的训练过程。为了解决 CH2问题，HyperUOF 采用了一种基于超图的方法来探索用户之间的高阶相关性，这种方法在稀疏数据集中被证明是有效的。为了验证所提出的 HyperUOF 算法的有效性，我们基于四个骨干推荐模型对三个实际数据集进行了广泛的实验。	code	0
Hierarchical Semantics Alignment for 3D Human Motion Retrieval	Yang Yang, Haoyu Shi, Huaiwen Zhang	Inner Mongolia University, Hohhot, China	Text to 3D human Motion Retrieval (TMR) is a challenging task in information retrieval, aiming to query relevant motion sequences with the natural language description. The conventional approach for TMR is to represent the data instances as point embeddings for alignment. However, in real-world scenarios, multiple motions often co-occur and superimpose on a single avatar. Simply aggregating text and motion sequences into a single global embedding may be inadequate for capturing the intricate semantics of superimposing motions. In addition, most of the motion variations occur locally and subtly, which further presents considerable challenges in precisely aligning motion sequences with their corresponding text. To address the aforementioned challenges, we propose a novel Hierarchical Semantics Alignment (HSA) framework for text-to-3D human motion retrieval. Beyond global alignment, we propose the Probabilistic-based Distribution Alignment (PDA) and a Descriptors-based Fine-grained Alignment (DFA) to achieve precise semantic matching. Specifically, the PDA encodes the text and motion sequences into multidimensional probabilistic distributions, effectively capturing the semantics of superimposing motions. By optimizing the problem of probabilistic distribution alignment, PDA achieves a precise match between superimposing motions and their corresponding text. The DFA first adopts a fine-grained feature gating by selectively filtering to the significant and representative local representations and meanwhile excluding the interferences of meaningless features. Then we adaptively assign local representations from text and motion into a set of cross-modal local aggregated descriptors, enabling local comparison and interaction between fine-grained text and motion features. Extensive experiments on two widely used benchmark datasets, HumanML3D and KIT-ML, demonstrate the effectiveness of the proposed method. It significantly outperforms existing state-of-the-art retrieval methods, achieving Rsum improvements of 24.74% on HumanML3D and 23.08% on KIT-ML.	文本到三维人体运动检索(tMR)是一个具有挑战性的任务在信息检索，旨在查询相关的运动序列与自然语言描述。TMR 的传统方法是将数据实例表示为点嵌入以进行对准。然而，在现实世界的情况下，多个运动往往同时发生，并叠加在一个单一的化身。简单地将文本和运动序列聚合成一个单一的全局嵌入可能不足以捕获叠加运动的复杂语义。此外，大多数运动变化发生在局部和微妙，这进一步提出了相当大的挑战，精确对齐运动序列与其相应的文本。针对上述挑战，我们提出了一种新的文本到三维人体运动检索的层次语义对齐(HSA)框架。在全局对齐的基础上，提出了基于概率的分布对齐(PDA)和基于描述符的细粒度对齐(DFA)来实现精确的语义匹配。具体来说，PDA 将文本和运动序列编码成多维概率分布，有效地捕获了运动叠加的语义。通过优化概率分布对齐问题，PDA 实现了叠加运动与对应文本的精确匹配。DFA 首先采用细粒度特征门控，对有意义和有代表性的局部表征进行选择性滤波，同时排除无意义特征的干扰。然后，我们自适应地从文本和运动中分配局部表示到一组跨模态局部聚合描述符中，从而实现细粒度文本和运动特征之间的局部比较和交互。在两个广泛使用的基准数据集 HumanML3D 和 KIT-ML 上的大量实验证明了该方法的有效性。它明显优于现有的最先进的检索方法，在 HumanML3D 上实现了24.74% 的 Rsum 改进，在 KIT-ML 上实现了23.08% 的 Rsum 改进。	code	0
A Large Scale Test Corpus for Semantic Table Search	Aristotelis Leventidis, Martin Pekár Christensen, Matteo Lissandrini, Laura Di Rocco, Katja Hose, Renée J. Miller	University of Verona & Aalborg University, Verona, Italy; Aalborg University, Aalborg, Denmark; Technische Universität Wien & Aalborg University, Vienna, Austria; Northeastern University, Boston, MA, USA	Table search aims to answer a query with a ranked list of tables. Unfortunately, current test corpora have focused mostly on needle-in-the-haystack tasks, where only a few tables are expected to exactly match the query intent. Instead, table search tasks often arise in response to the need for retrieving new datasets or augmenting existing ones, e.g., for data augmentation within data science or machine learning pipelines. Existing table repositories and benchmarks are limited in their ability to test retrieval methods for table search tasks. Thus, to close this gap, we introduce a novel dataset for query-by-example Semantic Table Search. This novel dataset consists of two snapshots of the large-scale Wikipedia tables collection from 2013 and 2019 with two important additions: (1) a page and topic aware ground truth relevance judgment and (2) a large-scale DBpedia entity linking annotation. Moreover, we generate a novel set of entity-centric queries that allows testing existing methods under a novel search scenario: semantic exploratory search. The resulting resource consists of 9,296 novel queries, 610,553 query-table relevance annotations, and 238,038 entity-linked tables from the 2013 snapshot. Similarly, on the 2019 snapshot, the resource consists of 2,560 queries, 958,214 relevance annotations, and 457,714 total tables. This makes our resource the largest annotated table-search corpus to date (97 times more queries and 956 times more annotated tables than any existing benchmark). We perform a user study among domain experts and prove that these annotators agree with the automatically generated relevance annotations. As a result, we can re-evaluate some basic assumptions behind existing table search approaches identifying their shortcomings along with promising novel research directions.	表搜索旨在使用表的排序列表来回答查询。不幸的是，目前的测试语料库主要集中在大海捞针的任务上，其中只有少数表格能够精确匹配查询意图。相反，表搜索任务经常出现在需要检索新数据集或扩充现有数据集的时候，例如，在数据科学或机器学习管道中进行数据扩充。现有的表存储库和基准在测试表搜索任务的检索方法方面受到限制。因此，为了弥补这一差距，我们引入了一个新的数据集查询的例子语义表搜索。这个新颖的数据集包括2013年和2019年大规模维基百科表收集的两个快照，其中有两个重要的补充: (1)一个页面和主题感知的基本事实相关性判断和(2)一个大规模的 DBpedia 实体链接注释。此外，我们还生成了一组新的以实体为中心的查询，允许在一个新的搜索场景下测试现有的方法: 语义探索搜索。得到的资源包括来自2013年快照的9,296个新查询、610,553个查询表相关注释和238,038个实体链接表。类似地，在2019年的快照中，资源由2,560个查询、958,214个相关注释和457,714个总表组成。这使我们的资源成为迄今为止最大的带注释的表搜索语料库(比任何现有基准多97倍的查询和956倍的带注释的表)。我们对领域专家进行了用户研究，证明了这些注释者与自动生成的关联注释是一致的。因此，我们可以重新评估现有表搜索方法背后的一些基本假设，识别它们的缺点，以及有希望的新的研究方向。	code	0
JDivPS: A Diversified Product Search Dataset	Zhirui Deng, Zhicheng Dou, Yutao Zhu, Xubo Qin, Pengchao Cheng, Jiangxu Wu, Hao Wang	JD.com, Inc., Beijing, China; Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China	The diversification of product search aims to offer diverse products to satisfy different user intents. Existing diversified product search approaches mainly relied on datasets sourced from online platforms. However, these datasets often present challenges due to their restricted public access and the absence of manually labeled user intents. Such limitations may lead to irreproducible experimental results and unreliable conclusions, restricting the development of this field. To address these problems, this paper introduces a novel dataset JDivPS for diversified product search. To the best of our knowledge, JDivPS is the first publicly accessible dataset with human-annotated user intents. The dataset is collected from JD.com, a major Chinese e-commerce platform. It includes 10,000 queries, around 1,680,000 unique products, and an average of 10 human-labeled user intents for each query. We have extensively evaluated several diversified ranking models using the JDivPS dataset. The results of these models are recorded and presented, serving as a valuable benchmark for future research. More details about the dataset can be found in https://github.com/DengZhirui/JDivPS.	产品搜索的多样化旨在提供多样化的产品，以满足不同的用户意图。现有的多样化产品搜索方法主要依赖于来自在线平台的数据集。然而，由于这些数据集的公开访问受到限制，而且缺乏手动标记的用户意图，因此往往会带来挑战。这些局限性可能导致不可重复的实验结果和不可靠的结论，限制了该领域的发展。为了解决这些问题，本文提出了一种用于多样化产品搜索的新数据集 JDivPS。据我们所知，JDivPS 是第一个具有人工注释用户意图的公开可访问数据集。该数据集是从中国主要的电子商务平台京东(JD.com)收集的。它包括10,000个查询，大约1,680,000个独特的产品，以及每个查询平均10个人工标记的用户意图。我们使用 JDivPS 数据集广泛地评估了几种不同的排名模型。这些模型的结果被记录和呈现，作为未来研究的一个有价值的基准。有关数据集的详细资料，可参阅 https://github.com/dengzhirui/jdivps。	code	0
An E-Commerce Dataset Revealing Variations during Sales	Jianfu Zhang, Qingtao Yu, Yizhou Chen, Guoliang Zhou, Yawen Liu, Yawei Sun, Chen Liang, Guangda Huzhang, Yabo Ni, Anxiang Zeng, Han Yu	Nanyang Technological University, Singapore, Singapore; Shanghai Jiao Tong University, Shanghai, China; Shopee Pte. Ltd., Singapore, Singapore	Since the development of artificial intelligence technology, E-Commerce has gradually become one of the world's largest commercial markets. Within this domain, sales events, which are based on sociological mechanisms, play a significant role. E-Commerce platforms frequently offer sales and promotions to encourage users to purchase items, leading to significant changes in live environments. Learning-To-Rank (LTR) is a crucial component of E-Commerce search and recommendations, and substantial efforts have been devoted to this area. However, existing methods often assume an independent and identically distributed data setting, which does not account for the evolving distribution of online systems beyond online finetuning strategies. This limitation can lead to inaccurate predictions of user behaviors during sales events, resulting in significant loss of revenue. In addition, models must readjust themselves once sales have concluded in order to eliminate any effects caused by the sales events, leading to further regret. To address these limitations, we introduce a long-term E-Commerce search data set specifically designed to incubate LTR algorithms during such sales events, with the objective of advancing the capabilities of E-Commerce search engines. Our investigation focuses on typical industry practices and aims to identify potential solutions to address these challenges.	自人工智能技术发展以来，电子商务已逐渐成为世界上最大的商业市场之一。在这个领域中，基于社会学机制的销售事件起着重要的作用。电子商务平台经常提供销售和促销，以鼓励用户购买物品，导致生活环境的重大变化。学习排名(LTR)是电子商务搜索和推荐的一个重要组成部分，在这个领域已经投入了大量的努力。然而，现有的方法往往假定一个独立的和同样分布的数据集，这没有考虑到在线系统不断演变的分布，超出了在线微调策略。这种限制可能导致在销售活动期间对用户行为的不准确预测，从而导致收入的显著损失。此外，模型必须调整自己一旦销售结束，以消除任何影响所造成的销售事件，导致进一步的遗憾。为了解决这些局限性，我们引入了一个长期的电子商务搜索数据集，专门设计用于在此类销售活动期间孵化 LTR 算法，目的是提高电子商务搜索引擎的能力。我们的调查集中在典型的行业实践，旨在确定潜在的解决方案，以应对这些挑战。	code	0
Exploring Multi-Scenario Multi-Modal CTR Prediction with a Large Scale Dataset	Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang, Chenliang Li, Fajie Yuan	Ant Group, Beijing, China; Wuhan University, Wuhan, China; Westlake University, Hangzhou, China; Ant Group, Hangzhou, China	Click-through rate (CTR) prediction plays a crucial role in recommendation systems, with significant impact on user experience and platform revenue generation. Despite the various public CTR datasets available due to increasing interest from both academia and industry, these datasets have limitations. They cover a limited range of scenarios and predominantly focus on ID-based features, neglecting the vital role of multi-modal features for effective multi-scenario CTR prediction. Moreover, their scale is modest compared to real-world industrial datasets, hindering robust and comprehensive evaluation of complex models. To address these challenges, we introduce a large-scale Multi-Scenario Multi-Modal CTR dataset named AntM2 C, built from real industrial data from Alipay. This dataset offers an impressive breadth and depth of information, covering CTR data from four diverse business scenarios, including advertisements, consumer coupons, mini-programs, and videos. Unlike existing datasets, AntM2 C provides not only ID-based features but also five textual features and one image feature for both users and items, supporting more delicate multi-modal CTR prediction. AntM2 C is also substantially larger than existing datasets, comprising 100 million CTR data. This scale allows for robust and comprehensive evaluation and comparison of CTR prediction models. We employ AntM2 C to construct several typical CTR tasks, including multi-scenario modeling, item and user cold-start modeling, and multi-modal modeling. Initial experiments and comparisons with baseline methods have shown that AntM2 C presents both new challenges and opportunities for CTR models, with the potential to significantly advance CTR research. The AntM2 C dataset is available at https://www.atecup.cn/OfficalDataSet.	点进率预测在推荐系统中起着至关重要的作用，对用户体验和平台收入产生重大影响。尽管由于学术界和工业界的兴趣日益增长，各种公共 CTR 数据集可用，但这些数据集有其局限性。它们涵盖的场景范围有限，主要侧重于基于身份的特征，忽视了多模态特征在有效的多场景 CTR 预测中的重要作用。此外，与现实世界的工业数据集相比，它们的规模较小，妨碍了对复杂模型的稳健和全面的评估。为了应对这些挑战，我们引入了一个名为 AntM2C 的大规模多场景多模式 CTR 数据集，该数据集是从支付宝的真实工业数据构建的。这个数据集提供了令人印象深刻的广度和深度的信息，涵盖了来自四个不同商业场景的点击率数据，包括广告、消费者优惠券、迷你程序和视频。与现有的数据集不同，AntM2 C 不仅提供基于 ID 的特征，而且还为用户和项目提供五个文本特征和一个图像特征，支持更精细的多模态 CTR 预测。AntM2C 也远远大于现有的数据集，包括1亿个点击率数据。这个尺度允许对 CTR 预测模型进行稳健和全面的评估和比较。我们使用 AntM2 C 构建了几个典型的 CTR 任务，包括多场景建模、项目和用户冷启动建模以及多模态建模。最初的实验和与基线方法的比较表明，AntM2 C 为 CTR 模型提出了新的挑战和机遇，有可能显著推进 CTR 研究。AntM2 c 数据集可在 https://www.atecup.cn/officaldataset 下载。	code	0
Dimension Importance Estimation for Dense Information Retrieval	Guglielmo Faggioli, Nicola Ferro, Raffaele Perego, Nicola Tonellotto	University of Padova, Padua, Italy; University of Padova, Padova, Italy; CNR, Pisa, Italy; University of Pisa, Pisa, Italy	Recent advances in Information Retrieval have shown the effectiveness of embedding queries and documents in a latent high-dimensional space to compute their similarity. While operating on such high-dimensional spaces is effective, in this paper, we hypothesize that we can improve the retrieval performance by adequately moving to a query-dependent subspace. More in detail, we formulate the Manifold Clustering (MC) Hypothesis: projecting queries and documents onto a subspace of the original representation space can improve retrieval effectiveness. To empirically validate our hypothesis, we define a novel class of Dimension IMportance Estimators (DIME). Such models aim to determine how much each dimension of a high-dimensional representation contributes to the quality of the final ranking and provide an empirical method to select a subset of dimensions where to project the query and the documents. To support our hypothesis, we propose an oracle DIME, capable of effectively selecting dimensions and almost doubling the retrieval performance. To show the practical applicability of our approach, we then propose a set of DIMEs that do not require any oracular piece of information to estimate the importance of dimensions. These estimators allow us to carry out a dimensionality selection that enables performance improvements of up to +11.5% (moving from 0.675 to 0.752 nDCG@10) compared to the baseline methods using all dimensions. Finally, we show that, with simple and realistic active feedback, such as the user's interaction with a single relevant document, we can design a highly effective DIME, allowing us to outperform the baseline by up to +0.224 nDCG@10 points (+58.6%, moving from 0.384 to 0.608).	信息检索的最新进展表明，将查询和文档嵌入到一个潜在的高维空间来计算它们的相似性是有效的。虽然在这样的高维空间上操作是有效的，但本文假设我们可以通过适当地移动到查询相关的子空间来提高检索性能。更详细地说，我们提出了流形聚类(MC)假说: 将查询和文档投影到原始表示空间的子空间中可以提高检索效率。为了从经验上验证我们的假设，我们定义了一类新的维度重要性估计(DIME)。这些模型旨在确定高维表示的每个维度在多大程度上有助于最终排名的质量，并提供一种经验方法来选择一个维度子集，以投射查询和文档。为了支持我们的假设，我们提出了一个 Oracle DIME，它能够有效地选择维度，并且检索性能几乎翻倍。为了展示我们的方法的实际适用性，我们然后提出一组 DIME，它们不需要任何预言信息来估计维度的重要性。这些估计值使我们能够进行维度选择，与使用所有维度的基线方法相比，性能提高高达11.5% (从0.675移动到0.752 nDCG@10)。最后，我们表明，通过简单和现实的主动反馈，例如用户与单个相关文档的交互，我们可以设计一个高效的 DIME，使我们的表现优于基线最多 + 0.224 nDCG@10分(+ 58.6% ，从0.384移动到0.608)。	code	0
Large Language Models for Next Point-of-Interest Recommendation	Peibo Li, Maarten de Rijke, Hao Xue, Shuang Ao, Yang Song, Flora D. Salim	University of New South Wales; University of Amsterdam; The University of New South Wales	The next Point of Interest (POI) recommendation task is to predict users'immediate next POI visit given their historical data. Location-Based SocialNetwork (LBSN) data, which is often used for the next POI recommendation task,comes with challenges. One frequently disregarded challenge is how toeffectively use the abundant contextual information present in LBSN data.Previous methods are limited by their numerical nature and fail to address thischallenge. In this paper, we propose a framework that uses pretrained LargeLanguage Models (LLMs) to tackle this challenge. Our framework allows us topreserve heterogeneous LBSN data in its original format, hence avoiding theloss of contextual information. Furthermore, our framework is capable ofcomprehending the inherent meaning of contextual information due to theinclusion of commonsense knowledge. In experiments, we test our framework onthree real-world LBSN datasets. Our results show that the proposed frameworkoutperforms the state-of-the-art models in all three datasets. Our analysisdemonstrates the effectiveness of the proposed framework in using contextualinformation as well as alleviating the commonly encountered cold-start andshort trajectory problems.	下一个兴趣点(POI)推荐任务是根据用户的历史数据预测他们的下一次 POI 访问。基于位置的社交网络(LBSN)数据通常用于下一个 POI 推荐任务，但它也带来了挑战。一个经常被忽视的挑战是如何有效地利用 LBSN 数据中存在的大量上下文信息。以前的方法受到其数字性质的限制，无法解决这一挑战。在本文中，我们提出了一个框架，使用预先训练的大型语言模型(LLM)来应对这一挑战。我们的框架允许我们以原始格式保留异构 LBSN 数据，从而避免了上下文信息的丢失。此外，由于包含了常识性知识，我们的框架能够理解上下文信息的内在含义。在实验中，我们在三个实际的 LBSN 数据集上测试我们的框架。我们的结果表明，所提出的框架在所有三个数据集中都优于最先进的模型。我们的分析证明了该框架在利用上下文信息以及缓解常见的冷启动和短轨道问题方面的有效性。	code	0
The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking	Ali Vardasbi, Maarten de Rijke, Fernando Diaz, Mostafa Dehghani	Google DeepMind; Spotify; University of Amsterdam; Carnegie Mellon University	When learning to rank from user interactions, search and recommender systemsmust address biases in user behavior to provide a high-quality ranking. Onetype of bias that has recently been studied in the ranking literature is whensensitive attributes, such as gender, have an impact on a user's judgment aboutan item's utility. For example, in a search for an expertise area, some usersmay be biased towards clicking on male candidates over female candidates. Wecall this type of bias group membership bias. Increasingly, we seek rankingsthat are fair to individuals and sensitive groups. Merit-based fairnessmeasures rely on the estimated utility of the items. With group membershipbias, the utility of the sensitive groups is under-estimated, hence, withoutcorrecting for this bias, a supposedly fair ranking is not truly fair. In thispaper, first, we analyze the impact of group membership bias on ranking qualityas well as merit-based fairness metrics and show that group membership bias canhurt both ranking and fairness. Then, we provide a correction method for groupbias that is based on the assumption that the utility score of items indifferent groups comes from the same distribution. This assumption has twopotential issues of sparsity and equality-instead-of-equity; we use anamortized approach to address these. We show that our correction method canconsistently compensate for the negative impact of group membership bias onranking quality and fairness metrics.	当学习从用户交互中进行排名时，搜索和推荐系统必须解决用户行为中的偏见，以提供高质量的排名。排名文献中最近研究的一种偏见类型是，当敏感属性，如性别，对用户对一个项目的效用的判断产生影响时。例如，在搜索专业领域时，一些用户可能会偏向于点击男性候选人而不是女性候选人。我们把这种类型的偏见称为群体成员偏见。我们越来越多地寻求对个人和敏感群体公平的排名。基于绩效的公平性衡量依赖于项目的估计效用。由于群体成员偏差，敏感群体的效用被低估，因此，如果不对这种偏差进行修正，所谓的公平排名就不是真正的公平。本文首先分析了群体成员偏差对排名质量的影响，以及基于绩效的公平性指标，指出群体成员偏差会损害排名和公平性。在此基础上，我们提出了一种群体偏差的校正方法，该方法基于不同群体项目的效用得分来自同一分布的假设。这个假设有两个潜在的问题，即稀缺性和公平性，而不是公平性; 我们使用分摊方法来解决这些问题。我们表明，我们的修正方法可以一致地补偿群体成员偏见对排名质量和公平性指标的负面影响。	code	0
Grand: A Fast and Accurate Graph Retrieval Framework via Knowledge Distillation	Lin Lan, Pinghui Wang, Rui Shi, Tingqing Liu, Juxiang Zeng, Feiyang Sun, Yang Ren, Jing Tao, Xiaohong Guan	Xi'an Jiaotong University; GaussDB, Huawei Technologies Co Ltd, Shenzhen, China; MOE KLINNS Lab, Xi'an Jiaotong University, Xi'an, China	Graph retrieval aims to find the most similar graphs in a graph database given a query graph, which is a fundamental problem with many real-world applications in chemical engineering, code analysis, etc. To date, existing neural graph retrieval methods generally fall into two categories: Embedding Based Paradigm (Ebp) and Matching Based Paradigm (Mbp). The Ebp models learn an individual vectorial representation for each graph and the retrieval process can be accelerated by pre-computing these representations. The Mbp models learn a neural matching function to compare graphs on a pair-by-pair basis, in which the fine-grained pairwise comparison leads to higher retrieval accuracy but severely degrades retrieval efficiency. In this paper, to combine the advantage of Ebp in retrieval efficiency with that of Mbp in retrieval accuracy, we propose a novel Graph RetrievAl framework via KNowledge Distillation, namely GRAND. The key point is to leverage the idea of knowledge distillation to transfer the fine-grained graph comparison knowledge from an Mbp model to an Ebp model, such that the Ebp model can generate better graph representations and thus yield higher retrieval accuracy. At the same time, we can still pre-compute and index the improved graph representations to retain the retrieval speed of Ebp. Towards this end, we propose to perform knowledge distillation from three perspectives: score, node, and subgraph levels. In addition, we propose to perform mutual two-way knowledge transfer between Mbp and Ebp, such that Mbp and Ebp complement and benefit each other. Extensive experiments on three real-world datasets show that GRAND improves the performance of Ebp by a large margin and the improvement is consistent for different combinations of Ebp and Mbp models. For example, GRAND achieves performance gains of mostly more than 10% and up to 16.88% in terms of Recall@K on different datasets.	图形检索的目的是在给定查询图的图库中找到最相似的图，这是化学工程、代码分析等现实应用中的一个基本问题。迄今为止，现有的神经图检索方法一般分为两类: 基于嵌入的范式(Ebp)和基于匹配的范式(Mbp)。Ebp 模型学习每个图的单个向量表示，并且可以通过预计算这些表示来加速检索过程。Mbp 模型学习了一种神经匹配功能，可以在一对一对的基础上对图进行比较，在这种情况下，细粒度的成对比较导致更高的检索准确性，但严重降低了检索效率。本文结合 Ebp 在检索效率方面的优势和 Mbp 在检索精度方面的优势，提出了一种基于知识提取的图形检索 Al 框架 GRAND。其关键是利用知识精馏的思想，将 Mbp 模型中的细粒度图形比较知识转化为 Ebp 模型，使 Ebp 模型能够生成更好的图形表示，从而提高检索精度。同时，我们还可以对改进后的图表示进行预计算和索引，以保持 Ebp 的检索速度。为此，我们提出从三个角度进行知识提取: 得分、节点和子图层次。此外，我们提出在 Mbp 和 Ebp 之间进行双向知识转移，使 Mbp 和 Ebp 相互补充，相互受益。在三个实际数据集上的大量实验表明，GRAND 算法大大提高了 Ebp 算法的性能，而且对于 Ebp 和 Mbp 模型的不同组合，GRAND 算法的性能改善是一致的。例如，GRAND 在不同的数据集上通过 Recall@K 实现了大多超过10% 和高达16.88% 的性能增益。	code	0
Unmasking Privacy: A Reproduction and Evaluation Study of Obfuscation-based Perturbation Techniques for Collaborative Filtering	Alex Martinez, Mihnea Tufis, Ludovico Boratto	University of Cagliari, Cagliari, Italy; Eurecat, Technology Centre of Catalonia & Universitat de Barcelona, Barcelona, Catalunya, Spain; Eurecat, Technology Centre of Catalonia, Barcelona, Catalunya, Spain	Recommender systems (RecSys) solve personalisation problems and therefore heavily rely on personal data - demographics, user preferences, user interactions - each baring important privacy risks. It is also widely accepted that in RecSys performance and privacy are at odds, with the increase of one resulting in the decrease of the other. Among the diverse approaches in privacy enhancing technologies (PET) for RecSys, perturbation stands out for its simplicity and computational efficiency. It involves adding noise to sensitive data, thus hiding its real value from an untrusted actor. We reproduce and test a set of four randomization-based perturbation techniques developed by Batmaz and Polat \citebatmaz2016randomization for privacy preserving collaborative filtering. While the framework presents great advantages - low computational requirements, several useful privacy-enhancing parameters - the supporting paper lacks conclusions drawn from empirical evaluation. We address this shortcoming by proposing - in absence of an implementation by the authors - our own implementation of the obfuscation framework. We then develop an evaluation framework to test the main assumption of the reference paper - that RecSys privacy and performance are competing goals. We extend this study to understand how much we can enhance privacy, within reasonable losses of the RecSys performance. We reproduce and test the framework for the more realistic scenario where only implicit feedback is available, using two well-known datasets (MovieLens-1M and Last.fm-1K), and several state-of-the-art recommendation algorithms (NCF and LightGCN from the Microsoft Recommenders public repository).	推荐系统(RecSys)解决个性化问题，因此严重依赖于个人数据——人口统计数据、用户偏好、用户交互——每个都有重要的隐私风险。人们普遍认为，在 RecSys 系统中，性能和隐私是不一致的，一个的增加会导致另一个的减少。在 RecSys 隐私增强技术(PET)的众多方法中，摄动以其简单性和计算效率而著称。它包括向敏感数据添加噪音，从而向不受信任的参与者隐藏其真实值。我们重现并测试了 Batmaz 和 Polat citebatmaz2016为保护隐私而开发的一组基于随机化的扰动技术，这些技术都是用于保护隐私的协同过滤。虽然该框架具有很大的优势——低计算需求，几个有用的隐私增强参数——但是支持文件缺乏从实证评估中得出的结论。我们通过提出——在作者没有实现的情况下——我们自己的模糊框架的实现来解决这个缺陷。然后，我们开发了一个评估框架来检验参考文件的主要假设—— RecSys 的隐私和性能是相互竞争的目标。我们扩展这项研究，以了解我们可以提高多少隐私，在合理损失的 RecSys 性能。我们使用两个众所周知的数据集(MovieLens-1M 和 Last.fm-1K)和几个最先进的推荐算法(来自 Microsoft 推荐公共存储库的 NCF 和 LightGCN) ，重现和测试框架以实现更现实的场景，其中只有隐式反馈是可用的。	code	0
GPT4Rec: Graph Prompt Tuning for Streaming Recommendation	Peiyan Zhang, Yuchen Yan, Xi Zhang, Liying Kang, Chaozhuo Li, Feiran Huang, Senzhang Wang, Sunghun Kim	Hong Kong University of Science and Technology; Peking University School of Intelligence Science and Technology; Fuzhou University Interdisciplinary Institute for Medical Engineering; Jinan University; Microsoft Research Asia; Hong Kong Polytechnic University; Central South University	In the realm of personalized recommender systems, the challenge of adaptingto evolving user preferences and the continuous influx of new users and itemsis paramount. Conventional models, typically reliant on a static training-testapproach, struggle to keep pace with these dynamic demands. Streamingrecommendation, particularly through continual graph learning, has emerged as anovel solution. However, existing methods in this area either rely onhistorical data replay, which is increasingly impractical due to stringent dataprivacy regulations; or are inability to effectively address the over-stabilityissue; or depend on model-isolation and expansion strategies. To tackle thesedifficulties, we present GPT4Rec, a Graph Prompt Tuning method for streamingRecommendation. Given the evolving user-item interaction graph, GPT4Rec firstdisentangles the graph patterns into multiple views. After isolating specificinteraction patterns and relationships in different views, GPT4Rec utilizeslightweight graph prompts to efficiently guide the model across varyinginteraction patterns within the user-item graph. Firstly, node-level promptsare employed to instruct the model to adapt to changes in the attributes orproperties of individual nodes within the graph. Secondly, structure-levelprompts guide the model in adapting to broader patterns of connectivity andrelationships within the graph. Finally, view-level prompts are innovativelydesigned to facilitate the aggregation of information from multipledisentangled views. These prompt designs allow GPT4Rec to synthesize acomprehensive understanding of the graph, ensuring that all vital aspects ofthe user-item interactions are considered and effectively integrated.Experiments on four diverse real-world datasets demonstrate the effectivenessand efficiency of our proposal.	在个性化推荐系统领域，适应不断变化的用户偏好和不断涌入的新用户和项目的挑战至关重要。传统的模型，通常依赖于静态的训练-测试方法，很难跟上这些动态的需求。流式推荐，特别是通过持续的图学习，已经成为一种新的解决方案。然而，这一领域的现有方法要么依赖于历史数据重放，由于严格的数据隐私条例，这越来越不切实际; 要么无法有效地解决过度稳定的问题; 要么依赖于模型隔离和扩展策略。为了解决这些问题，我们提出了 GPT4Rec，一种用于流推荐的图形提示调优方法。考虑到不断发展的用户项交互图，GPT4Rec 首先将图形模式分解为多个视图。在不同视图中隔离特定的交互模式和关系之后，GPT4Rec 利用轻量级图形提示有效地指导模型跨用户项图中的各种交互模式。首先，采用节点级提示来指导模型适应图中各个节点的属性或属性的变化。其次，结构级提示指导模型适应图中更广泛的连接和关系模式。最后，视图级提示被创新性地设计，以促进来自多个分离视图的信息的聚合。这些迅速的设计使 GPT4Rec 能够综合对图表的全面理解，确保用户-项目交互的所有重要方面都得到考虑和有效集成。在四个不同的真实世界数据集上的实验证明了我们方案的有效性和高效性。	code	0
I3: Intent-Introspective Retrieval Conditioned on Instructions	Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, TatSeng Chua, Siliang Tang	DAMO Academy, Alibaba Group; Zhejiang University; Worcester Polytechnic Institute; National University of Singapore	Recent studies indicate that dense retrieval models struggle to perform wellon a wide variety of retrieval tasks that lack dedicated training data, asdifferent retrieval tasks often entail distinct search intents. To address thischallenge, in this work we leverage instructions to flexibly describe retrievalintents and introduce I3, a unified retrieval system that performsIntent-Introspective retrieval across various tasks, conditioned onInstructions without any task-specific training. I3 innovatively incorporates apluggable introspector in a parameter-isolated manner to comprehend specificretrieval intents by jointly reasoning over the input query and instruction,and seamlessly integrates the introspected intent into the original retrievalmodel for intent-aware retrieval. Furthermore, we propose progressively-prunedintent learning. It utilizes extensive LLM-generated data to train I3phase-by-phase, embodying two key designs: progressive structure pruning anddrawback extrapolation-based data refinement. Extensive experiments show thatin the BEIR benchmark, I3 significantly outperforms baseline methods designedwith task-specific retrievers, achieving state-of-the-art zero-shot performancewithout any task-specific tuning.	最近的研究表明，密集检索模型难以在缺乏专门训练数据的各种检索任务中表现良好，因为不同的检索任务往往需要不同的搜索意图。为了应对这一挑战，在这项工作中，我们利用指令来灵活地描述检索对象，并引入 I3，一个统一的检索系统，可以在不同任务间执行意图-内省检索，不需要任何特定任务的训练，只需要指令。I3创新地以参数隔离的方式整合了可插入的内省器，通过对输入查询和指令的联合推理来理解特定的检索意图，并且无缝地将内省意图整合到原始的检索模型中用于意图感知检索。此外，我们提出逐步修剪意向学习。它利用大量 LLM 生成的数据对 I3进行逐步训练，包含两个关键设计: 渐进式结构修剪和基于缺陷外推的数据细化。大量的实验表明，在 BEIR 基准测试中，I3明显优于设计有任务特定检索器的基准方法，在没有任何任务特定调整的情况下，实现了最先进的零射击性能。	code	0
Disentangling ID and Modality Effects for Session-based Recommendation	Xiaokun Zhang, Bo Xu, Zhaochun Ren, Xiaochen Wang, Hongfei Lin, Fenglong Ma	Dalian University of Technology; Leiden University; Pennsylvania State University	Session-based recommendation aims to predict intents of anonymous users basedon their limited behaviors. Modeling user behaviors involves two distinctrationales: co-occurrence patterns reflected by item IDs, and fine-grainedpreferences represented by item modalities (e.g., text and images). However,existing methods typically entangle these causes, leading to their failure inachieving accurate and explainable recommendations. To this end, we propose anovel framework DIMO to disentangle the effects of ID and modality in the task.At the item level, we introduce a co-occurrence representation schema toexplicitly incorporate cooccurrence patterns into ID representations.Simultaneously, DIMO aligns different modalities into a unified semantic spaceto represent them uniformly. At the session level, we present a multi-viewself-supervised disentanglement, including proxy mechanism and counterfactualinference, to disentangle ID and modality effects without supervised signals.Leveraging these disentangled causes, DIMO provides recommendations via causalinference and further creates two templates for generating explanations.Extensive experiments on multiple real-world datasets demonstrate theconsistent superiority of DIMO over existing methods. Further analysis alsoconfirms DIMO's effectiveness in generating explanations.	基于会话的推荐旨在根据匿名用户的有限行为来预测用户的意图。建模用户行为涉及两种不同的情况: 由项目 ID 反映的共现模式，以及由项目模式(例如，文本和图像)表示的细粒度偏好。然而，现有的方法通常纠缠这些原因，导致他们无法实现准确和可解释的建议。为此，我们提出了一个新的框架 DIMO，以解决身份和情态在任务中的影响。在项目层面，我们引入了一个共现表示模式，以显式地将共现模式纳入身份表示。同时，DIMO 将不同的模式统一到一个统一的语义空间中，以统一地表示它们。在会话层面，我们提出了一种多视角的自监督解纠缠算法，包括代理机制和反事实推理，用于在没有监督信号的情况下解纠缠 ID 和模态效应。利用这些分离的原因，DIMO 通过因果推理提供建议，并进一步创建两个模板来产生解释。在多个真实世界数据集上的大量实验证明了 DIMO 相对于现有方法的一致优越性。进一步的分析也证实了 DIMO 在产生解释方面的有效性。	code	0
Large Language Models are Learnable Planners for Long-Term Recommendation	Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng	Meta; University of Science and Technology of China	Planning for both immediate and long-term benefits becomes increasinglyimportant in recommendation. Existing methods apply Reinforcement Learning (RL)to learn planning capacity by maximizing cumulative reward for long-termrecommendation. However, the scarcity of recommendation data presentschallenges such as instability and susceptibility to overfitting when trainingRL models from scratch, resulting in sub-optimal performance. In this light, wepropose to leverage the remarkable planning capabilities over sparse data ofLarge Language Models (LLMs) for long-term recommendation. The key to achievingthe target lies in formulating a guidance plan following principles ofenhancing long-term engagement and grounding the plan to effective andexecutable actions in a personalized manner. To this end, we propose a Bi-levelLearnable LLM Planner framework, which consists of a set of LLM instances andbreaks down the learning process into macro-learning and micro-learning tolearn macro-level guidance and micro-level personalized recommendationpolicies, respectively. Extensive experiments validate that the frameworkfacilitates the planning ability of LLMs for long-term recommendation. Our codeand data can be found at https://github.com/jizhi-zhang/BiLLP.	计划短期和长期的利益变得越来越重要的建议。现有的方法通过最大化长期推荐的累积回报来应用强化学习(RL)来学习计划能力。然而，推荐数据的稀缺性提出了挑战，如不稳定性和容易过拟合时，从头开始训练 RL 模型，导致次优性能。在这种情况下，我们建议利用大语言模型(LLM)稀疏数据的卓越规划能力来进行长期推荐。实现目标的关键在于制定指导计划，遵循加强长期参与和以个性化方式采取有效和可执行行动的原则。为此，我们提出了一个双层可学习的 LLM 规划框架，该框架由一组 LLM 实例组成，将学习过程分解为宏观学习和微观学习，分别学习宏观层面的指导和微观层面的个性化推荐策略。大量的实验验证了该框架有利于 LLM 的长期推荐规划能力。我们的代码和数据可以在 https://github.com/jizhi-zhang/billp 找到。	code	0
Identifiability of Cross-Domain Recommendation via Causal Subspace Disentanglement	Jing Du, Zesheng Ye, Bin Guo, Zhiwen Yu, Lina Yao	Northwestern Polytechnical University, Xi'an, China; CSIRO's Data 61 & The University of New South Wales, Sydney, Australia; The University of New South Wales, Sydney, Australia; Macquarie University, Sydney, Australia	Cross-Domain Recommendation~(CDR) seeks to enable effective knowledge transfer across domains. Most existing works rely on either representation alignment or transformation bridges, but they come with shortcomings regarding identifiability of domain-shared and domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its joint identifiability as they primarily fixate on the marginal distribution within a particular domain. Such a failure may overlook the conditionality between two domains and how it contributes to latent factor disentanglement, leading to negative transfer when domains are weakly correlated. In this study, we explore what should and should not be transferred in cross-domain user representations from a causality perspective. We propose a Hierarchical causal subspace disentanglement approach to explore the Joint IDentifiability of cross-domain joint distribution, termed HJID, to preserve domain-specific behaviors from domain-shared factors. HJID abides by the feature hierarchy and divides user representations into generic shallow subspace and domain-oriented deep subspaces. We first encode the generic pattern in the shallow subspace by minimizing the Maximum Mean Discrepancy of initial layer activation. Then, to dissect how domain-oriented latent factors are encoded in deeper layers activation, we construct a cross-domain causality-based data generation graph, which identifies cross-domain consistent and domain-specific components, adhering to the Minimal Change principle. This allows HJID to maintain stability whilst discovering unique factors for different domains, all within a generative framework of invertible transformations that guarantee the joint identifiability. With experiments on real-world datasets, we show that HJID outperforms SOTA methods on both strong- and weak-correlation CDR tasks.	跨领域推荐 ~ (CDR)旨在实现跨领域的有效知识转移。现有的工作大多依赖于表示对齐或转换桥梁，但在领域共享和领域特定潜在因素的可识别性方面存在缺陷。具体来说，虽然 CDR 将用户表示描述为两个域的联合分布，但这些方法无法解释其联合可识别性，因为它们主要集中在特定域内的边缘分布。这种失效可能会忽略两个领域之间的条件性，以及它如何促进潜在因素的解纠缠，导致负迁移时，领域是弱相关的。在这项研究中，我们从因果关系的角度探讨什么应该和不应该在跨域用户表示中传递。我们提出了一种分层因果子空间解纠缠方法来探索跨域联合分布的联合可识别性，称为 HJID，以保留领域特定的行为从领域共享因素。HJID 遵循特征层次结构，将用户表示划分为一般的浅子空间和面向领域的深子空间。我们首先通过最小化初始层激活的最大平均差来编码浅子空间中的通用模式。然后，为了剖析面向领域的潜在因素是如何在深层激活中编码的，我们构建了一个基于跨领域因果关系的数据生成图，该图根据最小变化原则识别跨领域一致性和特定领域的组件。这允许 HJID 保持稳定性，同时发现不同领域的独特因素，所有这些都在一个可逆转换的生成框架内，保证了联合可识别性。通过对实际数据集的实验，我们发现 HJID 方法在强相关和弱相关 CDR 任务上都优于 SOTA 方法。	code	0
DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain Recommendation	Yu Li, Yi Zhang, Zimu Zhou, Qiang Li	Algorithm team, WeSure Inc., Shenzhen, China; School of Data Science, City University of Hong Kong, Hong Kong, Hong Kong; College of Computer Science and Technology, Jilin University, Changchun, China	Cross-domain recommendation (CDR) is a widely used methodology in recommender systems to combat data sparsity. It leverages user data across different domains or platforms for providing personalized recommendations. Traditional CDR assumes user preferences and behavior data can be shared freely among cloud and users, which is now impractical due to strict restrictions of data privacy. In this paper, we propose a Deployment-friendly Cloud-Device Collaboration framework for Cross-Domain Recommendation (DeCoCDR). It splits CDR into a two-stage recommendation model through cloud-device collaborations, i.e., item-recall on cloud and item re-ranking on device. This design enables effective CDR while preserving data privacy for both the cloud and the device. Extensive offline and online experiments are conducted to validate the effectiveness of DeCoCDR. In offline experiments, DeCoCDR outperformed the state-of-the-arts in three large datasets. While in real-world deployment, DeCoCDR improved the conversion rate by 45.3% compared with the baseline.	跨域推荐(CDR)是推荐系统中应用最为广泛的一种数据稀疏处理方法。它利用跨不同领域或平台的用户数据来提供个性化的推荐。传统的 CDR 假设用户偏好和行为数据可以在云和用户之间自由共享，但是由于数据隐私的严格限制，这种假设是不切实际的。在本文中，我们提出了一个面向跨域推荐的部署友好的云设备协作框架(DeCoCDR)。它通过云设备协作将 CDR 划分为两个阶段的推荐模型，即云上的项目召回和设备上的项目重新排序。这种设计支持有效的 CDR，同时保护云和设备的数据隐私。为了验证 DeCoCDR 的有效性，进行了大量的离线和在线实验。在离线实验中，DeCoCDR 在三个大型数据集中的表现超过了最先进的水平。在实际部署中，与基线相比，DeCoCDR 将转换率提高了45.3% 。	code	0
Mutual Information-based Preference Disentangling and Transferring for Non-overlapped Multi-target Cross-domain Recommendations	Zhi Li, Daichi Amagata, Yihong Zhang, Takahiro Hara, Shuichiro Haruta, Kei Yonekawa, Mori Kurokawa	Osaka University, Suita, Osaka, Japan; KDDI Research, Inc., Fujimino, Saitama, Japan	Building high-quality recommender systems is challenging for new services and small companies, because of their sparse interactions. Cross-domain recommendations (CDRs) alleviate this issue by transferring knowledge from data in external domains. However, most existing CDRs leverage data from only a single external domain and serve only two domains. CDRs serving multiple domains require domain-shared entities (i.e., users and items) to transfer knowledge, which significantly limits their applications due to the hardness and privacy concerns of finding such entities. We therefore focus on a more general scenario, non-overlapped multi-target CDRs (NO-MTCDRs), which require no domain-shared entities and serve multiple domains. Existing methods require domain-shared users to learn user preferences and cannot work on NO-MTCDRs. We hence propose MITrans, a novel mutual information-based (MI-based) preference disentangling and transferring framework to improve recommendations for all domains. MITrans effectively leverages knowledge from multiple domains as well as learning both domain-shared and domain-specific preferences without using domain-shared users. In MITrans, we devise two novel MI constraints to disentangle domain-shared and domain-specific preferences. Moreover, we introduce a module that fuses domain-shared preferences in different domains and combines them with domain-specific preferences to improve recommendations. Our experimental results on two real-world datasets demonstrate the superiority of MITrans in terms of recommendation quality and application range against state-of-the-art overlapped and non-overlapped CDRs.	构建高质量的推荐系统对于新服务和小公司来说是一个挑战，因为它们的交互很少。跨领域建议(CDR)通过从外部领域的数据中传输知识来缓解这一问题。然而，大多数现有的 CDR 仅利用来自单个外部域的数据，并且仅服务于两个域。服务于多个域的 CDR 需要域共享实体(即用户和项目)来传递知识，由于寻找这些实体的难度和隐私问题，这极大地限制了它们的应用程序。因此，我们关注于一个更一般的场景，非重叠多目标 CDR (NO-MTCDR) ，它不需要域共享实体并服务于多个域。现有的方法要求域共享用户学习用户首选项，并且不能在 NO-MTCDR 上工作。因此，我们提出了一种新的基于互信息(MI-based)的偏好分离和传递框架 MITAN，以改善所有领域的建议。MTrans 有效地利用了来自多个领域的知识，并且在不使用领域共享用户的情况下学习了领域共享和领域特定的偏好。在 MTrans 中，我们设计了两个新的 MI 约束来区分领域共享和领域特定的偏好。此外，我们还引入了一个模块，该模块融合了不同领域中的域共享首选项，并将它们与特定领域的首选项结合起来，以改进建议。我们在两个实际数据集上的实验结果表明，MTrans 在推荐质量和应用范围方面优于最先进的重叠和非重叠 CDR。	code	0
LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset	Haitao Li, Yunqiu Shao, Yueyue Wu, Qingyao Ai, Yixiao Ma, Yiqun Liu	Tsinghua University; Tsinghua university	As an important component of intelligent legal systems, legal case retrieval plays a critical role in ensuring judicial justice and fairness. However, the development of legal case retrieval technologies in the Chinese legal system is restricted by three problems in existing datasets: limited data size, narrow definitions of legal relevance, and naive candidate pooling strategies used in data sampling. To alleviate these issues, we introduce LeCaRDv2, a large-scale Legal Case Retrieval Dataset (version 2). It consists of 800 queries and 55,192 candidates extracted from 4.3 million criminal case documents. To the best of our knowledge, LeCaRDv2 is one of the largest Chinese legal case retrieval datasets, providing extensive coverage of criminal charges. Additionally, we enrich the existing relevance criteria by considering three key aspects: characterization, penalty, procedure. This comprehensive criteria enriches the dataset and may provides a more holistic perspective. Furthermore, we propose a two-level candidate set pooling strategy that effectively identify potential candidates for each query case. It's important to note that all cases in the dataset have been annotated by multiple legal experts specializing in criminal law. Their expertise ensures the accuracy and reliability of the annotations. We evaluate several state-of-the-art retrieval models at LeCaRDv2, demonstrating that there is still significant room for improvement in legal case retrieval. The details of LeCaRDv2 can be found at the anonymous website https://github.com/anonymous1113243/LeCaRDv2.	法律案件检索作为智能法律系统的重要组成部分，对于确保司法公正和公平起着至关重要的作用。然而，中国法律体系中法律案例检索技术的发展受到现有数据集存在的三个问题的制约: 数据规模有限、法律相关性定义狭窄以及数据抽样中采用的候选人汇集策略过于简单。为了缓解这些问题，我们引入了 LeCaRDv2，这是一个大型的法律案例检索数据集(版本2)。它包括从430万份刑事案件文件中提取的800个查询和55,192个候选者。据我们所知，LeCaRDv2是中国最大的法律案件检索数据集之一，提供了广泛的刑事指控。此外，我们还考虑了角色塑造、惩罚和程序三个关键因素，丰富了现有的相关标准。这个全面的标准丰富了数据集，并且可以提供一个更全面的视角。此外，我们提出了一个两级候选集合池策略，有效地识别每个查询用例的潜在候选者。值得注意的是，数据集中的所有案例都由多位专门研究刑法的法律专家进行了注释。他们的专业知识确保了注释的准确性和可靠性。我们评估了 LeCaRDv2的几个最先进的检索模型，表明在法律案件检索方面仍有很大的改进空间。有关 Lecardv2的详情，可浏览该匿名网站的 https://github.com/anonymous1113243/LeCaRDv2。	code	0
Behavior Pattern Mining-based Multi-Behavior Recommendation	Haojie Li, Zhiyong Cheng, Xu Yu, Jinhuan Liu, Guanfeng Liu, Junwei Du	School of Computing, Macquarie University; College of Computer Science and Technology, China University of Petroleum; Shandong Artifcial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences); College of Data Science, Qingdao University of Science and Technology	Multi-behavior recommendation systems enhance effectiveness by leveraging auxiliary behaviors (such as page views and favorites) to address the limitations of traditional models that depend solely on sparse target behaviors like purchases. Existing approaches to multi-behavior recommendations typically follow one of two strategies: some derive initial node representations from individual behavior subgraphs before integrating them for a comprehensive profile, while others interpret multi-behavior data as a heterogeneous graph, applying graph neural networks to achieve a unified node representation. However, these methods do not adequately explore the intricate patterns of behavior among users and items. To bridge this gap, we introduce a novel algorithm called Behavior Pattern mining-based Multi-behavior Recommendation (BPMR). Our method extensively investigates the diverse interaction patterns between users and items, utilizing these patterns as features for making recommendations. We employ a Bayesian approach to streamline the recommendation process, effectively circumventing the challenges posed by graph neural network algorithms, such as the inability to accurately capture user preferences due to over-smoothing. Our experimental evaluation on three real-world datasets demonstrates that BPMR significantly outperforms existing state-of-the-art algorithms, showing an average improvement of 268.29 in NDCG@10 metrics. The code of our BPMR is openly accessible for use and further research at https://github.com/rookitkitlee/BPMR.	多行为推荐系统通过利用辅助行为(如页面浏览和收藏夹)来解决传统模型的局限性，这些传统模型仅仅依赖于稀疏的目标行为(如购买) ，从而提高了有效性。现有的多行为推荐方法通常遵循以下两种策略之一: 一些方法从个体行为子图中获得初始节点表示，然后将它们整合为一个综合概况; 另一些方法将多行为数据解释为异构图，应用图神经网络实现统一的节点表示。然而，这些方法并没有充分探索用户和项之间复杂的行为模式。为了弥补这一差距，我们引入了一种新的算法，称为基于行为模式挖掘的多行为推荐(BPMR)。我们的方法广泛地调查了用户和项目之间不同的交互模式，利用这些模式作为提出建议的特征。我们使用贝叶斯方法来简化推荐过程，有效地规避了图形神经网络算法带来的挑战，例如由于过度平滑而无法准确地捕获用户偏好。我们对三个真实世界数据集的实验评估表明，BPMR 显著优于现有的最先进的算法，显示 NDCG@10指标的平均改进为268.29。我们的《业务流程核证机关守则》已开放供 https://github.com/rookitkitlee/BPMR 使用和进一步研究。	code	0
Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation	Xinyu Mao, Shengyao Zhuang, Bevan Koopman, Guido Zuccon	The University of Queensland; CSIRO	The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation. In this paper, we propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation, without the need for costly model fine-tuning and inference. This method exploits continuous relevance feedback from reviewers during document screening to efficiently update the dense query representation, which is then applied to rank the remaining documents to be screened. We evaluate this approach across the CLEF TAR datasets for this task. Results suggest that the investigated dense query-driven approach is more efficient than directly using neural models and shows promising effectiveness compared to previous methods developed on the considered datasets. Our code is available at https://github.com/ielab/dense-screening-feedback.	在系统评价中筛选优先级的目标是确定具有高召回率的相关文档，并将其排在早期位置以供审查。如果与停止标准配对，则可以节省审查工作，如果与下游任务一起执行，则可以加快审查完成。最近的研究表明，神经模型在这项任务上有很好的潜力，但是它们耗费时间的微调和推理阻碍了它们在筛选优先级方面的广泛应用。在本文中，我们提出了一种替代方法，这种方法仍然依赖于神经模型，但是利用密集的表示和关联反馈来增强筛选的优先级，而不需要昂贵的模型微调和推断。这种方法利用审阅者在文件筛选过程中的连续关联反馈，有效地更新密集的查询表示，然后应用于对待筛选的其余文件进行排序。我们通过 CLEF TAR 数据集评估这种方法。结果表明，所研究的密集查询驱动的方法比直接使用神经模型更有效，并显示出有希望的有效性比以往的方法开发考虑数据集。我们的代码可以在 https://github.com/ielab/dense-screening-feedback 找到。	code	0
Cross-reconstructed Augmentation for Dual-target Cross-domain Recommendation	Qingyang Mao, Qi Liu, Zhi Li, Likang Wu, Bing Lv, Zheng Zhang	College of Management and Economics, Tianjin University, Tianjin, China; University of Science and Technology of China State Key Laboratory of Cognitive Intelligence;Hefei Comprehensive National Science Center Institute of Artificial Intelligence; University of Science and Technology of China State Key Laboratory of Cognitive Intelligence; Shenzhen International Graduate School, Tsinghua University, Shenzhen, China	To alleviate the long-standing data sparsity issue in recommender systems, numerous studies in cross-domain recommendation (CDR) have been conducted to facilitate information transfer processes across domains. In recent years, dual-target CDR has been introduced to gain mutual improvements between two domains through more general bidirectional transfer rather than traditional one-way transit. Existing methods in dual-target CDR focus primarily on designing powerful encoders to learn representative cross-domain information, without tackling the fundamental issue of interaction data shortage. In this paper, we present CrossAug, a novel data augmentation approach to leverage interactions more efficiently in two domains. Specifically, we propose intra-domain and inter-domain augmentations based on cross-reconstructed representations in terms of sampled records. To reduce the harm of domain shift, we project domain-shared representations in two domains into a joint space with Householder transformations and apply center alignments. All these modules boost the utilization of interactions with little influence from negative transfer. Extensive experimental results over public datasets demonstrate the effectiveness of CrossAug and its components in dual-target CDR.	为了缓解推荐系统中长期存在的数据稀疏问题，人们对跨域推荐进行了大量的研究，以促进跨域的信息传递过程。近年来，双目标 CDR 被引入，通过更一般的双向传输而不是传统的单向传输，实现了两个领域之间的相互改进。现有的双目标 CDR 方法主要集中在设计强大的编码器来学习有代表性的跨域信息，而没有解决交互数据短缺的根本问题。在本文中，我们提出 CrossAug，一种新的数据增强方法，以更有效地利用交互作用在两个领域。具体来说，我们提出了域内和域间增强的基础上的交叉重建表示的抽样记录。为了减少域移位的危害，我们将两个域中的域共享表示投影到一个带有 Householder 变换的联合空间中，并应用中心对齐。所有这些模块都提高了交互作用的利用率，而负迁移的影响很小。在公共数据集上的大量实验结果证明了 CrossAug 及其组件在双目标 CDR 中的有效性。	code	0
Distillation for Multilingual Information Retrieval	Eugene Yang, Dawn J. Lawrie, James Mayfield	HLT/COE; Johns Hopkins University; Human Language Technology Center of Excellence, Johns Hopkins University	Recent work in cross-language information retrieval (CLIR), where queries anddocuments are in different languages, has shown the benefit of theTranslate-Distill framework that trains a cross-language neural dual-encodermodel using translation and distillation. However, Translate-Distill onlysupports a single document language. Multilingual information retrieval (MLIR),which ranks a multilingual document collection, is harder to train than CLIRbecause the model must assign comparable relevance scores to documents indifferent languages. This work extends Translate-Distill and proposeMultilingual Translate-Distill (MTD) for MLIR. We show that ColBERT-X modelstrained with MTD outperform their counterparts trained ith MultilingualTranslate-Train, which is the previous state-of-the-art training approach, by5robust to the way languages are mixed in training batches. Our implementationis available on GitHub.	最近在跨语检索(CLIR)领域的工作显示了翻译-提取框架的好处，该框架通过翻译和提取训练了一个跨语言的神经双编码器模型。但是，Translate-Distill 只支持一种文档语言。对多语言文档集进行排序的多语言信息检索(mLIR)比 CLIR 更难训练，因为该模型必须为不同语言的文档分配可比较的相关度分数。这项工作扩展了 MLIR 的翻译-蒸馏和提出的多语言翻译-蒸馏(MTD)。我们表明，使用 MTD 训练的 ColBERT-X 模型优于使用多语言翻译训练(这是以前的最先进的训练方法)训练的对应模型，因为它对训练批次中语言的混合方式具有鲁棒性。我们的实现可以在 GitHub 上找到。	code	0
Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees	Jingwei Kang, Maarten de Rijke, Harrie Oosterhuis	Radboud University; University of Amsterdam	Stochastic learning to rank (LTR) is a recent branch in the LTR field thatconcerns the optimization of probabilistic ranking models. Their probabilisticbehavior enables certain ranking qualities that are impossible withdeterministic models. For example, they can increase the diversity of displayeddocuments, increase fairness of exposure over documents, and better balanceexploitation and exploration through randomization. A core difficulty in LTR isgradient estimation, for this reason, existing stochastic LTR methods have beenlimited to differentiable ranking models (e.g., neural networks). This is instark contrast with the general field of LTR where Gradient Boosted DecisionTrees (GBDTs) have long been considered the state-of-the-art. In this work, we address this gap by introducing the first stochastic LTRmethod for GBDTs. Our main contribution is a novel estimator for thesecond-order derivatives, i.e., the Hessian matrix, which is a requirement foreffective GBDTs. To efficiently compute both the first and second-orderderivatives simultaneously, we incorporate our estimator into the existingPL-Rank framework, which was originally designed for first-order derivativesonly. Our experimental results indicate that stochastic LTR without the Hessianhas extremely poor performance, whilst the performance is competitive with thecurrent state-of-the-art with our estimated Hessian. Thus, through thecontribution of our novel Hessian estimation method, we have successfullyintroduced GBDTs to stochastic LTR.	随机学习排序(LTR)是 LTR 领域的一个新兴分支，主要研究概率排序模型的优化问题。他们的概率行为使得某些排名质量是不可能的确定性模型。例如，它们可以增加显示文档的多样性，增加文档曝光的公平性，以及通过随机化更好地平衡开发和探索。LTR 的一个核心难点是梯度估计，由于这个原因，现有的随机 LTR 方法仅限于可微排序模型(如神经网络)。这与长期以来一直被认为是最先进的梯度增强决策树(GBDTs)的 LTR 的一般领域形成了鲜明的对比。在这项工作中，我们通过引入第一个随机 LTR- 方法来解决这个差距。我们的主要贡献是一个新的估计器的二阶导数，即，黑森矩阵，这是一个需要有效的 GBDTs。为了同时有效地计算一阶导数和二阶导数，我们将估计量合并到现有的 PL 秩框架中，这个框架最初是为一阶导数而设计的。我们的实验结果表明，随机 LTR 没有黑森有非常差的性能，而性能是竞争性的当前国家的最先进的，我们估计黑森。因此，通过我们新的 Hessian 估计方法的贡献，我们成功地将 GBDTs 引入到随机 LTR 中。	code	0
Information Diffusion Prediction via Cascade-Retrieved In-context Learning	Ting Zhong, Jienan Zhang, Zhangtao Cheng, Fan Zhou, Xueqin Chen	University of Electronic Science and Technology of China, Chengdu, Sichuan, China; Delft University of Technology Faculty of Civil Engineering and Geosciences; University of Electronic Science and Technology of China; University of Electronic Science and Technology of China, Chengdu, China	Information diffusion prediction, which aims to infer the infected behavior of individual users during information spread, is critical for understanding the dynamics of information propagation and users' influence on online social media. To date, existing methods either focus on capturing limited contextual information from a single cascade, overlooking the potentially complex dependencies across different cascades, or they are committed to improving model performance by using intricate technologies to extract additional features as supplements to user representations, neglecting the drift of model performance across different platforms. To address these limitations, we propose a novel framework called CARE (CAscade-REtrieved In-Context Learning) inspired by the concept of in-context learning in LLMs. Specifically, CARE first constructs a prompts pool derived from historical cascades, then utilizes ranking-based search engine techniques to retrieve prompts with similar patterns based on the query. Moreover, CARE also introduces two augmentation strategies alongside social relationship enhancement to enrich the input context. Finally, the transformed query-cascade representation from a GPT-type architecture is projected to obtain the prediction. Experiments on real-world datasets from various platforms show that CARE outperforms state-of-the-art baselines in terms of effectiveness and robustness in information diffusion prediction.	信息扩散预测旨在推断个体用户在信息传播过程中的感染行为，对于理解信息传播动态和用户对网络社交媒体的影响至关重要。迄今为止，现有的方法要么侧重于从单个级联捕获有限的上下文信息，忽视不同级联之间潜在的复杂依赖性，要么致力于通过使用复杂的技术提取额外的特征作为用户表示的补充来改善模型性能，忽略了不同平台之间模型性能的漂移。为了解决这些局限性，我们提出了一个新的框架，称为 CARE (级联检索在上下文学习)的启发在 LLM 中的上下文学习的概念。具体来说，CARE 首先构造一个源自历史级联的提示池，然后利用基于排序的搜索引擎技术根据查询检索具有类似模式的提示。此外，CARE 在增强社会关系的同时，还引入了两种增强策略来丰富输入语境。最后，从一个 GPT 类型的体系结构转换的查询级联表示进行投影，以获得预测。对来自不同平台的真实世界数据集的实验表明，CARE 在信息扩散预测的有效性和鲁棒性方面优于最先进的基线。	code	0
Masked Graph Transformer for Large-Scale Recommendation	Huiyuan Chen, Zhe Xu, ChinChia Michael Yeh, Vivian Lai, Yan Zheng, Minghua Xu, Hanghang Tong	University of Illinois Urbana-Champaign; Visa Inc	Graph Transformers have garnered significant attention for learninggraph-structured data, thanks to their superb ability to capture long-rangedependencies among nodes. However, the quadratic space and time complexityhinders the scalability of Graph Transformers, particularly for large-scalerecommendation. Here we propose an efficient Masked Graph Transformer, namedMGFormer, capable of capturing all-pair interactions among nodes with a linearcomplexity. To achieve this, we treat all user/item nodes as independenttokens, enhance them with positional embeddings, and feed them into akernelized attention module. Additionally, we incorporate learnable relativedegree information to appropriately reweigh the attentions. Experimentalresults show the superior performance of our MGFormer, even with a singleattention layer.	图形变换器已经获得了学习图形结构化数据的重要关注，由于他们的卓越的能力捕获节点之间的长距离依赖。然而，二次空间和时间的复杂性阻碍了图形变换器的可扩展性，特别是对于大规模推荐。在这里，我们提出了一个有效的屏蔽图形转换器，命名为 MGForm，能够捕获所有对节点之间的交互具有线性复杂度。为了实现这一点，我们将所有用户/项目节点视为独立的令牌，使用位置嵌入来增强它们，并将它们提供给内核化的注意模块。此外，我们结合可学习的相对程度信息，以适当地重新权衡注意力。实验结果表明，即使只有一个注意层，我们设计的 MG 变换器仍具有优越的性能。	code	0
Modeling Domains as Distributions with Uncertainty for Cross-Domain Recommendation	Xianghui Zhu, Mengqun Jin, Hengyu Zhang, Chang Meng, Daoxin Zhang, Xiu Li	Shenzhen International Graduate School, Tsinghua University, Shenzhen, China; Xiaohongshu Inc., Shanghai, China	In the field of dual-target Cross-Domain Recommendation (DTCDR), improving the performance in both the information sparse domain and rich domain has been a mainstream research trend. However, prior embedding-based methods are insufficient to adequately describe the dynamics of user actions and items across domains. Moreover, previous efforts frequently lacked a comprehensive investigation of the entire domain distributions. This paper proposes a novel framework entitled Wasserstein Cross-Domain Recommendation (WCDR) that captures uncertainty in Wasserstein space to address above challenges. In this framework, we abstract user/item actions as Elliptical Gaussian distributions and divide them into local-intrinsic and global-domain parts. To further model the domain diversity, we adopt shared-specific pattern for global-domain distributions and present Masked Domain-aware Sub-distribution Aggregation (MDSA) module to produce informative and diversified global-domain distributions, which incorporates attention-based aggregation method and masking strategy that alleviates negative transfer issues. Extensive experiments on two public datasets and one business dataset are conducted. Experimental results demonstrate the superiority of WCDR over state-of-the-art methods.	在双目标跨域推荐(DTCDR)领域，提高信息稀疏域和富域的性能已成为主流研究趋势。但是，先前的基于嵌入的方法不足以充分描述跨领域的用户操作和项的动态。此外，以前的努力经常缺乏对整个领域分布的全面调查。本文提出了一个新的框架，名为沃瑟斯坦跨域建议(WCDR) ，捕捉 Wasserstein 空间的不确定性，以解决上述挑战。在这个框架中，我们将用户/项目行为抽象为椭圆高斯分布，并将其分为局部固有行为和全局域行为。为了进一步对领域多样性进行建模，我们对全局领域分布采用了共享特定模式，并提出了掩蔽领域感知子分布聚合(MDSA)模块，以生成信息丰富且多样化的全局领域分布，该模块结合了基于注意的聚合方法和掩蔽策略，减轻了负迁移问题。对两个公共数据集和一个业务数据集进行了广泛的实验。实验结果表明，WCDR 方法优于目前最先进的方法。	code	0
SCM4SR: Structural Causal Model-based Data Augmentation for Robust Session-based Recommendation	Muskan Gupta, Priyanka Gupta, Jyoti Narwariya, Lovekesh Vig, Gautam Shroff	TCS Research, Delhi, India	With mounting privacy concerns, and movement towards a cookie-less internet, session-based recommendation (SR) models are gaining increasing popularity. The goal of SR models is to recommend top-K items to a user by utilizing information from past actions within a session. Many deep neural networks (DNN) based SR have been proposed in the literature, however, they experience performance declines in practice due to inherent biases (e.g., popularity bias) present in training data. To alleviate this, we propose an underlying neural-network (NN) based Structural Causal Model (SCM) which comprises an evolving user behavior (simulator) and recommendation model. The causal relations between the two sub-models and variables at consecutive timesteps are defined by a sequence of structural equations, whose parameters are learned using logged data. The learned SCM enables the simulation of a user's response on a counterfactual list of recommended items (slate). For this, we intervene on recommendation slates with counterfactual slates and simulate the user's response through learned SCM thereby generating counterfactual sessions to augment the training data. Through extensive empirical evaluation on simulated and real-world datasets, we show that the augmented data mitigates the impact of sparse training data and improves the performance of the SR models.	随着越来越多的隐私问题，以及向无 cookie 互联网的转变，基于会话的推荐(SR)模型越来越受欢迎。SR 模型的目标是通过利用会话中过去操作的信息向用户推荐 top-K 条目。文献中提出了许多基于深层神经网络(DNN)的 SR 方法，然而，由于训练数据中存在固有的偏差(如流行偏差) ，它们在实际应用中的性能下降。为了解决这一问题，我们提出了一种基于神经网络(NN)的结构性因果模型(SCM) ，该模型包括演化的用户行为模拟器(模拟器)和推荐模型。这两个子模型和变量之间的因果关系在连续的时间步长定义了一系列的结构方程，其参数是利用测井数据学习。学到的 SCM 可以模拟用户对推荐项目(板岩)的反事实列表的响应。为此，我们使用反事实模板介入推荐平台，并通过学习 SCM 模拟用户的反应，从而产生反事实会话来增加训练数据。通过对模拟数据集和真实数据集的大量实证评估，我们发现增强数据减轻了稀疏训练数据的影响，提高了 SR 模型的性能。	code	0
USimAgent: Large Language Models for Simulating Search Users	Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, Jiaxin Mao	Renmin University of China	Due to the advantages in the cost-efficiency and reproducibility, usersimulation has become a promising solution to the user-centric evaluation ofinformation retrieval systems. Nonetheless, accurately simulating user searchbehaviors has long been a challenge, because users' actions in search arehighly complex and driven by intricate cognitive processes such as learning,reasoning, and planning. Recently, Large Language Models (LLMs) havedemonstrated remarked potential in simulating human-level intelligence and havebeen used in building autonomous agents for various tasks. However, thepotential of using LLMs in simulating search behaviors has not yet been fullyexplored. In this paper, we introduce a LLM-based user search behaviorsimulator, USimAgent. The proposed simulator can simulate users' querying,clicking, and stopping behaviors during search, and thus, is capable ofgenerating complete search sessions for specific search tasks. Empiricalinvestigation on a real user behavior dataset shows that the proposed simulatoroutperforms existing methods in query generation and is comparable totraditional methods in predicting user clicks and stopping behaviors. Theseresults not only validate the effectiveness of using LLMs for user simulationbut also shed light on the development of a more robust and generic usersimulators.	由于用户仿真具有成本低、重复性好等优点，已成为信息检索系统以用户为中心评估的一种有前途的解决方案。尽管如此，精确模拟用户搜索行为长期以来一直是一个挑战，因为用户在搜索中的行为非常复杂，并受到复杂的认知过程(如学习、推理和计划)的驱动。近年来，大语言模型(LLM)在模拟人类智能水平方面显示出了巨大的潜力，并被用于为各种任务构建自主代理。然而，在模拟搜索行为中使用 LLM 的潜力还没有被充分探索。本文介绍了一个基于 LLM 的用户搜索行为模拟器 USimAgent。该模拟器可以模拟用户在搜索过程中的查询、点击和停止行为，从而能够为特定的搜索任务生成完整的搜索会话。对真实用户行为数据集的实验研究表明，该模拟器在查询生成方面优于现有方法，在预测用户点击和停止行为方面与传统方法具有可比性。研究结果不仅验证了 LLM 用于用户仿真的有效性，而且为开发更加健壮和通用的用户仿真器提供了参考。	code	0
ECAT: A Entire space Continual and Adaptive Transfer Learning Framework for Cross-Domain Recommendation	Chaoqun Hou, Yuanhang Zhou, Yi Cao, Tong Liu	Alibaba Group	In industrial recommendation systems, there are several mini-apps designed to meet the diverse interests and needs of users. The sample space of them is merely a small subset of the entire space, making it challenging to train an efficient model. In recent years, there have been many excellent studies related to cross-domain recommendation aimed at mitigating the problem of data sparsity. However, few of them have simultaneously considered the adaptability of both sample and representation continual transfer setting to the target task. To overcome the above issue, we propose a Entire space Continual and Adaptive Transfer learning framework called ECAT which includes two core components: First, as for sample transfer, we propose a two-stage method that realizes a coarse-to-fine process. Specifically, we perform an initial selection through a graph-guided method, followed by a fine-grained selection using domain adaptation method. Second, we propose an adaptive knowledge distillation method for continually transferring the representations from a model that is well-trained on the entire space dataset. ECAT enables full utilization of the entire space samples and representations under the supervision of the target task, while avoiding negative migration. Comprehensive experiments on real-world industrial datasets from Taobao show that ECAT advances state-of-the-art performance on offline metrics, and brings +13.6	在工业推荐系统中，有几个微型应用程序可以满足用户的不同兴趣和需求。它们的样本空间只是整个空间的一个小子集，这使得训练一个有效的模型变得非常困难。近年来，针对数据稀疏问题的跨域推荐已经有了很多优秀的研究成果。然而，很少有人同时考虑样本和表征连续迁移设置对目标任务的适应性。为了克服上述问题，我们提出了一个称为 ECAT 的全空间连续自适应传输学习框架，该框架包括两个核心部分: 首先，对于样本传输，我们提出了一个两阶段的方法，实现了从粗到精的过程。具体来说，我们通过图形引导的方法执行初始选择，然后使用领域自适应方法进行细粒度选择。其次，我们提出了一种自适应知识精馏方法，用于连续地从一个对整个空间数据集训练有素的模型中传递表示。ECAT 能够在目标任务的监督下充分利用整个空间样本和表示，同时避免负迁移。对淘宝网真实工业数据集的全面实验表明，ECAT 在离线指标方面提高了最先进的性能，带来了 + 13.6	code	0
Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies	ChihWei Hsu, Martin Mladenov, Ofer Meshi, James Pine, Hubert Pham, Shane Li, Xujian Liang, Anton Polishko, Li Yang, Ben Scheetz, Craig Boutilier	Google Research, Mountain View, CA, USA; YouTube, New York, NY, USA; Google, Mountain View, CA, USA	Evaluation of policies in recommender systems typically involves A/B live experiments on real users to assess a new policy's impact on relevant metrics. This "gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for onboarding users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of preference elicitation algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we can test new algorithms in a way that reliably predicts their performance on key metrics when deployed live.	推荐系统中的策略评估通常包括对真实用户进行 A/B 现场实验，以评估新策略对相关指标的影响。然而，这种“黄金标准”在周期时间、用户成本和潜在用户保留方面成本很高。在为新入职用户制定政策时，这些成本尤其成问题，因为新入职只发生一次。在这项工作中，我们描述了一种模拟方法，用于增加(和减少)现场实验的使用。我们举例说明它的部署，用于评估用于 YouTube 音乐平台上的新用户的偏好启发算法。通过开发反事实的健壮用户行为模型，以及将这些模型与生产基础设施结合起来的仿真服务，我们可以测试新算法，从而可靠地预测它们在实时部署时的关键指标性能。	code	0
A Semantic Search Engine for Helping Patients Find Doctors and Locations in a Large Healthcare Organization	Mayank Kejriwal, Hamid Haidarian, MinHsueh Chiu, Andy Xiang, Deep Shrestha, Faizan Javed	Kaiser Permanente Digital, Oakland, CA, USA; University of Southern California, Marina del Rey, CA, USA; University of Southern California, Marina Del Rey, CA, USA	Efficiently finding doctors and locations (FDL) is an important search problem for patients in the healthcare domain, for which traditional information retrieval (IR) methods tend to be sub-optimal. This paper introduces and defines FDL as an important healthcare industry-specific problem in IR. We then propose a semantic search engine as a robust solution to FDL in Kaiser Permanente (KP), a large healthcare organization with 12 million members. Our solution meets practical needs of data security and privacy, scalability, cost-effectiveness, backward compatibility with existing indexes and search infrastructure, and interpretability of outputs for patients. It uses a concept-rich ontology to model raw data from multiple sources as entities, relations, and attributes in a knowledge graph that is stored and indexed in an industry-scale graph database. We evaluate the solution on a real patient-query log and demonstrate its practical utility. The system has been implemented and deployed live to KP customers.	对于医疗领域的患者来说，有效地找到医生和位置(FDL)是一个重要的搜索问题，而传统的信息检索(IR)方法往往不是最佳方法。本文介绍并定义了 FDL 作为医疗保健行业特有的一个重要问题。然后，我们提出了一个语义搜索引擎作为一个健壮的解决方案的 FDL 在凯泽永久(金伯利) ，一个大型医疗保健组织的12万成员。我们的解决方案满足了数据安全和隐私、可扩展性、成本效益、与现有索引和搜索基础设施的向下兼容以及病人输出结果的可解释性的实际需求。它使用概念丰富的本体将来自多个来源的原始数据建模为知识图中的实体、关系和属性，该知识图存储在工业规模的图形数据库中并进行索引。我们在一个真实的病人查询日志上评估了这个解决方案，并演示了它的实用性。该系统已经实施，并部署到金伯利进程客户现场。	code	0
Clinical Trial Retrieval via Multi-grained Similarity Learning	Junyu Luo, Cheng Qian, Lucas Glass, Fenglong Ma	IQVIA, Philadelphia, PA, USA; The Pennsylvania State University, University Park, USA; IQVIA, Chicago, USA	Clinical trial analysis is one of the main business directions and services in IQVIA, and reviewing past similar studies is one of the most critical steps before starting a commercial clinical trial. The current review process is manual and time-consuming, requiring a clinical trial analyst to manually search through an extensive clinical trial database and then review all candidate studies. Therefore, it is of great interest to develop an automatic retrieval algorithm to select similar studies by giving new study information. To achieve this goal, we propose a novel group-based trial similarity learning network named GTSLNet, consisting of two kinds of similarity learning modules. The pair-wise section-level similarity learning module aims to compare the query trial and the candidate trial from the abstract semantic level via the proposed section transformer. Meanwhile, a word-level similarity learning module uses the word similarly matrix to capture the low-level similarity information. Additionally, an aggregation module combines these similarities. To address potential false negatives and noisy data, we introduce a variance-regularized group distance loss function. Experiment results show that the proposed GTSLNet significantly and consistently outperforms state-of-the-art baselines.	临床试验分析是 IQVIA 主要的商业方向和服务之一，在开始商业临床试验之前，回顾过去的类似研究是最关键的步骤之一。目前的审查过程是手工和耗时的，需要一个临床试验分析人员手工搜索通过一个广泛的临床试验数据库，然后审查所有候选研究。因此，开发一种自动检索算法，通过提供新的研究信息来选择相似的研究，具有重要的意义。为了实现这一目标，我们提出了一种新的基于组的试验相似性学习网络 GTSLNet，该网络由两种相似性学习模块组成。两节级相似性学习模块通过提出的节变换，从抽象语义层面对查询试验和候选试验进行比较。同时，一个词级相似度学习模块利用词相似矩阵来获取低级相似度信息。此外，聚合模块将这些相似性组合在一起。为了处理潜在的假阴性和噪声数据，我们引入了一个方差正则化的群距离损失函数。实验结果表明，所提出的 GTSLNet 性能明显优于最先进的基线。	code	0
Search under Uncertainty: Cognitive Biases and Heuristics: A Tutorial on Testing, Mitigating and Accounting for Cognitive Biases in Search Experiments	Jiqun Liu, Leif Azzopardi	University of Strathclyde, Glasgow, United Kingdom; The University of Oklahoma, Norman, OK, USA	Understanding how people interact with search interfaces is core to the field of Interactive Information Retrieval (IIR). While various models have been proposed (e.g., Belkin's ASK, Berry picking, Everyday-life information seeking, Information foraging theory, Economic theory, etc.), they have largely ignored the impact of cognitive biases on search behaviour and performance. A growing body of empirical work exploring how people's cognitive biases influence search and judgments, has led to the development of new models of search that draw upon Behavioural Economics and Psychology. This full day tutorial will provide a starting point for researchers seeking to learn more about information seeking, search and retrieval under uncertainty. The tutorial will be structured into three parts. First, we will provide an introduction of the biases and heuristics program put forward by Tversky and Kahneman [60] (1974) which assumes that people are not always rational. The second part of the tutorial will provide an overview of the types and space of biases in search,[5, 40] before doing a deep dive into several specific examples and the impact of biases on different types of decisions (e.g., health/medical, financial). The third part will focus on a discussion of the practical implication regarding the design and evaluation human-centered IR systems in the light of cognitive biases - where participants will undertake some hands-on exercises.	了解人们如何与搜索界面互动是交互式信息检索(IIR)领域的核心。虽然已经提出了各种各样的模型(例如，Belkin 的 ASK，Berry 拣选，日常生活中的信息搜索，信息搜索理论，经济学理论等) ，但是他们很大程度上忽略了认知偏差对搜索行为和性能的影响。越来越多的实证研究探索人们的认知偏见如何影响搜索和判断，导致了新的搜索模式的发展，这些模式借鉴了行为经济学和心理学。这一整天的教程将提供一个起点，研究人员寻求了解更多的信息搜索，搜索和检索在不确定性下。本教程将分为三个部分。首先，我们将介绍 Tversky 和 Kahneman 60提出的假设人们不总是理性的偏见和启发式程序。本教程的第二部分将提供一个关于搜索中偏见的类型和空间的概述[5,40] ，然后深入研究几个具体的例子，以及偏见对不同类型决策(例如，健康/医疗，金融)的影响。第三部分将着重讨论基于认知偏差的以人为中心的信息检索系统的设计和评估的实际意义——参与者将进行一些实践练习。	code	0
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision	Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang	Shanghai Jiao Tong University; China Pacific Insurance Company	Numerous large language model (LLM) agents have been built for differenttasks like web navigation and online shopping due to LLM's wide knowledge andtext-understanding ability. Among these works, many of them utilize in-contextexamples to achieve generalization without the need for fine-tuning, while fewof them have considered the problem of how to select and effectively utilizethese examples. Recently, methods based on trajectory-level retrieval with taskmeta-data and using trajectories as in-context examples have been proposed toimprove the agent's overall performance in some sequential decision makingtasks. However, these methods can be problematic due to plausible examplesretrieved without task-specific state transition dynamics and long input withplenty of irrelevant context. In this paper, we propose a novel framework(TRAD) to address these issues. TRAD first conducts Thought Retrieval,achieving step-level demonstration selection via thought matching, leading tomore helpful demonstrations and less irrelevant input noise. Then, TRADintroduces Aligned Decision, complementing retrieved demonstration steps withtheir previous or subsequent steps, which enables tolerance for imperfectthought and provides a choice for balance between more context and less noise.Extensive experiments on ALFWorld and Mind2Web benchmarks show that TRAD notonly outperforms state-of-the-art models but also effectively helps in reducingnoise and promoting generalization. Furthermore, TRAD has been deployed inreal-world scenarios of a global business insurance company and improves thesuccess rate of robotic process automation.	由于大语言模型(LLM)具有广泛的知识和文本理解能力，人们已经为不同的任务建立了大量的大语言模型(LLM)代理，如网络导航和在线购物。在这些工作中，许多人利用上下文中的例子来实现泛化而不需要进行微调，而很少有人考虑如何选择和有效利用这些例子的问题。近年来，人们提出了基于任务元数据的轨迹级检索方法，并将轨迹作为上下文实例来提高代理在一些连续决策任务中的整体性能。然而，这些方法可能是有问题的，因为没有任务特定的状态转换动态检索似乎合理的例子和大量不相关的上下文的长输入。在本文中，我们提出了一个新的框架(TRAD)来解决这些问题。TRAD 首先进行思维检索，通过思维匹配实现阶段性的演示选择，使演示更有帮助，输入噪音更少。然后，TRAD 引入了对齐决策，将检索到的示范步骤与之前或之后的步骤相互补充，这使得对不完美思想的容忍度成为可能，并提供了一个在更多上下文和更少噪音之间取得平衡的选择。在 ALFWorld 和 Mind2Web 基准上的大量实验表明，TRAD 不仅优于最先进的模型，而且有效地帮助降低噪音和促进推广。此外，TRAD 已经部署在一个全球商业保险公司的现实世界场景中，并提高了机器人过程自动化的成功率。	code	0
Representation Learning and Information Retrieval	Yiming Yang		How to best represent words, documents, queries, entities, relations, and other variables in information retrieval (IR) and related applications has been a fundamental research question for decades. Early IR systems relied on the independence assumptions about words and documents for simplicity and scalability, which were clearly sub-optimal from a semantic point of view. The rapid development of deep neural networks in the past decade has revolutionized the representation learning technologies for contextualized word embedding and graph-enhanced document embedding, leading to the new era of dense IR. This talk highlights such impactful shifts in representation learning for IR and related areas, the new challenges coming along and the remedies, including our recent work in large-scale dense IR [1, 9], in graph-based reasoning for knowledge-enhanced predictions [10], in self-refinement of large language models (LLMs) with retrieval augmented generation (RAG)[2,7] and iterative feedback [3,4], in principle-driven self-alignment of LLMs with minimum human supervision [6], etc. More generally, the power of such deep learning goes beyond IR enhancements, e.g., for significantly improving the state-of-the-art solvers for NP-Complete problems in classical computer science [5,8].	如何最好地表示单词、文档、查询、实体、关系以及其他变量在信息检索(IR)和相关应用中一直是几十年来的一个基础研究问题。早期的 IR 系统依赖于关于单词和文档的独立性假设，以实现简单性和可伸缩性，从语义角度来看，这显然是次优的。近十年来深层神经网络的迅速发展，使得上下文化词语嵌入和图增强文档嵌入的表示学习技术发生了革命性的变化，进入了密集红外的新时代。这个演讲强调了 IR 和相关领域表示学习的这种有影响力的转变，新的挑战和补救措施，包括我们最近在大规模稠密 IR [1,9] ，基于图形的知识增强预测推理[10] ，大型语言模型(LLM)的自我完善(RAG)[2,7]和迭代反馈[3,4] ，在原则驱动的 LLM 自我调整与最小人类监督[6]等。更一般地说，这种深度学习的力量超越了 IR 增强，例如，显着提高了经典计算机科学中 NP 完全问题的最先进的解决方案[5,8]。	code	0
"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"	Andrew Parry, Debasis Ganguly, Manish Chandra	University of Glasgow Computing Science; University of Glasgow School of Computing; University of Glasgow	With the increasing ability of large language models (LLMs), in-contextlearning (ICL) has evolved as a new paradigm for natural language processing(NLP), where instead of fine-tuning the parameters of an LLM specific to adownstream task with labeled examples, a small number of such examples isappended to a prompt instruction for controlling the decoder's generationprocess. ICL, thus, is conceptually similar to a non-parametric approach, suchas k-NN, where the prediction for each instance essentially depends on thelocal topology, i.e., on a localised set of similar instances and their labels(called few-shot examples). This suggests that a test instance in ICL isanalogous to a query in IR, and similar examples in ICL retrieved from atraining set relate to a set of documents retrieved from a collection in IR.While standard unsupervised ranking models can be used to retrieve thesefew-shot examples from a training set, the effectiveness of the examples canpotentially be improved by re-defining the notion of relevance specific to itsutility for the downstream task, i.e., considering an example to be relevant ifincluding it in the prompt instruction leads to a correct prediction. With thistask-specific notion of relevance, it is possible to train a supervised rankingmodel (e.g., a bi-encoder or cross-encoder), which potentially learns tooptimally select the few-shot examples. We believe that the recent advances inneural rankers can potentially find a use case for this task of optimallychoosing examples for more effective downstream ICL predictions.	随着大语言模型(LLM)能力的不断提高，上下文内学习(in-context learning，ICL)已经成为自然语言处理(NLP)的一种新范式。因此，ICL 在概念上类似于非参数方法，例如 k-NN，其中每个实例的预测基本上依赖于局部拓扑，即一组局部化的相似实例及其标签(称为极少数例子)。这表明 ICL 中的测试实例类似于 IR 中的查询，并且从训练集中检索的 ICL 中的类似实例与从 IR 中的集合中检索的一组文档有关。虽然标准的无监督排序模型可以用于从训练集中检索这些几个镜头的例子，但是这些例子的有效性可以通过重新定义特定于下游任务的适用性的相关性概念来提高，即，如果在提示指令中包含一个相关的例子，则会导致正确的预测。有了这个任务特定的相关性概念，就有可能训练一个有监督的排名模型(例如，一个双编码器或交叉编码器) ，它可能学会最佳地选择少数镜头的例子。我们相信，最近在神经排序的进展可能会找到这个任务的最佳选择更有效的下游 ICL 预测例子的用例。	code	0
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval	Zhenyu Yang, Dizhan Xue, Shengsheng Qian, Weiming Dong, Changsheng Xu	State Key Laboratory of Multimodal Artificial Intelligence Systems	Zero-Shot Composed Image Retrieval (ZS-CIR) has garnered increasing interest in recent years, which aims to retrieve a target image based on a query composed of a reference image and a modification text without training samples. Specifically, the modification text describes the distinction between the two images. To conduct ZS-CIR, the prevailing methods employ pre-trained image-to-text models to transform the query image and text into a single text, which is then projected into the common feature space by CLIP to retrieve the target image. However, these methods neglect that ZS-CIR is a typicalfuzzy retrieval task, where the semantics of the target image are not strictly defined by the query image and text. To overcome this limitation, this paper proposes a training-free LLM-based Divergent Reasoning and Ensemble (LDRE) method for ZS-CIR to capture diverse possible semantics of the composed result. Firstly, we employ a pre-trained captioning model to generate dense captions for the reference image, focusing on different semantic perspectives of the reference image. Then, we prompt Large Language Models (LLMs) to conduct divergent compositional reasoning based on the dense captions and modification text, deriving divergent edited captions that cover the possible semantics of the composed target. Finally, we design a divergent caption ensemble to obtain the ensemble caption feature weighted by semantic relevance scores, which is subsequently utilized to retrieve the target image in the CLIP feature space. Extensive experiments on three public datasets demonstrate that our proposed LDRE achieves the new state-of-the-art performance.	近年来，零拍摄合成图像检索(ZS-CIR)越来越受到人们的关注，其目的是基于一个由参考图像和修改文本组成的查询来检索目标图像，而不需要训练样本。具体来说，修改文本描述了这两个图像之间的区别。为了实现 ZS-CIR，常用的方法是使用预先训练好的图文模型将查询图像和文本转换为单个文本，然后通过 CLIP 投影到公共特征空间中检索目标图像。然而，这些方法忽略了 ZS-CIR 是一个典型的模糊检索任务，其中目标图像的语义没有严格地由查询图像和文本来定义。为了克服这一局限性，本文提出了一种基于无训练 LLM 的 ZS-CIR 发散推理与集成(LDRE)方法，用于捕获组合结果的多种可能语义。首先，我们使用一个预先训练的字幕模型为参考图像生成密集的字幕，重点关注参考图像的不同语义视角。然后，我们提示大语言模型(LLM)进行发散的组合推理的基础上密集的标题和修改文本，导出发散的编辑标题，涵盖可能的语义组合的目标。最后，我们设计了一个发散字幕集合来获得以语义相关分数为权重的字幕集合特征，然后利用该特征在 CLIP 特征空间中检索目标图像。在三个公共数据集上的大量实验表明，我们提出的 LDRE 实现了新的最先进的性能。	code	0
EditKG: Editing Knowledge Graph for Recommendation	Gu Tang, Xiaoying Gan, Jinghe Wang, Bin Lu, Lyuwen Wu, Luoyi Fu, Chenghu Zhou	Shanghai Jiao Tong University, Shanghai, China; Chinese Academy of Sciences, Beijing, China	With the enrichment of user-item interactions, Graph Neural Networks (GNNs) are widely used in recommender systems to alleviate information overload. Nevertheless, they still suffer from the cold-start issue. Knowledge Graphs (KGs), providing external information, have been extensively applied in GNN-based methods to mitigate this issue. However, current KG-aware recommendation methods suffer from the knowledge imbalance problem caused by incompleteness of existing KGs. This imbalance is reflected by the long-tail phenomenon of item attributes, i.e., unpopular items usually lack more attributes compared to popular items. To tackle this problem, we propose a novel framework called EditKG: Editing Knowledge Graph for Recommendation, to balance attribute distribution of items via editing KGs. EditKG consists of two key designs: Knowledge Generator and Knowledge Deleter. Knowledge Generator generates attributes for items by exploring their mutual information correlations and semantic correlations. Knowledge Deleter removes the task-irrelevant item attributes according to the parameterized task relevance score, while dropping the spurious item attributes through aligning the attribute scores. Extensive experiments on three benchmark datasets demonstrate that EditKG significantly outperforms state-of-the-art methods, and achieves 8.98% average improvement. The implementations are available at https://github.com/gutang-97/2024SIGIR-EditKG.	随着用户-项目交互的丰富，图形神经网络(GNN)被广泛应用于推荐系统以减轻信息超载。尽管如此，他们仍然受到冷启动问题的困扰。提供外部信息的知识图(KG)已广泛应用于基于 GNN 的方法中，以缓解这一问题。但是，现有的幼儿园知识推荐方法存在不完备性导致的知识不平衡问题。这种不平衡反映在项目属性的长尾现象上，也就是说，不受欢迎的项目通常比受欢迎的项目缺乏更多的属性。为了解决这个问题，我们提出了一个新的框架，即 EditKG: Editing Knowledge Graph for  推荐知识图，通过编辑 KG 来平衡项目的属性分布。EditKG 由两个关键设计组成: 知识生成器和知识删除器。知识生成器通过探索项目之间的信息相关性和语义相关性来生成项目的属性。知识删除器根据参数化的任务相关性得分删除与任务无关的项目属性，同时通过对齐属性得分删除虚假的项目属性。对三个基准数据集的大量实验表明，EditKG 的性能明显优于最先进的方法，平均改进率达到8.98% 。有关实施方案可于 https://github.com/gutang-97/2024sigir-editkg 下载。	code	0
GUITAR: Gradient Pruning toward Fast Neural Ranking	Weijie Zhao, Shulong Tan, Ping Li	Baidu Research USA; Rochester Institute of Technology; VecML	With the continuous popularity of deep learning and representation learning,fast vector search becomes a vital task in various ranking/retrieval basedapplications, say recommendation, ads ranking and question answering. Neuralnetwork based ranking is widely adopted due to its powerful capacity inmodeling complex relationships, such as between users and items, questions andanswers. However, it is usually exploited in offline or re-ranking manners forit is time-consuming in computations. Online neural network ranking–so calledfast neural ranking–is considered challenging because neural network measuresare usually non-convex and asymmetric. Traditional Approximate Nearest Neighbor(ANN) search which usually focuses on metric ranking measures, is notapplicable to these advanced measures. In this paper, we introduce a novel graph searching framework to acceleratethe searching in the fast neural ranking problem. The proposed graph searchingalgorithm is bi-level: we first construct a probable candidate set; then weonly evaluate the neural network measure over the probable candidate setinstead of evaluating the neural network over all neighbors. Specifically, wepropose a gradient-based algorithm that approximates the rank of the neuralnetwork matching score to construct the probable candidate set; and we presentan angle-based heuristic procedure to adaptively identify the proper size ofthe probable candidate set. Empirical results on public data confirm theeffectiveness of our proposed algorithms.	随着深度学习和表示学习的不断普及，快速向量搜索成为各种基于排序/检索的应用程序(如推荐、广告排序和问题回答)的重要任务。基于神经网络的排序因其对复杂关系(如用户与项目、问题与答案之间的关系)建模能力强而被广泛采用。然而，它通常是利用离线或重新排序的方式，因为它是耗时的计算。在线神经网络排序-所谓的快速神经排序-被认为是具有挑战性的，因为神经网络测量通常是非凸和不对称的。传统的近似最近邻(ANN)搜索通常侧重于度量排序度量，不适用于这些高级度量。本文提出了一种新的图搜索框架，以加快快速神经排序问题的搜索速度。提出的图搜索算法是双层的: 首先构造一个可能的候选集，然后对可能的候选集进行神经网络测度评估，而不是对所有邻居进行神经网络测度评估。具体来说，我们提出了一个基于梯度的算法，它近似于神经网络匹配得分的秩来构造可能的候选集; 并且我们提出了一个基于角度的启发式过程来自适应地识别可能的候选集的适当大小。公开数据的实验结果证实了算法的有效性。	code	0
Revisiting Document Expansion and Filtering for Effective First-Stage Retrieval	Watheq Mansour, Shengyao Zhuang, Guido Zuccon, Joel Mackenzie	CSIRO, Brisbane, Australia; The University of Queensland, Brisbane, Australia	Document expansion is a technique that aims to reduce the likelihood of term mismatch by augmenting documents with related terms or queries. Doc2Query minus minus (Doc2Query-) represents an extension to the expansion process that uses a neural model to identify and remove expansions that may not be relevant to the given document, thereby increasing the quality of the ranking while simultaneously reducing the amount of augmented data. In this work, we conduct a detailed reproducibility study of Doc2Query- to better understand the trade-offs inherent to document expansion and filtering mechanisms. After successfully reproducing the best-performing method from the Doc2Query- family, we show that filtering actually harms recall-based metrics on various test collections. Next, we explore whether the two-stage "generate-then-filter" process can be replaced with a single generation phase via reinforcement learning. Finally, we extend our experimentation to learned sparse retrieval models and demonstrate that filtering is not helpful when term weights can be learned. Overall, our work provides a deeper understanding of the behaviour and characteristics of common document expansion mechanisms, and paves the way for developing more efficient yet effective augmentation models.	文档扩展是一种技术，旨在通过增加文档中的相关术语或查询来减少术语不匹配的可能性。Doc2Query 减号(Doc2Query -)表示扩展过程的一个扩展，它使用一个神经模型来识别和删除可能与给定文档无关的扩展，从而提高排名的质量，同时减少增强数据的数量。在这项工作中，我们对 Doc2Query 进行了详细的可重复性研究，以更好地理解文档扩展和过滤机制所固有的权衡。在成功复制了 Doc2Query 家族中性能最好的方法之后，我们展示了过滤实际上损害了各种测试集合中基于召回的度量。接下来，我们将探讨是否可以通过强化学习将两阶段的“生成-然后过滤”过程替换为单一生成阶段。最后，我们将实验扩展到学习的稀疏检索模型，并证明当词权可以学习时，过滤是没有帮助的。总的来说，我们的工作使人们对共同文件扩展机制的行为和特点有了更深入的了解，并为开发更有效率、更有效力的扩展模型铺平了道路。	code	0
Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval	Haokun Wen, Xuemeng Song, Xiaolin Chen, Yinwei Wei, Liqiang Nie, TatSeng Chua	Shandong University School of Software, Joint SDU-NTU Centre for Artificial Intelligence Research; Harbin Institute of Technology (Shenzhen) School of Computer Science and Technology; National University of Singapore; Monash University; Shandong University; Harbin Institute of Technology (Shenzhen)	Composed image retrieval (CIR) aims to retrieve the target image based on amultimodal query, i.e., a reference image paired with correspondingmodification text. Recent CIR studies leverage vision-language pre-trained(VLP) methods as the feature extraction backbone, and perform nonlinearfeature-level multimodal query fusion to retrieve the target image. Despite thepromising performance, we argue that their nonlinear feature-level multimodalfusion may lead to the fused feature deviating from the original embeddingspace, potentially hurting the retrieval performance. To address this issue, inthis work, we propose shifting the multimodal fusion from the feature level tothe raw-data level to fully exploit the VLP model's multimodal encoding andcross-modal alignment abilities. In particular, we introduce a Dual QueryUnification-based Composed Image Retrieval framework (DQU-CIR), whose backbonesimply involves a VLP model's image encoder and a text encoder. Specifically,DQU-CIR first employs two training-free query unification components:text-oriented query unification and vision-oriented query unification, toderive a unified textual and visual query based on the raw data of themultimodal query, respectively. The unified textual query is derived byconcatenating the modification text with the extracted reference image'stextual description, while the unified visual query is created by writing thekey modification words onto the reference image. Ultimately, to address diversesearch intentions, DQU-CIR linearly combines the features of the two unifiedqueries encoded by the VLP model to retrieve the target image. Extensiveexperiments on four real-world datasets validate the effectiveness of ourproposed method.	复合图像检索(CIR)是一种基于多模态查询的目标图像检索方法。近年来，CIR 研究利用视觉语言预训练(VLP)方法作为特征提取骨干，进行非线性特征级多模态查询融合来检索目标图像。尽管它们具有良好的性能，但是它们的非线性特征级多模融合可能导致融合特征偏离原始嵌入空间，从而影响检索性能。为了解决这一问题，本文提出将多模态融合从特征层次转移到原始数据层次，以充分发挥 VLP 模型的多模态编码和跨模态对齐能力。特别地，我们介绍了一个基于双查询统一的复合图像检索框架(DQU-CIR) ，它的主干包括 VLP 模型的图像编码器和文本编码器。具体来说，DQU-CIR 首先使用两个无需训练的查询统一组件: 面向文本的查询统一和面向视觉的查询统一，分别基于多模态查询的原始数据得到一个统一的文本查询和可视化查询。通过将修改后的文本与提取出的参考图像的文本描述连接起来，得到统一的文本查询; 通过将关键修改词写入参考图像，生成统一的可视化查询。最终，为了解决多样化研究的意图，DQU-CIR 将 VLP 模型编码的两个统一查询的特征线性地结合起来以检索目标图像。通过对四个实际数据集的大量实验验证了本文方法的有效性。	code	0
Browsing and Searching Metadata of TREC	Timo Breuer, Ellen M. Voorhees, Ian Soboroff	National Institute of Standards and Technology, Gaithersburg, MD, USA	Information Retrieval (IR) research is deeply rooted in experimentation and evaluation, and the Text REtrieval Conference (TREC) has been playing a central role in making that possible since its inauguration in 1992. TREC's mission centers around providing the infrastructure and resources to make IR evaluations possible at scale. Over the years, a plethora of different retrieval problems were addressed, culminating in data artifacts that remained as valuable and useful tools for the IR community. Even though the data are largely available from TREC's website, there is currently no resource that facilitates a cohesive way to obtain metadata information about the run file - the IR community's de-facto standard data format for storing rankings of system-oriented IR experiments. To this end, the work at hand introduces a software suite that facilitates access to metadata of experimental resources, resulting from over 30 years of IR experiments and evaluations at TREC. With a particular focus on the run files, the paper motivates the requirements for better access to TREC metadata and details the concepts, the resources, the corresponding implementations, and possible use cases. More specifically, we contribute a web interface to browse former TREC submissions. Besides, we provide the underlying metadatabase and a corresponding RESTful interface for more principled and structured queries about the TREC metadata.	信息检索研究深深植根于实验和评估，而文本检索会议(TREC)自1992年成立以来，一直在实现这一目标方面发挥着核心作用。TREC 的任务主要是提供基础设施和资源，使大规模的 IR 评估成为可能。多年来，大量不同的检索问题得到了解决，最终形成了数据工件，这些工件仍然是 IR 社区宝贵而有用的工具。尽管这些数据大部分可以从 TREC 的网站上获得，但是目前还没有资源能够帮助我们获得关于运行文件的元数据信息——这是 IR 社区用于存储面向系统的 IR 实验排名的事实上的标准数据格式。为此，手头的工作介绍了一个软件套件，促进访问实验资源的元数据，这是在 TREC 超过30年的 IR 实验和评估的结果。本文特别关注运行文件，激发了更好地访问 TREC 元数据的需求，并详细介绍了概念、资源、相应的实现和可能的用例。更具体地说，我们贡献了一个网络界面来浏览以前的 TREC 提交。此外，我们还提供了底层的元数据库和相应的 RESTful 接口，用于针对 TREC 元数据的更原则化和结构化的查询。	code	0
ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries	Qiaosheng Chen, Weiqing Luo, Zixian Huang, Tengteng Lin, Xiaxia Wang, Ahmet Soylu, Basil Ell, Baifan Zhou, Evgeny Kharlamov, Gong Cheng	University of Oxford, Oxford, United Kingdom; OsloMet - Oslo Metropolitan University, Oslo, Norway; Bielefeld University & University of Oslo, Bielefeld, Germany; OsloMet - Oslo Metropolitan University & University of Oslo, Oslo, Norway; State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China; Bosch Center for Artificial Intelligence & University of Oslo, Renningen, Germany	Dataset search, or more specifically, ad hoc dataset retrieval which is a trending specialized IR task, has received increasing attention in both academia and industry. While methods and systems continue evolving, existing test collections for this task exhibit shortcomings, particularly suffering from lexical bias in pooling and limited to keyword-style queries for evaluation. To address these limitations, in this paper, we construct ACORDAR 2.0, a new test collection for this task which is also the largest to date. To reduce lexical bias in pooling, we adapt dense retrieval models to large structured data, using them to find an extended set of semantically relevant datasets to be annotated. To diversify query forms, we employ a large language model to rewrite keyword queries into high-quality question-style queries. We use the test collection to evaluate popular sparse and dense retrieval models to establish a baseline for future studies. The test collection and source code are publicly available.	数据集搜索，或者更具体地说，即特定数据集检索，作为一种趋势性的专业信息检索任务，已经受到学术界和工业界越来越多的关注。虽然方法和系统仍在不断发展，但现有的测试集合显示出缺陷，特别是在池中存在词法偏差，并且仅限于用于评估的关键字样式查询。为了解决这些局限性，在本文中，我们构建了 ACORDAR 2.0，一个新的测试集合来完成这个任务，这也是迄今为止最大的一个测试集合。为了减少池中的词汇偏差，我们将密集检索模型适用于大型结构化数据，使用它们来寻找一组扩展的语义相关数据集来进行注释。为了使查询表单多样化，我们使用了一个大型语言模型来将关键字查询重写为高质量的问题样式查询。我们使用测试集来评估流行的稀疏和密集的检索模型，为未来的研究建立基线。测试集合和源代码是公开可用的。	code	0
Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling	Jie Wang, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose	Telefonica Research; Google; University of Glasgow	Reinforcement Learning (RL)-based recommender systems have demonstratedpromising performance in meeting user expectations by learning to make accuratenext-item recommendations from historical user-item interactions. However,existing offline RL-based sequential recommendation methods face the challengeof obtaining effective user feedback from the environment. Effectively modelingthe user state and shaping an appropriate reward for recommendation remains achallenge. In this paper, we leverage language understanding capabilities andadapt large language models (LLMs) as an environment (LE) to enhance RL-basedrecommenders. The LE is learned from a subset of user-item interaction data,thus reducing the need for large training data, and can synthesise userfeedback for offline data by: (i) acting as a state model that produces highquality states that enrich the user representation, and (ii) functioning as areward model to accurately capture nuanced user preferences on actions.Moreover, the LE allows to generate positive actions that augment the limitedoffline training data. We propose a LE Augmentation (LEA) method to furtherimprove recommendation performance by optimising jointly the supervisedcomponent and the RL policy, using the augmented actions and historical usersignals. We use LEA, the state and reward models in conjunction withstate-of-the-art RL recommenders and report experimental results on twopublicly available datasets.	基于强化学习(rL)的推荐系统通过学习从历史的用户-项目交互中做出准确的下一项推荐，在满足用户期望方面展示了良好的性能。然而，现有的基于离线 RL 的顺序推荐方法面临着从环境中获得有效用户反馈的挑战。有效地建模用户状态并为推荐建立适当的奖励仍然是一个挑战。在本文中，我们利用语言理解能力并将大型语言模型(LLM)作为一个环境(LE)来增强基于 RL 的推荐器。LE 是从用户项目交互数据的子集中学习的，从而减少了对大量训练数据的需求，并且可以通过以下方式综合用户对离线数据的反馈: (i)作为产生丰富用户表示的高质量状态的状态模型，以及(ii)作为奖励模型来准确捕获细微差别的用户行动偏好。此外，LE 允许产生积极的行动，以增加有限的离线训练数据。提出了一种基于增强动作和历史用户信号的 LEA 方法，通过联合优化被监控组件和 RL 策略，进一步提高推荐性能。我们使用 LEA，状态和奖励模型与最先进的 RL 推荐程序结合使用，并在两个公开可用的数据集上报告实验结果。	code	0
OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems	Shuyuan Xu, Wenyue Hua, Yongfeng Zhang	Rutgers University	In recent years, the integration of Large Language Models (LLMs) intorecommender systems has garnered interest among both practitioners andresearchers. Despite this interest, the field is still emerging, and the lackof open-source R D platforms may impede the exploration of LLM-basedrecommendations. This paper introduces OpenP5, an open-source platform designedas a resource to facilitate the development, training, and evaluation ofLLM-based generative recommender systems for research purposes. The platform isimplemented using encoder-decoder LLMs (e.g., T5) and decoder-only LLMs (e.g.,Llama-2) across 10 widely recognized public datasets, catering to twofundamental recommendation tasks: sequential and straightforwardrecommendations. Recognizing the crucial role of item IDs in LLM-basedrecommendations, we have also incorporated three item indexing methods withinthe OpenP5 platform: random indexing, sequential indexing and collaborativeindexing. Built on the Transformers library, the platform facilitates easycustomization of LLM-based recommendations for users. OpenP5 boasts a range offeatures including extensible data processing, task-centric optimization,comprehensive datasets and checkpoints, efficient acceleration, andstandardized evaluations, making it a valuable tool for the implementation andevaluation of LLM-based recommender systems. The open-source code andpre-trained checkpoints for the OpenP5 library are publicly available athttps://github.com/agiresearch/OpenP5.	近年来，将大语言模型(LLM)集成到推荐系统中引起了从业者和研究者的兴趣。尽管有这样的兴趣，该领域仍然在兴起，缺乏开源的研发平台可能会阻碍对基于 LLM 的建议的探索。本文介绍了 OpenP5，这是一个开源平台，设计了一个资源，用于促进基于 LLM 的生成式推荐系统的开发、培训和评估，以供研究之用。该平台使用编码器-解码器 LLM (例如 T5)和解码器-纯 LLM (例如 Llama-2)在10个广泛认可的公共数据集中实现，满足两个基本的推荐任务: 顺序和直接的推荐。认识到项目 ID 在基于 LLM 的推荐中的关键作用，我们还在 OpenP5平台中引入了三种项目索引方法: 随机索引、顺序索引和协作索引。该平台建立在变形金刚库的基础上，便于用户轻松定制基于 LLM 的推荐。OpenP5拥有一系列功能，包括可扩展的数据处理、以任务为中心的优化、全面的数据集和检查点、高效的加速和标准化的评估，使其成为基于 LLM 的推荐系统的实现和评估的有价值的工具。Openp5库的开源代码和经过预先训练的检查点可以通过 https:// github.com/agiresearch/openp5公开获得。	code	0
Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach	Tianhao Shi, Yang Zhang, Jizhi Zhang, Fuli Feng, Xiangnan He	university of science and technology of china; University of Science and Technology of China	As recommender systems are indispensable in various domains such as jobsearching and e-commerce, providing equitable recommendations to users withdifferent sensitive attributes becomes an imperative requirement. Priorapproaches for enhancing fairness in recommender systems presume theavailability of all sensitive attributes, which can be difficult to obtain dueto privacy concerns or inadequate means of capturing these attributes. Inpractice, the efficacy of these approaches is limited, pushing us toinvestigate ways of promoting fairness with limited sensitive attributeinformation. Toward this goal, it is important to reconstruct missing sensitiveattributes. Nevertheless, reconstruction errors are inevitable due to thecomplexity of real-world sensitive attribute reconstruction problems and legalregulations. Thus, we pursue fair learning methods that are robust toreconstruction errors. To this end, we propose Distributionally Robust FairOptimization (DRFO), which minimizes the worst-case unfairness over allpotential probability distributions of missing sensitive attributes instead ofthe reconstructed one to account for the impact of the reconstruction errors.We provide theoretical and empirical evidence to demonstrate that our methodcan effectively ensure fairness in recommender systems when only limitedsensitive attributes are accessible.	由于推荐系统在求职搜索和电子商务等各个领域都是不可或缺的，因此向具有不同敏感特征的用户提供公平的推荐成为一项必要的要求。提高推荐系统公平性的先验方法假定所有敏感属性的可用性，由于隐私问题或捕获这些属性的手段不足，这些属性可能难以获得。在实践中，这些方法的效果是有限的，促使我们研究的方法，以促进公平与有限的敏感属性信息。为了实现这个目标，重新构建缺失的敏感属性非常重要。然而，由于现实世界敏感属性重构问题和法律规定的复杂性，重构错误是不可避免的。因此，我们追求对重构错误具有鲁棒性的公平学习方法。为此，我们提出了分布式鲁棒公平优化(DRFO)方法，该方法将缺失敏感属性的所有潜在概率分布的最坏情况不公平性最小化，而不是通过重构来考虑重构错误的影响。我们提供了理论和经验证明来证明我们的方法可以有效地确保推荐系统的公平性，当只有有限的敏感属性可以访问时。	code	0
Large Language Models and Future of Information Retrieval: Opportunities and Challenges	ChengXiang Zhai	University of Illinois at Urbana-Champaign, Urbana, IL, USA	Recent years have seen great success of large language models (LLMs) in performing many natural language processing tasks with impressive performance, including tasks that directly serve users such as question answering and text summarization. They open up unprecedented opportunities for transforming information retrieval (IR) research and applications. However, concerns such as halluciation undermine their trustworthiness, limiting their actual utility when deployed in real-world applications, especially high-stake applications where trust is vital. How can we both exploit the strengths of LLMs and mitigate any risk caused by their weaknesses when applying LLMs to IR? What are the best opportunities for us to apply LLMs to IR? What are the major challenges that we will need to address in the future to fully exploit such opportunities? Given the anticipated growth of LLMs, what will future information retrieval systems look like? Will LLMs eventually replace an IR system? In this perspective paper, we examine these questions and provide provisional answers to them. We argue that LLMs will not be able to replace search engines, and future LLMs would need to learn how to use a search engine so that they can interact with a search engine on behalf of users. We conclude with a set of promising future research directions in applying LLMs to IR.	近年来，大型语言模型(LLM)在执行许多自然语言处理任务方面取得了巨大的成功，其性能令人印象深刻，包括直接为用户服务的任务，如问题回答和文本摘要。它们为信息检索研究和应用的转型开辟了前所未有的机遇。然而，幻觉等担忧会破坏它们的可信度，限制了它们在实际应用程序中的实际效用，特别是在信任至关重要的高风险应用程序中。在将 LLM 应用于 IR 时，我们如何既利用 LLM 的优势，又减少由于它们的弱点而造成的风险？我们将 LLM 应用于 IR 的最佳机会是什么？为了充分利用这些机会，我们今后需要应对哪些重大挑战？考虑到 LLM 的预期增长，未来的信息检索系统会是什么样子？LLM 最终会取代 IR 系统吗？在这篇透视文章中，我们考察了这些问题并提供了初步的答案。我们认为 LLM 将无法取代搜索引擎，未来的 LLM 将需要学习如何使用搜索引擎，以便能够代表用户与搜索引擎交互。最后，我们提出了一系列有希望的将 LLM 应用于红外光谱的研究方向。	code	0
Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding	Hansi Zeng, Chen Luo, Hamed Zamani	University of Massachusetts Amherst; Amazon	This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. The sequential identifier, on the other hand, is obtained via quantizing relevance-based representations of documents. Extensive experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin (e.g., 15.6% MRR improvements on MS MARCO), while achieving 22x speed up in terms of query latency.	本文介绍了一种新的优化和解码方法 PAG，它通过同时解码引导生成检索模型中文档标识符的自回归生成。为此，PAG 为每个文档构造一个基于集合的顺序标识符。由于信息检索中的词袋假设，基于集合的标识符建立在词汇标记之上。序列标识符则是通过量化基于相关性的文档表示来获得的。对 MSMARCO 和 TREC 深度学习跟踪数据的广泛实验表明，PAG 比最先进的生成检索模型有很大优势(例如，对 MS MARCO 的 MRR 改进为15.6%) ，同时在查询延迟方面提高了22倍的速度。	code	0
Course Recommender Systems Need to Consider the Job Market	Jibril Frej, Anna Dai, Syrielle Montariol, Antoine Bosselut, Tanja Käser	EPFL	Current course recommender systems primarily leverage learner-courseinteractions, course content, learner preferences, and supplementary coursedetails like instructor, institution, ratings, and reviews, to make theirrecommendation. However, these systems often overlook a critical aspect: theevolving skill demand of the job market. This paper focuses on the perspectiveof academic researchers, working in collaboration with the industry, aiming todevelop a course recommender system that incorporates job market skill demands.In light of the job market's rapid changes and the current state of research incourse recommender systems, we outline essential properties for courserecommender systems to address these demands effectively, includingexplainable, sequential, unsupervised, and aligned with the job market anduser's goals. Our discussion extends to the challenges and research questionsthis objective entails, including unsupervised skill extraction from joblistings, course descriptions, and resumes, as well as predictingrecommendations that align with learner objectives and the job market anddesigning metrics to evaluate this alignment. Furthermore, we introduce aninitial system that addresses some existing limitations of course recommendersystems using large Language Models (LLMs) for skill extraction andReinforcement Learning (RL) for alignment with the job market. We provideempirical results using open-source data to demonstrate its effectiveness.	当前的课程推荐系统主要利用学习者-课程互动、课程内容、学习者偏好和补充课程细节(如教师、机构、评分和评论)来进行推荐。然而，这些系统往往忽略了一个关键方面: 就业市场不断变化的技能需求。本文聚焦于学术研究人员的视角，与行业合作，旨在开发一个包含就业市场技能需求的课程推荐系统。鉴于就业市场的快速变化和研究课程推荐系统的现状，我们概述了课程/推荐系统有效满足这些需求的基本属性，包括可解释的、有序的、无监督的、与就业市场和用户目标一致的。我们的讨论延伸到这一目标所涉及的挑战和研究问题，包括从工作列表中无监督地提取技能，课程描述和简历，以及预测建议，与学习者的目标和就业市场和设计指标来评估这种一致性。此外，我们还介绍了一个初始系统，该系统解决了课程推荐系统的一些现有局限性，它使用大型语言模型(LLM)来提取技能，使用强化学习(RL)来与就业市场保持一致。我们使用开源数据提供了实证结果来证明其有效性。	code	0
Leave No Patient Behind: Enhancing Medication Recommendation for Rare Disease Patients	Zihao Zhao, Yi Jing, Fuli Feng, Jiancan Wu, Chongming Gao, Xiangnan He	University of Science and Technology of China	Medication recommendation systems have gained significant attention inhealthcare as a means of providing tailored and effective drug combinationsbased on patients' clinical information. However, existing approaches oftensuffer from fairness issues, as recommendations tend to be more accurate forpatients with common diseases compared to those with rare conditions. In thispaper, we propose a novel model called Robust and Accurate REcommendations forMedication (RAREMed), which leverages the pretrain-finetune learning paradigmto enhance accuracy for rare diseases. RAREMed employs a transformer encoderwith a unified input sequence approach to capture complex relationships amongdisease and procedure codes. Additionally, it introduces two self-supervisedpre-training tasks, namely Sequence Matching Prediction (SMP) and SelfReconstruction (SR), to learn specialized medication needs and interrelationsamong clinical codes. Experimental results on two real-world datasetsdemonstrate that RAREMed provides accurate drug sets for both rare and commondisease patients, thereby mitigating unfairness in medication recommendationsystems.	药物推荐系统作为一种根据患者临床信息提供量身定制的有效药物组合的手段，在医疗保健领域引起了广泛的关注。然而，现有的治疗方法往往存在公平性问题，因为与罕见疾病患者相比，常见疾病患者的治疗建议往往更为准确。在本文中，我们提出了一个新的模型，称为健壮和准确的推荐用药(RAREMed) ，它利用预训练微调学习范式，以提高准确性的罕见疾病。RAREMed 使用了一个具有统一输入序列方法的变压器编码器来捕获疾病和程序代码之间的复杂关系。此外，还引入了序列匹配预测(SMP)和自我重构(SR)两个自我监督的训练前任务，以了解专业用药需求和临床规范之间的相互关系。在两个真实世界数据集上的实验结果表明，RAREMed 为罕见和常见疾病患者提供了准确的药物组合，从而减轻了药物推荐系统中的不公平性。	code	0
MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation	Andreea Iana, Goran Glavas, Heiko Paulheim	University of Würzburg; University of Mannheim	Digital news platforms use news recommenders as the main instrument to caterto the individual information needs of readers. Despite an increasinglylanguage-diverse online community, in which many Internet users consume news inmultiple languages, the majority of news recommendation focuses on major,resource-rich languages, and English in particular. Moreover, nearly all newsrecommendation efforts assume monolingual news consumption, whereas more andmore users tend to consume information in at least two languages. Accordingly,the existing body of work on news recommendation suffers from a lack ofpublicly available multilingual benchmarks that would catalyze development ofnews recommenders effective in multilingual settings and for low-resourcelanguages. Aiming to fill this gap, we introduce xMIND, an open, multilingualnews recommendation dataset derived from the English MIND dataset using machinetranslation, covering a set of 14 linguistically and geographically diverselanguages, with digital footprints of varying sizes. Using xMIND, wesystematically benchmark several state-of-the-art content-based neural newsrecommenders (NNRs) in both zero-shot (ZS-XLT) and few-shot (FS-XLT)cross-lingual transfer scenarios, considering both monolingual and bilingualnews consumption patterns. Our findings reveal that (i) current NNRs, even whenbased on a multilingual language model, suffer from substantial performancelosses under ZS-XLT and that (ii) inclusion of target-language data in FS-XLTtraining has limited benefits, particularly when combined with a bilingual newsconsumption. Our findings thus warrant a broader research effort inmultilingual and cross-lingual news recommendation. The xMIND dataset isavailable at https://github.com/andreeaiana/xMIND.	数字新闻平台以新闻推荐为主要工具，满足读者的个性化信息需求。尽管网络社区的语言越来越多样化，许多互联网用户使用多种语言阅读新闻，但大多数新闻推荐都集中在主要的、资源丰富的语言上，尤其是英语。此外，几乎所有的新闻推荐都假定消费者只使用一种语言，而越来越多的用户倾向于使用至少两种语言的信息。因此，关于新闻推荐的现有工作缺乏公开提供的多语种基准，这些基准将促进在多语种环境和低资源语言中有效发展新闻推荐人。为了填补这个空白，我们引入了 xMIND，一个开放的，多语言新闻推荐数据集，它来自英语 MIND 数据集，使用机器翻译，涵盖了14种语言和地理上的不同语言，数字足迹大小不一。使用 xMIND，我们系统地基准几个国家的最先进的基于内容的神经新闻推荐器(NNR)在零拍(ZS-XLT)和少拍(FS-XLT)跨语言传输场景，考虑到单语和双语新闻消费模式。我们的研究结果显示，(i)目前的 NNR，即使基于多语言模型，在 ZS-XLT 下也遭受了显着的性能损失，并且(ii)在 FS-XLTtraining 中包含目标语言数据的益处有限，特别是与双语新闻消费相结合时。因此，我们的研究结果值得在多语言和跨语言的新闻推荐更广泛的研究工作。XMIND 数据集可在 https://github.com/andreeaiana/xMIND 下载。	code	0
Steering Large Language Models for Cross-lingual Information Retrieval	Ping Guo, Yubing Ren, Yue Hu, Yanan Cao, Yunpeng Li, Heyan Huang	; Institute of Information Engineering, Chinese Academy of Sciences; Beijing Institute of Technology, Beijing, China	In today's digital age, accessing information across language barriers poses a significant challenge, with conventional search systems often struggling to interpret and retrieve multilingual content accurately. Addressing this issue, our study introduces a novel integration of applying Large Language Models (LLMs) as Cross-lingual Readers in information retrieval systems, specifically targeting the complexities of cross-lingual information retrieval (CLIR). We present an innovative approach: Activation Steered Multilingual Retrieval (ASMR) that employs "steering activations''-a method to adjust and direct the LLM's focus-enhancing its ability to understand user queries and generate accurate, language-coherent responses. ASMR adeptly combines a Multilingual Dense Passage Retrieval (mDPR) system with an LLM, overcoming the limitations of traditional search engines in handling diverse linguistic inputs. This approach is particularly effective in managing the nuances and intricacies inherent in various languages. Rigorous testing on established benchmarks such as XOR-TyDi QA, and MKQA demonstrates that ASMR not only meets but surpasses existing standards in CLIR, achieving state-of-the-art performance. The results of our research hold significant implications for understanding the inherent features of how LLMs understand and generate natural languages, offering an attempt towards more inclusive, effective, and linguistically diverse information access on a global scale.	在当今的数字时代，跨越语言障碍获取信息构成了重大挑战，传统的搜索系统往往难以准确地解释和检索多语种内容。针对这个问题，我们的研究介绍了一个新的整合应用大语言模型(LLMs)作为跨语言读者在信息检索系统，特别是针对跨语言信息检索(CLIR)的复杂性。我们提出了一个创新的方法: 激活导向多语言检索(ASMR) ，采用“导向激活”-一种方法来调整和指导 LLM 的重点-增强其能力，理解用户的查询，并产生准确的，语言一致的反应。ASMR 巧妙地将多语言密集通道检索(mDPR)系统与 LLM 相结合，克服了传统搜索引擎在处理多种语言输入时的局限性。这种方法在管理各种语言固有的细微差别和复杂性方面特别有效。对 XOR-tyDi QA 和 MKQA 等既定基准的严格测试表明，ASMR 不仅符合而且超越了 CLIR 现有的标准，实现了最先进的性能。我们的研究结果对于理解 LLM 理解和生成自然语言的内在特征具有重要意义，为在全球范围内获取更具包容性、有效性和语言多样性的信息提供了一种尝试。	code	0
DAC: Quantized Optimal Transport Reward-based Reinforcement Learning Approach to Detoxify Query Auto-Completion	Aishwarya Maheswaran, Kaushal Kumar Maurya, Manish Gupta, Maunendra Sankar Desarkar	Indian Institute of Technology Hyderabad, Hyderabad, India; Microsoft Corporation, Hyderabad, India	Modern Query Auto-Completion (QAC) systems utilize natural language generation (NLG) using large language models (LLM) to achieve remarkable performance. However, these systems are prone to generating biased and toxic completions due to inherent learning biases. Existing detoxification approaches exhibit two key limitations: (1) They primarily focus on mitigating toxicity for grammatically well-formed long sentences but struggle to adapt to the QAC task, where queries are short and structurally different (include spelling errors, do not follow grammatical rules and have relatively flexible word order). (2) These approaches often view detoxification through a binary lens where all text labeled as toxic is undesirable, and non-toxic is considered desirable. To address these limitations, we propose DAC, an intuitive and efficient reinforcement learning-based model to detoxify QAC. With DAC, we introduce an additional perspective of considering the third query class of addressable toxicity. These queries can encompass implicit toxicity, subjective toxicity, or non-toxic queries containing toxic words. We incorporate this three-class query behavior perspective into the proposed model through quantized optimal transport to learn distinctions and generate truly non-toxic completions. We evaluate toxicity levels in the generated completions by DAC across two real-world QAC datasets (Bing and AOL) using two classifiers: a publicly available generic classifier (Detoxify) and a search query-specific classifier, which we develop (TClassify). We find that DAC consistently outperforms all existing baselines on the Bing dataset and achieves competitive performance on the AOL dataset for query detoxification. % providing high quality and low toxicity. We make the code publicly available.	现代查询自动完成(QAC)系统利用自然语言生成(NLG)和大语言模型(LLM)来实现显著的查询性能。然而，由于固有的学习偏差，这些系统容易产生有偏见和有毒的完成。现有的排毒方法表现出两个关键的局限性: (1)它们主要集中在缓解语法结构良好的长句的毒性，但难以适应 QAC 任务，其中查询是短暂的和结构不同的(包括拼写错误，不遵循语法规则，并具有相对灵活的语序)。(2)这些方法通常通过一个二元透镜来看待解毒问题，其中所有标记为有毒的文本都是不可取的，而无毒的文本则被认为是可取的。为了解决这些局限性，我们提出 DAC，一个直观和有效的强化学习为基础的模型来解毒 QAC。使用 DAC，我们引入了考虑可寻址毒性的第三个查询类的另一个视角。这些查询可以包含隐性毒性、主观毒性或包含毒性词的无毒性查询。我们通过量化的最优传输将这三类查询行为视角整合到提出的模型中，以学习区别并产生真正的无毒完成。我们使用两个分类器(公开可用的通用分类器(Detoxify)和我们开发的搜索查询特定分类器(TClassfy))来评估 DAC 在两个实际 QAC 数据集(Bing 和 AOL)中生成的完成中的毒性水平。我们发现 DAC 始终优于 Bing 数据集上所有现有的基线，并在 AOL 数据集上实现了具有竞争力的查询解毒性能。提供高质量和低毒性的百分比。我们公开代码。	code	0
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues	Diji Yang, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang	University of California Santa Cruz; Mineral.ai	Although the Retrieval-Augmented Generation (RAG) paradigms can use externalknowledge to enhance and ground the outputs of Large Language Models (LLMs) tomitigate generative hallucinations and static knowledge base problems, theystill suffer from limited flexibility in adopting Information Retrieval (IR)systems with varying capabilities, constrained interpretability during themulti-round retrieval process, and a lack of end-to-end optimization. Toaddress these challenges, we propose a novel LLM-centric approach, IM-RAG, thatintegrates IR systems with LLMs to support multi-round RAG through learningInner Monologues (IM, i.e., the human inner voice that narrates one'sthoughts). During the IM process, the LLM serves as the core reasoning model(i.e., Reasoner) to either propose queries to collect more information via theRetriever or to provide a final answer based on the conversational context. Wealso introduce a Refiner that improves the outputs from the Retriever,effectively bridging the gap between the Reasoner and IR modules with varyingcapabilities and fostering multi-round communications. The entire IM process isoptimized via Reinforcement Learning (RL) where a Progress Tracker isincorporated to provide mid-step rewards, and the answer prediction is furtherseparately optimized via Supervised Fine-Tuning (SFT). We conduct extensiveexperiments with the HotPotQA dataset, a popular benchmark for retrieval-based,multi-step question-answering. The results show that our approach achievesstate-of-the-art (SOTA) performance while providing high flexibility inintegrating IR modules as well as strong interpretability exhibited in thelearned inner monologues.	虽然检索-增强生成(reeval-augsted Generation，RAG)范式可以使用外部知识来增强和巩固大型语言模型(Large Language model，LLM)的输出，以减少生成幻觉和静态知识库问题，但是它们在采用具有不同功能的信息检索系统方面的灵活性仍然有限，在多轮检索过程中的可解释性受到限制，以及缺乏端到端优化。为了应对这些挑战，我们提出了一种新的以 LLM 为中心的方法，IM-RAG，它将 IR 系统与 LLM 集成在一起，通过学习内部独白(IM，也就是讲述一个人思想的人类内心声音)来支持多轮 RAG。在 IM 过程中，LLM 作为核心推理模型(例如，推理器)提出查询以通过检索器收集更多信息，或者根据会话上下文提供最终答案。我们还引入了一个改善从检索器的输出，有效地弥补差距的推理器和红外模块具有不同的能力，并促进多轮通信。整个即时通讯过程通过强化学习(RL)进行优化，其中包含一个进度跟踪器来提供中间步骤的奖励，而答案预测则通过监督微调(sFT)进一步优化。我们对 HotPotQA 数据集进行了广泛的实验，这是一个基于检索的多步骤问答的流行基准。结果表明，我们的方法实现了最先进的(SOTA)性能，同时提供了高度的灵活性集成的红外模块和强大的解释能力表现在学习的内部独白。	code	0
Towards Human-centered Proactive Conversational Agents	Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, TatSeng Chua	Georgetown University; National University of Singapore; Harbin Institute of Technology, Shenzhen; Singapore Management University	Recent research on proactive conversational agents (PCAs) mainly focuses onimproving the system's capabilities in anticipating and planning actionsequences to accomplish tasks and achieve goals before users articulate theirrequests. This perspectives paper highlights the importance of moving towardsbuilding human-centered PCAs that emphasize human needs and expectations, andthat considers ethical and social implications of these agents, rather thansolely focusing on technological capabilities. The distinction between aproactive and a reactive system lies in the proactive system'sinitiative-taking nature. Without thoughtful design, proactive systems riskbeing perceived as intrusive by human users. We address the issue byestablishing a new taxonomy concerning three key dimensions of human-centeredPCAs, namely Intelligence, Adaptivity, and Civility. We discuss potentialresearch opportunities and challenges based on this new taxonomy upon the fivestages of PCA system construction. This perspectives paper lays a foundationfor the emerging area of conversational information retrieval research andpaves the way towards advancing human-centered proactive conversationalsystems.	主动会话代理(PCA)的研究主要集中在提高系统预测和计划行为的能力，以便在用户提出要求之前完成任务和实现目标。这份观点文件强调了建立以人为中心的个人协商机制的重要性，这种机制强调人的需求和期望，并考虑到这些机制的道德和社会影响，而不是仅仅关注技术能力。主动系统与被动系统的区别在于主动系统的主动性。如果没有经过深思熟虑的设计，积极主动的系统就有被人类用户认为是侵入性的风险。我们通过建立一个关于以人为中心的 PCA 的三个关键维度的新分类来解决这个问题，即智力、适应性和文明性。我们讨论潜在的研究机会和挑战基于这个新的分类在五个阶段的主成分分析系统建设。这篇观点论文为会话信息检索研究的新兴领域奠定了基础，并为推进以人为中心的主动会话系统铺平了道路。	code	0
TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants	Mohammad Aliannejadi, Zahra Abbasiantaeb, Shubham Chatterjee, Jeffrey Dalton, Leif Azzopardi	University of Edinburgh; University of Amsterdam; University of Strathclyde	Conversational information seeking has evolved rapidly in the last few yearswith the development of Large Language Models (LLMs), providing the basis forinterpreting and responding in a naturalistic manner to user requests. Theextended TREC Interactive Knowledge Assistance Track (iKAT) collection aims toenable researchers to test and evaluate their Conversational Search Agents(CSA). The collection contains a set of 36 personalized dialogues over 20different topics each coupled with a Personal Text Knowledge Base (PTKB) thatdefines the bespoke user personas. A total of 344 turns with approximately26,000 passages are provided as assessments on relevance, as well as additionalassessments on generated responses over four key dimensions: relevance,completeness, groundedness, and naturalness. The collection challenges CSA toefficiently navigate diverse personal contexts, elicit pertinent personainformation, and employ context for relevant conversations. The integration ofa PTKB and the emphasis on decisional search tasks contribute to the uniquenessof this test collection, making it an essential benchmark for advancingresearch in conversational and interactive knowledge assistants.	近年来，随着大型语言模型(LLM)的发展，会话信息搜索得到了迅速的发展，为用户的自然解释和响应提供了基础。扩展的 TREC 交互式知识援助跟踪(iKAT)集旨在使研究人员能够测试和评估他们的对话搜索代理(CSA)。该集合包含20个不同主题的36个个性化对话集，每个对话集还有一个定义定制用户角色的个人文本知识库(Personal Text Knowledge Base，PTKB)。总共提供了344个回合，大约26,000个段落，作为相关性评估，以及对四个关键维度(相关性，完整性，基础性和自然性)产生的反应的额外评估。这个系列向 CSA 提出了挑战，要求它能够有效地浏览不同的个人背景，获取相关的人物信息，并为相关的对话使用背景。PTKB 的集成和对决策搜索任务的强调有助于该测试集的独特性，使其成为推进会话和交互式知识助手研究的必要基准。	code	0
UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching	Quanxing Zha, Xin Liu, Yiuming Cheung, Xing Xu, Nannan Wang, Jianjia Cao	Xidian University; University of Electronic Science and Technology of China; Hong Kong Baptist University; Huaqiao University	Cross-modal matching has recently gained significant popularity to facilitate retrieval across multi-modal data, and existing works are highly relied on an implicit assumption that the training data pairs are perfectly aligned. However, such an ideal assumption is extremely impossible due to the inevitably mismatched data pairs, a.k.a. noisy correspondence, which can wrongly enforce the mismatched data to be similar and thus induces the performance degradation. Although some recent methods have attempted to address this problem, they still face two challenging issues: 1) unreliable data division for training inefficiency and 2) unstable prediction for matching failure. To address these problems, we propose an efficient Uncertainty-Guided Noisy Correspondence Learning (UGNCL) framework to achieve noise-robust cross-modal matching. Specifically, a novel Uncertainty Guided Division (UGD) algorithm is reliably designed leverage the potential benefits of derived uncertainty to divide the data into clean, noisy and hard partitions, which can effortlessly mitigate the impact of easily-determined noisy pairs. Meanwhile, an efficient Trusted Robust Loss (TRL) is explicitly designed to recast the soft margins, calibrated by confident yet error soft correspondence labels, for the data pairs in the hard partition through the uncertainty, leading to increase/decrease the importance of matched/mismatched pairs and further alleviate the impact of noisy pairs for robustness improvement. Extensive experiments conducted on three public datasets highlight the superiorities of the proposed framework, and show its competitive performance compared with the state-of-the-arts. The code is available at https://github.com/qxzha/UGNCL.	跨模态匹配近年来在多模态数据检索领域得到了广泛的应用，现有的研究大多依赖于训练数据对完美对齐的假设。然而，这种理想的假设是极其不可能的，因为不可避免的不匹配的数据对，也就是噪声对应，会错误地强迫不匹配的数据相似，从而导致性能下降。虽然最近的一些方法已经尝试解决这个问题，但仍然面临两个挑战: 1)不可靠的数据划分训练效率低下和2)不稳定的预测匹配失败。为了解决这些问题，我们提出了一种有效的不确定引导噪声对应学习(UGNCL)框架来实现噪声鲁棒的跨模态匹配。具体地说，一种新的不确定性引导分割(UGD)算法是可靠地设计的，利用导出的不确定性的潜在好处，将数据划分为干净的、有噪声的和硬的分区，这可以毫不费力地减轻容易确定的有噪声对的影响。同时，设计了一种有效的可信鲁棒损失(TRL)算法，通过不确定性重铸硬分区数据对的软边界，使得匹配/不匹配对的重要性增加/减小，进一步减轻噪声对鲁棒性改善的影响。在三个公共数据集上进行的大量实验突出了该框架的优越性，并显示了其与最新技术相比的竞争性能。密码可在 https://github.com/qxzha/ugncl 查阅。	code	0
DHMAE: A Disentangled Hypergraph Masked Autoencoder for Group Recommendation	Yingqi Zhao, Haiwei Zhang, Qijie Bai, Changli Nie, Xiaojie Yuan	Nankai University	Group recommendation aims to suggest items to a group of users that are suitable for the group. Although some existing powerful deep learning models have achieved improved performance, various aspects remain unexplored: (1) Most existing models using contrastive learning tend to rely on high-quality data augmentation which requires precise contrastive view generation; (2) There is multifaceted natural noise in group recommendation, and additional noise is introduced during data augmentation; (3) Most existing hypergraph neural network-based models over-entangle the information of members and items, ignoring their unique characteristics. In light of this, we propose a highly effective Disentangled Hypergraph Masked Auto Encoder-enhanced method for group recommendation (DHMAE), combining a disentangled hypergraph neural network with a graph masked autoencoder. This approach creates self-supervised signals without data augmentation by masking the features of some nodes and hyperedges and then reconstructing them. For the noise problem, we design a masking strategy that relies on pre-computed degree-sensitive probabilities for the process of masking features. Furthermore, we propose a disentangled hypergraph neural network for group recommendation scenarios to extract common messages of members and items and disentangle them during the convolution process. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art models and effectively addresses the noise issue.	群组建议旨在向一组用户推荐适合该群组的项目。虽然现有的一些强大的深度学习模型已经取得了改善的性能，但是各个方面仍然没有得到探索: (1)大多数使用对比学习的现有模型倾向于依赖于高质量的数据增强，这需要精确的对比视图生成; (2)在群体推荐中存在多方面的自然噪声，并且在数据增强过程中引入了额外的噪声; (3)大多数现有的基于超图神经网络的模型过度纠缠成员和项目的信息,。鉴于此，我们提出了一种高效的用于群组推荐(DHMAE)的离散超图掩码自动编码增强方法，该方法将离散超图神经网络与图掩码自动编码器相结合。该方法通过掩盖某些节点和超边缘的特征，然后对其进行重构，在不增加数据量的情况下产生自监督信号。针对噪声问题，我们设计了一种基于预先计算的度敏感概率的掩蔽策略。在此基础上，提出了一种用于群体推荐场景的解缠超图神经网络，用于在卷积过程中提取成员和项目的共同信息并进行解缠。广泛的实验表明，我们的方法显着优于国家的最先进的模型，并有效地解决噪声问题。	code	0
Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?	Ming Li, Yuanna Liu, Sami Jullien, Mozhdeh Ariannezhad, Andrew Yates, Mohammad Aliannejadi, Maarten de Rijke	University of Amsterdam; Universiteit van Amsterdam	Next basket recommendation (NBR) is a special type of sequentialrecommendation that is increasingly receiving attention. So far, most NBRstudies have focused on optimizing the accuracy of the recommendation, whereasoptimizing for beyond-accuracy metrics, e.g., item fairness and diversityremains largely unexplored. Recent studies into NBR have found a substantialperformance difference between recommending repeat items and explore items.Repeat items contribute most of the users' perceived accuracy compared withexplore items. Informed by these findings, we identify a potential "short-cut"to optimize for beyond-accuracy metrics while maintaining high accuracy. Toleverage and verify the existence of such short-cuts, we propose aplug-and-play two-step repetition-exploration (TREx) framework that treatsrepeat items and explores items separately, where we design a simple yet highlyeffective repetition module to ensure high accuracy, while two explorationmodules target optimizing only beyond-accuracy metrics. Experiments areperformed on two widely-used datasets w.r.t. a range of beyond-accuracymetrics, viz. five fairness metrics and three diversity metrics. Ourexperimental results verify the effectiveness of TREx. Prima facie, thisappears to be good news: we can achieve high accuracy and improvedbeyond-accuracy metrics at the same time. However, we argue that the real-worldvalue of our algorithmic solution, TREx, is likely to be limited and reflect onthe reasonableness of the evaluation setup. We end up challenging existingevaluation paradigms, particularly in the context of beyond-accuracy metrics,and provide insights for researchers to navigate potential pitfalls anddetermine reasonable metrics to consider when optimizing for accuracy andbeyond-accuracy metrics.	下一个篮子推荐(NBR)是一种特殊类型的顺序推荐，越来越受到关注。到目前为止，大多数 NBRs 研究的重点是优化推荐的准确性，而对于超准确度指标的优化，例如，项目公平性和多样性仍然在很大程度上没有被探索。最近对 NBR 的研究发现，在推荐重复项目和探索项目之间存在显著的绩效差异。与探索项目相比，重复项目贡献了大部分用户的感知准确性。根据这些发现，我们确定了一个潜在的“捷径”来优化超精度度量，同时保持高精度。为了利用和验证这种捷径的存在，我们提出了即插即用的两步重复探索(TREx)框架，它处理重复项目并分别探索项目，其中我们设计了一个简单而高效的重复模块以确保高精度，而两个探索模块的目标仅仅是优化超越精度的度量。实验是在两个广泛使用的数据集 W.R.T。上进行的，这两个数据集包括五个公平性指标和三个多样性指标。实验结果验证了 TREx 的有效性。初步看来，这似乎是个好消息: 我们可以同时实现高精度和改进超精度度量。然而，我们认为，我们的算法解决方案 TREx 的现实价值可能是有限的，并反映了评估设置的合理性。我们最终挑战现有的评估范式，特别是在超精确度指标的背景下，并为研究人员提供洞察力，以导航潜在的陷阱，并确定合理的指标时考虑优化的准确性和超精确度指标。	code	0
AutoDCS: Automated Decision Chain Selection in Deep Recommender Systems	Dugang Liu, Shenxian Xian, Yuhao Wu, Chaohua Yang, Xing Tang, Xiuqiang He, Zhong Ming	Tencent; FiT,Tencent; College of Computer Science and Software Engineering, Shenzhen University; Shenzhen University	Multi-behavior recommender systems (MBRS) have been commonly deployed on real-world industrial platforms for their superior advantages in understanding user preferences and mitigating data sparsity. However, the cascade graph modeling paradigm adopted in mainstream MBRS usually assumes that users will refer to all types of behavioral knowledge they have when making decisions about target behaviors, i.e., use all types of behavioral interactions indiscriminately when modeling and predicting target behaviors for each user. We call this a full decision chain constraint and argue that it may be too strict by ignoring that different types of behavioral knowledge have varying importance for different users. In this paper, we propose a novel automated decision chain selection (AutoDCS) framework to relax this constraint, which can consider each user's unique decision dependencies and select a reasonable set of behavioral knowledge to activate for the prediction of target behavior. Specifically, AutoDCS first integrates some existing MBRS methods in a base cascade module to obtain a set of behavior-aware embeddings. Then, a bilateral matching gating mechanism is used to select an exclusive set of behaviors for the current user-item pair to form a decision chain, and the corresponding behavior-augmented embeddings are selectively activated. Subsequently, AutoDCS combines the behavior-augmented and original behavior-aware embeddings to predict the target behavior. Finally, we evaluate AutoDCS and demonstrate its effectiveness through experiments over four public multi-behavior benchmarks.	多行为推荐系统(MBRS)因其在理解用户偏好和减少数据稀疏性方面的优越性而广泛应用于现实世界的工业平台。然而，主流 MBRS 采用的级联图建模范式通常假定用户在决策目标行为时会参考他们所拥有的所有类型的行为知识，即在为每个用户建模和预测目标行为时不加区分地使用所有类型的行为交互。我们称之为完全决策链约束，并认为它可能过于严格，忽略了不同类型的行为知识对不同的用户有不同的重要性。本文提出了一种新的自动决策链选择框架(AutoDCS) ，该框架可以考虑每个用户独特的决策依赖关系，并选择一组合理的行为知识来激活目标行为的预测。具体来说，AutoDCS 首先将一些现有的 MBRS 方法集成到一个基本级联模块中，以获得一组行为感知的嵌入。然后，利用双边匹配门控机制为当前的用户项对选择一组排他的行为，形成决策链，并选择性地激活相应的行为增强嵌入。随后，AutoDCS 将行为增强嵌入和原始行为感知嵌入相结合，对目标行为进行预测。最后，我们通过四个公共多行为基准测试对 AutoDCS 进行了评估并验证了其有效性。	code	0
EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender Systems	Yuanqing Yu, Chongming Gao, Jiawei Chen, Heng Tang, Yuefeng Sun, Qian Chen, Weizhi Ma, Min Zhang	Zhejiang University; University of Science and Technology of China; Tsinghua University	Reinforcement Learning (RL)-Based Recommender Systems (RSs) have gainedrising attention for their potential to enhance long-term user engagement.However, research in this field faces challenges, including the lack ofuser-friendly frameworks, inconsistent evaluation metrics, and difficulties inreproducing existing studies. To tackle these issues, we introduce EasyRL4Rec,an easy-to-use code library designed specifically for RL-based RSs. Thislibrary provides lightweight and diverse RL environments based on five publicdatasets and includes core modules with rich options, simplifying modeldevelopment. It provides unified evaluation standards focusing on long-termoutcomes and offers tailored designs for state modeling and actionrepresentation for recommendation scenarios. Furthermore, we share our findingsfrom insightful experiments with current methods. EasyRL4Rec seeks tofacilitate the model development and experimental process in the domain ofRL-based RSs. The library is available for public use.	基于强化学习的推荐系统(RSs)因其增强长期用户参与度的潜力而受到越来越多的关注。然而，该领域的研究面临着挑战，包括缺乏用户友好的框架，不一致的评估指标，以及难以复制现有的研究。为了解决这些问题，我们介绍了 EasyRL4Rec，这是一个专门为基于 RL 的 RSS 设计的易于使用的代码库。该库基于五个 public 数据集提供轻量级和多样化的 RL 环境，并包括具有丰富选项的核心模块，从而简化了模型开发。它提供了注重长期结果的统一评估标准，并为推荐场景的状态建模和行动表示提供了量身定制的设计。此外，我们分享我们的发现，从深刻的实验与目前的方法。EasyRL4Rec 致力于促进基于 RL 的 RSS 领域的模型开发和实验过程。图书馆可供公众使用。	code	0
Explainability for Transparent Conversational Information-Seeking	Weronika Lajewska, Damiano Spina, Johanne Trippas, Krisztian Balog	University of Stavanger; RMIT University	The increasing reliance on digital information necessitates advancements inconversational search systems, particularly in terms of informationtransparency. While prior research in conversational information-seeking hasconcentrated on improving retrieval techniques, the challenge remains ingenerating responses useful from a user perspective. This study exploresdifferent methods of explaining the responses, hypothesizing that transparencyabout the source of the information, system confidence, and limitations canenhance users' ability to objectively assess the response. By exploringtransparency across explanation type, quality, and presentation mode, thisresearch aims to bridge the gap between system-generated responses andresponses verifiable by the user. We design a user study to answer questionsconcerning the impact of (1) the quality of explanations enhancing the responseon its usefulness and (2) ways of presenting explanations to users. Theanalysis of the collected data reveals lower user ratings for noisyexplanations, although these scores seem insensitive to the quality of theresponse. Inconclusive results on the explanations presentation format suggestthat it may not be a critical factor in this setting.	对数字信息的日益依赖使得会话搜索系统的发展成为必然，特别是在信息透明度方面。虽然以前对会话信息搜索的研究集中在提高检索技术，但是挑战仍然是从用户的角度产生有用的反应。本研究探讨了解释响应的不同方法，假设信息来源的透明度、系统信心和局限性可以提高用户客观评估响应的能力。通过探索解释类型、质量和表达模式之间的透明度，本研究旨在弥合系统生成的响应和用户可验证的响应之间的差距。我们设计了一个用户研究来回答以下问题: (1)解释的质量提高了回答的有用性; (2)向用户提供解释的方式。对收集到的数据进行分析后发现，尽管这些分数似乎对回复的质量不敏感，但用户对噪音解释的评分较低。关于解释说明格式的不确定结果表明，它可能不是这种情况下的一个关键因素。	code	0
Evaluating Search System Explainability with Psychometrics and Crowdsourcing	Catherine Chen, Carsten Eickhoff	Brown University; University of Tübingen	As information retrieval (IR) systems, such as search engines andconversational agents, become ubiquitous in various domains, the need fortransparent and explainable systems grows to ensure accountability, fairness,and unbiased results. Despite recent advances in explainable AI and IRtechniques, there is no consensus on the definition of explainability. Existingapproaches often treat it as a singular notion, disregarding themultidimensional definition postulated in the literature. In this paper, we usepsychometrics and crowdsourcing to identify human-centered factors ofexplainability in Web search systems and introduce SSE (Search SystemExplainability), an evaluation metric for explainable IR (XIR) search systems.In a crowdsourced user study, we demonstrate SSE's ability to distinguishbetween explainable and non-explainable systems, showing that systems withhigher scores indeed indicate greater interpretability. We hope that aside fromthese concrete contributions to XIR, this line of work will serve as ablueprint for similar explainability evaluation efforts in other domains ofmachine learning and natural language processing.	随着信息检索(IR)系统，如搜索引擎和会话代理，在各个领域变得无处不在，对透明和可解释的系统的需求增长，以确保问责制，公平性和无偏见的结果。尽管可解释性 AI 和 IR 技术最近取得了一些进展，但是对于可解释性的定义还没有达成共识。现有的方法往往把它作为一个单一的概念，无视文献中假定的多维定义。在本文中，我们使用心理测量学和众包来识别网络搜索系统中以人为中心的可解释性因素，并介绍了 SSE (Search SystemExplainability) ，一个可解释的 IR (XIR)搜索系统的评估指标。在众包用户研究中，我们证明了 SSE 区分可解释和不可解释系统的能力，表明分数较高的系统确实表明更大的可解释性。我们希望，除了这些对 XIR 的具体贡献之外，这一系列工作将为机器学习和自然语言处理的其他领域中类似的可解释性评估工作提供蓝图。	code	0
Enhancing Dataset Search with Compact Data Snippets	Qiaosheng Chen, Jiageng Chen, Xiao Zhou, Gong Cheng	Nanjing University	In light of the growing availability and significance of open data, the problem of dataset search has attracted great attention in the field of information retrieval. Nevertheless, current metadata-based approaches have revealed shortcomings due to the low quality and availability of dataset metadata, while the magnitude and heterogeneity of actual data hindered the development of content-based solutions. To address these challenges, we propose to convert different formats of structured data into a unified form, from which we extract a compact data snippet that indicates the relevance of the whole data. Thanks to its compactness, we feed it into a dense reranker to improve search accuracy. We also convert it back to the original format to be presented for assisting users in relevance judgment. The effectiveness of our approach has been demonstrated by extensive experiments on two test collections for dataset search.	随着开放数据的日益普及和重要性的提高，数据集搜索引起了信息检索领域的广泛关注。尽管如此，目前基于元数据的方法暴露了数据集元数据质量和可用性低的缺点，而实际数据的规模和异质性阻碍了基于内容的解决方案的开发。为了应对这些挑战，我们建议将不同格式的结构化数据转换成统一的形式，从中提取一个表明整个数据相关性的紧凑数据片段。由于它的紧凑性，我们把它输入到一个密集的重新排序，以提高搜索的准确性。我们还将其转换回原来的格式，以协助用户进行相关性判断。我们的方法的有效性已经得到了广泛的实验证明的两个测试收集的数据集搜索。	code	0
When MOE Meets LLMs: Parameter Efficient Fine-tuning for Multi-task Medical Applications	Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Derong Xu, Feng Tian, Yefeng Zheng	Southern University of Science and Technology, City University of Hong Kong; Xi'an Jiaotong University, City University of Hong Kong; City University of Hong Kong; Xi'an Jiaotong University; Tencent; University of Science and Technology of China, City University of Hong Kong	The recent surge in Large Language Models (LLMs) has garnered significantattention across numerous fields. Fine-tuning is often required to fit generalLLMs for a specific domain, like the web-based healthcare system. However, twoproblems arise during fine-tuning LLMs for medical applications. One is thetask variety problem, which involves distinct tasks in real-world medicalscenarios. The variety often leads to sub-optimal fine-tuning for dataimbalance and seesaw problems. Besides, the large amount of parameters in LLMsleads to huge time and computation consumption by fine-tuning. To address thesetwo problems, we propose a novel parameter efficient fine-tuning framework formulti-task medical applications, dubbed as MOELoRA. The designed framework aimsto absorb both the benefits of mixture-of-expert (MOE) for multi-task learningand low-rank adaptation (LoRA) for parameter efficient fine-tuning. Forunifying MOE and LoRA, we devise multiple experts as the trainable parameters,where each expert consists of a pair of low-rank matrices to retain the smallsize of trainable parameters. Then, a task-motivated gate function for allMOELoRA layers is proposed, which can control the contributions of each expertand produce distinct parameters for various tasks. We conduct experiments on amulti-task medical dataset, indicating MOELoRA outperforms the existingparameter efficient fine-tuning methods. The code is available online.	最近大型语言模型(LLM)的兴起已经引起了众多领域的广泛关注。通常需要进行微调，以适应特定领域的常规 LLM，比如基于 Web 的医疗保健系统。然而，在微调 LLM 用于医疗应用时会出现两个问题。其中之一是任务多样性问题，它涉及到现实世界医学场景中不同的任务。这种多样性常常导致数据平衡和跷跷板问题的次优微调。此外，LLMS 中的大量参数导致了大量的时间和计算量的消耗。为了解决这两个问题，我们提出了一种新的参数高效微调框架公式多任务医疗应用，称为 MOELoRA。所设计的框架旨在吸收混合专家(MOE)在多任务学习和低秩自适应(LoRA)在参数有效微调方面的优点。结合 MOE 和 LoRA，我们设计了多个专家作为可训练参数，其中每个专家由一对低秩矩阵组成，以保留可训练参数的小尺寸。然后，提出了一种适用于所有 MOELoRA 层的任务驱动门函数，它可以控制每个专家的贡献，并为不同的任务产生不同的参数。在多任务医学数据集上进行了实验，结果表明 MOELoRA 优于现有的参数有效微调方法。代码可以在线获得。	code	0
OEHR: An Orthopedic Electronic Health Record Dataset	Yibo Xie, Kaifan Wang, Jiawei Zheng, Feiyan Liu, Xiaoli Wang, Guofeng Huang	School of Informatics, Xiamen University, Xiamen, China; School of Informatics, Institute of AI, Xiamen University, Xiamen, China; Xiamen University, Affiliated Southeast Hospital, Zhangzhou, China	During the past decades, healthcare institutions continually amassed clinical data that is not intended to support research. Despite the increasing number of publicly available electronic health record (EHR) datasets, it is difficult to find publicly available datasets in Orthopedics that can be used to compare and evaluate downstream tasks. This paper presents OEHR, a healthcare benchmark dataset in Orthopedics, sourced from the EHR of real hospitals. Information available includes patient measurements, diagnoses, treatments, clinical notes, and medical images. OEHR is intended to support clinical research. To evaluate the quality of OEHR, we conduct extensive experiments by implementing state-of-the-art methods for performing downstream tasks. The results show that OEHR serves as a valuable extension to existing publicly available EHR datasets. The dataset is available at http://47.94.174.82/.	在过去的几十年里，医疗机构不断地收集临床数据，而这些数据并不是用来支持研究的。尽管公开可用的电子健康记录(EHR)数据集数量不断增加，但很难在骨科中找到可用于比较和评估下游任务的公开可用数据集。本文介绍了 OEHR，一个骨科医疗基准数据集，来源于实际医院的 EHR。可获得的信息包括患者测量、诊断、治疗、临床记录和医学图像。OEHR 旨在支持临床研究。为了评估 OEHR 的质量，我们进行了广泛的实验，采用了最先进的方法来执行下游任务。结果表明，OEHR 作为一个有价值的扩展，现有的公开可用的 EHR 数据集。数据集可在 http://47.94.174.82/下载。	code	0
SIGformer: Sign-aware Graph Transformer for Recommendation	Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang	Huazhong Agricultural University; Zhejiang University; OPPO Co Ltd	In recommender systems, most graph-based methods focus on positive userfeedback, while overlooking the valuable negative feedback. Integrating bothpositive and negative feedback to form a signed graph can lead to a morecomprehensive understanding of user preferences. However, the existing effortsto incorporate both types of feedback are sparse and face two main limitations:1) They process positive and negative feedback separately, which fails toholistically leverage the collaborative information within the signed graph; 2)They rely on MLPs or GNNs for information extraction from negative feedback,which may not be effective. To overcome these limitations, we introduce SIGformer, a new method thatemploys the transformer architecture to sign-aware graph-based recommendation.SIGformer incorporates two innovative positional encodings that capture thespectral properties and path patterns of the signed graph, enabling the fullexploitation of the entire graph. Our extensive experiments across fivereal-world datasets demonstrate the superiority of SIGformer overstate-of-the-art methods. The code is available athttps://github.com/StupidThree/SIGformer.	在推荐系统中，大多数基于图表的方法侧重于积极的用户反馈，而忽略了有价值的消极反馈。将正反馈和负反馈结合起来形成一个有符号的图形可以更全面地理解用户偏好。然而，现有的将这两种类型的反馈结合起来的努力是稀疏的，并且面临两个主要的限制: 1)他们分别处理正反馈和负反馈，这不能全面地利用签名图表中的协作信息; 2)他们依赖 MLP 或 GNN 从负反馈中获得信息抽取，这可能不是有效的。为了克服这些局限性，我们引入了 SIGformer，这是一种使用转换器结构来实现基于符号感知图的推荐的新方法。 SIGformer 包含了两种创新的位置编码，它们捕获了符号图的光谱特性和路径模式，从而实现了对整个图的充分利用。我们在五维世界数据集上的广泛实验证明了 SIGformer 夸大了最先进的方法的优越性。该代码可以在 https:// github.com/stupidthree/sigformer 上获得。	code	0
Scaling Laws For Dense Retrieval	Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen, Yiqun Liu	Renmin University of China; Xiaohongshu Inc; Tsinghua University	Scaling up neural models has yielded significant advancements in a wide arrayof tasks, particularly in language generation. Previous studies have found thatthe performance of neural models frequently adheres to predictable scalinglaws, correlated with factors such as training set size and model size. Thisinsight is invaluable, especially as large-scale experiments grow increasinglyresource-intensive. Yet, such scaling law has not been fully explored in denseretrieval due to the discrete nature of retrieval metrics and complexrelationships between training data and model sizes in retrieval tasks. In thisstudy, we investigate whether the performance of dense retrieval models followsthe scaling law as other neural models. We propose to use contrastivelog-likelihood as the evaluation metric and conduct extensive experiments withdense retrieval models implemented with different numbers of parameters andtrained with different amounts of annotated data. Results indicate that, underour settings, the performance of dense retrieval models follows a precisepower-law scaling related to the model size and the number of annotations.Additionally, we examine scaling with prevalent data augmentation methods toassess the impact of annotation quality, and apply the scaling law to find thebest resource allocation strategy under a budget constraint. We believe thatthese insights will significantly contribute to understanding the scalingeffect of dense retrieval models and offer meaningful guidance for futureresearch endeavors.	放大神经模型已经在大量任务中取得了重大进展，特别是在语言生成方面。以往的研究发现，神经模型的性能往往遵循可预测的标度律，与训练集大小和模型大小等因素相关。这种洞察力是非常宝贵的，特别是在大规模实验日益增长的资源密集型的情况下。然而，由于检索度量的离散性以及检索任务中训练数据和模型大小之间的复杂关系，这种尺度规律在密集检索中还没有得到充分的研究。在这项研究中，我们调查是否密集检索模型的性能遵循标度律作为其他神经模型。我们建议使用对比日志似然作为评估指标，并进行广泛的实验与密集的检索模型实施与不同数量的参数和训练与不同数量的注释数据。结果表明，在我们的设置下，密集检索模型的性能遵循与模型大小和注释数量相关的精确幂律尺度。此外，我们还研究了使用流行的数据增强方法进行缩放来评估注释质量的影响，并应用缩放定律来寻找预算线下的最佳资源分配策略。我们相信，这些见解将显着有助于理解密集检索模型的缩放效应，并为未来的研究工作提供有意义的指导。	code	0
Diffusion Models for Generative Outfit Recommendation	Yiyan Xu, Wenjie Wang, Fuli Feng, Yunshan Ma, Jizhi Zhang, Xiangnan He	National University of Singapore; University of Science and Technology of China	Outfit Recommendation (OR) in the fashion domain has evolved through twostages: Pre-defined Outfit Recommendation and Personalized Outfit Composition.However, both stages are constrained by existing fashion products, limitingtheir effectiveness in addressing users' diverse fashion needs. Recently, theadvent of AI-generated content provides the opportunity for OR to transcendthese limitations, showcasing the potential for personalized outfit generationand recommendation. To this end, we introduce a novel task called Generative OutfitRecommendation (GOR), aiming to generate a set of fashion images and composethem into a visually compatible outfit tailored to specific users. The keyobjectives of GOR lie in the high fidelity, compatibility, and personalizationof generated outfits. To achieve these, we propose a generative outfitrecommender model named DiFashion, which empowers exceptional diffusion modelsto accomplish the parallel generation of multiple fashion images. To ensurethree objectives, we design three kinds of conditions to guide the parallelgeneration process and adopt Classifier-Free-Guidance to enhance the alignmentbetween the generated images and conditions. We apply DiFashion on bothpersonalized Fill-In-The-Blank and GOR tasks and conduct extensive experimentson iFashion and Polyvore-U datasets. The quantitative and human-involvedqualitative evaluation demonstrate the superiority of DiFashion overcompetitive baselines.	服装推荐(OR)在时尚领域经历了两个阶段: 预先定义的服装推荐和个性化的服装组合。然而，这两个阶段都受到现有时尚产品的限制，限制了它们满足用户不同时尚需求的有效性。最近，人工智能生成内容的出现为 OR 提供了超越这些限制的机会，展示了个性化服装生成和推荐的潜力。为此，我们引入了一个新颖的任务，称为生成性出行推荐(GOR) ，旨在生成一组时尚图像，并将它们组成一个视觉兼容的服装定制给特定的用户。GOR 的关键目标在于高保真度、兼容性和生成服务的个性化。为了实现这些目标，我们提出了一个名为 DiFashion 的生成式服装推荐模型，它授权异常扩散模型来完成多个时尚图像的并行生成。为了保证三个目标，我们设计了三种条件来指导并行生成过程，并采用无分类器导引来增强生成的图像与条件之间的对齐。我们将 DiFashion 应用于个性化的填空和 GOR 任务，并在 iFashion 和 Polyvore-U 数据集上进行了广泛的实验。定量和人为参与的定性评价证明了 DiFashion 过度竞争基线的优越性。	code	0
Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity	Yu Hou, JinDuk Park, WonYong Shin	Yonsei University	A recent study has shown that diffusion models are well-suited for modelingthe generative process of user-item interactions in recommender systems due totheir denoising nature. However, existing diffusion model-based recommendersystems do not explicitly leverage high-order connectivities that containcrucial collaborative signals for accurate recommendations. Addressing thisgap, we propose CF-Diff, a new diffusion model-based collaborative filtering(CF) method, which is capable of making full use of collaborative signals alongwith multi-hop neighbors. Specifically, the forward-diffusion process addsrandom noise to user-item interactions, while the reverse-denoising processaccommodates our own learning model, named cross-attention-guided multi-hopautoencoder (CAM-AE), to gradually recover the original user-item interactions.CAM-AE consists of two core modules: 1) the attention-aided AE module,responsible for precisely learning latent representations of user-iteminteractions while preserving the model's complexity at manageable levels, and2) the multi-hop cross-attention module, which judiciously harnesses high-orderconnectivity information to capture enhanced collaborative signals. Throughcomprehensive experiments on three real-world datasets, we demonstrate thatCF-Diff is (a) Superior: outperforming benchmark recommendation methods,achieving remarkable gains up to 7.29Theoretically-validated: reducing computations while ensuring that theembeddings generated by our model closely approximate those from the originalcross-attention, and (c) Scalable: proving the computational efficiency thatscales linearly with the number of users or items.	最近的一项研究表明，由于扩散模型的去噪特性，它非常适合于模拟推荐系统中用户-项目交互的生成过程。然而，现有的基于扩散模型的推荐系统并没有明确地利用包含关键协作信号的高阶连接来获得准确的推荐。为了解决这一问题，我们提出了基于扩散模型的协同过滤(CF)方法 CF-Diff，该方法能够充分利用协作信号和多跳邻居。具体而言，前向扩散过程在用户项目交互中加入随机噪声，而反向去噪过程适应我们自己的学习模型，即交叉注意引导的多跳自动编码器(CAM-AE) ，以逐渐恢复原始的用户项目交互。 CAM-AE 由两个核心模块组成: 1)注意辅助 AE 模块，负责精确学习用户项目交互的潜在表征，同时将模型的复杂性保持在可管理的水平; 2)多跳交叉注意模块，明智地利用高阶连通性信息捕获增强的协作信号。通过对三个真实世界数据集的全面实验，我们证明 CF-Diff 是(a)优越的: 表现优于基准推荐方法，达到显着的收益高达7.29理论验证: 减少计算，同时确保我们的模型生成的嵌入接近那些来自原始交叉注意力，和(c)可伸缩: 证明计算效率与用户或项目的数量成线性关系。	code	0
Graph Signal Diffusion Model for Collaborative Filtering	Yunqin Zhu, Chao Wang, Qi Zhang, Hui Xiong	University of Science and Technology of China; Shanghai AI Laboratory; The Hong Kong University of Science and Technology	Collaborative filtering is a critical technique in recommender systems. Amongvarious methods, an increasingly popular paradigm is to reconstruct user-iteminteractions based on the historical observations. This can be viewed as aconditional generative task, where recently developed diffusion modeldemonstrates great potential. However, existing studies on diffusion modelslack effective solutions for modeling implicit feedback data. Particularly, theisotropic nature of the standard diffusion process fails to account for theheterogeneous dependencies among items, leading to a misalignment with thegraphical structure of the interaction space. Meanwhile, random noisedestroying personalized information in interaction vectors, causing difficultyin reverse reconstruction. In this paper, we make novel adaptions of diffusionmodel and propose Graph Signal Diffusion Model for Collaborative Filtering(named GiffCF). To better represent the high-dimensional and sparsedistribution of implicit feedback, we define a generalized form of denoisingdiffusion using heat equation on the item-item similarity graph. Our forwardprocess smooths interaction signals with an advanced family of graph filters.Hence, instead of losing information, it involves item-item similarities asbeneficial prior knowledge for recommendation. To reconstruct high-qualityinteractions, our reverse process iteratively refines and sharpens preferencesignals in a deterministic manner, where the update direction is conditioned onthe user history and computed from a carefully designed two-stage denoiser.Finally, through extensive experiments, we show that GiffCF effectivelyleverages the advantages of both diffusion model and graph signal processing,and achieves state-of-the-art performance on three benchmark datasets.	协同过滤是推荐系统中的一项关键技术。在各种方法中，一个日益流行的范例是基于历史观察重建用户-项目交互。这可以看作是条件生成任务，其中最近开发的扩散模型显示了巨大的潜力。然而，现有的扩散模型松弛有效解的研究隐式反馈数据建模。特别是，标准扩散过程的各向同性性质没有考虑到项目之间的非均匀依赖性，导致与相互作用空间的图形结构不一致。同时，随机噪声破坏交互矢量中的个性化信息，给反向重建带来困难。在本文中，我们对扩散模型进行了新的改进，并提出了协同过滤的图形信号扩散模型(GiffCF)。为了更好地表示隐式反馈的高维稀疏分布，我们在项目-项目相似图上利用热方程定义了一种广义形式的去噪扩散。我们的正向处理平滑交互信号与一个先进的图形滤波器家族。因此，它不仅不会丢失信息，而且涉及项目项相似性作为推荐的有益先验知识。为了重建高质量的交互，我们的反向过程以确定性方式迭代地细化和锐化偏好信号，其中更新方向以用户历史为条件，并从仔细设计的两阶段去噪器计算。最后，通过大量的实验表明，GifffCF 有效地利用了扩散模型和图形信号处理的优势，在三个基准数据集上实现了最先进的性能。	code	0
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models	YuAn Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng	Institute of Computing Technology, Chinese Academy of Sciences; University of Amsterdam	Adversarial ranking attacks have gained increasing attention due to theirsuccess in probing vulnerabilities, and, hence, enhancing the robustness, ofneural ranking models. Conventional attack methods employ perturbations at asingle granularity, e.g., word-level or sentence-level, to a target document.However, limiting perturbations to a single level of granularity may reduce theflexibility of creating adversarial examples, thereby diminishing the potentialthreat of the attack. Therefore, we focus on generating high-qualityadversarial examples by incorporating multi-granular perturbations. Achievingthis objective involves tackling a combinatorial explosion problem, whichrequires identifying an optimal combination of perturbations across allpossible levels of granularity, positions, and textual pieces. To address thischallenge, we transform the multi-granular adversarial attack into a sequentialdecision-making process, where perturbations in the next attack step areinfluenced by the perturbed document in the current attack step. Since theattack process can only access the final state without direct intermediatesignals, we use reinforcement learning to perform multi-granular attacks.During the reinforcement learning process, two agents work cooperatively toidentify multi-granular vulnerabilities as attack targets and organizeperturbation candidates into a final perturbation sequence. Experimentalresults show that our attack method surpasses prevailing baselines in bothattack effectiveness and imperceptibility.	由于对抗性排序攻击能够成功地探测漏洞，从而增强了神经排序模型的鲁棒性，因此受到了越来越多的关注。传统的攻击方法采用单一粒度的扰动，例如，单词级或句子级的攻击目标文档。然而，将扰动限制在单一粒度级别可能会降低创建敌对示例的灵活性，从而减少攻击的潜在威胁。因此，我们的重点是通过结合多粒度扰动生成高质量的对抗性例子。实现这个目标需要解决一个组合爆炸问题，这需要在所有可能的粒度、位置和文本块级别上识别出扰动的最佳组合。为了应对这一挑战，我们将多粒度对抗性攻击转化为一个顺序决策过程，在这个过程中，下一个攻击步骤中的扰动会受到当前攻击步骤中受到干扰的文档的影响。由于攻击过程只能在没有直接中间信号的情况下访问最终状态，因此我们使用强化学习来执行多粒度攻击。在强化学习过程中，两个代理协同工作，将多粒度漏洞识别为攻击目标，并将扰动候选者组织成最终的扰动序列。实验结果表明，我们的攻击方法在攻击效果和不可感知性方面都优于现有的基准。	code	0
Optimal Transport Enhanced Cross-City Site Recommendation	Xinhang Li, Xiangyu Zhao, Zihao Wang, Yang Duan, Yong Zhang, Chunxiao Xing	City University of Hong Kong; HKUST; Tsinghua University; UIUC	Site recommendation, which aims at predicting the optimal location for brands to open new branches, has demonstrated an important role in assisting decision-making in modern business. In contrast to traditional recommender systems that can benefit from extensive information, site recommendation starkly suffers from extremely limited information and thus leads to unsatisfactory performance. Therefore, existing site recommendation methods primarily focus on several specific name brands and heavily rely on fine-grained human-crafted features to avoid the data sparsity problem. However, such solutions are not able to fulfill the demand for rapid development in modern business. Therefore, we aim to alleviate the data sparsity problem by effectively utilizing data across multiple cities and thereby propose a novel Optimal Transport enhanced Cross-city (OTC) framework for site recommendation. Specifically, OTC leverages optimal transport (OT) on the learned embeddings of brands and regions separately to project the brands and regions from the source city to the target city. Then, the projected embeddings of brands and regions are utilized to obtain the inference recommendation in the target city. By integrating the original recommendation and the inference recommendations from multiple cities, OTC is able to achieve enhanced recommendation results. The experimental results on the real-world OpenSiteRec dataset, encompassing thousands of brands and regions across four metropolises, demonstrate the effectiveness of our proposed OTC in further improving the performance of site recommendation models.	网站推荐，旨在预测最佳位置的品牌开设新的分支机构，已经显示了在协助决策的重要作用，在现代企业。与可以从大量信息中受益的传统推荐系统不同，网站推荐严重受限于极其有限的信息，从而导致不能令人满意的性能。因此，现有的网站推荐方法主要集中在几个特定的品牌，并严重依赖于细粒度的人工精心制作的功能，以避免数据稀疏问题。然而，这样的解决方案并不能满足现代企业快速发展的需要。因此，我们的目标是通过有效地利用跨多个城市的数据来缓解数据稀疏性问题，从而提出一种新的最优交通增强跨城市(OTC)的网站推荐框架。具体来说，OTC 利用最优运输(OT)对品牌和区域的学习嵌入，分别从源头城市向目标城市投射品牌和区域。然后，利用品牌和区域的投影嵌入，得到目标城市的推理推荐。通过整合来自多个城市的原始推荐和推理推荐，OTC 能够获得更好的推荐结果。在现实世界 OpenSiteRec 数据集上的实验结果，涵盖了四个大都市的数千个品牌和地区，证明了我们提出的 OTC 在进一步提高网站推荐模型性能方面的有效性。	code	0
Disentangled Contrastive Hypergraph Learning for Next POI Recommendation	Yantong Lai, Yijun Su, Lingwei Wei, Tianqi He, Haitao Wang, Gaode Chen, Daren Zha, Qiang Liu, Xingxing Wang	JD iCity, JD Technology; Institute of Information Engineering, Chinese Academy of Sciences; Meituan	Next point-of-interest (POI) recommendation has been a prominent and trending task to provide next suitable POI suggestions for users. Most existing sequential-based and graph neural network-based methods have explored various approaches to modeling user visiting behaviors and have achieved considerable performances. However, two key issues have received less attention: i) Most previous studies have ignored the fact that user preferences are diverse and constantly changing in terms of various aspects, leading to entangled and suboptimal user representations. ii) Many existing methods have inadequately modeled the crucial cooperative associations between different aspects, hindering the ability to capture complementary recommendation effects during the learning process. To tackle these challenges, we propose a novel framework Disentangled Contrastive Hypergraph Learning (DCHL) for next POI recommendation. Specifically, we design a multi-view disentangled hypergraph learning component to disentangle intrinsic aspects among collaborative, transitional and geographical views with adjusted hypergraph convolutional networks. Additionally, we propose an adaptive fusion method to integrate multi-view information automatically. Finally, cross-view contrastive learning is employed to capture cooperative associations among views and reinforce the quality of user and POI representations based on self-discrimination. Extensive experiments on three real-world datasets validate the superiority of our proposal over various state-of-the-arts. To facilitate future research, our code is available at https://github.com/icmpnorequest/SIGIR2024_DCHL.	下一个感兴趣的点(POI)建议已经成为一个突出和趋势性的任务，为用户提供下一个合适的 POI 建议。现有的基于序列的和基于图神经网络的方法已经探索了各种用户访问行为的建模方法，并取得了可观的性能。然而，有两个关键问题受到的关注较少: i)大多数以前的研究忽略了这样一个事实，即用户偏好是多样的，并在各个方面不断变化，导致纠缠和次优的用户表示。(2)许多现有的方法对不同方面之间的关键合作关系建模不足，影响了学习过程中获取互补推荐效应的能力。为了应对这些挑战，我们提出了一个新的框架，对比超图学习(DCHL)的下一个 POI 建议。具体来说，我们设计了一个多视图分离超图学习组件，用于分离协作视图、过渡视图和地理视图之间的内在关系，并对超图卷积网络进行了调整。此外，本文还提出了一种自适应融合方法来实现多视点信息的自动融合。最后，利用跨视角对比学习来捕捉视图之间的合作关联，提高基于自我歧视的用户和 POI 表示的质量。在三个真实世界数据集上的大量实验验证了我们的方案相对于各种最新技术的优越性。为方便日后进行研究，我们的代码已上载至 https://github.com/icmpnorequest/sigir2024_dchl。	code	0
CLLP: Contrastive Learning Framework Based on Latent Preferences for Next POI Recommendation	Hongli Zhou, Zhihao Jia, Haiyang Zhu, Zhizheng Zhang	Southeast University School of Computer Science and Engineering	Next Point-Of-Interest (POI) recommendation plays an important role in various location-based services. Its main objective is to predict the users' next interested POI based on their previous check-in information. Most existing studies view the next POI recommendation as a sequence prediction problem but pay little attention to the fine-grained latent preferences of users, neglecting the diversity of user motivations on visiting the POIs. In this paper, we propose a contrastive learning framework based on latent preferences (CLLP) for next POI recommendation, which models the latent preference distributions of users at each POI and then yield disentangled latent preference representations. Specifically, we leverage the cross-local and global spatio-temporal contexts to learn POI representations for dynamically modeling user preferences. And we design a novel distillation strategy to make full use of the collaborative signals from other users for representation optimization. Then, we disentangle multiple latent preferences in POI representations using predefined preference prototypes, while leveraging preference-level contrastive learning to encourage independence of different latent preferences by improving the quality of latent preference representation space. Meanwhile, we employ a multi-task training strategy to jointly optimize all parameters. Experimental results on two real-world datasets show that CLLP achieves the state-of-the-art performance and significantly outperforms all existing solutions. Further investigations demonstrate the robustness of CLLP against sparse and noisy data.	下一个兴趣点(POI)推荐在各种基于位置的服务中起着重要作用。它的主要目标是根据用户以前的签入信息预测用户下一个感兴趣的 POI。现有的研究大多将下一个 POI 推荐视为一个序列预测问题，而忽视了用户访问 POI 的细粒度潜在偏好，忽视了用户访问 POI 动机的多样性。本文提出了一个基于潜在偏好的对比学习框架(CLLP) ，用于下一个 POI 推荐，该框架对每个 POI 用户的潜在偏好分布进行建模，然后产生解纠缠的潜在偏好表示。具体来说，我们利用跨局部和全局的时空上下文来学习用于动态建模用户偏好的 POI 表示。并设计了一种新的精馏策略，充分利用其他用户的协同信号进行表示优化。然后，利用预定义的偏好原型对 POI 表示中的多个潜在偏好进行分离，同时利用偏好水平对比学习通过提高潜在偏好表示空间的质量来鼓励不同潜在偏好的独立性。同时，采用多任务训练策略，对各参数进行联合优化。在两个实际数据集上的实验结果表明，CLLP 算法的性能达到了最高水平，明显优于现有的所有解决方案。进一步的研究证明了 CLLP 算法对稀疏和噪声数据的鲁棒性。	code	0
OpenSiteRec: An Open Dataset for Site Recommendation	Xinhang Li, Xiangyu Zhao, Yejing Wang, Yu Liu, Chong Chen, Cheng Long, Yong Zhang, Chunxiao Xing		As a representative information retrieval task, site recommendation, which aims at predicting the optimal sites for a brand or an institution to open new branches in an automatic data-driven way, is beneficial and crucial for brand development in modern business. However, there is no publicly available dataset so far and most existing approaches are limited to an extremely small scope of brands, which seriously hinders the research on site recommendation. Therefore, we collect, construct and release an open comprehensive dataset, namely OpenSiteRec, to facilitate and promote the research on site recommendation. Specifically, OpenSiteRec leverages a heterogeneous graph schema to represent various types of real-world entities and relations in four international metropolises. To evaluate the performance of the existing general methods on the site recommendation task, we conduct benchmarking experiments of several representative recommendation models on OpenSiteRec. Furthermore, we also highlight the potential application directions to demonstrate the wide applicability of OpenSiteRec. We believe that our OpenSiteRec dataset is significant and anticipated to encourage the development of advanced methods for site recommendation. OpenSiteRec is available online at https://OpenSiteRec.github.io/.	网站推荐作为一项具有代表性的信息检索任务，旨在预测一个品牌或机构以自动数据驱动的方式开设新分支机构的最佳网站，对于现代商业中的品牌发展是有益的，也是至关重要的。然而，到目前为止还没有公开的数据集，大多数现有的方法仅限于极小范围的品牌，这严重阻碍了对网站推荐的研究。因此，我们收集、构建和发布一个开放的综合数据集，即 OpenSiteRec，以促进和推动网站推荐的研究。具体来说，OpenSiteRec 利用异构图模式来表示四个国际大都市中各种类型的现实世界实体和关系。为了评估现有的一般方法在站点推荐任务中的性能，我们在 OpenSiteRec 上对几种有代表性的推荐模型进行了基准测试实验。此外，我们还强调了潜在的应用程序方向，以展示 OpenSiteRec 的广泛适用性。我们相信我们的 OpenSiteRec 数据集是重要的，并且预计将鼓励开发用于网站推荐的高级方法。OpenSiteRec 可于网上 https://OpenSiteRec.github.io/下载。	code	0
Fairness-Aware Exposure Allocation via Adaptive Reranking	Thomas Jänich, Graham McDonald, Iadh Ounis	University of Glasgow	In the first stage of a re-ranking pipeline, an inexpensive ranking model is typically deployed to retrieve a set of documents that are highly likely to be relevant to the user's query. The retrieved documents are then re-ranked by a more effective but expensive ranking model, e.g., a deep neural ranker such as BERT. However, in such a standard pipeline, no new documents are typically discovered after the first stage retrieval. Hence, the amount of exposure that a particular group of documents - e.g., documents from a particular demographic category - can receive is limited by the number of documents that are retrieved in the first stage retrieval. Indeed, if too few documents from a group are retrieved in the first stage retrieval, ensuring that the group receives a fair amount of exposure to the user may become infeasible. Therefore, it is useful to identify more documents from underrepresented groups that are potentially relevant to the query during the re-ranking stage. In this work, we investigate how deploying adaptive re-ranking, which enables the discovery of additional potentially relevant documents in the re-ranking stage, can improve the exposure that a given group of documents receives in the final ranking. We propose six adaptive re-ranking policies that can discover documents from underrepresented groups to increase the disadvantaged groups' exposure in the final ranking. Our experiments on the TREC 2021 and 2022 Fair Ranking Track test collections show that our policies consistently improve the fairness of the exposure distribution in the final ranking, compared to standard adaptive re-ranking approaches, resulting in increases of up to ~13% in Attention Weighted Ranked Fairness (AWRF). Moreover, our best performing policy, Policy 6, consistently maintains and frequently increases the utility of the search results in terms of nDCG.	在重新排序管道的第一阶段，通常会部署一个廉价的排序模型来检索一组很可能与用户的查询相关的文档。然后，检索到的文档通过一个更有效但代价更高的排序模型进行重新排序，例如，一个像 BERT 这样的深度神经排序器。但是，在这样的标准管道中，在第一阶段检索之后通常不会发现新文档。因此，一组特定文件(例如，来自特定人口类别的文件)能够接触的数量受到在第一阶段检索中检索到的文件数量的限制。实际上，如果在第一阶段检索时检索到的来自某个组的文档太少，那么确保该组获得相当数量的用户暴露就可能变得不可行。因此，在重新排序阶段，从代表性不足的群体中识别出更多可能与查询相关的文档是有用的。在这项工作中，我们研究如何部署自适应重新排序，这使得发现额外的潜在相关的文件在重新排序阶段，可以提高曝光，一组文件接收到的最终排名。我们提出了六个适应性重新排序的政策，可以发现文件来自代表性不足的群体，以增加弱势群体的曝光在最终的排名。我们在 TREC 2021和2022公平排名跟踪测试集合上的实验表明，与标准的自适应重新排名方法相比，我们的政策持续改善了最终排名中暴露分布的公平性，导致注意力加权公平性(AWRF)增加了约13% 。此外，我们表现最好的策略，策略6，始终保持并频繁增加搜索结果在 nDCG 方面的效用。	code	0
A Taxation Perspective for Fair Re-ranking	Chen Xu, Xiaopeng Ye, Wenjie Wang, Liang Pang, Jun Xu, TatSeng Chua	Institute of Computing Technology; National University of Singapore; Renmin University of China	Fair re-ranking aims to redistribute ranking slots among items more equitablyto ensure responsibility and ethics. The exploration of redistribution problemshas a long history in economics, offering valuable insights for conceptualizingfair re-ranking as a taxation process. Such a formulation provides us with afresh perspective to re-examine fair re-ranking and inspire the development ofnew methods. From a taxation perspective, we theoretically demonstrate thatmost previous fair re-ranking methods can be reformulated as an item-level taxpolicy. Ideally, a good tax policy should be effective and convenientlycontrollable to adjust ranking resources. However, both empirical andtheoretical analyses indicate that the previous item-level tax policy cannotmeet two ideal controllable requirements: (1) continuity, ensuring minorchanges in tax rates result in small accuracy and fairness shifts; (2)controllability over accuracy loss, ensuring precise estimation of the accuracyloss under a specific tax rate. To overcome these challenges, we introduce anew fair re-ranking method named Tax-rank, which levies taxes based on thedifference in utility between two items. Then, we efficiently optimize such anobjective by utilizing the Sinkhorn algorithm in optimal transport. Upon acomprehensive analysis, Our model Tax-rank offers a superior tax policy forfair re-ranking, theoretically demonstrating both continuity andcontrollability over accuracy loss. Experimental results show that Tax-rankoutperforms all state-of-the-art baselines in terms of effectiveness andefficiency on recommendation and advertising tasks.	公平重新排名的目的是在项目之间更公平地重新分配排名位置，以确保责任和道德。再分配问题的探索在经济学中有着悠久的历史，为将公平再分配概念化为一个税收过程提供了有价值的见解。这样的表述为我们重新审视公平重排提供了新的视角，也启发了新方法的发展。从税收的角度，我们从理论上证明了大多数以前的公平重新排序方法可以重新制定为项目级的税收政策。理想情况下，一个好的税收政策应该是有效的，方便的，可控的，以调整等级资源。然而，实证和理论分析都表明，以往的项目级税收政策不能满足两个理想的可控要求: (1)连续性，确保税率的微小变化导致小的准确性和公平性转移; (2)准确性损失的可控性，确保在特定税率下准确估计准确性损失。为了克服这些挑战，我们引入了一种新的公平重新排序方法——税级法，该方法根据两个项目之间的效用差异来征税。然后，利用 Sinkhorn 算法对该目标进行有效的优化。在综合分析的基础上，我们的税级模型为公平重排提供了一个优越的税收政策，从理论上证明了精度损失的连续性和可控性。实验结果表明，就推荐和广告任务的有效性和效率而言，税收排名优于所有最先进的基线。	code	0
A Dual-Embedding Based DQN for Worker Recruitment in Spatial Crowdsourcing with Social Network	Yucen Gao, Wei Liu, Jianxiong Guo, Xiaofeng Gao, Guihai Chen	Shanghai Jiao Tong University; Peking University; Beijing Normal University	Spatial Crowdsourcing (SC) is a promising service that incentives workers to finish location-based tasks with high quality by providing rewards. Worker recruitment is a core issue in SC, for which most state-of-the-art algorithms focus on designing incentive mechanisms based on the existing SC worker pool. However, they may fail when the number of SC workers is not enough, especially for the new SC platforms. In recent years, social networks have been found to be helpful for worker recruitment by selecting seed workers to spread the task information so as to inspire more social users to participate, but how to select seed workers remains a challenge. Existing methods typically require numerous iterative searches leading to inefficiency in facing the big picture and failing to cope with dynamic environments. In the paper, we formulate the Effective Coverage Maximization (ECM) problem. We prove that the ECM problem is NP-hard and propose a novel worker recruitment method combined with the dual-embedding and Rainbow Deep Q-network (DQN), which is called DQNSelector. The dual-embedding extracts long-range social influence information from the social network and near-range coverage quality information from the geographic information map using the inner-product method and our proposed efficient Path Increment Iterative Calculation (PIIC) algorithm respectively. We then combine the dual embedding to design a Rainbow DQN-based reinforcement learning model so as to select seed workers. Extensive experiments and ablation studies based on real-world datasets verify the superiority of DQNSelector.	空间众包(SC)是一项很有前途的服务，它通过提供奖励激励员工高质量地完成基于位置的任务。员工招聘是供应链管理的核心问题，目前大多数最先进的算法都是在现有供应链员工库的基础上设计激励机制。然而，当供应链工作者的数量不足时，尤其是对于新的供应链平台而言，供应链管理可能会失败。近年来，人们发现社会网络有助于种子工人的招聘，通过选择种子工人来传播任务信息，以激励更多的社会用户参与，但如何选择种子工人仍然是一个挑战。现有的方法通常需要大量的迭代搜索，这会导致在面对大局和不能处理动态环境时效率低下。本文研究了有效覆盖最大化问题。我们证明了 ECM 问题是 NP 难的，并提出了一种结合双嵌入和彩虹深 Q 网络(DQN)的新的工人招聘方法，称为 DQNSelector。双嵌入算法分别采用内积法和有效的路径增量迭代计算(PIIC)算法从社交网络中提取远程社会影响信息，从地理信息图中提取近程覆盖质量信息。然后，我们结合双嵌入设计了一个基于彩虹 DQN 的强化学习模型，以便选择种子工人。基于实际数据集的大量实验和烧蚀研究验证了 DQNSelector 的优越性。	code	0
Efficient Community Search Based on Relaxed k-Truss Index	Xiaoqin Xie, Shuangyuan Liu, Jiaqi Zhang, Shuai Han, Wei Wang, Wu Yang	Harbin Engineering University College of Computer Science and Technology	Communities are prevalent in large graphs such as social networks, protein networks, etc. Community search aims to find a cohesive subgraph that contains the query nodes. Existing community search algorithms often adopt community models to find target communities, and k-truss model is a popularly used one that provides structural constraints. However, the structural constraints presented by k-truss is so tight that the searching algorithm often can not find the target communities. There always exist some subgraphs that may not conform to k-truss structure but do have cohesive characteristics to meet users' personalized requirements. Moreover, the k-truss based community search algorithms can not meet users' real-time demands on large graphs. To address the above problems, this paper proposes the relaxed k-truss community search problem for the first time. Then we construct a relaxed k-truss index, which can help to find cohesive communities in linear time and provide flexible searching for nested communities. We also design an index maintenance algorithm to dynamically update the index. Furthermore, a community search algorithm based on the relaxed k-truss index is presented. Extensive experimental results on real datasets prove the effectiveness and efficiency of our model and algorithms.	社区在社会网络、蛋白质网络等大型图表中普遍存在。社区搜索的目的是找到一个包含查询节点的内聚子图。现有的社区搜索算法往往采用社区模型来寻找目标社区，而 k- 桁架模型是一种常用的提供结构约束的模型。然而，由于 k- 桁架所表示的结构约束过于严格，搜索算法往往无法找到目标群落。总是存在一些子图，这些子图可能不符合 k- 桁架结构，但具有内聚特性，以满足用户的个性化需求。此外，基于 k- 桁架的社区搜索算法不能满足用户对大图的实时性要求。针对上述问题，本文首次提出了松弛 k 桁架群体搜索问题。然后构造一个松弛 k- 桁架索引，它可以帮助在线性时间内找到内聚群落，并为嵌套群落提供灵活的搜索。设计了索引维护算法，实现了索引的动态更新。在此基础上，提出了一种基于松弛 k- 桁架索引的社区搜索算法。在实际数据集上的大量实验结果证明了该模型和算法的有效性和高效性。	code	0
Untargeted Adversarial Attack on Knowledge Graph Embeddings	Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Qika Lin, Yuxia Geng, Jun Liu	School of Computer Science, Hangzhou Dianzi University; Xi'an Jiaotong University; School of Computer Science and Technology, Xi'an Jiaotong University; National University of Singapore; Department of Computer Science, The University of Manchester	Knowledge graph embedding (KGE) methods have achieved great success inhandling various knowledge graph (KG) downstream tasks. However, KGE methodsmay learn biased representations on low-quality KGs that are prevalent in thereal world. Some recent studies propose adversarial attacks to investigate thevulnerabilities of KGE methods, but their attackers are target-oriented withthe KGE method and the target triples to predict are given in advance, whichlacks practicability. In this work, we explore untargeted attacks with the aimof reducing the global performances of KGE methods over a set of unknown testtriples and conducting systematic analyses on KGE robustness. Considering logicrules can effectively summarize the global structure of a KG, we developrule-based attack strategies to enhance the attack efficiency. In particular,weconsider adversarial deletion which learns rules, applying the rules to scoretriple importance and delete important triples, and adversarial addition whichcorrupts the learned rules and applies them for negative triples asperturbations. Extensive experiments on two datasets over three representativeclasses of KGE methods demonstrate the effectiveness of our proposed untargetedattacks in diminishing the link prediction results. And we also find thatdifferent KGE methods exhibit different robustness to untargeted attacks. Forexample, the robustness of methods engaged with graph neural networks and logicrules depends on the density of the graph. But rule-based methods like NCRL areeasily affected by adversarial addition attacks to capture negative rules	知识图嵌入(KGE)方法在处理各种知识图(KG)下游任务方面取得了巨大的成功。然而，KGE 方法可能会学习在现实世界中普遍存在的低质量幼儿园的偏见表示。最近的一些研究提出用对抗性攻击来研究 KGE 方法的脆弱性，但是 KGE 方法的攻击者是面向目标的，而且提前给出了预测的目标三元组，缺乏实用性。在这项工作中，我们探讨非目标攻击的目的是降低全局性能的 KGE 方法在一组未知的三元组和进行系统的 KGE 鲁棒性分析。考虑到逻辑规则可以有效地概括 KG 的全局结构，我们开发了基于规则的攻击策略来提高攻击效率。特别地，我们考虑了学习规则的对抗性删除，应用规则来确定重要性并删除重要的三元组，以及对抗性加法，它破坏了学习规则并将其应用于负的三元组扰动。在三种典型的 KGE 方法上对两个数据集进行了大量的实验，证明了我们提出的非目标攻击在减少链路预测结果方面的有效性。我们还发现不同的 KGE 方法对非目标攻击具有不同的鲁棒性。例如，使用图神经网络和逻辑规则的方法的鲁棒性取决于图的密度。但是像 NCRL 这样基于规则的方法很容易受到对抗性加法攻击的影响，以捕获负面规则	code	0
Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval	Guangyuan Ma, Xing Wu, Zijia Lin, Songlin Hu	Tsinghua University; Chinese Academy of Sciences; Institute of Information Engineering, Chinese Academy of Sciences	Masked auto-encoder pre-training has emerged as a prevalent technique forinitializing and enhancing dense retrieval systems. It generally utilizesadditional Transformer decoder blocks to provide sustainable supervisionsignals and compress contextual information into dense representations.However, the underlying reasons for the effectiveness of such a pre-trainingtechnique remain unclear. The usage of additional Transformer-based decodersalso incurs significant computational costs. In this study, we aim to shedlight on this issue by revealing that masked auto-encoder (MAE) pre-trainingwith enhanced decoding significantly improves the term coverage of input tokensin dense representations, compared to vanilla BERT checkpoints. Building uponthis observation, we propose a modification to the traditional MAE by replacingthe decoder of a masked auto-encoder with a completely simplified Bag-of-Wordprediction task. This modification enables the efficient compression of lexicalsignals into dense representations through unsupervised pre-training.Remarkably, our proposed method achieves state-of-the-art retrieval performanceon several large-scale retrieval benchmarks without requiring any additionalparameters, which provides a 67auto-encoder pre-training with enhanced decoding.	掩模自动编码器预训练已经成为初始化和增强密集检索系统的一种流行技术。它通常使用额外的变压器解码器块来提供可持续的监督信号，并将上下文信息压缩成密集的表示。然而，这种预培训技术有效性的根本原因仍然不清楚。使用额外的基于变压器的解码器也会带来巨大的计算成本。在这项研究中，我们的目标是通过揭示掩码自动编码器(MAE)预训练与增强的解码显着提高输入标记在密集表示中的术语覆盖率来阐明这个问题，与普通的 BERT 检查点相比。在此基础上，我们提出了一种改进的传统 MAE 方法，用一个完全简化的“字包”预测任务来代替掩码自动编码器的解码器。这种修改使得词汇信号能够通过无监督的预训练有效地压缩成密集的表示。值得注意的是，我们提出的方法在不需要任何额外参数的情况下，在几个大规模检索基准上实现了最先进的检索性能，它提供了一个具有增强解码的67自动编码器预训练。	code	0
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval	Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo	Ant Group Multimodal Learning; Ant Group; Ant Group Zhifubao	We present a Multi-Modal Recipe for Advancing Adaptation-based Pre-trainingtowards effective and efficient zero-shot video-text retrieval, dubbed M2-RAAP.Upon popular image-text models like CLIP, most current adaptation-basedvideo-text pre-training methods are confronted by three major issues, i.e.,noisy data corpus, time-consuming pre-training, and limited performance gain.Towards this end, we conduct a comprehensive study including four criticalsteps in video-text pre-training. Specifically, we investigate 1) datafiltering and refinement, 2) video input type selection, 3) temporal modeling,and 4) video feature enhancement. We then summarize this empirical study intothe M2-RAAP recipe, where our technical contributions lie in 1) the datafiltering and text re-writing pipeline resulting in 1M high-quality bilingualvideo-text pairs, 2) the replacement of video inputs with key-frames toaccelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy toenhance video features. We conduct extensive experiments by adapting threeimage-text foundation models on two refined video-text datasets from differentlanguages, validating the robustness and reproducibility of M2-RAAP foradaptation-based pre-training. Results demonstrate that M2-RAAP yields superiorperformance with significantly reduced data (-90establishing a new SOTA on four English zero-shot retrieval datasets and twoChinese ones. We are preparing our refined bilingual data annotations andcodebase, which will be available athttps://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_RAAP.	我们提出了一个推进适应性的多模态配方，基于有效和高效的零拍视频文本检索预训练，称为 M2-RAAP。在 CLIP 等流行的图像文本预训练模型中，目前大多数基于自适应的视频文本预训练方法都面临着三个主要问题: 噪声数据库、耗时的预训练以及有限的性能增益。为此，我们进行了一个全面的研究，包括四个关键步骤的视频文本预训练。具体来说，我们研究1)数据过滤和细化，2)视频输入类型选择，3)时间建模，和4)视频特征增强。然后，我们将这一实证研究总结为 M2-RAAP 配方，其中我们的技术贡献在于: 1)数据过滤和文本重写流水线，导致1M 高质量的双语视频文本对; 2)用关键帧替换视频输入以加速预训练; 3)辅助字幕引导(ACG)策略以增强视频特性。通过在两个不同语言的视频文本数据集上调整三个基于图像文本的模型，验证了 M2-RAAP 在基于自适应的预训练中的鲁棒性和可重复性。结果表明，M2-RAAP 算法在显著减少数据量(- 90的情况下，对4个英文零镜头检索数据集和2个中文零镜头检索数据集建立了新的 SOTA。我们正在准备我们精心制作的双语数据注释和代码库，可通过 https:// github.com/alipay/ant-multi-modal-framework/tree/main/prj/m2_raap 查阅。	code	0
CaLa: Complementary Association Learning for Augmenting Comoposed Image Retrieval	Xintong Jiang, Yaxiong Wang, Mengjian Li, Yujiao Wu, Bingwen Hu, Xueming Qian	Hefei University of Technology; Xi'an Jiaotong University School of Software; Xi'an Jiaotong University; Zhejiang Lab; Anhui University of Technology; CSRIO	Composed Image Retrieval (CIR) involves searching for target images based onan image-text pair query. While current methods treat this as a query-targetmatching problem, we argue that CIR triplets contain additional associationsbeyond this primary relation. In our paper, we identify two new relationswithin triplets, treating each triplet as a graph node. Firstly, we introducethe concept of text-bridged image alignment, where the query text serves as abridge between the query image and the target image. We propose a hinge-basedcross-attention mechanism to incorporate this relation into network learning.Secondly, we explore complementary text reasoning, considering CIR as a form ofcross-modal retrieval where two images compose to reason about complementarytext. To integrate these perspectives effectively, we design a twinattention-based compositor. By combining these complementary associations withthe explicit query pair-target image relation, we establish a comprehensive setof constraints for CIR. Our framework, CaLa (Complementary Association Learningfor Augmenting Composed Image Retrieval), leverages these insights. We evaluateCaLa on CIRR and FashionIQ benchmarks with multiple backbones, demonstratingits superiority in composed image retrieval.	复合图像检索(CIR)涉及到基于图像-文本对查询的目标图像搜索。虽然目前的方法将其视为一个查询-目标匹配问题，但我们认为 CIR 三联包含了超出这个主要关系的其他关联。在本文中，我们确定了两个新的关系在三元组，治疗每个三元组作为一个图节点。首先，我们引入文本桥接图像对齐的概念，其中查询文本作为查询图像和目标图像之间的桥梁。我们提出了一种基于铰链的交叉注意机制，将这种关系纳入网络学习。其次，我们探讨了互补文本推理，考虑到 CIR 作为一种跨模态检索的形式，其中两个图像组成的互补文本的推理。为了有效地整合这些视角，我们设计了一个基于双注意的排序器。通过将这些互补关联与显式查询对目标图像关系相结合，建立了一套完整的 CIR 约束条件。我们的框架 CaLa (增强合成图像检索的互补关联学习)利用了这些见解。我们在 CIRR 和 FashionIQ 基准上对 CaLa 进行了多骨干评价，证明了其在合成图像检索中的优越性。	code	0
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora	Zijun Long, Xuri Ge, Richard McCreadie, Joemon M. Jose	University of Glasgow	Text-to-image retrieval aims to find the relevant images based on a textquery, which is important in various use-cases, such as digital libraries,e-commerce, and multimedia databases. Although Multimodal Large Language Models(MLLMs) demonstrate state-of-the-art performance, they exhibit limitations inhandling large-scale, diverse, and ambiguous real-world needs of retrieval, dueto the computation cost and the injective embeddings they produce. This paperpresents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework,designed for fast and effective large-scale long-text to image retrieval. Thefirst stage, Entity-based Ranking (ER), adapts to long-text query ambiguity byemploying a multiple-queries-to-multiple-targets paradigm, facilitatingcandidate filtering for the next stage. The second stage, Summary-basedRe-ranking (SR), refines these rankings using summarized queries. We alsopropose a specialized Decoupling-BEiT-3 encoder, optimized for handlingambiguous user needs and both stages, which also enhances computationalefficiency through vector-based similarity inference. Evaluation on the AToMiCdataset reveals that CFIR surpasses existing MLLMs by up to 11.06Recall@1000, while reducing training and retrieval times by 68.75respectively. We will release our code to facilitate future research athttps://github.com/longkukuhi/CFIR.	文本图像检索是基于文本查询的相关图像检索，在数字图书馆、电子商务、多媒体数据库等领域具有重要意义。虽然多模态大语言模型(MLLM)展示了最先进的性能，但由于计算成本和它们产生的内射嵌入，它们在处理大规模、多样化和模糊的现实世界检索需求方面表现出局限性。本文提出了一个两阶段的粗细索引共享检索(CFIR)框架，用于快速有效的大规模长文本图像检索。第一阶段，基于实体的排名(ER) ，适应长文本查询模糊性通过采用多查询多目标范例，促进候选人过滤的下一阶段。第二个阶段，Summary-basedRe-rank (SR) ，使用汇总查询改进这些排名。我们还提出了一个专门的解耦 -BEiT-3编码器，优化处理模糊的用户需求和两个阶段，这也提高了计算效率通过向量相似性推理。对 AToMiC 数据集的评估显示，CFIR 超越现有的 MLLM 多达11.06 Recall@1000，同时分别减少了68.75的训练和检索时间。我们将发布我们的代码，以促进未来的研究 https:// github.com/longkukuhi/cfir。	code	0
CaseLink: Inductive Graph Learning for Legal Case Retrieval	Yanran Tang, Ruihong Qiu, Hongzhi Yin, Xue Li, Zi Huang	The University of Queensland; The University of Queensland School of Electrical Engineering and Computer Science	In case law, the precedents are the relevant cases that are used to supportthe decisions made by the judges and the opinions of lawyers towards a givencase. This relevance is referred to as the case-to-case reference relation. Toefficiently find relevant cases from a large case pool, retrieval tools arewidely used by legal practitioners. Existing legal case retrieval models mainlywork by comparing the text representations of individual cases. Although theyobtain a decent retrieval accuracy, the intrinsic case connectivityrelationships among cases have not been well exploited for case encoding,therefore limiting the further improvement of retrieval performance. In a casepool, there are three types of case connectivity relationships: the casereference relationship, the case semantic relationship, and the case legalcharge relationship. Due to the inductive manner in the task of legal caseretrieval, using case reference as input is not applicable for testing. Thus,in this paper, a CaseLink model based on inductive graph learning is proposedto utilise the intrinsic case connectivity for legal case retrieval, a novelGlobal Case Graph is incorporated to represent both the case semanticrelationship and the case legal charge relationship. A novel contrastiveobjective with a regularisation on the degree of case nodes is proposed toleverage the information carried by the case reference relationship to optimisethe model. Extensive experiments have been conducted on two benchmark datasets,which demonstrate the state-of-the-art performance of CaseLink. The code hasbeen released on https://github.com/yanran-tang/CaseLink.	在判例法中，判例是用来支持法官作出的裁决和律师对判例的意见的相关案例。这种相关性被称为案例对案例的参考关系。为了有效地从大量案例中找到相关案例，检索工具被法律从业者广泛使用。现有的法律案例检索模型主要是通过比较个案的文本表示来实现的。虽然它们获得了不错的检索准确性，但是案例之间的内在案例连通性关系还没有被很好地用于案例编码，因此限制了检索性能的进一步提高。在案例池中，有三种类型的案例连接关系: 案例引用关系、案例语义关系和案例法律费用关系。由于法律案例检索任务中的归纳方式，以案例参考为输入不适用于检验。为此，本文提出了一种基于归纳图学习的案例链接模型，该模型利用内在的案例连通性进行法律案例检索，并引入了一个新的全局案例图来表示案例语义关系和案例法律指控关系。提出了一种新的对比目标，利用案例参考关系所携带的信息对模型进行优化。在两个基准数据集上进行了大量的实验，这些实验证明了 CaseLink 的最新性能。密码已经在 https://github.com/yanran-tang/caselink 上发布了。	code	0
Explicitly Integrating Judgment Prediction with Legal Document Retrieval: A Law-Guided Generative Approach	Weicong Qin, Zelin Cao, Weijie Yu, Zihua Si, Sirui Chen, Jun Xu	University of International Business and Economics School of Information Technology and Management; University of Illinois at Urbana-Champaign; Renmin University of China Gaoling School of Artificial Intelligence	Legal document retrieval and judgment prediction are crucial tasks inintelligent legal systems. In practice, determining whether two documents sharethe same judgments is essential for establishing their relevance in legalretrieval. However, existing legal retrieval studies either ignore the vitalrole of judgment prediction or rely on implicit training objectives, expectinga proper alignment of legal documents in vector space based on their judgments.Neither approach provides explicit evidence of judgment consistency forrelevance modeling, leading to inaccuracies and a lack of transparency inretrieval. To address this issue, we propose a law-guided method, namely GEAR,within the generative retrieval framework. GEAR explicitly integrates judgmentprediction with legal document retrieval in a sequence-to-sequence manner.Experiments on two Chinese legal case retrieval datasets show the superiorityof GEAR over state-of-the-art methods while maintaining competitive judgmentprediction performance. Moreover, we validate its robustness across languagesand domains on a French statutory article retrieval dataset.	在智能法律系统中，法律文献检索和判决预测是至关重要的任务。在实践中，确定两份文件是否具有相同的判决对于确定它们在法律检索中的相关性至关重要。然而，现有的法律检索研究要么忽视了判断预测的重要作用，要么依赖于隐含的训练目标，期望法律文献在基于判断的向量空间中进行适当的对齐。这两种方法都没有为相关性建模提供明确的判断一致性证据，导致不准确和缺乏透明度检索。为了解决这个问题，我们提出了一个法律引导的方法，即 GEAR，在生成检索框架内。GEAR 明确地将判断预测与法律文献检索按顺序整合在一起。在两个中文案例检索数据集上的实验表明，在保持竞争性判断预测性能的同时，GEAR 方法优于现有的方法。此外，我们验证了它的鲁棒性跨语言和领域的法国法定文章检索数据集。	code	0
A Persona-Infused Cross-Task Graph Network for Multimodal Emotion Recognition with Emotion Shift Detection in Conversations	Geng Tu, Feng Xiong, Bin Liang, Ruifeng Xu	The Chinese University of Hong Kong, Hong Kong, China; Harbin Institute of Technology, Shenzhen, China	Recent research in Multimodal Emotion Recognition in Conversations (MERC) focuses on multimodal fusion and modeling speaker-sensitive context. In addition to contextual information, personality traits also affect emotional perception. However, current MERC methods solely consider the personality influence of speakers, neglecting speaker-addressee interaction patterns. Additionally, the bottleneck problem of Emotion Shift (ES), where consecutive utterances by the same speaker exhibit different emotions has been long neglected in MERC. Early ES research fails to distinguish diverse shift patterns and simply introduces whether shifts occur as knowledge into the MERC model without considering the complementary nature of the two tasks. Based on this, we propose a Persona-infused Cross-task Graph Network (PCGNet). It first models the speaker-addressee interactive relationships by the persona-infused refinement network. Then, it learns the auxiliary task of ES Detection and the main task of MERC using cross-task connections to capture correlations across two tasks. Finally, we introduce shift-aware contrastive learning to discern diverse shift patterns. Experimental results demonstrate that PCGNet outperforms state-of-the-art methods on two widely used datasets.	会话中多模态情绪识别(MERC)的研究主要集中在多模态融合和建模说话人敏感语境。除了上下文信息，人格特质也影响情感知觉。然而，目前的 MERC 方法只考虑说话人的个性影响，而忽略了说话人与收件人之间的交互模式。此外，情绪转移的瓶颈问题，即同一说话人的连续话语表现出不同的情绪，长期以来一直被人们所忽视。早期的 ES 研究未能区分不同的转移模式，只是简单地将转移是否作为知识发生引入 MERC 模型，而没有考虑两个任务的互补性。在此基础上，提出了一种基于人格的跨任务图形网络(PCGNet)。它首先通过人格注入精化网络建立了说话人与受话人之间的交互关系模型。然后，学习 ES 检测的辅助任务和 MERC 的主要任务，利用跨任务连接捕获两个任务之间的相关性。最后，我们引入移位意识对比学习来识别不同的移位模式。实验结果表明，PCGNet 在两个广泛使用的数据集上优于最先进的方法。	code	0
Analyzing and Mitigating Repetitions in Trip Recommendation	Wenzheng Shu, Kangqi Xu, Wenxin Tai, Ting Zhong, Yong Wang, Fan Zhou	University of Electronic Science and Technology of China; Hong Kong University of Science and Technology	Trip recommendation has emerged as a highly sought-after service over the past decade. Although current studies significantly understand human intention consistency, they struggle with undesired repetitive outcomes that need resolution. We make two pivotal discoveries using statistical analyses and experimental designs: (1) The occurrence of repetitions is intricately linked to the models and decoding strategies. (2) During training and decoding, adding perturbations to logits can reduce repetition. Motivated by these observations, we introduce AR-Trip (Anti Repetition for Trip Recommendation), which incorporates a cycle-aware predictor comprising three mechanisms to avoid duplicate Points-of-Interest (POIs) and demonstrates their effectiveness in alleviating repetition. Experiments on four public datasets illustrate that AR-Trip successfully mitigates repetition issues while enhancing precision.	在过去的十年里，旅行推荐已经成为一项非常受欢迎的服务。虽然目前的研究显着了解人类意图的一致性，他们挣扎与不希望的重复结果，需要解决。通过统计分析和实验设计，我们得到了两个关键的发现: (1)重复的发生与模型和解码策略有着密切的联系。(2)在训练和解码过程中，对 logit 加扰动可以减少重复。受这些观察的启发，我们引入了 AR-Trip (反重复行程推荐) ，其中包含一个周期感知预测器，由三个机制组成，以避免重复感兴趣点(POI) ，并证明其在减轻重复方面的有效性。在四个公共数据集上的实验表明，AR-Trip 在提高精度的同时成功地缓解了重复问题。	code	0
Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval	Yingrui Yang, Parker Carlson, Shanxiu He, Yifan Qiao, Tao Yang	University of California, Santa Barbara; University of California at Santa Barbara	Previous work has demonstrated the potential to combine document rankings from dense and sparse retrievers for higher relevance effectiveness. This paper proposes a cluster-based partial dense retrieval scheme guided by sparse retrieval results to optimize fusion between dense and sparse retrieval at a low space and CPU-time cost while retaining a competitive relevance. This scheme exploits the overlap of sparse retrieval results and document embedding clusters, and judiciously selects a limited number of clusters to probabilistically guarantee the inclusion of top sparse results. This paper provides an evaluation of this scheme on its in-domain and zero-shot retrieval performance for the MS MARCO and BEIR datasets.	以前的工作已经证明了将密集和稀疏检索器的文档排名结合起来以提高相关性效率的潜力。提出了一种基于聚类的部分密集检索方案，该方案以稀疏检索结果为指导，在保持竞争相关性的同时，以较低的空间和 CPU 时间成本优化密集和稀疏检索之间的融合。该方案利用了稀疏检索结果和文档嵌入聚类的重叠性，并且明智地选择了有限数量的聚类，从概率上保证了顶部稀疏结果的包含。本文对该方案在 MS MARCO 和 BEIR 数据集上的域内检索性能和零镜头检索性能进行了评价。	code	0
Contextualization with SPLADE for High Recall Retrieval	Eugene Yang	Human Language Technology Center of Excellence, Johns Hopkins University	High Recall Retrieval (HRR), such as eDiscovery and medical systematicreview, is a search problem that optimizes the cost of retrieving most relevantdocuments in a given collection. Iterative approaches, such as iterativerelevance feedback and uncertainty sampling, are shown to be effective undervarious operational scenarios. Despite neural models demonstrating success inother text-related tasks, linear models such as logistic regression, ingeneral, are still more effective and efficient in HRR since the model istrained and retrieves documents from the same fixed collection. In this work,we leverage SPLADE, an efficient retrieval model that transforms documents intocontextualized sparse vectors, for HRR. Our approach combines the best of bothworlds, leveraging both the contextualization from pretrained language modelsand the efficiency of linear models. It reduces 10in two HRR evaluation collections under a one-phase review workflow with atarget recall of 80available at https://github.com/eugene-yang/LSR-for-TAR.	高召回检索(HRR) ，如 eDiscovery 和 Medical Systematicreview，是一个优化检索给定集合中大多数相关文档的成本的搜索问题。迭代方法，如迭代相关性反馈和不确定性抽样，被证明是有效的各种操作场景。尽管神经模型在其他与文本相关的任务中取得了成功，但是线性模型，例如 Logit模型模型、通用模型，在 HRR 中仍然更加有效，因为该模型从相同的固定集合中检索文档。在这项工作中，我们利用 SPLADE，一个有效的检索模型，转换文档到上下文稀疏向量，为 HRR。我们的方法结合了两者的优点，既利用了预先训练的语言模型的上下文化，又利用了线性模型的效率。它通过一个单阶段的评审工作流程减少了十分之二的人力资源评估收集，目标召回 https://github.com/eugene-yang/lsr-for-tar 为80个。	code	0
Convex Feature Embedding for Face and Voice Association	Jiwoo Kang, Taewan Kim, YoungHo Park	Dongduk Women's University; Sookmyung Women's University	Face-and-voice association learning poses significant challenges in the field of deep learning. In this paper, we propose a straightforward yet effective approach for cross-modal feature embedding, specifically targeting the correlation between facial and voice association. Previous studies have examined cross-modal association tasks in order to establish the relationship between voice clips and facial images. Previous studies have examined the issue of cross-modal discrimination; however, they have not adequately recognized the importance of managing the heterogeneity in inter-modal features between audio and video. As a result, there is a significant prevalence of false positives and false negatives. To address the issue, the proposed method learns the embeddings of cross-modal features by introducing an additional feature that bridges the gap between these features. This facilitates the embedding of voice and face features belonging to the same individual within a convex hull. Through the utilization of cross-modal feature learning, cross-modal attention particularly reduces inter-class variance, resulting in a notable enhancement of the clustering power. We comprehensively evaluated our approach on cross-modal verification, matching, and retrieval tasks using the large-scale VoxCeleb dataset. Extensive experimental results demonstrate that the proposed method achieves notable improvements over existing state-of-the-art methods.	面语关联学习是深度学习领域的一个重要研究课题。本文提出了一种简单而有效的跨模态特征嵌入方法，特别针对人脸和语音之间的相关性。以往的研究已经检验了跨模式联想任务，以建立语音剪辑和面部图像之间的关系。以前的研究审查了多式联运歧视问题; 但是，它们没有充分认识到管理音频和视频之间多式联运特征异质性的重要性。因此，伪阳性的流行率相当高。为了解决这个问题，本文提出的方法通过引入一个附加的特征来学习交叉模态特征的嵌入，从而弥补这些特征之间的差距。这有利于在凸壳内嵌入属于同一个人的声音和面部特征。通过利用交叉模态特征学习，交叉模态注意特别地减少了类间方差，使聚类能力显著提高。我们使用大规模的 VoxCeleb 数据集对我们的跨模式验证、匹配和检索任务方法进行了全面的评估。大量的实验结果表明，与现有的最新方法相比，该方法取得了显著的改进。	code	0
Enhancing Criminal Case Matching through Diverse Legal Factors	Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang	Xidian University	Criminal case matching endeavors to determine the relevance between differentcriminal cases. Conventional methods predict the relevance solely based oninstance-level semantic features and neglect the diverse legal factors (LFs),which are associated with diverse court judgments. Consequently,comprehensively representing a criminal case remains a challenge for theseapproaches. Moreover, extracting and utilizing these LFs for criminal casematching face two challenges: (1) the manual annotations of LFs rely heavily onspecialized legal knowledge; (2) overlaps among LFs may potentially harm themodel's performance. In this paper, we propose a two-stage framework namedDiverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM). Firstly,DLF-CCM employs a multi-task learning framework to pre-train an LF extractionnetwork on a large-scale legal judgment prediction dataset. In stage two,DLF-CCM introduces an LF de-redundancy module to learn shared LF and exclusiveLFs. Moreover, an entropy-weighted fusion strategy is introduced to dynamicallyfuse the multiple relevance generated by all LFs. Experimental results validatethe effectiveness of DLF-CCM and show its significant improvements overcompetitive baselines. Code: https://github.com/jiezhao6/DLF-CCM.	刑事案件匹配试图确定不同刑事案件之间的相关性。传统的方法仅仅基于实例层次的语义特征来预测相关性，而忽略了与不同法院判决相关的多种法律因素。因此，综合表述一个刑事案件仍然是这些方法的挑战。此外，提取和利用这些逻辑框架进行刑事案件匹配面临两个挑战: (1)逻辑框架的人工注释严重依赖于专门的法律知识; (2)逻辑框架之间的重叠可能会损害模型的性能。在本文中，我们提出了一个两阶段的框架，即多元法律因素增强刑事案件匹配(DLF-CCM)。首先，DLF-CCM 采用多任务学习框架，在大规模法律判决预测数据集上预训练 LF 抽取网络。在第二阶段，DLF-CCM 引入了一个 LF 去冗余模块来学习共享 LF 和排他 LF。此外，引入熵权融合策略，动态融合所有 LFs 产生的多重相关性。实验结果验证了 DLF-CCM 算法的有效性，并显示了其对过度竞争基线的显著改善。密码: https://github.com/jiezhao6/dlf-ccm。	code	0
Faster Learned Sparse Retrieval with Block-Max Pruning	Antonio Mallia, Torsten Suel, Nicola Tonellotto	Pinecone; New York University; University of Pisa	Learned sparse retrieval systems aim to combine the effectiveness ofcontextualized language models with the scalability of conventional datastructures such as inverted indexes. Nevertheless, the indexes generated bythese systems exhibit significant deviations from the ones that use traditionalretrieval models, leading to a discrepancy in the performance of existing queryoptimizations that were specifically developed for traditional structures.These disparities arise from structural variations in query and documentstatistics, including sub-word tokenization, leading to longer queries, smallervocabularies, and different score distributions within posting lists. Thispaper introduces Block-Max Pruning (BMP), an innovative dynamic pruningstrategy tailored for indexes arising in learned sparse retrieval environments.BMP employs a block filtering mechanism to divide the document space intosmall, consecutive document ranges, which are then aggregated and sorted on thefly, and fully processed only as necessary, guided by a defined safe earlytermination criterion or based on approximate retrieval requirements. Throughrigorous experimentation, we show that BMP substantially outperforms existingdynamic pruning strategies, offering unparalleled efficiency in safe retrievalcontexts and improved tradeoffs between precision and efficiency in approximateretrieval tasks.	学习型稀疏检索系统旨在将上下文化语言模型的有效性与倒排索引等传统数据结构的可扩展性结合起来。然而，这些系统生成的索引与使用传统检索模型的索引存在显著差异，导致现有查询优化的性能差异，这些优化是专门为传统结构开发的。这些差异源于查询和文档统计(包括子词标记)中的结构变化，导致查询时间更长，词汇量更小，以及发布列表中的分数分布不同。本文介绍了块最大剪枝(Block-Max Pruning，BMP) ，一种适用于学习型稀疏检索环境中索引的创新动态剪枝策略。 BMP 采用块过滤机制，将文档空间划分为小的、连续的文档范围，然后对这些文档进行动态聚合和排序，只有在必要时才进行全面处理，采用定义的安全提前终止标准或基于近似检索要求。通过严格的实验，我们表明 BMP 大大优于现有的动态修剪策略，在安全检索上下文中提供了无与伦比的效率，并改善了近似检索任务中精度和效率之间的权衡。	code	0
Fine-Tuning LLaMA for Multi-Stage Text Retrieval	Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, Jimmy Lin	Microsoft Research Asia; University of Waterloo	While large language models (LLMs) have shown impressive NLP capabilities, existing IR applications mainly focus on prompting LLMs to generate query expansions or generating permutations for listwise reranking. In this study, we leverage LLMs directly to serve as components in the widely used multi-stage text ranking pipeline. Specifically, we fine-tune the open-source LLaMA-2 model as a dense retriever (repLLaMA) and a pointwise reranker (rankLLaMA). This is performed for both passage and document retrieval tasks using the MS MARCO training data. Our study shows that finetuned LLM retrieval models outperform smaller models. They are more effective and exhibit greater generalizability, requiring only a straightforward training strategy. Moreover, our pipeline allows for the fine-tuning of LLMs at each stage of a multi-stage retrieval pipeline. This demonstrates the strong potential for optimizing LLMs to enhance a variety of retrieval tasks. Furthermore, as LLMs are naturally pre-trained with longer contexts, they can directly represent longer documents. This eliminates the need for heuristic segmenting and pooling strategies to rank long documents. On the MS MARCO and BEIR datasets, our repLLaMA-rankLLaMA pipeline demonstrates a high level of effectiveness.	虽然大型语言模型(LLM)已经显示出令人印象深刻的 NLP 功能，但现有的 IR 应用程序主要侧重于提示 LLM 生成查询扩展或生成列表重新排序的排列。在这项研究中，我们利用 LLM 直接作为组件在广泛使用的多阶段文本排序流水线。具体来说，我们将开源 LLaMA-2模型微调为密集检索器(repLLaMA)和点态重分类器(rankLLaMA)。这是在使用微软 MARCO 训练数据的通道和文献检索任务中进行的。我们的研究表明，微调 LLM 检索模型优于较小的模型。它们更加有效，表现出更大的普遍性，只需要一个简单的训练策略。此外，我们的管道允许在多级检索管道的每个阶段对 LLM 进行微调。这表明优化 LLM 以增强各种检索任务的强大潜力。此外，由于 LLM 自然预先训练了较长的上下文，因此它们可以直接表示较长的文档。这消除了启发式分段和池策略对长文档进行排序的需要。在 MS MARCO 和 BEIR 数据集上，我们的 repLLaMA-rankLLaMA 管道显示了高水平的有效性。	code	0
Graph Diffusive Self-Supervised Learning for Social Recommendation	Jiuqiang Li, Hongjun Wang	Southwest Jiaotong University	Social recommendation aims at augmenting user-item interaction relationships and boosting recommendation quality by leveraging social information. Recently, self-supervised learning (SSL) has gained widespread adoption for social recommender. However, most existing methods exhibit poor robustness when faced with sparse user behavior data and are susceptible to inevitable social noise. To overcome the aforementioned limitations, we introduce a new Graph Diffusive Self-Supervised Learning (GDSSL) paradigm for social recommendation. Our approach involves the introduction of a guided social graph diffusion model that can adaptively mitigate the impact of social relation noise commonly found in real-world scenarios. This model progressively introduces random noise to the initial social graph and then iteratively restores it to recover the original structure. Additionally, to enhance robustness against noise and sparsity, we propose graph diffusive self-supervised learning, which utilizes the denoised social relation graph generated by our diffusion model for contrastive learning. The extensive experimental outcomes consistently indicate that our proposed GDSSL outmatches existing advanced solutions in social recommendation.	社会化推荐旨在通过利用社会化信息增强用户-项目的交互关系，提高推荐质量。近年来，自我监督学习(SSL)在社交推荐中得到了广泛的应用。然而，大多数现有的方法在面对稀疏的用户行为数据时表现出较差的鲁棒性，并且容易受到不可避免的社会噪声的影响。为了克服上述限制，我们引入了一个新的图扩散自我监督学习(GDSSL)范式用于社会推荐。我们的方法包括引入一个引导的社会图扩散模型，可以自适应地减轻在现实世界中常见的社会关系噪音的影响。该模型将随机噪声逐步引入到初始社会图中，然后迭代恢复它以恢复原始结构。此外，为了增强对噪声和稀疏性的鲁棒性，我们提出了图扩散自监督学习，它利用我们的扩散模型生成的去噪社会关系图进行对比学习。广泛的实验结果一致表明，我们提出的 GDSSL 在社会推荐方面优于现有的先进解决方案。	code	0
Improving In-Context Learning via Sequentially Selection and Preference Alignment for Few-Shot Aspect-Based Sentiment Analysis	Qianlong Wang, Keyang Ding, Xuan Luo, Ruifeng Xu	Harbin Institute of Technology, Shenzhen	In this paper, we leverage in-context learning (ICL) paradigm to handle few-shot aspect-based sentiment analysis (ABSA). Previous works first rank candidate examples by some metrics and then independently retrieve examples similar to test samples. However, their effectiveness may be discounted because of two limitations: in-context example redundancy and example preference misalignment between retriever and LLM. To alleviate them, we propose a novel framework that sequentially retrieves in-context examples. It not only considers which example is useful for the test sample but also prevents its information from being duplicated by already retrieved examples. Subsequently, we exploit the rewards of LLMs on retrieved in-context examples to optimize parameters for bridging preference gaps. Experiments on four ABSA datasets show that our framework is significantly superior to previous works.	本文中，我们利用上下文学习（ICL）范式来处理少样本基于方面的情感分析（ABSA）。先前的研究首先通过某些指标对候选示例进行排序，然后独立检索与测试样本相似的示例。然而，由于上下文示例冗余和检索器与语言模型（LLM）之间的示例偏好不一致，其有效性可能受到影响。为了缓解这些问题，我们提出了一种新颖的框架，该框架顺序检索上下文示例。它不仅考虑哪个示例对测试样本有用，还防止其信息被已检索的示例重复。随后，我们利用LLM在检索到的上下文示例上的奖励来优化参数，以弥合偏好差距。在四个ABSA数据集上的实验表明，我们的框架明显优于先前的研究。	code	0
Language Fairness in Multilingual Information Retrieval	Eugene Yang, Thomas Jänich, James Mayfield, Dawn J. Lawrie	HLTCOE; Johns Hopkins University; University of Glasgow; Human Language Technology Center of Excellence, Johns Hopkins University	Multilingual information retrieval (MLIR) considers the problem of rankingdocuments in several languages for a query expressed in a language that maydiffer from any of those languages. Recent work has observed that approachessuch as combining ranked lists representing a single document language each orusing multilingual pretrained language models demonstrate a preference for onelanguage over others. This results in systematic unfair treatment of documentsin different languages. This work proposes a language fairness metric toevaluate whether documents across different languages are fairly ranked throughstatistical equivalence testing using the Kruskal-Wallis test. In contrast tomost prior work in group fairness, we do not consider any language to be anunprotected group. Thus our proposed measure, PEER (Probability ofEqualExpected Rank), is the first fairness metric specifically designed tocapture the language fairness of MLIR systems. We demonstrate the behavior ofPEER on artificial ranked lists. We also evaluate real MLIR systems on twopublicly available benchmarks and show that the PEER scores align with prioranalytical findings on MLIR fairness. Our implementation is compatible withir-measures and is available at http://github.com/hltcoe/peer_measure.	多语言信息检索（MLIR）关注的是在查询语言可能不同于任何文档语言的情况下，对多种语言的文档进行排序的问题。最近的研究发现，一些方法，如将每种单一文档语言的排序列表组合起来，或使用多语言预训练语言模型，往往对某一种语言表现出偏好，从而导致对不同语言文档的系统性不公平对待。本文提出了一种语言公平性度量标准，通过使用Kruskal-Wallis检验的统计等效性测试，评估不同语言文档的排序是否公平。与大多数以往的群体公平性研究不同，我们不将任何语言视为不受保护的群体。因此，我们提出的度量标准PEER（Equal Expected Rank的概率）是首个专门设计用于捕捉MLIR系统语言公平性的公平性度量标准。我们在人工排序列表上展示了PEER的行为，并在两个公开可用的基准上评估了实际的MLIR系统，结果显示PEER得分与之前对MLIR公平性的分析发现相一致。我们的实现与ir-measures兼容，并可在http://github.com/hltcoe/peer_measure获取。	code	0
Large Language Models Based Stemming for Information Retrieval: Promises, Pitfalls and Failures	Shuai Wang, Shengyao Zhuang, Guido Zuccon	The University of Queensland; CSIRO	Text stemming is a natural language processing technique that is used to reduce words to their base form, also known as the root form. In Information Retrieval (IR), stemming is used in keyword-based matching pipelines to normalise text before indexing and query processing to improve subsequent matching between document and query keywords. The use of stemming has been shown to often improve the effectiveness of keyword-matching models such as BM25. However, traditional stemming methods, focusing solely on individual terms, overlook the richness of contextual information. Recognizing this gap, in this paper, we investigate the promising idea of using large language models (LLMs) to stem words by lever-aging its capability of context understanding. With this respect, we identify three avenues, each characterised by different trade-offs in terms of computational cost, effectiveness and robustness : (1) use LLMs to stem the vocabulary for a collection, i.e., the set of unique words that appear in the collection (vocabulary stemming), (2) use LLMs to stem each document separately (contextual stemming), and (3) use LLMs to extract from each document entities that should not be stemmed, then use vocabulary stemming to stem the rest of the terms (entity-based contextual stemming). Through a series of empirical experiments, we compare the use of LLMs for stemming with that of traditional lexical stemmers such as Porter and Krovetz for English text. We find that while vocabulary stemming and contextual stemming fail to achieve higher effectiveness than traditional stemmers, entity-based contextual stemming can achieve a higher effectiveness than using Porter stemmer alone, under specific conditions. Code and results are made available at https://github.com/ielab/SIGIR-2024-LLM-Stemming.	文本词干提取是一种自然语言处理技术，用于将单词缩减为其基本形式，即词根形式。在信息检索（IR）中，词干提取用于基于关键词的匹配流程中，通过在索引和查询处理之前规范化文本，以改进文档与查询关键词之间的后续匹配。研究表明，词干提取通常能提高如BM25等关键词匹配模型的有效性。然而，传统的词干提取方法仅关注单个词汇，忽略了上下文信息的丰富性。认识到这一差距，本文探讨了利用大型语言模型（LLMs）进行词干提取的有前景的想法，通过利用其理解上下文的能力。就此而言，我们确定了三条途径，每条途径在计算成本、有效性和鲁棒性方面都有不同的权衡：（1）使用LLMs对集合的词汇进行词干提取，即出现在集合中的唯一单词集合（词汇词干提取），（2）使用LLMs对每个文档单独进行词干提取（上下文词干提取），（3）使用LLMs从每个文档中提取不应进行词干提取的实体，然后使用词汇词干提取对其余词汇进行词干提取（基于实体的上下文词干提取）。通过一系列实证实验，我们将使用LLMs进行词干提取与传统的词汇词干提取器（如Porter和Krovetz）在英语文本中的应用进行了比较。我们发现，尽管词汇词干提取和上下文词干提取未能比传统词干提取器实现更高的有效性，但在特定条件下，基于实体的上下文词干提取能够比单独使用Porter词干提取器实现更高的有效性。代码和结果已在https://github.com/ielab/SIGIR-2024-LLM-Stemming公开。	code	0
MACA: Memory-aided Coarse-to-fine Alignment for Text-based Person Search	Liangxu Su, Rong Quan, Zhiyuan Qi, Jie Qin	Nanjing University of Aeronautics and Astronautics	Text-based person search (TBPS) aims to search for the target person in the full image through textual descriptions. The key to addressing this task is to effectively perform cross-modality alignment between text and images. In this paper, we propose a novel TBPS framework, named Memory-Aided Coarse-to-fine Alignment (MACA), to learn an accurate and reliable alignment between the two modalities. Firstly, we introduce a proposal-based alignment module, which performs contrastive learning to accurately align the textual modality with different pedestrian proposals at a coarse-grained level. Secondly, for the fine-grained alignment, we propose an attribute-based alignment module to mitigate unreliable features by aligning text-wise details with image-wise global features. Moreover, we introduce an intuitive memory bank strategy to supplement useful negative samples for more effective contrastive learning, improving the convergence and generalization ability of the model based on the learned discriminative features. Extensive experiments on CUHK-SYSU-TBPS and PRW-TBPS demonstrate the superiority of MACA over state-of-the-art approaches. The code is available at https://github.com/suliangxu/MACA.	基于文本的人物搜索（TBPS）旨在通过文本描述在完整图像中搜索目标人物。解决这一任务的关键在于有效实现文本与图像之间的跨模态对齐。本文提出了一种新颖的TBPS框架，名为“记忆辅助的由粗到细对齐”（MACA），以学习两种模态之间准确可靠的对齐。首先，我们引入了一个基于提议的对齐模块，通过对比学习在粗粒度层面准确对齐文本模态与不同的行人提议。其次，为了实现细粒度对齐，我们提出了一个基于属性的对齐模块，通过将文本细节与图像全局特征对齐来缓解不可靠特征。此外，我们引入了一种直观的记忆库策略，以补充有用的负样本，从而实现更有效的对比学习，基于学习到的判别特征提升模型的收敛性和泛化能力。在CUHK-SYSU-TBPS和PRW-TBPS数据集上的广泛实验证明了MACA优于现有最先进方法的优越性。代码可在https://github.com/suliangxu/MACA获取。	code	0
Negative as Positive: Enhancing Out-of-distribution Generalization for Graph Contrastive Learning	Zixu Wang, Bingbing Xu, Yige Yuan, Huawei Shen, Xueqi Cheng	Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences	Graph contrastive learning (GCL), standing as the dominant paradigm in therealm of graph pre-training, has yielded considerable progress. Nonetheless,its capacity for out-of-distribution (OOD) generalization has been relativelyunderexplored. In this work, we point out that the traditional optimization ofInfoNCE in GCL restricts the cross-domain pairs only to be negative samples,which inevitably enlarges the distribution gap between different domains. Thisviolates the requirement of domain invariance under OOD scenario andconsequently impairs the model's OOD generalization performance. To addressthis issue, we propose a novel strategy "Negative as Positive", where the mostsemantically similar cross-domain negative pairs are treated as positive duringGCL. Our experimental results, spanning a wide array of datasets, confirm thatthis method substantially improves the OOD generalization performance of GCL.	图对比学习（Graph Contrastive Learning, GCL）作为图预训练领域的主导范式，已经取得了显著的进展。然而，其在分布外（Out-of-Distribution, OOD）泛化能力方面的潜力尚未得到充分探索。本文指出，GCL中传统的InfoNCE优化方法限制了跨域对仅作为负样本，这不可避免地扩大了不同域之间的分布差距。这种做法违背了OOD场景下域不变性的要求，从而损害了模型的OOD泛化性能。为解决这一问题，我们提出了一种名为“负样本视为正样本”的新策略，在GCL过程中将语义上最相似的跨域负样本对视为正样本。我们的实验结果跨越多个数据集，证实了这种方法显著提升了GCL的OOD泛化性能。	code	0
On Backbones and Training Regimes for Dense Retrieval in African Languages	Akintunde Oladipo, Mofetoluwa Adeyemi, Jimmy Lin	University of Waterloo	The effectiveness of dense retrieval models trained with multilingual language models as backbones has been demonstrated in multilingual and cross-lingual information retrieval contexts. The optimal choice of a backbone model for a given retrieval task is dependent on the target retrieval domain as well as the pre-training domain of available language models and their generalization capabilities, the availability of relevance judgements, etc. In this work, we study the impact of these factors on retrieval effectiveness for African languages using three multilingual benchmark datasets: Mr. TyDi, MIRACL, and the newly released CIRAL dataset. We compare the effectiveness of mBERT as a backbone for dense retrieval models against multilingual language models such as AfriBERTa and AfroXLMR, which are specialized for African languages. Furthermore, we examine the impact of different training regimes on the effectiveness of dense retrieval in different domains for African languages. Our findings show that the pre-training domain of the backbone LM plays a huge role in retrieval effectiveness, especially in the absence of retrieval training data. Code artifacts are available at https://github.com/castorini/afridpr_backbones.	多语言语言模型作为骨干训练的密集检索模型的有效性已经在多语言和跨语言信息检索环境中得到了验证。对于给定的检索任务，最佳的骨干模型选择依赖于目标检索领域、可用语言模型的预训练领域及其泛化能力、相关性判断的可用性等因素。在本研究中，我们使用三个多语言基准数据集——Mr. TyDi、MIRACL和最新发布的CIRAL数据集，研究了这些因素对非洲语言检索效果的影响。我们比较了mBERT作为密集检索模型骨干的有效性与专门针对非洲语言的多语言语言模型如AfriBERTa和AfroXLMR的有效性。此外，我们还探讨了不同的训练机制对非洲语言在不同领域密集检索效果的影响。我们的研究结果表明，骨干语言模型的预训练领域在检索效果中起着重要作用，特别是在缺乏检索训练数据的情况下。代码资源可在https://github.com/castorini/afridpr_backbones获取。	code	0
Predicting Micro-video Popularity via Multi-modal Retrieval Augmentation	Ting Zhong, Jian Lang, Yifan Zhang, Zhangtao Cheng, Kunpeng Zhang, Fan Zhou	University of Electronic Science and Technology of China; University of Maryland, College Park	Accurately predicting the popularity of micro-videos is crucial for real-world applications such as recommender systems and identifying viral marketing opportunities. Existing methods often focus on limited cross-modal information within individual micro-videos, overlooking the potential advantages of exploiting vast repository of past videos. We present MMRA, a multi-modal retrieval-augmented popularity prediction model that enhances prediction accuracy using relevant retrieved information. MMRA first retrieves relevant instances from a multi-modal memory bank, aligning video and text through transformation mechanisms involving a vision model and a text-based retriever. Additionally, a multi-modal interaction network is carefully designed to jointly capture cross-modal correlations within the target video and extract informative knowledge through retrieved instances, ultimately enhancing the prediction. Extensive experiments conducted on the real-world micro-video dataset demonstrate the superiority of MMRA when compared to state-of-the-art models. The code and data are available at https://github.com/ICDM-UESTC/MMRA.	准确预测微视频的受欢迎程度对于推荐系统及识别病毒式营销机会等实际应用至关重要。现有方法通常仅关注单个微视频内的有限跨模态信息，忽视了利用大量过往视频库的潜在优势。我们提出了MMRA，一种多模态检索增强的流行度预测模型，通过使用相关检索信息来提高预测准确性。MMRA首先从多模态记忆库中检索相关实例，通过视觉模型和基于文本的检索器参与的转换机制来对齐视频和文本。此外，精心设计了一个多模态交互网络，以共同捕捉目标视频内的跨模态关联，并通过检索到的实例提取信息性知识，从而最终提升预测效果。在真实世界微视频数据集上进行的广泛实验表明，相较于最先进的模型，MMRA具有优越性。代码和数据可在https://github.com/ICDM-UESTC/MMRA获取。	code	0
Searching for Physical Documents in Archival Repositories	Tokinori Suzuki, Douglas W. Oard, Emi Ishita, Yoichi Tomiura	University of Maryland; Kyushu University	Early retrieval systems were used to search physical media (e.g., paper) using manually created metadata. Modern ranked retrieval techniques are far more capable, but they require that content be either born digital or digitized. For physical content, searching metadata remains the state of the art. This paper seeks to change that, using a textual-edge graph neural network to learn relations between items from available metadata and from any content that has been digitized. Results show that substantial improvement over the best prior method can be achieved.	早期的检索系统用于通过手动创建的元数据搜索物理媒介（如纸质文档）。现代的排序检索技术则更为强大，但它们要求内容要么是数字化原生的，要么已经被数字化。对于物理内容，搜索元数据仍然是当前的技术水平。本文旨在改变这一现状，通过使用文本边缘图神经网络来学习从可用元数据和已数字化内容中提取的项目之间的关系。结果表明，与之前最好的方法相比，可以实现显著的改进。	code	0
Self-Explainable Next POI Recommendation	Kai Yang, Yi Yang, Qiang Gao, Ting Zhong, Yong Wang, Fan Zhou	University of Electronic Science and Technology of China; Hong Kong University of Science and Technology; Southwestern University of Finance and Economics	Point-of-Interest (POI) recommendation involves predicting users' next preferred POI and is becoming increasingly significant in location-based social networks. However, users are often reluctant to trust recommended results due to the lack of transparency in these systems. While recent work on explaining recommender systems has gained attention, prevailing methods only provide post-hoc explanations based on results or rudimentary explanations according to attention scores. Such limitations hinder reliability and applicability in risk-sensitive scenarios. Inspired by the information theory, we propose a self-explainable framework with an ante-hoc view called ExNext for next POI recommendation aimed at overcoming these limitations. Specifically, we endow self-explainability to POI recommender systems through compact representation learning using a variational information bottleneck approach. The learned representation further improves accuracy by reducing redundancy behind massive spatial-temporal trajectories, which, in turn, boosts the recommendation performance. Experiments on three real-world datasets show significant improvements in both model explainability and recommendation performance.	兴趣点（POI）推荐涉及预测用户下一个偏好的POI，在基于位置的社交网络中变得越来越重要。然而，由于这些系统缺乏透明性，用户通常不愿意信任推荐结果。尽管最近关于解释推荐系统的工作引起了关注，但现有方法仅提供基于结果的事后解释或根据注意力分数的基本解释。这些局限性阻碍了在风险敏感场景中的可靠性和适用性。受信息论启发，我们提出了一种名为ExNext的自我解释框架，旨在克服这些局限性，从先验视角出发进行下一个POI推荐。具体而言，我们通过使用变分信息瓶颈方法进行紧凑表示学习，赋予POI推荐系统自我解释能力。所学到的表示通过减少大量时空轨迹背后的冗余，进一步提高了准确性，从而提升了推荐性能。在三个真实世界数据集上的实验表明，模型解释性和推荐性能均显著提升。	code	0
Synthetic Test Collections for Retrieval Evaluation	Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos	Snowflake; Microsoft; University College London	Test collections play a vital role in evaluation of information retrieval(IR) systems. Obtaining a diverse set of user queries for test collectionconstruction can be challenging, and acquiring relevance judgments, whichindicate the appropriateness of retrieved documents to a query, is often costlyand resource-intensive. Generating synthetic datasets using Large LanguageModels (LLMs) has recently gained significant attention in variousapplications. In IR, while previous work exploited the capabilities of LLMs togenerate synthetic queries or documents to augment training data and improvethe performance of ranking models, using LLMs for constructing synthetic testcollections is relatively unexplored. Previous studies demonstrate that LLMshave the potential to generate synthetic relevance judgments for use in theevaluation of IR systems. In this paper, we comprehensively investigate whetherit is possible to use LLMs to construct fully synthetic test collections bygenerating not only synthetic judgments but also synthetic queries. Inparticular, we analyse whether it is possible to construct reliable synthetictest collections and the potential risks of bias such test collections mayexhibit towards LLM-based models. Our experiments indicate that using LLMs itis possible to construct synthetic test collections that can reliably be usedfor retrieval evaluation.	测试集合在评估信息检索（IR）系统中扮演着至关重要的角色。为构建测试集合获取多样化用户查询可能颇具挑战性，而获取相关性判断，即指示检索文档与查询的适当性，通常成本高昂且资源密集。近年来，利用大型语言模型（LLMs）生成合成数据集在多个应用领域引起了显著关注。在信息检索领域，尽管先前的工作利用了LLMs的能力来生成合成查询或文档以增强训练数据并提升排序模型的性能，但使用LLMs构建合成测试集合的研究相对较少。以往的研究表明，LLMs具有生成用于评估IR系统合成相关性判断的潜力。本文全面探讨了是否可能利用LLMs构建完全合成的测试集合，不仅生成合成判断，还包括合成查询。特别地，我们分析了构建可靠的合成测试集合的可能性及其可能对基于LLM的模型展示出的偏见风险。我们的实验结果表明，使用LLMs可以构建能够可靠用于检索评估的合成测试集合。	code	0
SPLATE: Sparse Late Interaction Retrieval	Thibault Formal, Stéphane Clinchant, Hervé Déjean, Carlos Lassance	Cohere; Naver Labs Europe	The late interaction paradigm introduced with ColBERT stands out in theneural Information Retrieval space, offering a compellingeffectiveness-efficiency trade-off across many benchmarks. Efficient lateinteraction retrieval is based on an optimized multi-step strategy, where anapproximate search first identifies a set of candidate documents to re-rankexactly. In this work, we introduce SPLATE, a simple and lightweight adaptationof the ColBERTv2 model which learns an “MLM adapter”, mapping its frozentoken embeddings to a sparse vocabulary space with a partially learned SPLADEmodule. This allows us to perform the candidate generation step in lateinteraction pipelines with traditional sparse retrieval techniques, making itparticularly appealing for running ColBERT in CPU environments. Our SPLATEColBERTv2 pipeline achieves the same effectiveness as the PLAID ColBERTv2engine by re-ranking 50 documents that can be retrieved under 10ms.	ColBERT引入的晚期交互范式在神经信息检索领域中脱颖而出，在多个基准测试中提供了引人注目的效果与效率权衡。高效的晚期交互检索基于一种优化的多步策略，首先通过近似搜索识别一组候选文档，然后进行精确重排序。在这项工作中，我们提出了SPLATE，这是对ColBERTv2模型的一个简单而轻量级的适应，它学习一个“MLM适配器”，将冻结的令牌嵌入映射到一个由部分学习的SPLADE模块构成的稀疏词汇空间。这使得我们能够在晚期交互管道中使用传统的稀疏检索技术执行候选生成步骤，特别适合在CPU环境中运行ColBERT。我们的SPLATE ColBERTv2管道通过重排序50个可以在10毫秒内检索到的文档，实现了与PLAID ColBERTv2引擎相同的效果。	code	0
The Surprising Effectiveness of Rankers trained on Expanded Queries	Abhijit Anand, Venktesh V, Vinay Setty, Avishek Anand	TU Delft; L3S Research Institute; University of Stavanger	An significant challenge in text-ranking systems is handling hard queries that form the tail end of the query distribution. Difficulty may arise due to the presence of uncommon, underspecified, or incomplete queries. In this work, we improve the ranking performance of hard or difficult queries while maintaining the performance of other queries. Firstly, we do LLM-based query enrichment for training queries using relevant documents. Next, a specialized ranker is fine-tuned only on the enriched hard queries instead of the original queries. We combine the relevance scores from the specialized ranker and the base ranker, along with a query performance score estimated for each query. Our approach departs from existing methods that usually employ a single ranker for all queries, which is biased towards easy queries, which form the majority of the query distribution. In our extensive experiments on the DL-Hard dataset, we find that a principled query performance based scoring method using base and specialized ranker offers a significant improvement of up to 48.4% on the document ranking task and up to 25% on the passage ranking task compared to the baseline performance of using original queries, even outperforming SOTA model.	文本排序系统中的一个重要挑战是如何处理构成查询分布尾部的困难查询。困难的出现可能是由于存在不常见、未明确指定或不完整的查询。在这项工作中，我们提升了困难查询的排序性能，同时保持了其他查询的性能。首先，我们使用相关文档对训练查询进行基于大语言模型（LLM）的查询丰富。接着，我们仅在丰富后的困难查询上微调了一个专门的排序器，而非原始查询。我们将来自专门排序器和基础排序器的相关性分数结合起来，并结合每个查询的查询性能分数进行评估。我们的方法不同于现有通常为所有查询使用单一排序器的方法，这些方法偏向于占查询分布大多数的简单查询。在我们对DL-Hard数据集的广泛实验中，我们发现，基于查询性能的评分方法结合基础和专门排序器，在文档排序任务上实现了高达48.4%的显著改进，在段落排序任务上实现了高达25%的改进，相比于使用原始查询的基线性能，甚至超过了当前最先进的模型。	code	0
Turbo-CF: Matrix Decomposition-Free Graph Filtering for Fast Recommendation	JinDuk Park, YongMin Shin, WonYong Shin	Yonsei University	A series of graph filtering (GF)-based collaborative filtering (CF) showcasesstate-of-the-art performance on the recommendation accuracy by using a low-passfilter (LPF) without a training process. However, conventional GF-based CFapproaches mostly perform matrix decomposition on the item-item similaritygraph to realize the ideal LPF, which results in a non-trivial computationalcost and thus makes them less practical in scenarios where rapidrecommendations are essential. In this paper, we propose Turbo-CF, a GF-basedCF method that is both training-free and matrix decomposition-free. Turbo-CFemploys a polynomial graph filter to circumvent the issue of expensive matrixdecompositions, enabling us to make full use of modern computer hardwarecomponents (i.e., GPU). Specifically, Turbo-CF first constructs an item-itemsimilarity graph whose edge weights are effectively regulated. Then, our ownpolynomial LPFs are designed to retain only low-frequency signals withoutexplicit matrix decompositions. We demonstrate that Turbo-CF is extremely fastyet accurate, achieving a runtime of less than 1 second on real-world benchmarkdatasets while achieving recommendation accuracies comparable to bestcompetitors.	一系列基于图滤波（Graph Filtering, GF）的协同过滤（Collaborative Filtering, CF）方法通过使用无训练过程的低通滤波器（Low-Pass Filter, LPF）展示了最先进的推荐准确性。然而，传统的基于GF的CF方法大多通过对物品-物品相似度图进行矩阵分解来实现理想的LPF，这导致计算成本高昂，从而使得这些方法在需要快速推荐的情况下不太实用。本文提出了一种名为Turbo-CF的基于GF的CF方法，该方法既无需训练也无需矩阵分解。Turbo-CF采用多项式图滤波器来绕过高成本的矩阵分解问题，使我们能够充分利用现代计算机硬件组件（如GPU）。具体而言，Turbo-CF首先构建了一个边权重得到有效调节的物品-物品相似度图，然后设计了自有的多项式LPF来仅保留低频信号，而无需显式的矩阵分解。我们证明，Turbo-CF在速度极快的同时仍保持高准确性，在真实世界基准数据集上的运行时间不到1秒，同时达到了与最佳竞争对手相当的推荐准确性。	code	0
Unifying Graph Retrieval and Prompt Tuning for Graph-Grounded Text Classification	Le Dai, Yu Yin, Enhong Chen, Hui Xiong	University of Science and Technology of China Department of Computer Science and Technology; The Hong Kong University of Science and Technology (Guangzhou); University of Science and Technology of China; University of Science and Technology of China School of Data Science	Text classification has long time been researched as a fundamental problem in information retrieval. Since text data are frequently connected with graph structures, it poses new possibilities for a more accurate and explainable classification. One common approach of this graph-text integration is to consider text as graph attributes and utilize GNNs to conduct a node classification task. While both text and graph data are modeled, GNNs treat text in a rather coarse-grained way, have limitations in preserving the detailed structures of a graph, and are less robust to graph sparsity. In this paper, we propose to take an alternative perspective instead, viewing graph as the context of texts, as enlightened by retrieval augmented generation. We propose a novel framework called Graph Retrieval Prompt Tuning (GRPT), consisting of a Graph Retrieval Module and a Prompt Tuning Module integrated with graph context. For graph retrieval, two retrieval strategies are designed to retrieve node context and path context, preserving both node proximity and detailed connectivity patterns. Extensive experiments on four real-world datasets show the effectiveness of our framework in both standard supervised and sparse settings.	长期以来，文本分类一直被视为信息检索中的一个基础问题。由于文本数据经常与图结构相关联，这为更准确和可解释的分类带来了新的可能性。一种常见的图-文整合方法是将文本视为图的属性，并利用图神经网络（GNN）进行节点分类任务。然而，尽管文本和图数据都被建模，GNN在处理文本时较为粗粒度，难以保留图的详细结构，并且在图的稀疏性方面表现不够稳健。本文中，我们提出了一种替代视角，即将图视为文本的上下文，这一灵感来源于检索增强生成。我们提出了一种名为图检索提示调优（Graph Retrieval Prompt Tuning, GRPT）的新框架，该框架包括一个图检索模块和一个与图上下文集成的提示调优模块。对于图检索，我们设计了两种检索策略来检索节点上下文和路径上下文，既保留了节点接近性，又保留了详细的连接模式。在四个真实世界数据集上的广泛实验表明，我们的框架在标准监督和稀疏设置下均表现出色。	code	0
Weighted KL-Divergence for Document Ranking Model Refinement	Yingrui Yang, Yifan Qiao, Shanxiu He, Tao Yang		Transformer-based retrieval and reranking models for text document search are often refined through knowledge distillation together with contrastive learning. A tight distribution matching between the teacher and student models can be hard as over-calibration may degrade training effectiveness when a teacher does not perform well. This paper contrastively reweights KL divergence terms to prioritize the alignment between a student and a teacher model for proper separation of positive and negative documents. This paper analyzes and evaluates the proposed loss function on the MS MARCO and BEIR datasets to demonstrate its effectiveness in improving the relevance of tested student models.	基于Transformer的文本文档检索与重排序模型通常通过知识蒸馏与对比学习进行优化。当教师模型表现不佳时，过度校准可能导致训练效果下降，从而使得教师模型与学生模型之间的紧密分布匹配变得困难。本文通过对比重加权KL散度项，优先考虑学生模型与教师模型之间的对齐，以实现正负文档的适当分离。本文在MS MARCO和BEIR数据集上分析并评估了所提出的损失函数，证明了其在提高测试学生模型相关性方面的有效性。	code	0
Using Large Language Models for Math Information Retrieval	Behrooz Mansouri, Reihaneh Maarefdoust	University of Southern Maine	Large language models, such as Orca-2, have demonstrated notable problem-solving abilities in mathematics. However, their potential to enhance math information retrieval remains largely unexplored. This paper investigates the use of two large language models, LLaMA-2 and Orca-2 for three tasks in math information retrieval. First, the study explores the use of these models for relevance assessment, evaluating the relevance of answers to math questions. Then, the application of these models for math data augmentation is studied. Using the existing math information retrieval test collection, ARQMath, answers of different relevance degrees are generated for each topic. These answers are then used for fine-tuning a cross-encoder re-ranker and are compared against fine-tuning with answers that are manually labeled. Finally, the use of these models for ranking candidate answers to math questions is explored. The experimental results indicate that, while these models may not be effective for relevance assessment and ranking tasks, Orca-2 can be a valuable resource for math data augmentation.	大型语言模型，如Orca-2，在数学问题的解决能力上已显示出显著的效果。然而，它们在提升数学信息检索方面的潜力仍未得到充分探索。本文研究了使用两个大型语言模型——LLaMA-2和Orca-2——在数学信息检索中的三个任务。首先，研究探讨了这些模型用于相关性评估的情况，评估了回答数学问题的答案的相关性。接着，研究了这些模型在数学数据增强中的应用。利用现有的数学信息检索测试集合ARQMath，为每个主题生成了不同相关程度的答案。然后，这些答案被用于微调一个交叉编码器重新排序器，并与使用手动标记的答案进行微调的结果进行比较。最后，探讨了这些模型在为数学问题排序候选答案中的应用。实验结果表明，尽管这些模型在相关性评估和排序任务中可能效果不佳，但Orca-2在数学数据增强方面可以成为一个宝贵的资源。	code	0
A Question-Answering Assistant over Personal Knowledge Graph	Lingyuan Liu, Huifang Du, Xiaolian Zhang, Mengying Guo, Haofen Wang, Meng Wang	Southeast University Southeast University-Monash University Joint Graduate School; Huawei Technologies Co. Ltd.; Tongji University	We develop a Personal Knowledge Graph Question-Answering (PKGQA) assistant, seamlessly integrating information from multiple mobile applications into a unified and user-friendly query interface to offer users convenient information retrieval and personalized knowledge services. Based on a fine-grained schema customized for PKG, the PKGQA system in this paper comprises Symbolic Semantic Parsing, Frequently Asked Question (FAQ) Semantic Matching, and Neural Semantic Parsing modules, which are designed to take into account both accuracy and efficiency. The PKGQA system achieves high accuracy on the constructed dataset and demonstrates good performance in answering complex questions. Our system is implemented through an Android application, which is shown in https://youtu.be/p732U5KPEq4.	我们开发了一个个人知识图谱问答（PKGQA）助手，该助手能够无缝整合来自多个移动应用程序的信息，并将其集成到一个统一且用户友好的查询界面中，为用户提供便捷的信息检索和个性化知识服务。基于为PKG定制的细粒度模式，本文中的PKGQA系统包括符号语义解析、常见问题（FAQ）语义匹配和神经语义解析模块，这些模块设计时兼顾了准确性和效率。PKGQA系统在构建的数据集上实现了高准确性，并在回答复杂问题方面表现出色。我们的系统通过一个Android应用程序实现，展示视频可在https://youtu.be/p732U5KPEq4查看。	code	0
ConvLogRecaller: Real-Time Conversational Lifelog Recaller	YuanChi Lee, AnZi Yen, HenHsen Huang, HsinHsi Chen	Institute of Information Science, Academia Sinica; National Taiwan University; National Yang Ming Chiao Tung University	The popularization of networks fosters the convenience of communication. People can easily share their life experiences and thoughts with relatives and friends via instant messaging software. As time passes, individuals may forget certain details of life events, leading to difficulties in effectively communicating with others. The propensity of individuals to forget or mix up life events highlights the importance of services aimed at retrieving information about past experiences. This paper presents a conversational information recall system, ConvLogRecaller, which proactively supports real-time memory recall assistance during online conversations. Given a conversation of the user with others, ConvLogRecaller suggests a message if the user forgets the details of the life experiences. The services provided by our system can avoid hesitations or memory lapses that might hinder the efficiency of a conversation.	网络的普及促进了沟通的便利性。人们可以通过即时通讯软件轻松地与亲友分享生活经历和思想。然而，随着时间的推移，个人可能会忘记生活事件的某些细节，从而导致与他人有效沟通的困难。人们倾向于忘记或混淆生活事件的倾向突显了检索过去经历信息服务的重要性。本文介绍了一个对话式信息回忆系统ConvLogRecaller，该系统在在线对话中主动支持实时记忆回忆辅助。在用户与其他人进行对话时，如果用户忘记了生活经历的细节，ConvLogRecaller会提供一条建议消息。本系统提供的服务可以避免因犹豫或记忆缺失而可能阻碍对话效率的情况。	code	0
CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval	Christian Lülf, Denis Mayr Lima Martins, Marcos Antonio Vaz Salles, Yongluan Zhou, Fabian Gieseke	University of Copenhagen; Independent Researcher; University of Münster	The advent of text-image models, most notably CLIP, has significantlytransformed the landscape of information retrieval. These models enable thefusion of various modalities, such as text and images. One significant outcomeof CLIP is its capability to allow users to search for images using text as aquery, as well as vice versa. This is achieved via a joint embedding of imagesand text data that can, for instance, be used to search for similar items.Despite efficient query processing techniques such as approximate nearestneighbor search, the results may lack precision and completeness. We introduceCLIP-Branches, a novel text-image search engine built upon the CLIParchitecture. Our approach enhances traditional text-image search engines byincorporating an interactive fine-tuning phase, which allows the user tofurther concretize the search query by iteratively defining positive andnegative examples. Our framework involves training a classification model giventhe additional user feedback and essentially outputs all positively classifiedinstances of the entire data catalog. By building upon recent techniques, thisinference phase, however, is not implemented by scanning the entire datacatalog, but by employing efficient index structures pre-built for the data.Our results show that the fine-tuned results can improve the initial searchoutputs in terms of relevance and accuracy while maintaining swift responsetimes	文本-图像模型的出现，尤其是CLIP，极大地改变了信息检索的格局。这些模型能够融合多种模态，如文本和图像。CLIP的一个重要成果是它允许用户使用文本作为查询来搜索图像，反之亦然。这是通过图像和文本数据的联合嵌入实现的，例如，可以用于搜索相似的项目。尽管有近似最近邻搜索等高效的查询处理技术，但结果可能缺乏精确性和完整性。我们引入了CLIP-Branches，这是一种基于CLIP架构的新型文本-图像搜索引擎。我们的方法通过引入交互式微调阶段来增强传统的文本-图像搜索引擎，该阶段允许用户通过迭代定义正例和负例来进一步具体化搜索查询。我们的框架涉及在给定额外用户反馈的情况下训练分类模型，并基本上输出整个数据目录中所有正分类的实例。通过借鉴最近的技术，这一推理阶段并非通过扫描整个数据目录来实现，而是通过为数据预建的高效索引结构来实现。我们的结果表明，微调后的结果可以在保持快速响应时间的同时，提高初始搜索输出的相关性和准确性。	code	0
Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation	Zhongliang Zhou, Jielu Zhang, Zihan Guan, Mengxuan Hu, Ni Lao, Lan Mu, Sheng Li, Gengchen Mai	Google Inc; University of Virginia; University of Georgia	Geolocating precise locations from images presents a challenging problem in computer vision and information retrieval. Traditional methods typically employ either classification-dividing the Earth's surface into grid cells and classifying images accordingly, or retrieval-identifying locations by matching images with a database of image-location pairs. However, classification-based approaches are limited by the cell size and cannot yield precise predictions, while retrieval-based systems usually suffer from poor search quality and inadequate coverage of the global landscape at varied scale and aggregation levels. To overcome these drawbacks, we present Img2Loc, a novel system that redefines image geolocalization as a text generation task. This is achieved using cutting-edge large multi-modality models (LMMs) like GPT-4V or LLaVA with retrieval augmented generation. Img2Loc first employs CLIP-based representations to generate an image-based coordinate query database. It then uniquely combines query results with images itself, forming elaborate prompts customized for LMMs. When tested on benchmark datasets such as Im2GPS3k and YFCC4k, Img2Loc not only surpasses the performance of previous state-of-the-art models but does so without any model training. A video demonstration of the system can be accessed via this link https://drive.google.com/file/d/16A6Amc7AyUoKHRH3_WBRToRC13sn7tU/view?usp=sharing	从图像中定位精确位置在计算机视觉和信息检索领域是一个具有挑战性的问题。传统方法通常采用分类法——将地球表面划分为网格单元并对图像进行相应分类，或检索法——通过将图像与图像-位置对数据库匹配来识别位置。然而，基于分类的方法受限于单元格大小，无法产生精确预测，而基于检索的系统通常搜索质量较差，无法在全球范围内以不同尺度和聚合级别充分覆盖地理景观。为了克服这些缺点，我们提出了Img2Loc，这是一个将图像地理定位重新定义为文本生成任务的新系统。该系统利用GPT-4V或LLaVA等前沿的大型多模态模型（LMMs），结合增强的生成检索技术实现这一目标。Img2Loc首先采用基于CLIP的表示方法生成基于图像的坐标查询数据库。然后，它独特地将查询结果与图像本身相结合，形成专为LMMs定制的复杂提示。在Im2GPS3k和YFCC4k等基准数据集上的测试表明，Img2Loc不仅超越了之前最先进模型的性能，而且无需任何模型训练即可实现这一效果。系统的演示视频可通过以下链接访问：https://drive.google.com/file/d/16A6Amc7AyUoKHRH3_WBRToRC13sn7tU/view?usp=sharing。	code	0
JPEC: A Novel Graph Neural Network for Competitor Retrieval in Financial Knowledge Graphs	Wanying Ding, Manoj Cherukumalli, Santosh Chikoti, Vinay K. Chaudhri	JPMorgan Chase & Co; JPMorgan Chase & Co.	Knowledge graphs have gained popularity for their ability to organize and analyze complex data effectively. When combined with graph embedding techniques, such as graph neural networks (GNNs), knowledge graphs become a potent tool in providing valuable insights. This study explores the application of graph embedding in identifying competitors from a financial knowledge graph. Existing state-of-the-art(SOTA) models face challenges due to the unique attributes of our knowledge graph, including directed and undirected relationships, attributed nodes, and minimal annotated competitor connections. To address these challenges, we propose a novel graph embedding model, JPEC(JPMorgan Proximity Embedding for Competitor Detection), which utilizes graph neural network to learn from both first-order and second-order node proximity together with vital features for competitor retrieval. JPEC had outperformed most existing models in extensive experiments, showcasing its effectiveness in competitor retrieval.	知识图谱因其有效组织和分析复杂数据的能力而受到广泛关注。当与图嵌入技术（如图神经网络（GNN））结合时，知识图谱成为提供宝贵见解的强大工具。本研究探讨了图嵌入在从金融知识图谱中识别竞争对手的应用。现有的最先进（SOTA）模型面临挑战，因为我们的知识图谱具有独特的属性，包括有向和无向关系、属性节点以及极少量的标注竞争对手连接。为应对这些挑战，我们提出了一种新颖的图嵌入模型——JPEC（摩根大通竞争对手检测的邻近嵌入），该模型利用图神经网络从一阶和二阶节点邻近性以及竞争对手检索的关键特征中学习。在广泛的实验中，JPEC的表现优于大多数现有模型，展示了其在竞争对手检索中的有效性。	code	0
MACRec: A Multi-Agent Collaboration Framework for Recommendation	Zhefan Wang, Yuanqing Yu, Wendi Zheng, Weizhi Ma, Min Zhang	Tsinghua University Department of Computer Science and Technology; Tsinghua University Institute for AI Industry Research	LLM-based agents have gained considerable attention for their decision-making skills and ability to handle complex tasks. Recognizing the current gap in leveraging agent capabilities for multi-agent collaboration in recommendation systems, we introduce MACRec, a novel framework designed to enhance recommendation systems through multi-agent collaboration. Unlike existing work on using agents for user/item simulation, we aim to deploy multi-agents to tackle recommendation tasks directly. In our framework, recommendation tasks are addressed through the collaborative efforts of various specialized agents, including Manager, User/Item Analyst, Reflector, Searcher, and Task Interpreter, with different working flows. Furthermore, we provide application examples of how developers can easily use MACRec on various recommendation tasks, including rating prediction, sequential recommendation, conversational recommendation, and explanation generation of recommendation results. The framework and demonstration video are publicly available at https://github.com/wzf2000/MACRec.	基于大型语言模型（LLM）的代理因其决策能力和处理复杂任务的能力而受到广泛关注。认识到目前利用代理能力进行推荐系统中多代理协作的不足，我们提出了MACRec，这是一个通过多代理协作来增强推荐系统的新框架。与现有的使用代理进行用户/项目模拟的工作不同，我们的目标是将多代理直接部署来解决推荐任务。在我们的框架中，推荐任务通过各种专门代理的协作努力来解决，这些代理包括管理者、用户/项目分析师、反思者、搜索者和任务解释者，它们具有不同的工作流程。此外，我们提供了开发人员如何轻松地在各种推荐任务中使用MACRec的应用示例，包括评分预测、序列推荐、对话推荐和推荐结果解释生成。该框架和演示视频已在https://github.com/wzf2000/MACRec公开发布。	code	0
ModelGalaxy: A Versatile Model Retrieval Platform	Wenling Zhang, Yixiao Li, Zhaotian Li, Hailong Sun, Xiang Gao, Xudong Liu	buaa	With the growing number of available machine learning models and the emergence of model-sharing platforms, model reuse has become a significant approach to harnessing the power of artificial intelligence. One of the key issues to realizing model reuse resides in efficiently and accurately finding the target models that meet user needs from a model repository. However, the existing popular model-sharing platforms (e.g., Hugging Face) mainly support model retrieval based on model name matching and task filtering. If not familiar with the platform or specific models, users may suffer from low retrieval efficiency and a less user-friendly interaction experience. To address these issues, we have developed ModelGalaxy, a versatile model retrieval platform supporting multiple model retrieval methods, including keyword-based search, dataset-based search, and user-task-centric search. Moreover, ModelGalaxy leverages the power of large language models to provide users with easily retrieving and using models. Our source code is available at https://github.com/zwl906711886/ModelGalaxy.	随着可用机器学习模型数量的增加以及模型共享平台的兴起，模型复用已成为利用人工智能力量的重要途径之一。实现模型复用的关键问题之一在于如何从模型库中高效且准确地找到满足用户需求的模型。然而，现有的主流模型共享平台（如Hugging Face）主要支持基于模型名称匹配和任务过滤的模型检索方式。对于不熟悉平台或特定模型的用户来说，可能会面临检索效率低下和交互体验不佳的问题。为了解决这些问题，我们开发了ModelGalaxy，这是一个支持多种模型检索方法的多功能模型检索平台，包括基于关键词的搜索、基于数据集的搜索以及以用户任务为中心的搜索。此外，ModelGalaxy还利用大型语言模型的能力，为用户提供便捷的模型检索和使用体验。我们的源代码可在https://github.com/zwl906711886/ModelGalaxy获取。	code	0
RAG-Ex: A Generic Framework for Explaining Retrieval Augmented Generation	Viju Sudhi, Sinchana Ramakanth Bhat, Max Rudat, Roman Teucher	Fraunhofer IAIS	Owing to their size and complexity, large language models (LLMs) hardly explain why they generate a response. This effectively reduces the trust and confidence of end users in LLM-based applications, including Retrieval Augmented Generation (RAG) for Question Answering (QA) tasks. In this work, we introduce RAG-Ex, a model- and language-agnostic explanation framework that presents approximate explanations to the users revealing why the LLMs possibly generated a piece of text as a response, given the user input. Our framework is compatible with both open-source and proprietary LLMs. We report the significance scores of the approximated explanations from our generic explainer in both English and German QA tasks and also study their correlation with the downstream performance of LLMs. In the extensive user studies, our explainer yields an F1-score of 76.9% against the end user annotations and attains almost on-par performance with model-intrinsic approaches.	由于其规模和复杂性，大型语言模型（LLMs）几乎不解释它们为何生成某个响应。这实际上降低了终端用户对基于LLM的应用程序（包括用于问答任务的检索增强生成（RAG））的信任和信心。在这项工作中，我们介绍了RAG-Ex，这是一个模型和语言无关的解释框架，它向用户展示近似的解释，揭示LLMs在给定用户输入的情况下，为何可能生成一段文本作为响应。我们的框架兼容开源和专有的LLMs。我们在英语和德语问答任务中报告了我们通用解释器的近似解释的显著性分数，并研究了它们与LLMs下游性能的相关性。在广泛的用户研究中，我们的解释器在与终端用户注释的对比中达到了76.9%的F1分数，并且几乎与模型内在方法表现相当。	code	0
ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and Refinement	Saurabh Bhausaheb Zinjad, Amrita Bhattacharjee, Amey Bhilegaonkar, Huan Liu	Arizona State University	Crafting the ideal, job-specific resume is a challenging task for many jobapplicants, especially for early-career applicants. While it is highlyrecommended that applicants tailor their resume to the specific role they areapplying for, manually tailoring resumes to job descriptions and role-specificrequirements is often (1) extremely time-consuming, and (2) prone to humanerrors. Furthermore, performing such a tailoring step at scale while applyingto several roles may result in a lack of quality of the edited resumes. Totackle this problem, in this demo paper, we propose ResumeFlow: a LargeLanguage Model (LLM) aided tool that enables an end user to simply providetheir detailed resume and the desired job posting, and obtain a personalizedresume specifically tailored to that specific job posting in the matter of afew seconds. Our proposed pipeline leverages the language understanding andinformation extraction capabilities of state-of-the-art LLMs such as OpenAI'sGPT-4 and Google's Gemini, in order to (1) extract details from a jobdescription, (2) extract role-specific details from the user-provided resume,and then (3) use these to refine and generate a role-specific resume for theuser. Our easy-to-use tool leverages the user-chosen LLM in a completelyoff-the-shelf manner, thus requiring no fine-tuning. We demonstrate theeffectiveness of our tool via a video demo and propose novel task-specificevaluation metrics to control for alignment and hallucination. Our tool isavailable at https://job-aligned-resume.streamlit.app.	为特定职位打造理想的简历对许多求职者来说是一项艰巨的任务，尤其是对于初入职场的求职者。尽管强烈建议求职者根据所申请的具体职位定制简历，但手动根据职位描述和特定职位要求调整简历通常（1）极其耗时，并且（2）容易出现人为错误。此外，在申请多个职位时大规模进行此类定制步骤可能会导致编辑后的简历质量下降。为了解决这一问题，本文展示了一种名为ResumeFlow的工具：这是一个借助大型语言模型（LLM）的工具，使终端用户只需提供详细的个人简历和目标职位招聘信息，便能在几秒钟内获得一份专门针对该职位的个性化简历。我们提出的流程利用了如OpenAI的GPT-4和Google的Gemini等最先进LLM的语言理解和信息提取能力，以（1）从职位描述中提取细节，（2）从用户提供的简历中提取职位相关细节，然后（3）利用这些信息来优化并生成一份针对特定职位的简历。我们的工具易于使用，完全以即插即用的方式利用用户选择的LLM，无需任何微调。我们通过视频演示展示了该工具的有效性，并提出了新的任务特定评估指标来控制对齐和幻觉现象。该工具可访问 https://job-aligned-resume.streamlit.app。	code	0
ScholarNodes: Applying Content-based Filtering to Recommend Interdisciplinary Communities within Scholarly Social Networks	Md Asaduzzaman Noor, Jason A. Clark, John W. Sheppard	Montana State University; Montana State University, Bozeman, MT, USA; Montana State University Library and Information Science	Detecting communities within dynamic academic social networks and connecting these community detection findings to search and retrieval interfaces presents a multifaceted challenge. We explore an information retrieval method that integrates both partition-based and similarity-based network analysis to identify and recommend communities within content-based datasets. Our prototype "ScholarNodes" web interface bridges the gap between community detection algorithms (Louvain, K-means, Spectral clustering) and the BM25 (Best Matching 25) ranking algorithm within a cohesive user interface. From free-text keyword queries, ScholarNodes recommends collaborations, identifies local and external researcher networks, and visualizes an interdisciplinarity graph for individual researchers using the OpenAlex dataset, a global collection of academic papers and authors. Beyond the specific information retrieval use case, we discuss the broader applicability of the methods to generic social network analysis, community detection, and recommender systems. Additionally, we delve into the technical aspects of generating topical terms, community alignment techniques, and interface design considerations for integrating community detection algorithms into a search experience.	识别动态学术社交网络中的社区，并将这些社区检测结果与搜索和检索界面相连接，是一项多层面的挑战。我们探索了一种信息检索方法，该方法结合了基于分区和基于相似性的网络分析，以在基于内容的语料库中识别和推荐社区。我们的原型“ScholarNodes”网络界面弥合了社区检测算法（Louvain、K-means、谱聚类）与BM25（最佳匹配25）排序算法之间的鸿沟，构建了一个连贯的用户界面。通过自由文本关键词查询，ScholarNodes推荐合作机会，识别本地和外部研究者网络，并利用OpenAlex数据集（一个全球性的学术论文和作者集合）为个别研究者绘制跨学科图谱。除了特定的信息检索应用场景，我们还讨论了这些方法在通用社交网络分析、社区检测和推荐系统中的广泛适用性。此外，我们深入探讨了生成主题术语、社区对齐技术以及将社区检测算法整合到搜索体验中的界面设计考虑等技术细节。	code	0
Synthetic Query Generation using Large Language Models for Virtual Assistants	Sonal Sannigrahi, Thiago FragaSilva, Youssef Oualil, Christophe Van Gysel	Apple; Instituto Superior Técnico	Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands. The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives. Hence, the generation of synthetic queries that are similar to existing VA usage can greatly improve upon the VA's abilities-especially for use-cases that do not (yet) occur in paired audio/text data. In this paper, we provide a preliminary exploration of the use of Large Language Models (LLMs) to generate synthetic queries that are complementary to template-based methods. We investigate whether the methods (a) generate queries that are similar to randomly sampled, representative, and anonymized user queries from a popular VA, and (b) whether the generated queries are specific. We find that LLMs generate more verbose queries, compared to template-based methods, and reference aspects specific to the entity. The generated queries are similar to VA user queries, and are specific enough to retrieve the relevant entity. We conclude that queries generated by LLMs and templates are complementary.	虚拟助手（VAs）是重要的信息检索平台，通过语音指令帮助用户完成各种任务。语音识别系统（语音转文字）使用仅基于文本训练的查询先验来区分语音上易混淆的替代选项。因此，生成与现有VA使用情况相似的合成查询可以极大地提升VA的能力，特别是对于那些在配对的音频/文本数据中尚未出现的用例。本文初步探讨了使用大型语言模型（LLMs）生成与基于模板的生成方法互补的合成查询。我们研究了这些方法是否（a）生成的查询与从流行VA中随机抽样、具有代表性且匿名的用户查询相似，以及（b）生成的查询是否具有特异性。我们发现，与基于模板的方法相比，LLMs生成的查询更为冗长，并引用了与实体相关的特定方面。生成的查询与VA用户查询相似，并且足够具体以检索相关实体。我们得出结论，LLMs和模板生成的查询是互补的。	code	0
A Study on Unsupervised Question and Answer Generation for Legal Information Retrieval and Precedents Understanding	Johny Moreira, Altigran S. da Silva, Edleno Silva de Moura, Leandro Bezerra Marinho	Universidade Federal do Amazonas; Universidade Federal de Campina Grande	Traditional retrieval systems are hardly adequate for Legal Research, mainly because only returning the documents related to a given query is usually insufficient. Legal documents are extensive, and we posit that generating questions about them and detecting the answers provided by these documents help the Legal Research journey. This paper presents a pipeline that relates Legal Questions with documents answering them. We align features generated by Large Language Models with traditional clustering methods to find convergent and divergent answers to the same legal matter. We performed a case study with 50 legal documents on the Brazilian judiciary system. Our pipeline found convergent and divergent answers to 23 major legal questions regarding the case law for daily fines in Civil Procedural Law. The pipeline manual evaluation shows it managed to group diverse similar answers to the same question with an average precision of 0.85. It also managed to detect two divergent legal matters with an average F1 Score of 0.94.	传统的检索系统在法律研究中往往不够充分，主要原因是仅返回与给定查询相关的文档通常是不够的。法律文件篇幅浩繁，我们假设，为这些文件生成问题并检测这些文件所提供的答案，有助于法律研究的过程。本文提出了一种将法律问题与回答这些问题的文档相关联的流程。我们将大型语言模型生成的特征与传统的聚类方法相结合，以找到对同一法律问题的趋同和分歧答案。我们进行了案例研究，使用了50份关于巴西司法系统的法律文件。我们的流程发现了23个主要法律问题，这些问题涉及民事诉讼法中每日罚款的判例法，既有趋同答案也有分歧答案。该流程的手动评估显示，它成功地将多种相似的答案归类到同一问题下，平均精确度为0.85。同时，它还能够检测到两个分歧的法律事项，平均F1得分为0.94。	code	0
Reflections on the Coding Ability of LLMs for Analyzing Market Research Surveys	Shi Zong, Santosh Kolagati, Amit Chaudhary, Josh Seltzer, Jimmy Lin	Nexxt Intelligence; University of Waterloo	The remarkable success of large language models (LLMs) has drawn people's great interest in their deployment in specific domains and downstream applications. In this paper, we present the first systematic study of applying large language models (in our case, GPT-3.5 and GPT-4) for the automatic coding (multi-class classification) problem in market research. Our experimental results show that large language models could achieve a macro F1 score of over 0.5 for all our collected real-world market research datasets in a zero-shot setting. We also provide in-depth analyses of the errors made by the large language models. We hope this study sheds light on the lessons we learn and the open challenges large language models have when adapting to a specific market research domain.	大型语言模型（LLMs）的显著成功引起了人们对其在特定领域和下游应用中部署的极大兴趣。本文首次系统性地研究了将大型语言模型（在我们的案例中是GPT-3.5和GPT-4）应用于市场研究中的自动编码（多类分类）问题。我们的实验结果表明，在零样本设置下，大型语言模型可以在我们收集的所有真实市场研究数据集上实现超过0.5的宏F1分数。我们还深入分析了大型语言模型所犯的错误。我们希望这项研究能够揭示我们在学习过程中获得的启示以及大型语言模型在适应特定市场研究领域时面临的开放挑战。	code	0
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering	Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, Zheng Li	LinkedIn Corporation Senior Machine Learning Engineer; LinkedIn Corporation	In customer service technical support, swiftly and accurately retrieving relevant past issues is critical for efficiently resolving customer inquiries. The conventional retrieval methods in retrieval-augmented generation (RAG) for large language models (LLMs) treat a large corpus of past issue tracking tickets as plain text, ignoring the crucial intra-issue structure and inter-issue relations, which limits performance. We introduce a novel customer service question-answering method that amalgamates RAG with a knowledge graph (KG). Our method constructs a KG from historical issues for use in retrieval, retaining the intra-issue structure and inter-issue relations. During the question-answering phase, our method parses consumer queries and retrieves related sub-graphs from the KG to generate answers. This integration of a KG not only improves retrieval accuracy by preserving customer service structure information but also enhances answering quality by mitigating the effects of text segmentation. Empirical assessments on our benchmark datasets, utilizing key retrieval (MRR, Recall@K, NDCG@K) and text generation (BLEU, ROUGE, METEOR) metrics, reveal that our method outperforms the baseline by 77.6% in MRR and by 0.32 in BLEU. Our method has been deployed within LinkedIn's customer service team for approximately six months and has reduced the median per-issue resolution time by 28.6%.	在客户服务技术支持中，快速准确地检索相关历史问题对于高效解决客户咨询至关重要。传统的检索增强生成（RAG）方法在大语言模型（LLM）中将大量历史问题跟踪票据视为纯文本处理，忽略了问题内部的结构和问题间的关系，这限制了性能。我们提出了一种结合了RAG和知识图谱（KG）的新型客户服务问答方法。我们的方法从历史问题中构建KG用于检索，保留了问题内部的结构和问题间的关系。在问答阶段，我们的方法解析消费者查询并从KG中检索相关子图以生成答案。这种KG的整合不仅通过保留客户服务结构信息提高了检索准确性，还通过减轻文本分割的影响提升了回答质量。在我们基准数据集上的实证评估，使用关键检索（MRR、Recall@K、NDCG@K）和文本生成（BLEU、ROUGE、METEOR）指标，结果显示我们的方法在MRR上比基线高出77.6%，在BLEU上高出0.32。我们的方法已在LinkedIn的客户服务团队中部署了大约六个月，并将每个问题的解决时间中位数减少了28.6%。	code	0
Striking the Right Chord: A Comprehensive Approach to Amazon Music Search Spell Correction	Siddharth Sharma, Shiyun Yang, Ajinkya Walimbe, Tarun Sharma, Joaquin Delgado	Amazon Inc	Music and media search spell correction is distinct as it involves named entities like artist, album and podcast names, keywords from track titles and catchy phrases from lyrics. Users often mix artist names and keywords from track title or lyrics making spell correction highly contextual. Data drift in search queries caused during calendar event days or a newly released music album, brings a unique challenge of quickly adapting to new data points. Scalability of the solution is an essential requirement as the Music catalog is extremely large. In this work, we build a multi-stage framework for spell correction solution for music, media and named entity heavy search engines. We offer contextual spelling suggestions using a generative text transformer model and a mechanism to rapidly adapt to data drift as well as different market needs by using parameter efficient based fine tuning techniques. Furthermore, using a reinforcement learning approach our spell correction system can learn from a user's implicit and explicit feedback in real-time. Some key components of this system are being used in search at Amazon Music and showing significant improvements in customer engagement rate and other relevant metrics.	音乐和媒体搜索的拼写校正有其独特性，因为它涉及艺术家、专辑和播客名称等命名实体，以及来自曲目标题和歌词的关键词和吸引人的短语。用户常常混淆艺术家名称和曲目标题或歌词中的关键词，使得拼写校正高度依赖上下文。搜索查询中的数据漂移，尤其是在日历事件日或新音乐专辑发布时，带来了快速适应新数据点的独特挑战。解决方案的可扩展性是一个基本要求，因为音乐目录极其庞大。在这项工作中，我们构建了一个多阶段的框架，用于音乐、媒体和命名实体密集型搜索引擎的拼写校正解决方案。我们使用生成式文本转换器模型提供上下文拼写建议，并通过基于参数高效微调技术，快速适应数据漂移和不同市场的需求。此外，通过强化学习方法，我们的拼写校正系统能够实时从用户的隐式和显式反馈中学习。该系统的一些关键组件已在亚马逊音乐的搜索中使用，并显著提高了客户参与率和其他相关指标。	code	0
SLH-BIA: Short-Long Hawkes Process for Buy It Again Recommendations at Scale	Rankyung Park, Amit Pande, David Relyea, Pushkar Chennu, Prathyusha Kanmanth Reddy	Target Corporation	Buy It Again (BIA) recommendations are a crucial component in enhancing the customer experience and site engagement for retailers. In this paper, we build a short (S) and long (L) term Hawkes (H) process for each item and use it to obtain BIA recommendations for each customer. The challenges of deploying into a production environment including model scalability, an evolving item catalog, and real-time inference are discussed along with solutions such as model compression, frequency-based item filtering, training data sampling, data parallelization, parallel execution and microservice-based real-time recommendations. We significantly reduced model training time from roughly 250 hours to about 3 hours by applying the solutions, while serving real-time inference with less than 70ms latency. We compare our BIA model against state-of-the-art baselines using three publicly available datasets and provide results from A/B tests with millions of live customers. On 3 public datasets, our model outperforms SOTA baseline models in recall and NDCG metrics by around 85% and 10%, respectively, and in live A/B testing it exhibited more than 30% increase in click-through rate and roughly 30% revenue increase compared to other state of the art models.	"再次购买"（Buy It Again, BIA）推荐是增强零售商客户体验和网站参与度的关键组成部分。本文中，我们为每个商品构建了短期（S）和长期（L）的Hawkes（H）过程，并利用其为每位客户获取BIA推荐。我们讨论了部署到生产环境中的挑战，包括模型可扩展性、不断演变的商品目录以及实时推理，并提出了相应的解决方案，如模型压缩、基于频率的商品过滤、训练数据采样、数据并行化、并行执行以及基于微服务的实时推荐。通过应用这些解决方案，我们将模型训练时间从大约250小时显著减少到约3小时，同时实现了低于70毫秒的实时推理延迟。我们使用三个公开数据集将我们的BIA模型与最先进的基线模型进行了比较，并提供了与数百万实际客户进行的A/B测试结果。在三个公开数据集上，我们的模型在召回率和NDCG指标上分别优于最先进的基线模型约85%和10%，在实际A/B测试中，点击率提高了超过30%，收入增加了约30%，相较于其他最先进的模型。	code	0
Are Embeddings Enough? SIRIP Panel on the Future of Embeddings in Industry IR Systems	Jon Degenhardt, Tracy Holloway King	eBay; Adobe	The IR community as a whole is considering whether search and recommendations can move entirely to embedding-based technologies. This SIRIP panel discusses the future of embedding-based technologies in industry search given its broad range of document types, its specific query types, its performance requirements, and the features that accompany search. The panel comprises long-time industry experts and academics with industry ties. The panelists vary as to whether they believe that the industry in practice will move entirely to embeddings or will remain a hybrid domain.	整个信息检索（IR）社区正在探讨搜索和推荐是否可以完全转向基于嵌入（embedding-based）的技术。本次SIRIP专题讨论会聚焦于嵌入技术在工业搜索中的未来，考虑到其广泛的文档类型、特定的查询类型、性能需求以及伴随搜索的特征。讨论会邀请了长期从事工业界的专家学者，这些专家与学术界有着紧密联系。与会者对于工业界是否会完全转向嵌入技术，还是保持混合模式，持有不同观点。	code	0
Large Language Model Powered Agents for Information Retrieval	An Zhang, Yang Deng, Yankai Lin, Xu Chen, JiRong Wen, TatSeng Chua	Natl Univ Singapore, Singapore, Singapore; Renmin Univ China, Beijing, Peoples R China	The vital goal of information retrieval today extends beyond merely connecting users with relevant information they search for. It also aims to enrich the diversity, personalization, and interactivity of that connection, ensuring the information retrieval process is as seamless, beneficial, and supportive as possible in the global digital era. Current information retrieval systems often encounter challenges like a constrained understanding of queries, static and inflexible responses, limited personalization, and restricted interactivity. With the advent of large language models (LLMs), there's a transformative paradigm shift as we integrate LLM-powered agents into these systems. These agents bring forth crucial human capabilities like memory and planning to make them behave like humans in completing various tasks, effectively enhancing user engagement and offering tailored interactions. In this tutorial, we delve into the cutting-edge techniques of LLM-powered agents across various information retrieval fields, such as search engines, social networks, recommender systems, and conversational assistants. We will also explore the prevailing challenges in seamlessly incorporating these agents and hint at prospective research avenues that can revolutionize the way of information retrieval.	当今信息检索的重要目标不仅限于将用户与他们搜索的相关信息连接起来，更在于丰富这种连接的多样性、个性化和互动性，确保在全球数字化时代中，信息检索过程尽可能无缝、有益和支持性。目前的信息检索系统常常面临一些挑战，如对查询理解的局限性、静态且不灵活的响应、有限的个性化以及受限的互动性。随着大型语言模型（LLMs）的出现，我们正在经历一个变革性的范式转变，即将LLM赋能的代理整合到这些系统中。这些代理带来了关键的人类能力，如记忆和规划，使它们能够在完成各种任务时表现得像人类一样，从而有效提升用户参与度并提供定制化的互动。在本教程中，我们将深入探讨LLM赋能代理在各个信息检索领域的尖端技术，包括搜索引擎、社交网络、推荐系统和对话助手。我们还将探讨无缝整合这些代理所面临的当前挑战，并暗示可能的研究方向，这些方向有望彻底改变信息检索的方式。	code	0
High Recall Retrieval Via Technology-Assisted Review	Lenora Gray, David D. Lewis, Jeremy Pickens, Eugene Yang	Redgrave Data, Chantilly, VA 20151 USA; Johns Hopkins Univ, HLTCOE, Baltimore, MD USA	High Recall Retrieval (HRR) tasks, including eDiscovery in the law, systematic literature reviews, and sunshine law requests focus on efficiently prioritizing relevant documents for human review.Technology-assisted review (TAR) refers to iterative human-in-the-loop workflows that combine human review with IR and AI techniques to minimize both time and manual effort while maximizing recall. This full-day tutorial provides a comprehensive introduction to TAR. The morning session presents an overview of the key technologies and workflow designs used, the basics of practical evaluation methods, and the social and ethical implications of TAR deployment. The afternoon session provides more technical depth on the implications of TAR workflows for supervised learning algorithm design, how generative AI is can be applied in TAR, more sophisticated statistical evaluation techniques, and a wide range of open research questions.	高召回率检索（HRR）任务，包括法律领域的电子发现、系统性文献综述以及阳光法案请求，都致力于高效地优先处理相关文档以供人工审查。技术辅助审查（TAR）指的是结合了人工审查与信息检索（IR）和人工智能（AI）技术的迭代式人机协作工作流程，旨在最小化时间和人力成本的同时最大化召回率。本全天教程全面介绍了TAR。上午的课程概述了关键技术和工作流程设计，介绍了实际评估方法的基础知识，并探讨了TAR部署的社会和伦理影响。下午的课程则深入探讨了TAR工作流程对监督学习算法设计的影响、生成式AI在TAR中的应用、更复杂的统计评估技术，以及一系列开放的研究问题。	code	0
Large Language Models for Recommendation: Past, Present, and Future	Keqin Bao, Jizhi Zhang, Xinyu Lin, Yang Zhang, Wenjie Wang, Fuli Feng	Natl Univ Singapore, Singapore, Singapore; Univ Sci & Technol China, Hefei, Peoples R China	Large language models (LLMs) have significantly influenced recommender systems, spurring interest across academia and industry in leveraging LLMs for recommendation tasks. This includes using LLMs for generative item retrieval and ranking, and developing versatile LLMs for various recommendation tasks, potentially leading to a paradigm shift in the field of recommender systems. This tutorial aims to demystify the Large Language Model for Recommendation (LLM4Rec) by reviewing its evolution and delving into cutting-edge research. We will explore how LLMs enhance recommender systems in terms of architecture, learning paradigms, and functionalities such as conversational abilities, generalization, planning, and content generation. The tutorial will shed light on the challenges and open problems in this burgeoning field, including trustworthiness, efficiency, online training, and evaluation of LLM4Rec. We will conclude by summarizing key learnings from existing studies and outlining potential avenues for future research, with the goal of equipping the audience with a comprehensive understanding of LLM4Rec and inspiring further exploration in this transformative domain.	大型语言模型（LLMs）已显著影响了推荐系统，激发了学术界和工业界利用LLMs进行推荐任务的兴趣。这包括使用LLMs进行生成式项目检索和排序，以及开发多功能的LLMs以应对各种推荐任务，这可能引领推荐系统领域发生范式转变。本教程旨在通过回顾其发展历程并深入探讨前沿研究，揭开大型语言模型在推荐系统（LLM4Rec）中的神秘面纱。我们将探讨LLMs如何从架构、学习范式和功能（如对话能力、泛化能力、规划和内容生成）等方面增强推荐系统。教程还将揭示这一新兴领域中的挑战和开放问题，包括可信度、效率、在线训练以及LLM4Rec的评估。最后，我们将总结现有研究的关键发现，并概述未来研究的潜在方向，旨在为听众提供对LLM4Rec的全面理解，并激发在这一变革性领域的进一步探索。	code	0
Recent Advances in Generative Information Retrieval	Yubao Tang, Ruqing Zhang, Zhaochun Ren, Jiafeng Guo, Maarten de Rijke	Leiden Univ, Leiden, Netherlands; Univ Amsterdam, Amsterdam, Netherlands; Chinese Acad Sci, CAS Key Lab Network Data Sci & Technol, ICT, Beijing, Peoples R China	Generative retrieval (GR) has become a highly active area of information retrieval (IR) that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and the applications of GR. We end by outlining remaining challenges and issuing a call for future GR research. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.	生成式检索（GR）已成为信息检索（IR）领域中一个高度活跃且近期取得显著发展的研究方向。与传统的“索引-检索-然后排序”流程不同，GR范式旨在将语料库中的所有信息整合到一个单一模型中。通常，一个序列到序列的模型被训练用于直接将查询映射到相关的文档标识符（即docids）。本教程旨在介绍GR范式的核心概念，并全面概述其在基础理论和应用方面的最新进展。首先，我们将提供关于GR基础方面和问题表述的初步信息。接着，我们将重点转向docid设计、训练方法、推理策略以及GR应用方面的最新进展。最后，我们将概述当前存在的挑战，并呼吁未来在GR研究方面的努力。本教程旨在对那些有兴趣开发新型GR解决方案或在实际场景中应用GR的研究人员和行业从业者有所裨益。	code	0
Robust Information Retrieval	YuAn Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke	Computer Engineering Department,Eskişehir Technical University,Eskişehir,Turkey; Computer Engineering Department,Muğla Sıtkı Koçman University,Muğla,Turkey	A typical information retrieval (IR) system applies a single retrieval strategy to every information need of users. However, the results of the past IR experiments show that a particular retrieval strategy is in general good at fulfilling some type of information needs while failing to fulfil some other type, i.e., high variation in retrieval effectiveness across information needs. On the other hand, the same results also show that an information need that a particular retrieval strategy failed to fulfil could be fulfilled by one of the other existing retrieval strategies. The challenge in here is therefore to determine in advance what retrieval strategy should be applied to which information need. This challenge is related to the robustness of IR systems in retrieval effectiveness. For an IR system, robustness can be defined as fulfilling every information need of users with an acceptable level of satisfaction. Maintaining robustness in retrieval effectiveness is a long-standing challenge and in this article we propose a simple but powerful method as a remedy. The method is a selective approach to index term weighting and for any given query (i.e., information need) it predicts the "best" term weighting model amongst a set of alternatives, on the basis of the frequency distributions of query terms on a target document collection. To predict the best term weighting model, the method uses the Chi-square statistic, the statistic of the Chi-square goodness-of-fit test. The results of the experiments, performed using the official query sets of the TREC Web track and the Million Query track, reveal in general that the frequency distributions of query terms provide relevant information on the retrieval effectiveness of term weighting models. In particular, the results show that the selective approach proposed in this article is, on average, more effective and more robust than the most effective single term weighting model.	典型的信息检索（IR）系统对用户的每一种信息需求都采用单一的检索策略。然而，过去的IR实验结果表明，特定的检索策略通常擅长满足某些类型的信息需求，而对其他类型的信息需求则表现不佳，即检索效果在不同信息需求间存在较大差异。另一方面，同样的结果也显示，某一特定检索策略未能满足的信息需求，可能可以通过其他现有的检索策略来满足。因此，这里的挑战在于预先确定应将哪种检索策略应用于哪种信息需求。这一挑战与IR系统在检索效果上的鲁棒性相关。对于一个IR系统而言，鲁棒性可以定义为以用户可接受的满意程度满足每一种信息需求。维持检索效果的鲁棒性是一个长期存在的挑战，本文中我们提出了一种简单但强大的方法作为补救措施。该方法是一种选择性的索引词权重分配方法，对于任何给定的查询（即信息需求），它基于目标文档集合中查询词的频率分布，从一组备选方案中预测出“最佳”的词权重模型。为了预测最佳的词权重模型，该方法使用了卡方统计量，即卡方拟合优度检验的统计量。通过使用TREC Web轨道和百万查询轨道的官方查询集进行的实验结果表明，查询词的频率分布通常能够提供关于词权重模型检索效果的相关信息。特别是，结果显示，本文提出的选择性方法在平均效果和鲁棒性方面，均优于最有效的单一词权重模型。	code	0
IR-RAG @ SIGIR24: Information Retrieval's Role in RAG Systems	Fabio Petroni, Federico Siciliano, Fabrizio Silvestri, Giovanni Trappolini	Sapienza Univ Rome, Rome, Italy; Samaya AI, London, England	In recent years, Retrieval Augmented Generation (RAG) systems have emerged as a pivotal component in the field of artificial intelligence, gaining significant attention and importance across various domains. These systems, which combine the strengths of information retrieval and generative models, have shown promise in enhancing the capabilities and performance of machine learning applications. However, despite their growing prominence, RAG systems are not without their limitations and continue to be in need of exploration and improvement. This workshop seeks to focus on the critical aspect of information retrieval and its integral role within RAG frameworks. We argue that current efforts have undervalued the role of Information Retrieval (IR) in the RAG and have concentrated their attention on the generative part. As the cornerstone of these systems, IR's effectiveness dramatically influences the overall performance and outcomes of RAG models. We call for papers that will seek to revisit and emphasize the fundamental principles underpinning RAG systems. At the end of the workshop, we aim to have a clearer understanding of how robust information retrieval mechanisms can significantly enhance the capabilities of RAG systems. The workshop will serve as a platform for experts, researchers, and practitioners. We intend to foster discussions, share insights, and encourage research that underscores the vital role of Information Retrieval in the future of generative systems.	近年来，检索增强生成（RAG）系统在人工智能领域崭露头角，成为跨多个领域备受关注和重视的关键组成部分。这些系统结合了信息检索与生成模型的优势，展现出提升机器学习应用能力和性能的潜力。然而，尽管其日益重要，RAG系统仍存在局限性，亟需进一步探索和改进。本次研讨会旨在聚焦信息检索这一关键方面及其在RAG框架中的核心作用。我们主张，当前的研究低估了信息检索（IR）在RAG中的作用，并将重点过多地放在生成部分。作为这些系统的基石，IR的有效性极大地影响着RAG模型的整体性能和结果。我们呼吁提交论文，重新审视并强调支撑RAG系统的基本原理。研讨会结束时，我们期望能更清晰地理解稳健的信息检索机制如何显著增强RAG系统的能力。研讨会将作为专家、研究人员和从业者的交流平台，旨在促进讨论、分享见解，并鼓励强调信息检索在未来生成系统中关键作用的研究。	code	0
A Predictive Framework for Query Reformulation	Reyhaneh Goli	The University of Melbourne	Web search services are widely employed for various purposes. After identifying information needs, users attempt to articulate them in web queries that express their intentions. Then, they submit these queries to the chosen search engine with the hope of obtaining relevant results to meet their needs. In some cases, users may not immediately find precisely what they are seeking, prompting them to rewrite the query to obtain a greater number of relevant results or results that are perhaps more related to their intent. While significant work has been done on developing features such as query auto-completion, query suggestion, and query recommendation, the majority of these efforts were based on query co-occurrence or query similarity by clustering them or constructing query flow graphs to capture query connections. These approaches operate under the assumption that frequently observed follow-up queries are more likely to be submitted by users [1, 2, 4]. In this research, we investigate user query reformulation behavior. To achieve this, we will utilize the Trip Click dataset, a large-scale collection of user click data within the context of a health web search engine [3]. The log data from 2018 to 2020 will be considered, comprising 1,803,493 records representing the clicks that occurred across 527,749 sessions. Specifically, the focus will be on the impact of user interactions with the search result page when forming subsequent queries.	网络搜索服务被广泛应用于各种目的。在识别信息需求后，用户尝试将这些需求表述为网络查询，以表达他们的意图。然后，他们将这些查询提交给所选的搜索引擎，期望获得相关的结果以满足他们的需求。在某些情况下，用户可能无法立即找到他们所寻求的确切内容，从而促使他们重写查询以获取更多相关结果或更符合其意图的结果。尽管在开发查询自动完成、查询建议和查询推荐等功能方面已经取得了显著进展，但大多数这些努力都是基于查询共现或查询相似性，通过聚类或构建查询流图来捕捉查询之间的关联。这些方法的前提是，频繁观察到的后续查询更可能被用户提交[1, 2, 4]。在本研究中，我们探讨了用户查询重构行为。为此，我们将利用Trip Click数据集，这是一个大规模的健康网络搜索引擎用户点击数据集合[3]。我们将考虑2018年至2020年的日志数据，包含1,803,493条记录，代表在527,749个会话中发生的点击。具体而言，我们将重点研究用户与搜索结果页面交互对形成后续查询的影响。	code	0
Multimodal Representation and Retrieval [MRR 2024]	Xinliang Zhu, Arnab Dhua, Douglas Gray, I. Zeki Yalniz, Tan Yu, Mohamed Elhoseiny, Bryan Plummer	Nvidia, Santa Clara, CA USA; Meta, Menlo Pk, CA USA; Amazon, Palo Alto, CA 94303 USA; Boston Univ, Boston, MA USA; King Abdullah Univ Sci & Technol, Thuwal, Saudi Arabia	Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive half-day workshop focusing on the subject of multimodal representation and retrieval. The agenda includes keynote speeches, oral presentations, and an interactive panel discussion.	多模态数据在电子商务产品列表、社交媒体帖子和短视频等许多应用中都存在。然而，现有的处理这些数据的算法仍然主要关注通过视觉-语言对齐和跨模态检索的单模态表示学习。在本次研讨会中，我们旨在引入一个新的检索问题，即查询和文档都是多模态的。随着视觉语言建模、大型语言模型（LLMs）、检索增强生成（RAG）和多模态LLM的普及，我们看到多模态表示和检索任务中涌现出许多新的机会。此次活动将是一个全面的多模态表示和检索主题的半天研讨会。议程包括主题演讲、口头报告和互动小组讨论。	code	0
Axiomatic Guidance for Efficient and Controlled Neural Search	Andrew Parry	University of Glasgow	Pre-trained language models based on the transformer architecture provide solutions to general ad-hoc search tasks--ranging from news search to question-answering--vastly outperforming statistical approaches in terms of both precision and recall. These models operate over "semantics'', removing the need for bespoke features based on proprietary data (e.g., interaction logs). In doing so, this paradigm may lead to further adoption of the idealised "end-to-end'' retrieval system as an elegant and powerful search solution. However, outside of sanitised benchmarks, these models present exploitable and untrustworthy biases relinquishing any control over inference due to their black-box nature. Such biases threaten the viability of neural models in production. Without greater control over model output, stakeholders could raise concerns hindering the adoption of effective and efficient search. Today, feature-based search systems are still performant relative to state-of-the-art neural search and can adapt to a changing corpus and the needs of system stakeholders. As agency over information access is further reduced via emerging paradigms such as Retrieval-Augmented-Generation, we must retain control over the output of a search system. We consider that bias in neural search systems is an artefact of the training and underlying mechanisms of current pre-trained models but is not present in statistical models. Features such as statistical models are principled and arbitrarily controllable; these features can adapt to a corpus and meet the demands of a given search task. Conversely, the output of a current neural system can only be changed by post hoc constraints or by re-training the underlying model. We posit that by allowing external features to influence the semantic interactions within neural search at inference time, we can not only allow control over system output but reduce the need to model corpus-specific priors, which can instead be modelled by external features, allowing for greater generalisation and training efficiency gains. We aim to reduce the complexity of neural ranker training and inference, applying classical IR principles and systems that align with such principles as a generalisable process as opposed to the ad-hoc constraint of prior work. Such an approach can reduce the need for larger models whilst improving generalisation. Axiomatic signals can guide and control neural ranking models to reduce spurious factors in semantic relevance estimation by compensating for the frozen priors of neural systems whilst still operating over flexible latent space. Given the biases observed in current systems, this may satiate the concerns of multiple stakeholders, leading to broader adoption of the paradigm.	基于Transformer架构的预训练语言模型为各种即席搜索任务（从新闻搜索到问答）提供了解决方案，在精确度和召回率方面远超统计方法。这些模型在“语义”层面上运作，消除了对基于专有数据（如交互日志）的定制特征的需求。这种范式可能会进一步促进理想化的“端到端”检索系统作为优雅而强大的搜索解决方案的采用。然而，在非规范化的基准测试之外，这些模型存在可利用且不可信的偏见，由于其黑箱特性，放弃了任何对推理的控制。这些偏见威胁到神经模型在实际生产中的可行性。如果没有对模型输出的更大控制，利益相关者可能会提出担忧，阻碍有效和高效搜索的采用。目前，基于特征的搜索系统相对于最先进的神经搜索仍然表现出色，并且能够适应不断变化的语料库和系统利益相关者的需求。随着信息访问权通过诸如检索增强生成等新兴范式进一步减少，我们必须保留对搜索系统输出的控制。我们认为，神经搜索系统中的偏见是当前预训练模型训练及其底层机制的产物，但并非存在于统计模型中。统计模型的特征具有原则性且可任意控制；这些特征能够适应语料库并满足特定搜索任务的需求。相反，当前神经系统的输出只能通过事后约束或重新训练底层模型来改变。我们提出，通过允许外部特征在推理时影响神经搜索中的语义交互，我们不仅可以控制系统输出，还可以减少对语料库特定先验知识的建模需求，这些先验知识可以由外部特征建模，从而实现更大的泛化性和训练效率提升。我们的目标是降低神经排序器训练和推理的复杂性，应用与经典信息检索原则相一致的系统作为可泛化的过程，而不是先前工作的即席约束。这种方法可以在减少对更大模型的需求的同时提高泛化性。公理信号可以指导和控制神经排序模型，通过补偿神经系统的冻结先验来减少语义相关性估计中的虚假因素，同时仍操作在灵活的潜在空间上。鉴于当前系统中观察到的偏见，这可能会缓解多个利益相关者的担忧，从而促进该范式的更广泛采用。	code	0
Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques	Marco Braga	University of Milano-Bicocca	Personalization of the search experience according to the users and their context is an important topic in Information Retrieval (IR), studied by the research community for a long time. The IR field has witnessed a transformation with the recent availability of pre-trained Large Language Models. Typically, personalization requires the model to incorporate user-specific information, through the definition of an appropriate prompting or injecting user knowledge into the model and then fine-tuning it. However, using prompting, we do not know where and how much the model is personalizing the output. Furthermore, fine-tuning such systems is computationally expensive: since they are characterized by billions of parameters, the fine-tuning process has introduced profound computational challenges. For these reasons, we propose a novel approach that combines personalization and Parameter Efficient Fine-Tuning methods.	根据用户及其上下文个性化搜索体验是信息检索（IR）领域的一个重要课题，长期以来一直受到研究社区的关注。随着预训练大型语言模型的出现，IR领域经历了重大变革。通常，个性化要求模型整合用户特定的信息，通过定义适当的提示或向模型注入用户知识，然后进行微调。然而，使用提示方法时，我们无法确定模型在何处以及在多大程度上对输出进行了个性化处理。此外，微调这类系统在计算上非常昂贵：由于它们具有数十亿个参数，微调过程引入了巨大的计算挑战。因此，我们提出了一种结合个性化与参数高效微调方法的新方法。	code	0
Towards a Framework for Legal Case Retrieval	Tebo LeburuDingalo	University of Botswana	Legal case reports detail the main points of a decided case, findings and decisions of the court. The reports are a fundamental source for Case law, a law which requires judges to align their rulings with previous judicial decisions on similar cases [1]. Timely and reliable access to case reports is thus of critical importance to legal practitioners working on a current case, and laymen interested in the outcome of cases. However, ensuring effective retrieval of previous case reports is still proving a challenge, even with the use of retrieval technologies already proven effective in other Information Retrieval (IR) domains. This has been attributed to factors such as lack of structure, and lengthiness of case report documents, and queries formulated to represent an ongoing case for which the reports are being sought [4]. To address these factors we propose an IR framework that focuses on infusing structure into the documents and queries through the identification of legal rhetorical roles such as arguments and facts in the text. Furthermore, we aim to explore the use of selected groupings of these rhetorical roles as representations for the documents and queries. The benefit of using selected content is illustrated in recent research where for instance segments of documents such as abstracts, case headers, specific paragraphs, and sentences have been used to build effective legal IR systems. We thus hypothesize that we can attain marked improved performance when we build a case retrieval system using only a section of a case report or a query such as arguments or facts. However, in contrast to these studies we posit that utilizing rhetorical role information to extract content will lead to more effective representations that can enhance the performance of case retrieval systems. The proposed framework will consists of a set of components needed to process both query and case report text to firstly infuse structure, extract effective representative content and finally perform retrieval. To aid the development of the framework, several empirical investigations will be conducted on publicly accessible datasets, and a self-curated test collection derived from Botswana legal case reports. Key research questions to assist in our investigation are as follows:: Can we successfully detect the implicit elements of a legal text reflecting rhetorical roles significant to legal case documents?RQ2: In comparison to human formulated queries, do whole case queries give better performance?RQ3: Can we improve retrieval performance by only retaining textual units representing specific rhetorical roles from an entire query text (current case)?RQ4: Does indexing only textual units representing specific rhetorical roles from prior case documents improve retrieval performance?RQ5: Do the selected approaches result in performance improvement for our local corpus in terms of precision, recall and user satisfaction?Some preliminary work has been done and published towards investigating the viability of using summaries in legal case retrieval and identification of rhetorical roles in case documents. We submitted results of a system that utilized expanded summarized queries for an AILA precedent retrieval task competition that outperformed other submissions [2]. Furthermore, our approach that utilized TagCrowd for summarization performed well on a task of Statute retrieval [5]. Towards the feasibility of rhetorically labelling legal text we experimented with the fastText classifier for an AILA organized task. While our methods did not attain state-of-the-art, they gave insights into the performance of the different roles and factors that can affect performance in the task. [3]. : Can we successfully detect the implicit elements of a legal text reflecting rhetorical roles significant to legal case documents? RQ2: In comparison to human formulated queries, do whole case queries give better performance? RQ3: Can we improve retrieval performance by only retaining textual units representing specific rhetorical roles from an entire query text (current case)? RQ4: Does indexing only textual units representing specific rhetorical roles from prior case documents improve retrieval performance? RQ5: Do the selected approaches result in performance improvement for our local corpus in terms of precision, recall and user satisfaction?	法律案件报告详细记录了已决案件的要点、法院的判决和决定。这些报告是判例法的基础来源，判例法要求法官在判决时与先前类似案件的司法决定保持一致[1]。因此，法律从业者在处理当前案件时，以及对案件结果感兴趣的非专业人士，及时且可靠地获取案件报告至关重要。然而，即使使用在其他信息检索（IR）领域已证明有效的检索技术，确保有效检索先前的案件报告仍然是一个挑战。这归因于案件报告文档缺乏结构、篇幅冗长以及用于表示正在进行的案件的查询不够精确等因素[4]。为了应对这些因素，我们提出了一种信息检索框架，该框架通过识别文本中的法律修辞角色（如论点和事实）来为文档和查询注入结构。此外，我们旨在探索使用这些修辞角色的选定组合作为文档和查询的表示形式。最近的研究表明，使用文档的某些部分（如摘要、案件标题、特定段落和句子）可以构建有效的法律信息检索系统。因此，我们假设，仅使用案件报告或查询的一部分（如论点或事实）构建案件检索系统，可以显著提升性能。然而，与这些研究不同，我们认为利用修辞角色信息提取内容将产生更有效的表示，从而增强案件检索系统的性能。所提出的框架将包括一系列组件，用于处理查询和案件报告文本，首先注入结构，提取有效的代表性内容，最后执行检索。为了辅助框架的开发，将在公开可用的数据集和从博茨瓦纳法律案件报告中自选的测试集合上进行多项实证调查。关键的研究问题如下：我们能否成功检测反映法律案件报告重要修辞角色的法律文本的隐含元素？与人工制定的查询相比，整体案件查询是否能提供更好的性能？我们能否通过仅保留整个查询文本（当前案件）中代表特定修辞角色的文本单元来提高检索性能？仅索引先前案件文档中代表特定修辞角色的文本单元是否能提高检索性能？所选方法是否能提高我们本地语料库的性能，包括精确度、召回率和用户满意度？已经进行了一些初步工作，并发表了关于在法律案件检索中使用摘要的可行性以及识别案件文档中修辞角色的研究。我们提交了一个利用扩展摘要查询的系统结果，该系统在AILA先例检索任务竞赛中表现优于其他提交[2]。此外，我们使用TagCrowd进行摘要的方法在法令检索任务中表现良好[5]。为了验证法律文本修辞标注的可行性，我们使用fastText分类器进行了AILA组织的任务实验。尽管我们的方法未达到最先进水平，但它们提供了关于不同角色性能以及可能影响任务性能的因素的洞察[3]。	code	0
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs	Clemencia Siro, Mohammad Aliannejadi, Maarten de Rijke	University of Amsterdam	In ad-hoc retrieval, evaluation relies heavily on user actions, includingimplicit feedback. In a conversational setting such signals are usuallyunavailable due to the nature of the interactions, and, instead, the evaluationoften relies on crowdsourced evaluation labels. The role of user feedback inannotators' assessment of turns in a conversational perception has been littlestudied. We focus on how the evaluation of task-oriented dialogue systems(TDSs), is affected by considering user feedback, explicit or implicit, asprovided through the follow-up utterance of a turn being evaluated. We exploreand compare two methodologies for assessing TDSs: one includes the user'sfollow-up utterance and one without. We use both crowdworkers and largelanguage models (LLMs) as annotators to assess system responses across fouraspects: relevance, usefulness, interestingness, and explanation quality. Ourfindings indicate that there is a distinct difference in ratings assigned byboth annotator groups in the two setups, indicating user feedback doesinfluence system evaluation. Workers are more susceptible to user feedback onusefulness and interestingness compared to LLMs on interestingness andrelevance. User feedback leads to a more personalized assessment of usefulnessby workers, aligning closely with the user's explicit feedback. Additionally,in cases of ambiguous or complex user requests, user feedback improvesagreement among crowdworkers. These findings emphasize the significance of userfeedback in refining system evaluations and suggest the potential for automatedfeedback integration in future research. We publicly release the annotated datato foster research in this area.	在临时检索中，评估严重依赖于用户行为，包括隐式反馈。在对话环境中，由于互动的性质，这些信号通常不可用，因此评估往往依赖于众包的评估标签。用户反馈在标注者对对话感知中轮次的评估中的作用尚未得到充分研究。我们关注的是，在考虑用户反馈（无论是显式还是隐式）作为被评估轮次的后续话语时，任务导向对话系统（TDSs）的评估如何受到影响。我们探讨并比较了两种评估TDSs的方法：一种包括用户的后续话语，另一种不包括。我们使用众包工作者和大语言模型（LLMs）作为标注者，从相关性、有用性、有趣性和解释质量四个方面评估系统响应。我们的研究结果表明，在这两种设置中，两组标注者给出的评分存在显著差异，表明用户反馈确实影响了系统评估。在有用性和有趣性方面，工作者比LLMs在有趣性和相关性方面更容易受到用户反馈的影响。用户反馈导致工作者对有用性的评估更具个性化，与用户的显式反馈紧密一致。此外，在用户请求模糊或复杂的情况下，用户反馈提高了众包工作者之间的一致性。这些发现强调了用户反馈在改进系统评估中的重要性，并暗示了未来研究中自动化反馈整合的潜力。我们公开发布了标注数据，以促进该领域的研究。	code	0
General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study	Qixiang Fang, Zhihan Zhou, Francesco Barbieri, Yozen Liu, Leonardo Neves, Dong Nguyen, Daniel L. Oberski, Maarten W. Bos, Ron Dotsch	Snap Inc.; Utrecht University; Utrecht University and University Medical Center Utrecht; Northwestern University	Learning general-purpose user representations based on user behavioral logs is an increasingly popular user modeling approach. It benefits from easily available, privacy-friendly yet expressive data, and does not require extensive re-tuning of the upstream user model for different downstream tasks. While this approach has shown promise in search engines and e-commerce applications, its fit for instant messaging platforms, a cornerstone of modern digital communication, remains largely uncharted. We explore this research gap using Snapchat data as a case study. Specifically, we implement a Transformer-based user model with customized training objectives and show that the model can produce high-quality user representations across a broad range of evaluation tasks, among which we introduce three new downstream tasks that concern pivotal topics in user research: user safety, engagement and churn. We also tackle the challenge of efficient extrapolation of long sequences at inference time, by applying a novel positional encoding method.	基于用户行为日志学习通用用户表征是一种日益流行的用户建模方法。这种方法得益于易于获取、隐私友好且表达丰富的数据，并且不需要为不同的下游任务对上游用户模型进行广泛的重新调整。尽管这种方法在搜索引擎和电子商务应用中显示出潜力，但其适用于即时通讯平台——现代数字通信的基石——的情况仍大多未被探索。我们利用Snapchat数据作为案例研究，探讨了这一研究空白。具体而言，我们实现了一个基于Transformer的用户模型，并采用了定制的训练目标，展示了该模型能够在广泛的评估任务中生成高质量的用户表征，其中包括我们引入的三个新的下游任务，这些任务关注用户研究中的关键主题：用户安全、参与度和流失。此外，我们还通过应用一种新颖的位置编码方法，解决了推理时高效外推长序列的挑战。	code	0
Neural Passage Quality Estimation for Static Pruning	Xuejun Chang, Debabrata Mishra, Craig Macdonald, Sean MacAvaney	University of Glasgow	Neural networks-especially those that use large, pre-trained language models-have improved search engines in various ways. Most prominently, they can estimate the relevance of a passage or document to a user's query. In this work, we depart from this direction by exploring whether neural networks can effectively predict which of a document's passages are unlikely to be relevant to any query submitted to the search engine.We refer to this query-agnostic estimation of passage relevance as a passage's quality.We find that our novel methods for estimating passage quality allow passage corpora to be pruned considerably while maintaining statistically equivalent effectiveness; our best methods can consistently prune >25% of passages in a corpora, across various retrieval pipelines. Such substantial pruning reduces the operating costs of neural search engines in terms of computing resources, power usage, and carbon footprint-both when processing queries (thanks to a smaller index size) and when indexing (lightweight models can prune low-quality passages prior to the costly dense or learned sparse encoding step). This work sets the stage for developing more advanced neural "learning-what-to-index" methods.	神经网络，尤其是那些使用大规模预训练语言模型的网络，已经在多个方面改进了搜索引擎。最显著的是，它们能够估计一段文本或文档与用户查询的相关性。在这项工作中，我们偏离了这一方向，转而探讨神经网络是否能够有效预测文档中的哪些段落不太可能与搜索引擎接收的任何查询相关。我们将这种与查询无关的段落相关性估计称为段落的质量。我们发现，我们新颖的段落质量估计方法能够在保持统计等效性的同时显著精简段落语料库；我们最优的方法能够在各种检索管道中持续精简超过25%的段落。这种实质性的精简减少了神经搜索引擎在计算资源、电力消耗和碳足迹方面的运营成本——无论是在处理查询时（由于索引规模较小）还是在索引时（轻量级模型可以在昂贵的密集或学习型稀疏编码步骤之前修剪低质量段落）。这项工作为开发更先进的神经“学习索引内容”方法奠定了基础。	code	0
COMI: COrrect and MItigate Shortcut Learning Behavior in Deep Neural Networks	Lili Zhao, Qi Liu, Linan Yue, Wei Chen, Liyi Chen, Ruijun Sun, Chao Song	University of Science and Technology of China; OPPO	Deep Neural Networks (DNNs), despite their notable progress across information retrieval tasks, encounter the issues of shortcut learning and struggle with poor generalization due to their reliance on spurious correlations between features and labels. Current research mainly mitigates shortcut learning behavior using augmentation and distillation techniques, but these methods could be laborious and introduce unwarranted biases. To tackle these, in this paper, we propose COMI, a novel method to COrrect and MItigate shortcut learning behavior. Inspired by the ways students solve shortcuts in educational scenarios, we aim to reduce model's reliance on shortcuts and enhance its ability to extract underlying information integrated with standard Empirical Risk Minimization (ERM). Specifically, we first design Correct Habit (CoHa) strategy to retrieve the top.. challenging samples for priority training, which encourages model to rely less on shortcuts in the early training. Then, to extract more meaningful underlying information, the information derived from ERM is separated into task-relevant and task-irrelevant information, the former serves as the primary basis for model predictions, while the latter is considered non-essential. However, within task-relevant information, certain potential shortcuts contribute to overconfident predictions. To mitigate this, we design Deep Mitigation (DeMi) network with shortcut margin loss to adaptively control the feature weights of shortcuts and eliminate their influence. Besides, to counteract unknown shortcut tokens issue in NLP, we adopt locally interpretable module-LIME to help recognize shortcut tokens. Finally, extensive experiments conducted on NLP and CV tasks demonstrate the effectiveness of COMI, which can perform well on both IID and OOD samples.	深度神经网络（DNNs）虽然在信息检索任务中取得了显著进展，但由于其依赖于特征与标签之间的虚假相关性，面临着捷径学习问题，并难以实现良好的泛化。当前的研究主要通过增强和蒸馏技术来缓解捷径学习行为，但这些方法可能既费力又引入不必要的偏见。为了解决这些问题，本文提出了一种名为COMI的新方法，用于纠正和缓解捷径学习行为。受学生在教育场景中解决捷径的方式启发，我们的目标是减少模型对捷径的依赖，并增强其提取深层信息的能力，同时结合标准的经验风险最小化（ERM）。具体而言，我们首先设计了“纠正习惯”（CoHa）策略，以优先训练最具挑战性的样本，从而在早期训练阶段鼓励模型减少对捷径的依赖。接着，为了提取更有意义的深层信息，我们将从ERM中获得的信息分为任务相关和任务无关信息，前者作为模型预测的主要依据，而后者则被视为非必要的。然而，在任务相关信息中，某些潜在的捷径可能导致过度自信的预测。为此，我们设计了“深度缓解”（DeMi）网络，并结合捷径边际损失，以自适应地控制捷径特征的权重并消除其影响。此外，为了应对自然语言处理（NLP）中未知的捷径词问题，我们采用了局部可解释模块LIME来帮助识别这些捷径词。最后，在NLP和计算机视觉（CV）任务上的广泛实验证明了COMI的有效性，该方法在独立同分布（IID）和非独立同分布（OOD）样本上均表现出色。	code	0
LLM-enhanced Cascaded Multi-level Learning on Temporal Heterogeneous Graphs	Fengyi Wang, Guanghui Zhu, Chunfeng Yuan, Yihua Huang	Nanjing University State Key Laboratory for Novel Software Technology	Learning on temporal heterogeneous graphs (THGs) has attracted substantial attention in applications of information retrieval. Such graphs are ubiquitous in real-world domains like recommender systems and social networks. However, the spatial heterogeneity, rich semantic information, and intricate evolution patterns of THGs make it still difficult to generate high-quality embeddings for graph nodes. In this paper, we focus on two valuable and understudied issues related to THG learning: (a) How to capture the specific evolutionary characteristics of diverse temporal heterogeneous graphs? (b) Due to the heterogeneous nature of the graph, how to capture the unique temporal patterns of different node types? We explore these questions and present our solution by proposing a new method named CasMLN (Cascaded Multi-level Learning Network) for THG learning. Through the multi-level learning structure and aggregation methods specifically designed for different levels, we obtain information of multiple levels and fuse them to improve embedding generation. Additionally, we pioneer the use of large language models (LLMs) in the THG field. By leveraging the universality and powerful capabilities of LLMs, our method introduces LLM-based external knowledge to effectively capture the implicit nature of graphs and node types, which helps to enhance type- and graph-level representations. We evaluate our method on several real-world THG datasets for different downstream tasks. Extensive experimental results show that CasMLN outperforms the state-of-the-art baselines in both accuracy and efficiency.	学习时间异构图（THG）在信息检索应用中引起了广泛关注。这类图在推荐系统和社交网络等现实领域中无处不在。然而，THG的空间异构性、丰富的语义信息和复杂的演化模式使得生成高质量的图节点嵌入仍然具有挑战性。本文重点研究了与THG学习相关的两个有价值且研究不足的问题：（a）如何捕捉不同时间异构图的特定演化特征？（b）由于图的异构性，如何捕捉不同节点类型的独特时间模式？我们探讨了这些问题，并通过提出一种名为CasMLN（级联多层次学习网络）的新方法来解决THG学习问题。通过多层次学习结构和为不同层次专门设计的聚合方法，我们获取了多层次的信息并将其融合，以改进嵌入生成。此外，我们开创性地在THG领域中使用大型语言模型（LLMs）。通过利用LLMs的通用性和强大能力，我们的方法引入了基于LLM的外部知识，有效地捕捉图和节点类型的隐含性质，从而有助于增强类型和图级别的表示。我们在多个真实世界的THG数据集上评估了我们的方法，用于不同的下游任务。广泛的实验结果表明，CasMLN在准确性和效率方面均优于最先进的基线。	code	0
Self-Improving Teacher Cultivates Better Student: Distillation Calibration for Multimodal Large Language Models	Xinwei Li, Li Lin, Shuai Wang, Chen Qian	Southeast University; Southeast university; Tsinghua University	Multimodal content generation, which leverages visual information to enhance the comprehension of cross-modal understanding, plays a critical role in Multimodal Information Retrieval. With the development of large language models (LLMs), recent research has adopted visual instruction tuning to inject the knowledge of LLMs into downstream multimodal tasks. The high complexity and great demand for resources urge researchers to study e.cient distillation solutions to transfer the knowledge from pre-trained multimodal models (teachers) to more compact student models. However, the instruction tuning for knowledge distillation in multimodal LLMs is resource-intensive and capability-restricted. The comprehension of students is highly reliant on the teacher models. To address this issue, we propose a novel Multimodal Distillation Calibration framework (MmDC). The main idea is to generate high-quality training instances that challenge student models to comprehend and prompt the teacher to calibrate the knowledge transferred to students, ultimately cultivating a better student model in downstream tasks. This framework comprises two stages: (1) multimodal alignment and (2) knowledge distillation calibration. In the.rst stage, parameter-e.cient.ne-tuning is used to enhance feature alignment between di.erent modalities. In the second stage, we develop a calibration strategy to assess the student model's capability and generate high-quality instances to calibrate knowledge distillation from teacher to student. The experiments on diverse datasets show that our framework e.ciently improves the student model's capabilities. Our 7B-size student model, after three iterations of distillation calibration, outperforms the current state-of-the-art LLaVA-13B model on the ScienceQA and LLaVA Test datasets and also exceeds other strong baselines in a zero-shot setting.	多模态内容生成，通过利用视觉信息来增强跨模态理解，在多模态信息检索中发挥着关键作用。随着大型语言模型（LLMs）的发展，近期研究采用了视觉指令微调来将LLMs的知识注入下游多模态任务。高复杂性和对资源的大量需求促使研究人员研究高效的蒸馏解决方案，将知识从预训练的多模态模型（教师模型）转移到更紧凑的学生模型中。然而，多模态LLMs中的指令微调知识蒸馏资源密集且能力受限，学生模型的理解高度依赖教师模型。为解决这一问题，我们提出了一种新颖的多模态蒸馏校准框架（MmDC）。其主要思想是生成高质量的训练实例，挑战学生模型以理解并促使教师模型校准传递给学生的知识，最终培养出在下游任务中表现更佳的学生模型。该框架包括两个阶段：（1）多模态对齐和（2）知识蒸馏校准。在第一阶段，采用参数高效微调来增强不同模态之间的特征对齐。在第二阶段，我们开发了一种校准策略，评估学生模型的能力并生成高质量实例，以校准从教师到学生的知识蒸馏。在多个数据集上的实验表明，我们的框架有效提升了学生模型的能力。经过三轮蒸馏校准后，我们的7B规模学生模型在ScienceQA和LLaVA测试数据集上超越了当前最先进的LLaVA-13B模型，并在零样本设置下也优于其他强基线模型。	code	0
Deep Automated Mechanism Design for Integrating Ad Auction and Allocation in Feed	Xuejian Li, Ze Wang, Bingqi Zhu, Fei He, Yongkang Wang, Xingxing Wang	Meituan	E-commerce platforms usually present an ordered list, mixed with several organic items and an advertisement, in response to each user's page view request. This list, the outcome of ad auction and allocation processes, directly impacts the platform's ad revenue and gross merchandise volume (GMV). Specifically, the ad auction determines which ad is displayed and the corresponding payment, while the ad allocation decides the display positions of the advertisement and organic items. The prevalent methods of segregating the ad auction and allocation into two distinct stages face two problems: 1) Ad auction does not consider externalities, such as the influence of actual display position and context on ad Click-Through Rate (CTR); 2) The ad allocation, which utilizes the auction-winning ad's payment to determine the display position dynamically, fails to maintain incentive compatibility (IC) for the advertisement. For instance, in the auction stage employing the traditional Generalized Second Price (GSP), even if the winning ad increases its bid, its payment remains unchanged. This implies that the advertisement cannot secure a better position and thus loses the opportunity to achieve higher utility in the subsequent ad allocation stage. Previous research often focused on one of the two stages, neglecting the two-stage problem, which may result in suboptimal outcomes. Therefore, this paper proposes a deep automated mechanism that integrates ad auction and allocation, ensuring both IC and Individual Rationality (IR) in the presence of externalities while maximizing revenue and GMV. The mechanism takes candidate ads and the ordered list of organic items as input. For each candidate ad, several candidate allocations are generated by inserting the ad in different positions of the ordered list of organic items. For each candidate allocation, a list-wise model takes the entire allocation as input and outputs the predicted result for each ad and organic item to model the global externalities. Finally, an automated auction mechanism, modeled by deep neural networks, is executed to select the optimal allocation. Consequently, this mechanism simultaneously decides the ranking, payment, and display position of the ad. Furthermore, the proposed mechanism results in higher revenue and GMV than state-of-the-art baselines in offline experiments and online A/B tests.	电子商务平台通常会针对每个用户的页面浏览请求，提供一个有序列表，其中混合了多个有机商品和一个广告。这个列表是广告拍卖和分配过程的结果，直接影响到平台的广告收入和总商品交易额（GMV）。具体来说，广告拍卖决定了展示哪个广告及其相应的支付，而广告分配则决定了广告和有机商品的展示位置。目前将广告拍卖和分配分隔为两个独立阶段的方法面临两个问题：1) 广告拍卖未考虑外部性，如实际展示位置和上下文对广告点击率（CTR）的影响；2) 广告分配使用拍卖胜出的广告支付来动态决定展示位置，无法维持广告的激励兼容性（IC）。例如，在使用传统广义第二价格（GSP）的拍卖阶段，即使胜出的广告提高其出价，其支付仍保持不变，这意味着广告无法获得更好的位置，从而在后续的广告分配阶段失去实现更高效用的机会。以往的研究往往只关注这两个阶段中的一个，忽视了两阶段问题，可能导致次优结果。因此，本文提出了一种深度自动化机制，将广告拍卖和分配整合在一起，在存在外部性的情况下确保IC和个体理性（IR），同时最大化收入和GMV。该机制以候选广告和有机商品的有序列表为输入。对于每个候选广告，通过将广告插入有机商品有序列表的不同位置，生成多个候选分配。对于每个候选分配，一个列表级模型以整个分配为输入，并输出每个广告和有机商品的预测结果，以模拟全局外部性。最后，通过深度神经网络建模的自动化拍卖机制执行，选择最优分配。因此，该机制同时决定了广告的排序、支付和展示位置。此外，在离线实验和在线A/B测试中，所提出的机制在收入和GMV方面均优于最先进的基线。	code	0
TGOnline: Enhancing Temporal Graph Learning with Adaptive Online Meta-Learning	Ruijie Wang, Jingyuan Huang, Yutong Zhang, Jinyang Li, Yufeng Wang, Wanyu Zhao, Shengzhong Liu, Charith Mendis, Tarek F. Abdelzaher	University of Illinois Urbana-Champaign; Zhejiang University; Stanford University; Shanghai Jiao Tong University	Temporal graphs, depicting time-evolving node connections through temporal edges, are extensively utilized in domains where temporal connection patterns are essential, such as recommender systems, financial networks, healthcare, and sensor networks. Despite recent advancements in temporal graph representation learning, performance degradation occurs with periodic collections of new temporal edges, owing to their dynamic nature and newly emerging information. This paper investigates online representation learning on temporal graphs, aiming for efficient updates of temporal models to sustain predictive performance during deployment. Unlike costly retraining or exclusive fine-tuning susceptible to catastrophic forgetting, our approach aims to distill information from previous model parameters and adapt it to newly gathered data. To this end, we propose TGOnline, an adaptive online meta-learning framework, tackling two key challenges. First, to distill valuable knowledge from complex temporal parameters, we establish an optimization objective that determines new parameters, either by leveraging global ones or by placing greater reliance on new data, where global parameters are meta-trained across various data collection periods to enhance temporal generalization. Second, to accelerate the online distillation process, we introduce an edge reduction mechanism that skips new edges lacking additional information and a node deduplication mechanism to prevent redundant computation within training batches on new data. Extensive experiments on four real-world temporal graphs demonstrate the effectiveness and efficiency of TGOnline for online representation learning, outperforming 18 state-of-the-art baselines. Notably, TGOnline not only outperforms the commonly utilized retraining strategy but also achieves a significant speedup of ~30x.	描述时间演化节点连接的时态图在多个领域中得到广泛应用，这些领域中时间连接模式至关重要，如推荐系统、金融网络、医疗保健和传感器网络。尽管时态图表示学习的最新进展取得了一些成果，但由于其动态特性和新信息的不断涌现，随着新时态边的周期性收集，性能下降问题依然存在。本文探讨了时态图上的在线表示学习，旨在实现时态模型的高效更新，以在部署期间维持预测性能。与昂贵的重新训练或易受灾难性遗忘影响的专用微调不同，我们的方法旨在从前模型参数中提取信息，并将其适应于新收集的数据。为此，我们提出了TGOnline，一种自适应的在线元学习框架，解决了两个关键挑战。首先，为了从复杂的时态参数中提取有价值的知识，我们建立了一个优化目标，用于确定新参数，既可以利用全局参数，也可以更依赖于新数据，其中全局参数在不同数据收集周期进行元训练，以增强时态泛化能力。其次，为了加速在线提取过程，我们引入了一种边减少机制，跳过缺乏附加信息的新边，以及一种节点去重机制，以防止在新数据训练批次中的冗余计算。在四个真实世界的时态图上的广泛实验证明了TGOnline在在线表示学习中的有效性和高效性，超越了18种最先进的基线方法。值得注意的是，TGOnline不仅优于常用的重新训练策略，还实现了约30倍的显著加速。	code	0
Intent Distribution based Bipartite Graph Representation Learning	Haojie Li, Wei Wei, Guanfeng Liu, Jinhuan Liu, Feng Jiang, Junwei Du	School of Information Science & Technology, Qingdao University of Science and Technology; Macquarie University; College of Data Science, Qingdao University of Science and Technology	Bipartite graph representation learning embeds users and items into a low-dimensional latent space based on observed interactions. Previous studies mainly fall into two categories: one reconstructs the structural relations of the graph through the representations of nodes, while the other aggregates neighboring node information using graph neural networks. However, existing methods only explore the local structural information of nodes during the learning process. This makes it difficult to represent the macroscopic structural information and leaves it easily affected by data sparsity and noise. To address this issue, we propose the Intent Distribution based Bipartite graph Representation learning (IDBR) model, which explicitly integrates node intent distribution information into the representation learning process. Specifically, we obtain node intent distributions through clustering and design an intent distribution based graph convolution neural network to generate node representations. Compared to traditional methods, we expand the scope of node representations, enabling us to obtain more comprehensive representations of global intent. When constructing the intent distributions, we effectively alleviated the issues of data sparsity and noise. Additionally, we enrich the representations of nodes by integrating potential neighboring nodes from both structural and semantic dimensions. Experiments on the link prediction and recommendation tasks illustrate that the proposed approach outperforms existing state-of-the-art methods. The code of IDBR is available at https://github.com/rookitkitlee/IDBR.	二部图表示学习通过观察到的交互作用将用户和物品嵌入到一个低维潜在空间中。先前的研究主要分为两类：一类是通过节点的表示重建图的结构关系，另一类是使用图神经网络聚合相邻节点的信息。然而，现有方法在表示学习过程中仅探索了节点的局部结构信息。这使得宏观结构信息的表示变得困难，并容易受到数据稀疏性和噪声的影响。为了解决这一问题，我们提出了基于意图分布的二部图表示学习（IDBR）模型，该模型明确地将节点意图分布信息整合到表示学习过程中。具体来说，我们通过聚类获得节点意图分布，并设计了一种基于意图分布的图卷积神经网络来生成节点表示。与传统方法相比，我们扩展了节点表示的范围，从而能够获得更全面的全局意图表示。在构建意图分布时，我们有效地缓解了数据稀疏性和噪声问题。此外，我们通过整合来自结构和语义维度的潜在相邻节点来丰富节点的表示。在链接预测和推荐任务上的实验表明，所提出的方法优于现有的最先进方法。IDBR的代码可在https://github.com/rookitkitlee/IDBR获取。	code	0
MTMS: Multi-teacher Multi-stage Knowledge Distillation for Reasoning-Based Machine Reading Comprehension	Zhuo Zhao, Zhiwen Xie, Guangyou Zhou, Jimmy Xiangji Huang	gyzhoumail.ccnu.edu.cn; zhuomails.ccnu.edu.cn; xiezhiwenwhu.edu.cn; jhuangyorku.ca	As the field of machine reading comprehension (MRC) continues to evolve, it is unlocking enormous potential for its practical application. However, the currently well-performing models predominantly rely on massive pre-trained language models with at least several hundred million or even over one hundred billion parameters. These complex models not only require immense computational power but also extensive storage, presenting challenges for resource-limited environments such as online education.Current research indicates that specific capabilities of larger models can be transferred to smaller models through knowledge distillation. However, prior to our work, there were no small models specifically designed for MRC task with complex reasoning abilities. In light of this, we present a novel multi-teacher multi-stage distillation approach, MTMS. It facilitates the easier deployment of reasoning-based MRC task on resource-constrained devices, thereby enabling effective applications. In this method, we design a multi-teacher distillation framework that includes both a logical teacher and a semantic teacher. This framework allows MTMS to simultaneously extract features from different perspectives of the text, mitigating the limitations inherent in single-teacher information representations. Furthermore, we introduce a multi-stage contrastive learning strategy. Through this strategy, the student model can progressively align with the teacher models, effectively bridging the gap between them. Extensive experimental outcomes on two inference-based datasets from real-world scenarios demonstrate that MTMS requires nearly 10 times fewer parameters compared with the teacher model size while achieving the competitive performance.	随着机器阅读理解（MRC）领域的不断发展，其在实际应用中的潜力正在被不断解锁。然而，目前表现优异的模型主要依赖于大规模预训练语言模型，这些模型至少拥有数亿甚至超过千亿级别的参数。这些复杂的模型不仅需要巨大的计算能力，还需要大量的存储空间，这对于在线教育等资源受限的环境构成了挑战。现有研究表明，大型模型的特定能力可以通过知识蒸馏转移到小型模型上。然而，在我们开展工作之前，尚未有专门为具有复杂推理能力的MRC任务设计的小型模型。鉴于此，我们提出了一种新颖的多教师多阶段蒸馏方法，即MTMS。该方法有助于在资源受限的设备上更轻松地部署基于推理的MRC任务，从而实现有效的应用。在该方法中，我们设计了一个包含逻辑教师和语义教师的多教师蒸馏框架。该框架使MTMS能够同时从文本的不同角度提取特征，缓解了单教师信息表示的固有限制。此外，我们引入了一种多阶段对比学习策略。通过这一策略，学生模型可以逐步与教师模型对齐，有效地缩小了它们之间的差距。在两个来自真实场景的基于推理的数据集上的广泛实验结果表明，MTMS所需的参数数量几乎是教师模型参数量的十分之一，同时达到了相当的性能水平。	code	0
Exploring the Trade-Off within Visual Information for MultiModal Sentence Summarization	Minghuan Yuan, Shiyao Cui, Xinghua Zhang, Shicheng Wang, Hongbo Xu, Tingwen Liu	Institute of Information Engineering, Chinese Academy of Sciences	MultiModal Sentence Summarization (MMSS) aims to generate a brief summary based on the given source sentence and its associated image. Previous studies on MMSS have achieved success by either selecting the task-relevant visual information or filtering out the task-irrelevant visual information to help the textual modality to generate the summary. However, enhancing from a single perspective usually introduces over-preservation or over-compression problems. To tackle these issues, we resort to Information Bottleneck (IB), which seeks to find a maximally compressed mapping of the input information that preserves as much information about the target as possible. Specifically, we propose a novel method, T(3), which adopts IB to balance the Trade-off between Task-relevant and Task-irrelevant visual information through the variational inference framework. In this way, the task-irrelevant visual information is compressed to the utmost while the task-relevant visual information is maximally retained. With the holistic perspective, the generated summary could maintain as many key elements as possible while discarding the unnecessary ones as far as possible. Extensive experiments on the representative MMSS dataset demonstrate the superiority of our proposed method. Our code is available at https://github.com/YuanMinghuan/T3.	多模态句子摘要（MMSS）旨在根据给定的源句子和相关图像生成简要摘要。以往的MMSS研究通过选择与任务相关的视觉信息或过滤掉与任务无关的视觉信息来帮助文本模态生成摘要，取得了成功。然而，从单一角度进行增强通常会引入过度保留或过度压缩的问题。为了解决这些问题，我们采用了信息瓶颈（IB）方法，该方法旨在找到输入信息的最大压缩映射，同时尽可能多地保留关于目标的信息。具体而言，我们提出了一种新方法T(3)，该方法通过变分推断框架采用IB来平衡与任务相关和与任务无关的视觉信息之间的权衡。通过这种方式，与任务无关的视觉信息被最大限度地压缩，而与任务相关的视觉信息则被最大程度地保留。从整体角度来看，生成的摘要能够尽可能多地保留关键元素，同时尽可能多地剔除不必要的内容。在代表性的MMSS数据集上的广泛实验证明了我们提出的方法的优越性。我们的代码可在https://github.com/YuanMinghuan/T3获取。	code	0
ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages	Bhawna Piryani, Jamshid Mozafari, Adam Jatowt	University of Innsbruck	Question answering (QA) and Machine Reading Comprehension (MRC) tasks have significantly advanced in recent years due to the rapid development of deep learning techniques and, more recently, large language models. At the same time, many benchmark datasets have become available for QA and MRC tasks. However, most existing large-scale benchmark datasets have been created predominantly using synchronous document collections like Wikipedia or the Web. Archival document collections, such as historical newspapers, contain valuable information from the past that is still not widely used to train large language models. To further contribute to advancing QA and MRC tasks and to overcome the limitation of previous datasets, we introduce ChroniclingAmericaQA, a largescale temporal QA dataset with 487K question-answer pairs created based on the historical newspaper collection Chronicling America. Our dataset is constructed from a subset of the Chronicling America newspaper collection spanning 120 years. One of the significant challenges for utilizing digitized historical newspaper collections is the low quality of OCR text. Therefore, to enable realistic testing of QA models, our dataset can be used in three different ways: answering questions from raw and noisy content, answering questions from cleaner, corrected version of the content, as well as answering questions from scanned images of newspaper pages. This and the fact that ChroniclingAmericaQA spans the longest time period among available QA datasets make it quite a unique and useful resource.	近年来，由于深度学习技术的迅速发展和大规模语言模型的出现，问答（QA）和机器阅读理解（MRC）任务取得了显著进展。与此同时，许多用于QA和MRC任务的基准数据集也相继问世。然而，大多数现有的大规模基准数据集主要基于同步文档集合（如维基百科或网络资源）创建。档案文档集合，如历史报纸，包含了大量有价值的历史信息，但这些信息尚未被广泛用于训练大规模语言模型。为了进一步推动QA和MRC任务的发展，并克服以往数据集的局限性，我们推出了ChroniclingAmericaQA，这是一个基于历史报纸集合Chronicling America创建的包含48.7万个问答对的大规模时间性QA数据集。我们的数据集构建自Chronicling America报纸集合的一个子集，跨越了120年的时间。利用数字化历史报纸集合的一个主要挑战是OCR文本的质量较低。因此，为了实现对QA模型的实际测试，我们的数据集可以以三种不同的方式使用：从原始且包含噪声的内容中回答问题，从经过清理和校正的内容版本中回答问题，以及从报纸页面的扫描图像中回答问题。此外，ChroniclingAmericaQA所涵盖的时间跨度在现有的QA数据集中是最长的，这使得它成为一个非常独特且有用的资源。	code	0
BRB-KMeans: Enhancing Binary Data Clustering for Binary Product Quantization	Suwon Lee, SangMin Choi	Gyeongsang National University	In Binary Product Quantization (BPQ), where product quantization is applied to binary data, the traditional k-majority method is used for clustering, with centroids determined based on Hamming distance and majority vote for each bit. However, this approach often leads to a degradation in clustering quality, negatively impacting BPQ's performance. To address these challenges, we introduce Binary-to-Real-and-Back K-Means (BRB-KMeans), a novel method that initially transforms binary data into real-valued vectors, performs k-means clustering on these vectors, and then converts the generated centroids back into binary data. This innovative approach significantly enhances clustering quality by leveraging the high clustering quality of k-means in the real-valued vector space, thereby facilitating future quantization for binary data. Through extensive experiments, we demonstrate that BRB-KMeans significantly enhances clustering quality and overall BPQ performance, notably outperforming traditional methods.	在二进制乘积量化（BPQ）中，当乘积量化应用于二进制数据时，传统的方法使用k-多数方法进行聚类，其中质心是基于汉明距离和每个位的多数投票来确定的。然而，这种方法通常会导致聚类质量下降，从而负面影响BPQ的性能。为了应对这些挑战，我们提出了一种新的方法——二进制到实数再返回K均值（BRB-KMeans），该方法首先将二进制数据转换为实值向量，然后在这些向量上执行k均值聚类，最后将生成的质心转换回二进制数据。这种创新方法通过利用实值向量空间中k均值聚类的高质量，显著提高了聚类质量，从而为二进制数据的未来量化提供了便利。通过广泛的实验，我们证明了BRB-KMeans显著提升了聚类质量和整体BPQ性能，明显优于传统方法。	code	0
Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation	Yoori Oh, Yoseob Han, Kyogu Lee	Seoul National University; Soongsil University	There has been growing interest in audio-language retrieval research, wherethe objective is to establish the correlation between audio and textmodalities. However, most audio-text paired datasets often lack rich expressionof the text data compared to the audio samples. One of the significantchallenges facing audio-text datasets is the presence of similar or identicalcaptions despite different audio samples. Therefore, under many-to-one mappingconditions, audio-text datasets lead to poor performance of retrieval tasks. Inthis paper, we propose a novel approach to tackle the data imbalance problem inaudio-language retrieval task. To overcome the limitation, we introduce amethod that employs a distance sampling-based paraphraser leveraging ChatGPT,utilizing distance function to generate a controllable distribution ofmanipulated text data. For a set of sentences with the same context, thedistance is used to calculate a degree of manipulation for any two sentences,and ChatGPT's few-shot prompting is performed using a text cluster with asimilar distance defined by the Jaccard similarity. Therefore, ChatGPT, whenapplied to few-shot prompting with text clusters, can adjust the diversity ofthe manipulated text based on the distance. The proposed approach is shown tosignificantly enhance performance in audio-text retrieval, outperformingconventional text augmentation techniques.	近年来，音频-语言检索研究引起了越来越多的关注，其目标在于建立音频与文本模态之间的关联。然而，大多数音频-文本配对数据集中的文本数据相较于音频样本，往往缺乏丰富的表达。音频-文本数据集面临的一个重大挑战是，尽管音频样本不同，但可能存在相似或相同的字幕。因此，在多对一映射的情况下，音频-文本数据集会导致检索任务的性能不佳。本文提出了一种新颖的方法来解决音频-语言检索任务中的数据不平衡问题。为了克服这一限制，我们引入了一种基于距离采样的释义生成方法，该方法利用ChatGPT，通过距离函数生成可控的被操纵文本数据的分布。对于一组具有相同上下文的句子，距离被用来计算任意两个句子之间的操纵程度，并且使用基于Jaccard相似性定义的相似距离的文本簇进行ChatGPT的少样本提示。因此，当ChatGPT应用于带有文本簇的少样本提示时，可以根据距离调整被操纵文本的多样性。所提出的方法在音频-文本检索中的性能显著提升，超越了传统的文本增强技术。	code	0
Fake News Detection via Multi-scale Semantic Alignment and Cross-modal Attention	Jiandong Wang, Hongguang Zhang, Chun Liu, Xiongjun Yang				code	0
Label Hierarchical Structure-Aware Multi-Label Few-Shot Intent Detection via Prompt Tuning	Xiaotong Zhang, Xinyi Li, Han Liu, Xinyue Liu, Xianchao Zhang	Dalian University of Technology	Multi-label intent detection aims to recognize multiple user intents behind dialogue utterances. The diversity of user utterances and the scarcity of training data motivate multi-label few-shot intent detection. However, existing methods ignore the hybrid of verb and noun within an intent, which is essential to identify the user intent. In this paper, we propose a label hierarchical structure-aware method for multi-label few-shot intent detection via prompt tuning (LHS). Firstly, for the support data, we concatenate the original utterance with the label description generated by GPT-4 to obtain the utterance-level representation. Then we construct a multi-label hierarchical structure-aware prompt model to learn the label hierarchical information. To learn more discriminative class prototypes, we devise a prototypical contrastive learning method to pull the utterances close to their corresponding intent labels and away from other intent labels. Extensive experiments on two datasets demonstrate the superiority of our method.	多标签意图检测旨在识别对话话语背后的多个用户意图。用户话语的多样性和训练数据的稀缺性促使了多标签少样本意图检测的发展。然而，现有方法忽视了意图中动词和名词的混合，这对于识别用户意图至关重要。本文中，我们提出了一种通过提示调优（LHS）的标签层次结构感知方法，用于多标签少样本意图检测。首先，对于支持数据，我们将原始话语与由GPT-4生成的标签描述连接起来，以获得话语级别的表示。接着，我们构建了一个多标签层次结构感知的提示模型，以学习标签层次信息。为了学习更具区分性的类别原型，我们设计了一种原型对比学习方法，将话语拉近其对应的意图标签，并远离其他意图标签。在两个数据集上的广泛实验证明了我们方法的优越性。	code	0
MKV: Mapping Key Semantics into Vectors for Rumor Detection	Yang Li, Liguang Liu, Jiacai Guo, LapKei Lee, Fu Lee Wang, Zhenguo Yang	Hong Kong Metropolitan University; Guangdong University of Technology	The cross-attention mechanism has been widely employed in the multimodal rumor detection task, which is computation-intensive and suffers from the restricted modal receptive field. In this paper, we propose a multimodal rumor detection model (MKV), which maps multimodal key semantics with discrimination into feature vectors for rumor detection. More specifically, MKV extracts high-dimensional features for each modality separately by the Multimodal Feature Extractor (MFE). The mapping mechanism learns low-dimensional mapping scheme (Map) and key semantics (Key) with discrimination from the different modal features respectively. Subsequently, the Map and Key jointly construct a state matrix (State) containing all possible permutations of modalities. In particular, a max pooling operation is performed on State and products a feature vector (Vector). The mapping mechanism is able to incrementally learn the discriminative semantics by stacking manner. Vectors from the stacking process are leveraged in the Rumor Detection module (RD). Extensive experiments on two public datasets show that the MKV achieves the state-of-the-art performance.	交叉注意力机制已广泛应用于多模态谣言检测任务，但该任务计算密集且受限于模态接收场。本文提出了一种多模态谣言检测模型（MKV），该模型将多模态关键语义映射为特征向量以进行谣言检测。具体而言，MKV通过多模态特征提取器（MFE）分别提取每种模态的高维特征。映射机制分别从不同模态特征中学习低维映射方案（Map）和具有区分性的关键语义（Key）。随后，Map和Key共同构建包含所有可能模态排列的状态矩阵（State）。特别地，对State执行最大池化操作，生成特征向量（Vector）。映射机制通过堆叠方式逐步学习区分性语义。堆叠过程中的向量被用于谣言检测模块（RD）。在两个公开数据集上的广泛实验表明，MKV达到了最先进的性能。	code	0
PAG-LLM: Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors	Vikas Yadav, Zheng Tang, Vijay Srinivasan				code	0
Self-Referential Review: Exploring the Impact of Self-Reference Effect in Review	Kyusik Kim, Hyungwoo Song, Bongwon Suh				code	0
Unbiased Validation of Technology-Assisted Review for eDiscovery	Gordon V. Cormack, Maura R. Grossman, Andrew Harbison, Tom O'Halloran, Bronagh McManus				code	0
Homogeneous-listing-augmented Self-supervised Multimodal Product Title Refinement	Jiaqi Deng, Kaize Shi, Huan Huo, Dingxian Wang, Guandong Xu				code	0
GATS: Generative Audience Targeting System for Online Advertising	Cong Jiang, Zhongde Chen, Bo Zhang, Yankun Ren, Xin Dong, Lei Cheng, Xinxing Yang, Longfei Li, Jun Zhou, Linjian Mo				code	0
ScienceDirect Topic Pages: A Knowledge Base of Scientific Concepts Across Various Science Domains	Artemis Çapari, Hosein Azarbonyad, Georgios Tsatsaronis, Zubair Afzal, Judson Dunham				code	0
GOLF: Goal-Oriented Long-term liFe tasks supported by human-AI collaboration	Ben Wang				code	0
CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks	Xiaoxi Li, Zhicheng Dou, Yujia Zhou, Fangchao Liu				code	0
Transformer-based Reasoning for Learning Evolutionary Chain of Events on Temporal Knowledge Graph	Zhiyu Fang, ShuaiLong Lei, Xiaobin Zhu, Chun Yang, ShiXue Zhang, XuCheng Yin, Jingyan Qin				code	0
NativE: Multi-modal Knowledge Graph Completion in the Wild	Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen	Zhejiang Univerisity; Zhejiang University; Ant Group; Zhejiang UniversityZhejiang University-Ant Group Joint Laboratory of Knowledge Graph	Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e.g., image, text, numeric, audio, video) but its distribution among entities is uneven, leading to missing modalities for certain entities. Existing works usually focus on common modalities like image and text while neglecting the imbalanced distribution phenomenon of modal information. To address these issues, we propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities and employs a collaborative modality adversarial training framework to augment the imbalanced modality information. We construct a new benchmark called WildKGC with five datasets to evaluate our method. The empirical results compared with 21 recent baselines confirm the superiority of our method, consistently achieving state-of-the-art performance across different datasets and various scenarios while keeping efficient and generalizable. Our code and data are released at https://github.com/zjukg/NATIVE	多模态知识图谱补全（MMKGC）旨在通过协同建模实体的三元组结构和多模态信息，自动发现给定多模态知识图谱中未观察到的事实知识。然而，现实世界中的多模态知识图谱由于其多样性和不均衡性而面临挑战，这意味着模态信息可以涵盖多种类型（如图像、文本、数值、音频、视频），但其分布在实体间是不均匀的，导致某些实体缺失模态信息。现有的工作通常关注图像和文本等常见模态，而忽视了模态信息的不均衡分布现象。为了解决这些问题，我们提出了一个全面的框架NativE，以实现真实环境中的多模态知识图谱补全。NativE提出了一种关系引导的双重自适应融合模块，该模块能够对任何模态进行自适应融合，并采用协同模态对抗训练框架来增强不均衡的模态信息。我们构建了一个名为WildKGC的新基准，包含五个数据集，用于评估我们的方法。与21个最近的基线方法进行比较的实证结果证实了我们的方法的优越性，在不同数据集和各种场景下始终保持高效和可推广性，并取得了最先进的性能。我们的代码和数据已在https://github.com/zjukg/NATIVE发布。	code	0
MetaHKG: Meta Hyperbolic Learning for Few-shot Temporal Reasoning	Ruijie Wang, Yutong Zhang, Jinyang Li, Shengzhong Liu, Dachun Sun, Tianchen Wang, Tianshi Wang, Yizhuo Chen, Denizhan Kara, Tarek F. Abdelzaher				code	0
YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy	Fabian M. Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, PierreHenri Paris, Jules Soria				code	0
Uncontextualized significance considered dangerous	Nicola Ferro, Mark Sanderson				code	0
CIRAL: A Test Collection for CLIR Evaluations in African Languages	Mofetoluwa Adeyemi, Akintunde Oladipo, Xinyu Zhang, David AlfonsoHermelo, Mehdi Rezagholizadeh, Boxing Chen, AbdulHakeem Omotayo, Idris Abdulmumin, Naome A. Etori, Toyib Babatunde Musa, Samuel Fanijo, Oluwabusayo Olufunke Awoyomi, Saheed Abdullahi Salahudeen, Labaran Adamu Mohammed, Daud Olamide Abolade, Falalu Ibrahim Lawan, Maryam Sabo Abubakar, Ruqayya Nasir Iro, Amina Abubakar Imam, Shafie Abdi Mohamed, Hanad Mohamud Mohamed, Tunde Oluwaseyi Ajayi, Jimmy Lin				code	0
IDGenRec: LLM-RecSys Alignment with Textual ID Learning	Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang				code	0
Enhanced Packed Marker with Entity Information for Aspect Sentiment Triplet Extraction	You Li, Xupeng Zeng, Yixiao Zeng, Yuming Lin				code	0
Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition	Xinghua Zhang, Gaode Chen, Shiyao Cui, Jiawei Sheng, Tingwen Liu, Hongbo Xu				code	0
ACE-2005-PT: Corpus for Event Extraction in Portuguese	Luís Filipe Cunha, Purificação Silvano, Ricardo Campos, Alípio Jorge	FLUP-University of Porto; University of Beira Interior; FCUP-University of Porto	Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55% and 87.55% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.	事件提取是自然语言处理（NLP）中的一项任务，通常涉及识别文本中事件的核心词（触发词）及其相关参数。ACE-2005 在这一领域被广泛认可为标准语料库。尽管其他语料库，如 PropBank，主要关注谓词-参数结构的标注，但 ACE-2005 提供了关于事件整体结构和语义的全面信息。然而，其有限的语言覆盖范围限制了其可用性。本文介绍了 ACE-2005-PT，这是一个通过将 ACE-2005 翻译成葡萄牙语（包括欧洲和巴西变体）而创建的语料库。为了加快获取 ACE-2005-PT 的过程，我们依赖于自动翻译工具。然而，这带来了一些挑战，即在原文和相应的翻译句子中自动识别多词标注之间的正确对齐。为此，我们开发了一个对齐流程，该流程结合了多种对齐技术：词形还原、模糊匹配、同义词匹配、多重翻译以及基于 BERT 的词对齐工具。为了衡量对齐效果，我们请语言学专家手动对齐了 ACE-2005-PT 语料库的一个子集。然后将该子集与我们的流程结果进行比较，分别获得了 70.55% 的精确匹配分数和 87.55% 的宽松匹配分数。因此，我们成功生成了葡萄牙语版本的 ACE-2005 语料库，该语料库已被 LDC 接受出版。	code	0
Universal Adversarial Perturbations for Vision-Language Pre-trained Models	PengFei Zhang, Zi Huang, Guangdong Bai		Vision-Language Pre-training (VLP) models have exhibited unprecedented capability in many applications by taking full advantage of the multimodal alignment. However, previous studies have shown they are vulnerable to maliciously crafted adversarial samples. Despite recent success, these methods are generally instance-specific and require generating perturbations for each input sample. In this paper, we reveal that VLP models are also vulnerable to the instance-agnostic universal adversarial perturbation (UAP). Specifically, we design a novel Contrastive-training Perturbation Generator with Cross-modal conditions (C-PGC) to achieve the attack. In light that the pivotal multimodal alignment is achieved through the advanced contrastive learning technique, we devise to turn this powerful weapon against themselves, i.e., employ a malicious version of contrastive learning to train the C-PGC based on our carefully crafted positive and negative image-text pairs for essentially destroying the alignment relationship learned by VLP models. Besides, C-PGC fully utilizes the characteristics of Vision-and-Language (V+L) scenarios by incorporating both unimodal and cross-modal information as effective guidance. Extensive experiments show that C-PGC successfully forces adversarial samples to move away from their original area in the VLP model's feature space, thus essentially enhancing attacks across various victim models and V+L tasks. The GitHub repository is available at https://github.com/ffhibnese/CPGC_VLP_Universal_Attacks.	视觉-语言预训练（VLP）模型通过充分利用多模态对齐，在许多应用中展现了前所未有的能力。然而，先前的研究表明，这些模型对恶意设计的对抗样本表现出脆弱性。尽管近期在这方面取得了成功，但这些方法通常是针对特定实例的，需要为每个输入样本生成扰动。本文揭示了VLP模型同样容易受到实例无关的通用对抗扰动（UAP）的影响。为此，我们设计了一种新颖的对比训练扰动生成器，名为跨模态条件对比训练扰动生成器（C-PGC），以实现攻击。鉴于关键的多模态对齐是通过先进的对比学习技术实现的，我们设计了一种恶意版本的对比学习，利用精心构建的正负图像-文本对来训练C-PGC，从而从根本上破坏VLP模型所学习到的对齐关系。此外，C-PGC充分利用了视觉与语言（V+L）场景的特性，将单模态和跨模态信息作为有效指导。大量实验表明，C-PGC成功地迫使对抗样本在VLP模型的特征空间中远离其原始区域，从而在各种受害模型和V+L任务中实质上增强了攻击效果。相关代码已发布在GitHub上，地址为https://github.com/ffhibnese/CPGC_VLP_Universal_Attacks。	code	0
Adaptive In-Context Learning with Large Language Models for Bundle Generation	Zhu Sun, Kaidong Feng, Jie Yang, Xinghua Qu, Hui Fang, YewSoon Ong, Wenyuan Liu				code	0
Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions	Soumyadeep Roy, Aparup Khatua, Fatemeh Ghoochani, Uwe Hadler, Wolfgang Nejdl, Niloy Ganguly	School of Information, University of Michigan; L3S Research Center; Indian Institute of Technology Kharagpur	GPT-4 demonstrates high accuracy in medical QA tasks, leading with an accuracy of 86.70%, followed by Med-PaLM 2 at 86.50%. However, around 14% of errors remain. Additionally, current works use GPT-4 to only predict the correct option without providing any explanation and thus do not provide any insight into the thinking process and reasoning used by GPT-4 or other LLMs. Therefore, we introduce a new domain-specific error taxonomy derived from collaboration with medical students. Our GPT-4 USMLE Error (G4UE) dataset comprises 4153 GPT-4 correct responses and 919 incorrect responses to the United States Medical Licensing Examination (USMLE) respectively. These responses are quite long (258 words on average), containing detailed explanations from GPT-4 justifying the selected option. We then launch a large-scale annotation study using the Potato annotation platform and recruit 44 medical experts through Prolific, a well-known crowdsourcing platform. We annotated 300 out of these 919 incorrect data points at a granular level for different classes and created a multi-label span to identify the reasons behind the error. In our annotated dataset, a substantial portion of GPT-4's incorrect responses is categorized as a "Reasonable response by GPT-4," by annotators. This sheds light on the challenge of discerning explanations that may lead to incorrect options, even among trained medical professionals. We also provide medical concepts and medical semantic predications extracted using the SemRep tool for every data point. We believe that it will aid in evaluating the ability of LLMs to answer complex medical questions. We make the resources available at https://github.com/roysoumya/usmle-gpt4-error-taxonomy .	GPT-4在医学问答任务中表现出色，准确率高达86.70%，领先于Med-PaLM 2的86.50%。然而，仍有约14%的错误存在。此外，当前的研究仅利用GPT-4预测正确选项，并未提供任何解释，因此无法洞察GPT-4或其他大型语言模型（LLMs）的思维过程和推理机制。为此，我们与医学生合作，引入了一种新的领域特定错误分类法。我们的GPT-4 USMLE错误（G4UE）数据集包括4153条GPT-4正确回答和919条错误回答，这些回答均来自美国医学执照考试（USMLE），且每条回答平均长度为258词，包含了GPT-4对所选选项的详细解释。随后，我们通过Potato标注平台启动了一项大规模的标注研究，并从知名众包平台Prolific招募了44位医学专家。我们对这919条错误数据中的300条进行了细粒度的标注，创建了多标签跨度以识别错误原因。在我们的标注数据集中，相当一部分GPT-4的错误回答被标注者归类为“GPT-4的合理响应”。这揭示了即使对于训练有素的医学专业人员，区分可能导致错误选项的解释也具有挑战性。我们还为每个数据点提供了使用SemRep工具提取的医学概念和医学语义预测。我们相信，这将有助于评估LLMs回答复杂医学问题的能力。相关资源已发布在https://github.com/roysoumya/usmle-gpt4-error-taxonomy。	code	0
SuicidEmoji: Derived Emoji Dataset and Tasks for Suicide-Related Social Content	Tianlin Zhang, Kailai Yang, Shaoxiong Ji, Boyang Liu, Qianqian Xie, Sophia Ananiadou				code	0
LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation	Farinam Hemmatizadeh, Christine Wong, Alice Yu, Hossein Fani				code	0
A Reproducibility Study of PLAID	Sean MacAvaney, Nicola Tonellotto	University of Pisa; University of Glasgow	The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring. In this paper, we reproduce and fill in missing gaps from the original work. By studying the parameters PLAID introduces, we find that its Pareto frontier is formed of a careful balance among its three parameters; deviations beyond the suggested settings can substantially increase latency without necessarily improving its effectiveness. We then compare PLAID with an important baseline missing from the paper: re-ranking a lexical system. We find that applying ColBERTv2 as a re-ranker atop an initial pool of BM25 results provides better efficiency-effectiveness trade-offs in low-latency settings. However, re-ranking cannot reach peak effectiveness at higher latency settings due to limitations in recall of lexical matching and provides a poor approximation of an exhaustive ColBERTv2 search. We find that recently proposed modifications to re-ranking that pull in the neighbors of top-scoring documents overcome this limitation, providing a Pareto frontier across all operational points for ColBERTv2 when evaluated using a well-annotated dataset. Curious about why re-ranking methods are highly competitive with PLAID, we analyze the token representation clusters PLAID uses for retrieval and find that most clusters are predominantly aligned with a single token and vice versa. Given the competitive trade-offs that re-ranking baselines exhibit, this work highlights the importance of carefully selecting pertinent baselines when evaluating the efficiency of retrieval engines.	PLAID（Performance-optimized Late Interaction Driver）算法用于ColBERTv2，利用聚类的术语表示来检索并逐步剪枝文档，以进行最终（精确）的文档评分。本文中，我们重现并填补了原始工作中的缺失部分。通过研究PLAID引入的参数，我们发现其帕累托前沿是由其三个参数之间的精心平衡形成的；超出建议设置的偏差可能会显著增加延迟，而不一定提高其有效性。随后，我们将PLAID与论文中缺失的一个重要基线进行比较：重排序一个词汇系统。我们发现，在低延迟设置下，将ColBERTv2作为初始BM25结果池之上的重排序器，提供了更好的效率-有效性权衡。然而，由于词汇匹配的召回限制，重排序在高延迟设置下无法达到峰值有效性，并且无法很好地近似于全面的ColBERTv2搜索。我们发现，最近提出的修改重排序方法，即引入高分文档的邻居，克服了这一限制，在使用良好注释的数据集评估时，为ColBERTv2在所有操作点上提供了帕累托前沿。由于对重排序方法为何与PLAID高度竞争感到好奇，我们分析了PLAID用于检索的标记表示簇，发现大多数簇主要与单个标记对齐，反之亦然。鉴于重排序基线展现出的竞争性权衡，这项工作强调了在评估检索引擎效率时，精心选择相关基线的重要性。	code	0
Bootstrap Deep Metric for Seed Expansion in Attributed Networks	Chunquan Liang, Yifan Wang, Qiankun Chen, Xinyuan Feng, Luyue Wang, Mei Li, Hongming Zhang	Northwest A&F University College of Information Engineering	Seed expansion tasks play an important role in various network applications such as recommendation systems, social network analysis, and bioinformatics. Given a network and a small group of examples as seeds, these tasks involve identifying additional members of interest from the same community. While most existing expansion methods focus on defining a fixed metric function based on the network structure alone, they often overlook the rich content associated with nodes in attributed networks. In this paper, we bridge the gap by learning a deep metric that takes into account both the network structure and node attributes, and by utilizing the recent advanced graph neural networks as encoding functions. The key challenge lies in the extreme scarcity of given positive examples (i.e., the seed nodes) in real-world applications and the absence of negatives (i.e., non-members of the target community). We introduce Bootstrap Deep Metric (BDM), a graph deep metric learning framework for seed expansion problems. BDM utilizes previous versions of representations to generate anchors for positive and unlabeled nodes, and learns enhanced node representations by minimizing the metric losses on both positive and unlabeled nodes. It eliminates the need for negative nodes, while producing closely aligned representations for members of target community and uniformly distributed representations for non-members, which effectively aid in selecting expansion nodes. Experimental results on real-life datasets show that our BDM not only substantially outperforms state-of-the-art approaches but also remarkably surpasses fully labeled classification models in most cases. Codes are available at https://github.com/wangyfnwsuaf/bdm.	种子扩展任务在推荐系统、社交网络分析和生物信息学等各种网络应用中扮演着重要角色。给定一个网络和一小部分示例作为种子，这些任务涉及从同一社区中识别出其他感兴趣的成员。尽管大多数现有的扩展方法侧重于基于网络结构定义一个固定的度量函数，但它们往往忽略了属性网络中与节点相关联的丰富内容。在本文中，我们通过学习一个深度度量函数来弥合这一差距，该函数同时考虑了网络结构和节点属性，并利用最新的图神经网络作为编码函数。关键挑战在于现实应用中给定的正例（即种子节点）极度稀缺，且不存在负例（即目标社区的非成员）。我们提出了Bootstrap深度度量（BDM），这是一个用于种子扩展问题的图深度度量学习框架。BDM利用先前版本的表示来为正例和未标记节点生成锚点，并通过最小化正例和未标记节点上的度量损失来学习增强的节点表示。它消除了对负节点的需求，同时为目标社区成员生成紧密对齐的表示，并为非成员生成均匀分布的表示，从而有效地辅助扩展节点的选择。在真实数据集上的实验结果表明，我们的BDM不仅显著优于最先进的方法，而且在大多数情况下还显著超越了完全标记的分类模型。代码可在https://github.com/wangyfnwsuaf/bdm获取。	code	0
Improving the Accuracy of Locally Differentially Private Community Detection by Order-consistent Data Perturbation	Taolin Guo, Shunshun Peng, Zhejian Zhang, Mengmeng Yang, KwokYan Lam				code	0
CIQA: A Coding Inspired Question Answering Model	Mousa Arraf, Kira Radinsky				code	0
Let Me Show You Step by Step: An Interpretable Graph Routing Network for Knowledge-based Visual Question Answering	Duokang Wang, Linmei Hu, Rui Hao, Yingxia Shao, Xin Lv, Liqiang Nie, Juanzi Li				code	0
Flexible and Adaptable Summarization via Expertise Separation	Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang				code	0
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Abdelrahman Abdallah, Mahmoud SalahEldin Kasem, Mahmoud Abdalla, Mohamed Mahmoud, Mohamed Elkasaby, Yasser Elbendary, Adam Jatowt				code	0
TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions	Jamshid Mozafari, Anubhav Jangra, Adam Jatowt				code	0
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation	Jingtao Zhan, Qingyao Ai, Yiqun Liu, Jia Chen, Shaoping Ma				code	0
Short Video Ordering via Position Decoding and Successor Prediction	Shiping Ge, Qiang Chen, Zhiwei Jiang, Yafeng Yin, Ziyao Chen, Qing Gu				code	0
Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models	Linan Yue, Qi Liu, Lili Zhao, Li Wang, Weibo Gao, Yanqing An				code	0
Legal Statute Identification: A Case Study using State-of-the-Art Datasets and Methods	Shounak Paul, Rajas Bhatt, Pawan Goyal, Saptarshi Ghosh				code	0
CivilSum: A Dataset for Abstractive Summarization of Indian Court Decisions	Manuj Malik, Zheng Zhao, Marcio Fonseca, Shrisha Rao, Shay B. Cohen				code	0
Analyzing Fusion Methods Using the Condorcet Rule	Liron Tyomkin, Oren Kurland				code	0
Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange	Ankit Satpute, Noah Gießing, André GreinerPetter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp				code	0
Combining Large Language Models and Crowdsourcing for Hybrid Human-AI Misinformation Detection	Xia Zeng, David La Barbera, Kevin Roitero, Arkaitz Zubiaga, Stefano Mizzaro				code	0
Counterfactual Augmentation for Robust Authorship Representation Learning	Hieu Man, Thien Huu Nguyen				code	0
Enhancing Task Performance in Continual Instruction Fine-tuning Through Format Uniformity	Xiaoyu Tan, Leijun Cheng, Xihe Qiu, Shaojie Shi, Yuan Cheng, Wei Chu, Yinghui Xu, Yuan Qi				code	0
From Text to Context: An Entailment Approach for News Stakeholder Classification	Alapan Kuila, Sudeshna Sarkar				code	0
Graph Reasoning Enhanced Language Models for Text-to-SQL	Zheng Gong, Ying Sun				code	0
IdmGAE: Importance-Inspired Dynamic Masking for Graph Autoencoders	Ge Chen, Yulan Hu, Sheng Ouyang, Zhirui Yang, Yong Liu, Cuicui Luo				code	0
Inferring Climate Change Stances from Multimodal Tweets	Nan Bai, Ricardo da Silva Torres, Anna Fensel, Tamara Metze, Art Dewulf				code	0
Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts	Subhendu Khatuya, Koushiki Sinha, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal				code	0
Modeling Scholarly Collaboration and Temporal Dynamics in Citation Networks for Impact Prediction	Pengwei Yan, Yangyang Kang, Zhuoren Jiang, Kaisong Song, Tianqianjin Lin, Changlong Sun, Xiaozhong Liu				code	0
Multi-view Mixed Attention for Contrastive Learning on Hypergraphs	Jongsoo Lee, DongKyu Chae				code	0
Old IR Methods Meet RAG	Oz Huly, Idan Pogrebinsky, David Carmel, Oren Kurland, Yoelle Maarek				code	0
Prediction of the Realisation of an Information Need: An EEG Study	Niall McGuire, Yashar Moshfeghi				code	0
R-ODE: Ricci Curvature Tells When You Will be Informed	Li Sun, Jingbin Hu, Mengjie Li, Hao Peng				code	0
PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept Linking	Yuzhang Xie, Jiaying Lu, Joyce Ho, Fadi B. Nahab, Xiao Hu, Carl Yang				code	0
ReCODE: Modeling Repeat Consumption with Neural ODE	Sunhao Dai, Changle Qu, Sirui Chen, Xiao Zhang, Jun Xu				code	0
RLStop: A Reinforcement Learning Stopping Method for TAR	Reem Bin Hezam, Mark Stevenson				code	0
Timeline Summarization in the Era of LLMs	Daivik Sojitra, Raghav Jain, Sriparna Saha, Adam Jatowt, Manish Gupta				code	0
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning	Jing Zhu, Xiang Song, Vassilis N. Ioannidis, Danai Koutra, Christos Faloutsos				code	0
An Integrated Data Processing Framework for Pretraining Foundation Models	Yiding Sun, Feng Wang, Yutao Zhu, Wayne Xin Zhao, Jiaxin Mao				code	0
Detecting and Explaining Emotions in Video Advertisements	Joachim Vanneste, Manisha Verma, Debasis Ganguly				code	0
FactCheck Editor: Multilingual Text Editor with End-to-End fact-checking	Vinay Setty				code	0
Shadowfax: Harnessing Textual Knowledge Base Population	Maxime Prieur, Cédric du Mouza, Guillaume Gadek, Bruno Grilhères				code	0
SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks	Michael Shliselberg, Ashkan Kazemi, Scott A. Hale, Shiri DoriHacohen				code	0
TextData: Save What You Know and Find What You Don't	Kevin Ros, Kedar Takwane, Ashwin Patil, Rakshana Jayaprakash, ChengXiang Zhai				code	0
Towards Robust QA Evaluation via Open LLMs	Ehsan Kamalloo, Shivani Upadhyay, Jimmy Lin				code	0
Truth-O-Meter: Handling Multiple Inconsistent Sources Repairing LLM Hallucinations	Boris Galitsky, Anton Chernyavskiy, Dmitry I. Ilvovsky				code	0
"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time	Scott Rome, Tianwen Chen, Raphael Tang, Luwei Zhou, Ferhan Ture				code	0
unKR: A Python Library for Uncertain Knowledge Graph Reasoning by Representation Learning	Jingting Wang, Tianxing Wu, Shilin Chen, Yunchang Liu, Shutong Zhu, Wei Li, Jingyi Xu, Guilin Qi				code	0
A Field Guide to Automatic Evaluation of LLM-Generated Summaries	Tempest A. van Schaik, Brittany Pugh				code	0
Surprising Efficacy of Fine-Tuned Transformers for Fact-Checking over Larger Language Models	Vinay Setty				code	0
Enhancing Baidu Multimodal Advertisement with Chinese Text-to-Image Generation via Bilingual Alignment and Caption Synthesis	Kang Zhao, Xinyu Zhao, Zhipeng Jin, Yi Yang, Wen Tao, Cong Han, Shuanglong Li, Lin Liu				code	0
Misinformation Mitigation Praxis: Lessons Learned and Future Directions from Co·Insights	Scott A. Hale, Kiran Garimella, Shiri DoriHacohen				code	0
Graph-Based Audience Expansion Model for Marketing Campaigns	Md. Mostafizur Rahman, Daisuke Kikuta, Yu Hirate, Toyotaro Suzumura				code	0
Empowering Large Language Models: Tool Learning for Real-World Interaction	Hongru Wang, Yujia Qin, Yankai Lin, Jeff Z. Pan, KamFai Wong				code	0
Large Language Models for Tabular Data: Progresses and Future Directions	Haoyu Dong, Zhiruo Wang				code	0
Preventing and Detecting Misinformation Generated by Large Language Models	Aiwei Liu, Qiang Sheng, Xuming Hu				code	0
LLM4Eval: Large Language Model for Evaluation in IR	Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz				code	0
Machine Generated Explanations and Their Evaluation	Edward Richards				code	0
Leveraging LLMs for Detecting and Modeling the Propagation of Misinformation in Social Networks	Payel Santra				code	0
Mosaicing Prevention in Declassification	Nathaniel Rollings				code	0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sigir2024.md

sigir2024.md

SIGIR2024 Paper List

Files

sigir2024.md

Latest commit

History

sigir2024.md

File metadata and controls

SIGIR2024 Paper List