【原】LLM之RAG：《Retrieval-Augmented Generation for Large Language Models: A Survey大型語(yǔ)言模型的檢索增強(qiáng)生成研究綜述》翻譯與解讀

處女座的程序猿 2024-01-15 發(fā)布于上海

展開(kāi)全文

LLM之RAG：《Retrieval-Augmented Generation for Large Language Models: A Survey大型語(yǔ)言模型的檢索增強(qiáng)生成研究綜述》翻譯與解讀

導(dǎo)讀：這篇論文主要圍繞信息檢索增強(qiáng)生成(Retrieval Augmented Generation，簡(jiǎn)稱(chēng)RAG)技術(shù)進(jìn)行概述和分析。

背景痛點(diǎn)：

>> 大語(yǔ)言模型(LLM)在處理知識(shí)密集型任務(wù)和回答離線知識(shí)更豐富的問(wèn)題時(shí)面臨難題，例如產(chǎn)生錯(cuò)誤信息或過(guò)時(shí)信息等問(wèn)題。

>> 往往需要對(duì)LLM進(jìn)行定制化訓(xùn)練，才能適應(yīng)不同場(chǎng)景下的應(yīng)用，這對(duì)開(kāi)發(fā)人員和研究人員來(lái)說(shuō)難度很大。

RAG技術(shù)的核心思想和解決方案：RAG通過(guò)將外部知識(shí)庫(kù)中的信息檢索成果整合到LLM的輸入context中，從而增強(qiáng)LLM處理知識(shí)型任務(wù)和產(chǎn)生更準(zhǔn)確答案的能力。

RAG技術(shù)發(fā)展趨勢(shì)：

>> 從初級(jí)RAG到高級(jí)RAG，再到模塊化RAG，不斷優(yōu)化框架結(jié)構(gòu)。

>> 結(jié)合信息檢索、生成和增強(qiáng)不同技術(shù)模塊，形成完整流程。

>> 利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)、LLM產(chǎn)生的內(nèi)容等不同來(lái)源進(jìn)行信息增強(qiáng)。

>> 探索迭代檢索、遞歸檢索、自適應(yīng)檢索等方法來(lái)優(yōu)化檢索過(guò)程。

>> 將RAG技術(shù)應(yīng)用和整合到定制訓(xùn)練中，實(shí)現(xiàn)LLM優(yōu)化的多種方式結(jié)合。

RAG技術(shù)的優(yōu)勢(shì)：

>> 無(wú)需重新訓(xùn)練LLM即可將外部新知識(shí)整合到模型中，更輕松地應(yīng)對(duì)需求變化。

>> 借助外部知識(shí)庫(kù)，LLM產(chǎn)出的答案更加準(zhǔn)確、相關(guān)，能更好解決知識(shí)型問(wèn)題。

>> RAG框架性能不斷提高，且可擴(kuò)展到圖像、語(yǔ)音等多模態(tài)信息處理。

綜上，RAG技術(shù)通過(guò)有效結(jié)合LLM與外部知識(shí)，在保留LLM優(yōu)點(diǎn)的同時(shí)彌補(bǔ)其知識(shí)不足的缺陷，為L(zhǎng)LM應(yīng)用于生產(chǎn)環(huán)境提供一條良好的路徑。

《Retrieval-Augmented Generation for Large Language Models: A Survey大型語(yǔ)言模型的檢索增強(qiáng)生成研究綜述》翻譯與解讀

地址

論文地址：https:///abs/2312.10997

時(shí)間

2024年1月5日

作者

Yunfan Gao 1, Yun Xiong 2, Xinyu Gao 2, Kangxiang Jia 2, Jinliu Pan 2, Yuxi Bi 3, Yi

Dai1, Jiawei Sun1, Qianyu Guo4, Meng Wang 3 and Haofen Wang 1,3 ?

同濟(jì)大學(xué)，復(fù)旦大學(xué)

Abstract

Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the models, particu-larly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs’ intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval , the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces the metrics and benchmarks for assessing RAG models, along with the most up-to-date evaluation framework. In conclusion, the paper delineates prospective avenues for research, including the identification of challenges, the expansion of multi-modalities, and the progression of the RAG infrastructure and its ecosystem. 1.

大型語(yǔ)言模型(llm)展示了重要的功能，但面臨著諸如幻覺(jué)、過(guò)時(shí)的知識(shí)和不透明、不可追溯的推理過(guò)程等挑戰(zhàn)。檢索-增強(qiáng)生成(Retrieval-Augmented Generation, RAG)通過(guò)整合來(lái)自外部數(shù)據(jù)庫(kù)的知識(shí)而成為一種很有前途的解決方案。這增強(qiáng)了模型的準(zhǔn)確性和可信度，特別是對(duì)于知識(shí)密集型任務(wù)，并允許持續(xù)的知識(shí)更新和特定領(lǐng)域信息的集成。RAG將llm的內(nèi)在知識(shí)與外部數(shù)據(jù)庫(kù)的龐大動(dòng)態(tài)存儲(chǔ)庫(kù)協(xié)同合并。這篇全面的綜述論文提供了對(duì)RAG范式進(jìn)展的詳細(xì)檢查，包括幼稚RAG、高級(jí)RAG和模塊化RAG。詳細(xì)分析了RAG框架的三方面基礎(chǔ)，包括檢索技術(shù)、生成技術(shù)和增強(qiáng)技術(shù)。本文強(qiáng)調(diào)了這些關(guān)鍵組件中嵌入的最先進(jìn)的技術(shù)，提供了對(duì)RAG系統(tǒng)進(jìn)步的深刻理解。此外，本文還介紹了用于評(píng)估RAG模型的度量和基準(zhǔn)，以及最新的評(píng)估框架。最后，本文描述了未來(lái)的研究途徑，包括識(shí)別挑戰(zhàn)、擴(kuò)展多模態(tài)以及RAG基礎(chǔ)設(shè)施及其生態(tài)系統(tǒng)的進(jìn)展。

1 Introduction

?Large language models (LLMs) such as the GPT se-ries [Brown et al., 2020, OpenAI, 2023] and the LLama se-ries [Touvron et al., 2023], along with other models like Gemini [Google, 2023], have achieved remarkable suc-cess in natural language processing, demonstrating supe-rior performance on various benchmarks including Super-GLUE [Wang et al., 2019], MMLU [Hendrycks et al., 2020], and BIG-bench [Srivastava et al., 2022]. Despite these advancements, LLMs exhibit notable limitations, par-ticularly in handling domain-specific or highly special-ized queries [Kandpal et al., 2023]. A common issue is the generation of incorrect information, or ”hallucina-tions” [Zhang et al., 2023b], especially when queries extend beyond the model’s training data or necessitate up-to-date in-formation. These shortcomings underscore the impractical-ity of deploying LLMs as black-box solutions in real-world production environments without additional safeguards. One promising approach to mitigate these limitations is Retrieval-Augmented Generation (RAG), which integrates external data retrieval into the generative process, thereby enhancing the model’s ability to provide accurate and relevant responses.	大型語(yǔ)言模型(llm)，如GPT系列[Brown等人，2020,OpenAI, 2023]和LLama系列[Touvron等人，2023]，以及Gemini [Google, 2023]等其他模型，在自然語(yǔ)言處理方面取得了顯著的成功，在各種基準(zhǔn)測(cè)試中表現(xiàn)出卓越的性能，包括Super-GLUE [Wang等人，2019]，MMLU [Hendrycks等人，2020]和BIG-bench [Srivastava等人，2022]。盡管取得了這些進(jìn)步，llm仍然表現(xiàn)出明顯的局限性，特別是在處理特定領(lǐng)域或高度專(zhuān)業(yè)化的查詢(xún)方面[Kandpal等人，2023]。一個(gè)常見(jiàn)的問(wèn)題是產(chǎn)生不正確的信息，或“幻覺(jué)”[Zhang等人，2023b]，特別是當(dāng)查詢(xún)超出模型的訓(xùn)練數(shù)據(jù)或需要最新信息時(shí)。這些缺點(diǎn)強(qiáng)調(diào)了在沒(méi)有額外保障的情況下將llm作為黑盒解決方案部署到實(shí)際生產(chǎn)環(huán)境中的不可行性。緩解這些限制的一種有希望的方法是檢索增強(qiáng)生成(retrieve - augmented Generation, RAG)，它將外部數(shù)據(jù)檢索集成到生成過(guò)程中，從而增強(qiáng)模型提供準(zhǔn)確和相關(guān)響應(yīng)的能力。
RAG, introduced by Lewis et al. [Lewis et al., 2020] in mid-2020, stands as a paradigm within the realm of LLMs, enhancing generative tasks. Specifically, RAG involves an initial retrieval step where the LLMs query an external data source to obtain relevant information before proceeding to an-swer questions or generate text. This process not only informs the subsequent generation phase but also ensures that the re-sponses are grounded in retrieved evidence, thereby signif-icantly enhancing the accuracy and relevance of the output. The dynamic retrieval of information from knowledge bases during the inference phase allows RAG to address issues such as the generation of factually incorrect content, commonly referred to as “hallucinations.” The integration of RAG into LLMs has seen rapid adoption and has become a pivotal tech-nology in refining the capabilities of chatbots and rendering LLMs more viable for practical applications.	由Lewis et al. [Lewis et al.， 2020]在2020年年中引入的RAG是法學(xué)碩士領(lǐng)域的一個(gè)范例，增強(qiáng)了生成任務(wù)。具體來(lái)說(shuō)，RAG涉及一個(gè)初始檢索步驟，llm在此步驟中查詢(xún)外部數(shù)據(jù)源以獲取相關(guān)信息，然后再繼續(xù)回答問(wèn)題或生成文本。這個(gè)過(guò)程不僅通知了后續(xù)的生成階段，而且還確保了響應(yīng)是基于檢索到的證據(jù)，從而顯著提高了輸出的準(zhǔn)確性和相關(guān)性。在推理階段從知識(shí)庫(kù)中動(dòng)態(tài)檢索信息使RAG能夠解決諸如生成事實(shí)不正確的內(nèi)容(通常稱(chēng)為“幻覺(jué)”)之類(lèi)的問(wèn)題。將RAG集成到法學(xué)碩士中已經(jīng)得到了迅速的采用，并已成為改進(jìn)聊天機(jī)器人功能和使法學(xué)碩士在實(shí)際應(yīng)用中更可行的關(guān)鍵技術(shù)。
The evolutionary trajectory of RAG unfolds across four distinctive phases, as illustrated in Figure 1. In its in-ception in 2017, aligned with the emergence of the Trans-former architecture, the primary thrust was on assimilating additional knowledge through Pre-Training Models (PTM) to augment language models. This epoch witnessed RAG’s foundational efforts predominantly directed at optimizing pre-training methodologies.	RAG的進(jìn)化軌跡在四個(gè)不同的階段展開(kāi)，如圖1所示。在2017年的初始階段，與Trans-former架構(gòu)的出現(xiàn)相一致，其主要目的是通過(guò)預(yù)訓(xùn)練模型(PTM)吸收額外的知識(shí)，以增強(qiáng)語(yǔ)言模型。這個(gè)時(shí)代見(jiàn)證了RAG的基礎(chǔ)努力，主要是針對(duì)優(yōu)化預(yù)訓(xùn)練方法。
Following this initial phase, a period of relative dormancy ensued before the advent of chatGPT, during which there was minimal advancement in related research for RAG. The sub-sequent arrival of chatGPT marked a pivotal moment in the?trajectory, propelling LLMs into the forefront. The com-munity’s focal point shifted towards harnessing the capabil-ities of LLMs to attain heightened controllability and ad-dress evolving requirements. Consequently, the lion’s share of RAG endeavors concentrated on inference, with a minor-ity dedicated to fine-tuning processes. As LLM capabili-ties continued to advance, especially with the introduction of GPT-4, the landscape of RAG technology underwent a sig-nificant transformation. The emphasis evolved into a hybrid approach, combining the strengths of RAG and fine-tuning, alongside a dedicated minority continuing the focus on opti-mizing pre-training methodologies.	在這個(gè)初始階段之后，在chatGPT出現(xiàn)之前，有一段相對(duì)的休眠期，在此期間，對(duì)RAG的相關(guān)研究進(jìn)展甚微。隨后，chatGPT的到來(lái)標(biāo)志著這一發(fā)展軌跡的關(guān)鍵時(shí)刻，將法學(xué)碩士推向了前沿。社區(qū)的焦點(diǎn)轉(zhuǎn)向利用法學(xué)碩士的能力，以獲得更高的可控性，并解決不斷變化的需求。因此，RAG的大部分努力都集中在推理上，只有一小部分致力于微調(diào)過(guò)程。隨著法學(xué)碩士技術(shù)的不斷發(fā)展，尤其是GPT-4的引入，RAG技術(shù)的前景發(fā)生了重大變化。重點(diǎn)發(fā)展成為一種混合方法，結(jié)合RAG和微調(diào)的優(yōu)勢(shì)，以及專(zhuān)門(mén)的少數(shù)人繼續(xù)專(zhuān)注于優(yōu)化預(yù)訓(xùn)練方法。
Despite the rapid growth of RAG research, there has been a lack of systematic consolidation and abstraction in the field, which poses challenges in understanding the comprehensive landscape of RAG advancements. This survey aims to out-line the entire RAG process and encompass the current and future directions of RAG research, by providing a thorough examination of retrieval augmentation in LLMs.	盡管RAG研究發(fā)展迅速，但該領(lǐng)域缺乏系統(tǒng)的整合和抽象，這對(duì)理解RAG進(jìn)展的全面前景提出了挑戰(zhàn)。本調(diào)查旨在概述整個(gè)RAG過(guò)程，并通過(guò)提供對(duì)法學(xué)碩士檢索增強(qiáng)的徹底檢查，涵蓋RAG研究的當(dāng)前和未來(lái)方向。
Therefore, this paper aims to comprehensively summarize and organize the technical principles, developmental history, content, and, in particular, the relevant methods and applica-tions after the emergence of LLMs, as well as the evaluation methods and application scenarios of RAG. It seeks to provide a comprehensive overview and analysis of existing RAG technologies and offer conclusions and prospects for future development methods. This survey intends to furnish readers and practitioners with a thorough and systematic comprehen-sion of large models and RAG, elucidate the progression and key technologies of retrieval augmentation, clarify the merits and limitations of various technologies along with their suit-able contexts, and forecast potential future developments.	因此，本文旨在對(duì)RAG的技術(shù)原理、發(fā)展歷史、內(nèi)容，特別是法學(xué)碩士出現(xiàn)后的相關(guān)方法和應(yīng)用，以及RAG的評(píng)價(jià)方法和應(yīng)用場(chǎng)景進(jìn)行全面的總結(jié)和整理。它試圖對(duì)現(xiàn)有的RAG技術(shù)進(jìn)行全面的概述和分析，并對(duì)未來(lái)的發(fā)展方法提出結(jié)論和展望。本調(diào)查旨在使讀者和從業(yè)者對(duì)大型模型和檢索增強(qiáng)有一個(gè)全面和系統(tǒng)的了解，闡明檢索增強(qiáng)的進(jìn)展和關(guān)鍵技術(shù)，闡明各種技術(shù)的優(yōu)點(diǎn)和局限性以及它們的適用背景，并預(yù)測(cè)潛在的未來(lái)發(fā)展。
Our contributions are as follows: >>We present a thorough and systematic review of the state-of-the-art RAG, delineating its evolution through paradigms including naive RAG, advanced RAG, and modular RAG. This review contextualizes the broader scope of RAG research within the landscape of LLMs. >>We identify and discuss the central technologies integral to the RAG process, specifically focusing on the aspects of “Retrieval”, “Generator” and “Augmentation”, and delve into their synergies, elucidating how these com-ponents intricately collaborate to form a cohesive and effective RAG framework. >>We construct a thorough evaluation framework for RAG, outlining the evaluation objectives and metrics. Our comparative analysis clarifies the strengths and weak-nesses of RAG compared to fine-tuning from various perspectives. Additionally, we anticipate future direc-tions for RAG, emphasizing potential enhancements to tackle current challenges, expansions into multi-modal settings, and the development of its ecosystem.	我們的貢獻(xiàn)如下: >>我們對(duì)最先進(jìn)的RAG進(jìn)行了全面和系統(tǒng)的回顧，描述了其通過(guò)范例的演變，包括幼稚的RAG，先進(jìn)的RAG和模塊化的RAG。這篇綜述的背景下，更廣泛的范圍內(nèi)的法學(xué)碩士研究RAG的景觀。 >>我們確定并討論了RAG過(guò)程中不可或缺的核心技術(shù)，特別關(guān)注“檢索”，“生成器”和“增強(qiáng)”方面，并深入研究了它們的協(xié)同作用，闡明了這些組件如何復(fù)雜地協(xié)作以形成一個(gè)有凝聚力和有效的RAG框架。 >>我們構(gòu)建了一個(gè)全面的RAG評(píng)估框架，概述了評(píng)估目標(biāo)和指標(biāo)。我們的對(duì)比分析從多個(gè)角度闡明了RAG與微調(diào)相比的優(yōu)缺點(diǎn)。此外，我們預(yù)測(cè)了RAG的未來(lái)方向，強(qiáng)調(diào)潛在的增強(qiáng)以應(yīng)對(duì)當(dāng)前的挑戰(zhàn)，擴(kuò)展到多模式設(shè)置，以及其生態(tài)系統(tǒng)的發(fā)展。
The paper unfolds as follows: Section 2 and 3 define RAG and detail its developmental process. Section 4 through 6 ex-plore core components—Retrieval, “Generation” and “Aug-mentation”—highlighting diverse embedded technologies. Section 7 focuses on RAG’s evaluation system. Section 8 compare RAG with other LLM optimization methods and suggest potential directions for its evolution. The paper con-cludes in Section 9.	第二節(jié)和第三節(jié)對(duì)RAG進(jìn)行了定義，并詳細(xì)介紹了RAG的發(fā)展過(guò)程。第4節(jié)至第6節(jié)探討了核心組件——檢索、“生成”和“增強(qiáng)”——重點(diǎn)介紹了各種嵌入式技術(shù)。第7節(jié)重點(diǎn)介紹RAG的評(píng)估體系。第8節(jié)將RAG與其他LLM優(yōu)化方法進(jìn)行了比較，并提出了其可能的發(fā)展方向。本文在第9節(jié)結(jié)束。

2 Definition

?The definition of RAG can be summarized from its workflow. Figure 2 depicts a typical RAG application workflow. In this scenario, a user inquires ChatGPT about a recent high-profile event (i.e., the abrupt dismissal and reinstatement of Ope-nAI’s CEO) which generated considerable public discourse. ChatGPT as the most renowned and widely utilized LLM, constrained by its pretraining data, lacks knowledge of re-cent events. RAG addresses this gap by retrieving up-to-date document excerpts from external knowledge bases. In this in-stance, it procures a selection of news articles pertinent to the inquiry. These articles, alongside the initial question, are then amalgamated into an enriched prompt that enables ChatGPT to synthesize an informed response. This example illustrates the RAG process, demonstrating its capability to enhance the model’s responses with real-time information retrieval.	RAG的定義可以從它的工作流程中總結(jié)出來(lái)。圖2描述了一個(gè)典型的RAG應(yīng)用程序工作流。在這個(gè)場(chǎng)景中，用戶(hù)向ChatGPT詢(xún)問(wèn)最近發(fā)生的一件引人注目的事件(例如，Ope-nAI的首席執(zhí)行官突然被解雇和復(fù)職)，該事件引起了相當(dāng)大的公眾討論。ChatGPT作為最著名和應(yīng)用最廣泛的LLM，受其預(yù)訓(xùn)練數(shù)據(jù)的限制，缺乏對(duì)近期事件的了解。RAG通過(guò)從外部知識(shí)庫(kù)檢索最新的文檔摘要來(lái)解決這一差距。在這種情況下，它獲得了與調(diào)查有關(guān)的新聞文章的選擇。然后，這些文章與最初的問(wèn)題合并成一個(gè)豐富的提示，使ChatGPT能夠合成一個(gè)知情的響應(yīng)。這個(gè)例子說(shuō)明了RAG過(guò)程，展示了它通過(guò)實(shí)時(shí)信息檢索增強(qiáng)模型響應(yīng)的能力。
Technologically, RAG has been enriched through various innovative approaches addressing pivotal questions such as “what to retrieve” “when to retrieve” and “how to use the retrieved information”. For “what to retrieve” research has progressed from simple token [Khandelwal et al., 2019] and entity retrieval [Nishikawa et al., 2022] to more complex structures like chunks [Ram et al., 2023] and knowledge graph [Kang et al., 2023], with studies focusing on the granularity of retrieval and the level of data structur-ing. Coarse granularity brings more information but with lower precision. Retrieving structured text provides more information while sacrificing efficiency. The ques-tion of “when to retrieve” has led to strategies ranging from single [Wang et al., 2023e, Shi et al., 2023] to adap-tive [Jiang et al., 2023b, Huang et al., 2023] and multiple retrieval [Izacard et al., 2022] methods. High frequency of retrieval brings more information and lower efficiency. As for ”how to use” the retrieved data, integration techniques have been developed across various levels of the model architecture, including the input [Khattab et al., 2022], intermediate [Borgeaud et al., 2022], and output lay-ers [Liang et al., 2023]. Although the “intermediate” and “output layers” are more effective, there are problems with the need for training and low efficiency.	在技術(shù)上，RAG通過(guò)各種創(chuàng)新方法得到了豐富，這些方法解決了諸如“檢索什么”、“何時(shí)檢索”和“如何使用檢索到的信息”等關(guān)鍵問(wèn)題。對(duì)于“檢索什么”的研究已經(jīng)從簡(jiǎn)單的令牌[Khandelwal等人，2019]和實(shí)體檢索[Nishikawa等人，2022]發(fā)展到更復(fù)雜的結(jié)構(gòu)，如塊[Ram等人，2023]和知識(shí)圖譜[Kang等人，2023]，研究重點(diǎn)是檢索的粒度和數(shù)據(jù)結(jié)構(gòu)的水平。粗粒度帶來(lái)更多的信息，但精度較低。檢索結(jié)構(gòu)化文本可以在犧牲效率的同時(shí)提供更多信息。“何時(shí)檢索”的問(wèn)題導(dǎo)致了從單一[Wang等人，2023e, Shi等人，2023]到自適應(yīng)[Jiang等人，2023b, Huang等人，2023]和多重檢索[Izacard等人，2022]方法的策略。檢索頻率高，信息量大，效率低。至于“如何使用”檢索到的數(shù)據(jù)，已經(jīng)在模型架構(gòu)的各個(gè)層次上開(kāi)發(fā)了集成技術(shù)，包括輸入層[Khattab等人，2022]、中間層[Borgeaud等人，2022]和輸出層[Liang等人，2023]。雖然“中間層”和“輸出層”更有效，但存在需要訓(xùn)練和效率低的問(wèn)題。
RAG is a paradigm that enhances LLMs by integrating ex-ternal knowledge bases. It employs a synergistic approach, combining information retrieval mechanisms and In-Context Learning (ICL) to bolster the LLM’s performance. In this framework, a query initiated by a user prompts the retrieval of?pertinent information via search algorithms. This information is then woven into the LLM’s prompts, providing additional context for the generation process. RAG’s key advantage lies in its obviation of the need for retraining of LLMs for task-specific applications. Developers can instead append an ex-ternal knowledge repository, enriching the input and thereby refining the model’s output precision. RAG has become one of the most popular architectures in LLMs’ systems, due to its high practicality and low barrier to entry, with many con-versational products being built almost entirely on RAG.	RAG是一種通過(guò)集成外部知識(shí)庫(kù)來(lái)增強(qiáng)法學(xué)碩士的范例。它采用協(xié)同方法，將信息檢索機(jī)制和上下文學(xué)習(xí)(ICL)相結(jié)合，以提高法學(xué)碩士的表現(xiàn)。在這個(gè)框架中，用戶(hù)發(fā)起的查詢(xún)提示通過(guò)搜索算法檢索相關(guān)信息。然后將這些信息編織到LLM的提示中，為生成過(guò)程提供額外的上下文。RAG的主要優(yōu)勢(shì)在于它避免了針對(duì)特定任務(wù)應(yīng)用程序?qū)Ψ▽W(xué)碩士進(jìn)行再培訓(xùn)的需要。開(kāi)發(fā)人員可以附加一個(gè)外部知識(shí)庫(kù)，豐富輸入，從而改進(jìn)模型的輸出精度。由于其高實(shí)用性和低入門(mén)門(mén)檻，RAG已成為法學(xué)碩士系統(tǒng)中最受歡迎的架構(gòu)之一，許多會(huì)話產(chǎn)品幾乎完全基于RAG構(gòu)建。
The RAG workflow comprises three key steps. First, the corpus is partitioned into discrete chunks, upon which vec-tor indices are constructed utilizing an encoder model. Sec-ond, RAG identifies and retrieves chunks based on their vec-tor similarity to the query and indexed chunks. Finally, the model synthesizes a response conditioned on the contextual information gleaned from the retrieved chunks. These steps form the fundamental framework of the RAG process, under-pinning its information retrieval and context-aware genera-tion capabilities. Next, we will provide an introduction to the RAG research framework.	RAG工作流包括三個(gè)關(guān)鍵步驟。首先，將語(yǔ)料庫(kù)劃分為離散塊，利用編碼器模型在其上構(gòu)建向量索引。其次，RAG根據(jù)它們與查詢(xún)和索引塊的向量相似性來(lái)標(biāo)識(shí)和檢索塊。最后，該模型綜合了基于從檢索塊中收集到的上下文信息的響應(yīng)。這些步驟構(gòu)成了RAG流程的基本框架，支持其信息檢索和上下文感知生成功能。接下來(lái)，我們將介紹RAG研究框架。

3 RAG Framework

?The RAG research paradigm is continuously evolving, and this section primarily delineates its progression. We cate-gorize it into three types: Naive RAG, Advanced RAG, and Modular RAG. While RAG were cost-effective and surpassed the performance of the native LLM, they also exhibited sev-eral limitations. The development of Advanced RAG and Modular RAG was a response to these specific shortcomings in Naive RAG.

RAG研究范式是不斷發(fā)展的，本節(jié)主要描述了它的發(fā)展過(guò)程。我們將其分為三種類(lèi)型:初級(jí)RAG、高級(jí)RAG和模塊化RAG。雖然RAG具有成本效益，并且性能超過(guò)了原生LLM，但它們也有一些局限性。高級(jí)RAG和模塊化RAG的開(kāi)發(fā)是對(duì)樸素RAG的這些具體缺點(diǎn)的回應(yīng)。

3.1 Naive RAG

?The Naive RAG research paradigm represents the earliest methodology, which gained prominence shortly after the widespread adoption of ChatGPT. The Naive RAG follows a traditional process that includes indexing, retrieval, and gen-eration. It is also characterized as a “Retrieve-Read” frame-work [Ma et al., 2023a].	樸素的RAG研究范式代表了最早的方法論，它在ChatGPT被廣泛采用后不久就獲得了突出的地位。樸素RAG遵循一個(gè)傳統(tǒng)的過(guò)程，包括索引、檢索和生成。它也被描述為“檢索-讀取”框架[Ma et al.， 2023a]。
Indexing The indexing process is a crucial initial step in data prepara-tion that occurs offline and involves several stages. It begins with data indexing, where original data is cleansed and ex-tracted, and various file formats such as PDF, HTML, Word, and Markdown are converted into standardized plain text. In order to fit within the context limitations of language models, this text is then segmented into smaller, more manageable chunks in a process known as chunking. These chunks are subsequently transformed into vector representations through an embedding model, chosen for its balance between infer-ence efficiency and model size. This facilitates similarity comparisons during the retrieval phase. Finally, an index is created to store these text chunks and their vector embed-dings as key-value pairs, which allows for efficient and scal-able search capabilities.	索引索引過(guò)程是離線數(shù)據(jù)準(zhǔn)備的關(guān)鍵初始步驟，涉及幾個(gè)階段。它從數(shù)據(jù)索引開(kāi)始，清理和提取原始數(shù)據(jù)，并將各種文件格式(如PDF、HTML、Word和Markdown)轉(zhuǎn)換為標(biāo)準(zhǔn)化的純文本。為了適應(yīng)語(yǔ)言模型的上下文限制，該文本然后被分割成更小、更易于管理的塊，這個(gè)過(guò)程稱(chēng)為分塊。這些塊隨后通過(guò)嵌入模型轉(zhuǎn)換為向量表示，選擇嵌入模型是為了在推理效率和模型大小之間取得平衡。這有助于在檢索階段進(jìn)行相似性比較。最后，創(chuàng)建索引以鍵值對(duì)的形式存儲(chǔ)這些文本塊及其向量嵌入，從而實(shí)現(xiàn)高效且可擴(kuò)展的搜索功能。
Retrieval Upon receipt of a user query, the system employs the same en-coding model utilized during the indexing phase to transcode?the input into a vector representation. It then proceeds to compute the similarity scores between the query vector and the vectorized chunks within the indexed corpus. The system prioritizes and retrieves the top K chunks that demonstrate the greatest similarity to the query. These chunks are subse-quently used as the expanded contextual basis for addressing the user’s request.	檢索在收到用戶(hù)查詢(xún)后，系統(tǒng)使用索引階段使用的相同編碼模型將輸入轉(zhuǎn)碼為矢量表示。然后，它繼續(xù)計(jì)算查詢(xún)向量和索引語(yǔ)料庫(kù)中矢量化塊之間的相似性分?jǐn)?shù)。系統(tǒng)對(duì)與查詢(xún)最相似的前K個(gè)塊進(jìn)行優(yōu)先級(jí)排序并檢索。這些塊隨后被用作擴(kuò)展的上下文基礎(chǔ)，用于處理用戶(hù)的請(qǐng)求。
Generation The posed query and selected documents are synthesized into a coherent prompt to which a large language model is tasked with formulating a response. The model’s approach to an-swering may vary depending on task-specific criteria, allow-ing it to either draw upon its inherent parametric knowledge or restrict its responses to the information contained within the provided documents. In cases of ongoing dialogues, any existing conversational history can be integrated into the prompt, enabling the model to engage in multi-turn dialogue interactions effectively.	一代提出的查詢(xún)和選定的文檔被合成為一個(gè)連貫的提示，大型語(yǔ)言模型的任務(wù)是制定響應(yīng)。模型的回答方法可能會(huì)根據(jù)特定于任務(wù)的標(biāo)準(zhǔn)而有所不同，允許它利用其固有的參數(shù)知識(shí)或限制其對(duì)所提供文檔中包含的信息的響應(yīng)。在正在進(jìn)行對(duì)話的情況下，任何現(xiàn)有的對(duì)話歷史都可以集成到提示符中，使模型能夠有效地進(jìn)行多輪對(duì)話交互。
Drawbacks in Naive RAG Naive RAG faces significant challenges in three key areas: “Retrieval,” “Generation,” and “Augmentation”.	Naive RAG的缺點(diǎn) Naive RAG在三個(gè)關(guān)鍵領(lǐng)域面臨重大挑戰(zhàn):“檢索”、“生成”和“增強(qiáng)”。
Retrieval quality poses diverse challenges, including low precision, leading to misaligned retrieved chunks and po-tential issues like hallucination or mid-air drop. Low recall also occurs, resulting in the failure to retrieve all relevant chunks, thereby hindering the LLMs’ ability to craft comprehensive responses. Outdated information further compounds the problem, potentially yielding inaccurate retrieval results.	檢索質(zhì)量帶來(lái)了各種各樣的挑戰(zhàn)，包括精度低，導(dǎo)致檢索塊不對(duì)齊以及潛在的問(wèn)題，如幻覺(jué)或半空中掉落。低回憶率也會(huì)發(fā)生，導(dǎo)致無(wú)法檢索所有相關(guān)的塊，從而阻礙了法學(xué)碩士制定全面回應(yīng)的能力。過(guò)時(shí)的信息使問(wèn)題進(jìn)一步復(fù)雜化，可能產(chǎn)生不準(zhǔn)確的檢索結(jié)果。
Response generation quality presents hallucination chal-lenge, where the model generates answers not grounded in the provided context, as well as issues of irrelevant context and potential toxicity or bias in the model’s output.	響應(yīng)生成質(zhì)量呈現(xiàn)幻覺(jué)挑戰(zhàn)，即模型生成的答案不基于所提供的上下文，以及模型輸出中不相關(guān)的上下文和潛在的毒性或偏見(jiàn)問(wèn)題。
The augmentation process presents its own challenges in effectively integrating context from retrieved passages with the current generation task, potentially leading to disjointed or incoherent output. Redundancy and repetition are also concerns, especially when multiple retrieved passages con-tain similar information, resulting in repetitive content in the generated response.	增強(qiáng)過(guò)程在有效地將檢索段落的上下文與當(dāng)前生成任務(wù)集成方面提出了自己的挑戰(zhàn)，可能導(dǎo)致不連貫或不連貫的輸出。冗余和重復(fù)也是一個(gè)問(wèn)題，特別是當(dāng)多個(gè)檢索的段落包含相似的信息時(shí)，會(huì)導(dǎo)致生成的響應(yīng)中出現(xiàn)重復(fù)的內(nèi)容。
Discerning the importance and relevance of multiple re-trieved passages to the generation task is another challenge, requiring the proper balance of each passage’s value. Addi-tionally, reconciling differences in writing styles and tones to ensure consistency in the output is crucial.	辨別多個(gè)檢索段落對(duì)生成任務(wù)的重要性和相關(guān)性是另一個(gè)挑戰(zhàn)，需要適當(dāng)平衡每個(gè)段落的價(jià)值。此外，協(xié)調(diào)不同的寫(xiě)作風(fēng)格和語(yǔ)調(diào)，以確保輸出的一致性是至關(guān)重要的。
Lastly, there’s a risk of generation models overly depend-ing on augmented information, potentially resulting in out-puts that merely reiterate the retrieved content without pro-viding new value or synthesized information.	最后，存在生成模型過(guò)度依賴(lài)于增強(qiáng)信息的風(fēng)險(xiǎn)，這可能導(dǎo)致輸出只是重復(fù)檢索的內(nèi)容，而沒(méi)有提供新值或合成信息。

3.2 Advanced RAG

?Advanced RAG has been developed with targeted enhance-ments to address the shortcomings of Naive RAG. In terms of retrieval quality, Advanced RAG implements pre-retrieval?and post-retrieval strategies. To address the indexing chal-lenges experienced by Naive RAG, Advanced RAG has re-fined its indexing approach using techniques such as slid-ing window, fine-grained segmentation, and metadata. It has also introduced various methods to optimize the retrieval pro-cess [ILIN, 2023].	高級(jí)RAG已被開(kāi)發(fā)，并有針對(duì)性地進(jìn)行了增強(qiáng)，以解決幼稚RAG的缺點(diǎn)。在檢索質(zhì)量方面，Advanced RAG實(shí)現(xiàn)了檢索前和檢索后策略。為了解決Naive RAG遇到的索引挑戰(zhàn)，Advanced RAG使用滑動(dòng)窗口、細(xì)粒度分割和元數(shù)據(jù)等技術(shù)重新定義了其索引方法。它還引入了各種方法來(lái)優(yōu)化檢索過(guò)程[ILIN, 2023]。
Pre-Retrieval Process Optimizing Data Indexing.The goal of optimizing data index-ing is to enhance the quality of the content being indexed. This involves five primary strategies: enhancing data gran-ularity, optimizing index structures, adding metadata, align-ment optimization, and mixed retrieval.	Pre-Retrieval過(guò)程優(yōu)化數(shù)據(jù)索引。優(yōu)化數(shù)據(jù)索引的目標(biāo)是提高被索引內(nèi)容的質(zhì)量。這涉及五種主要策略:增強(qiáng)數(shù)據(jù)粒度、優(yōu)化索引結(jié)構(gòu)、添加元數(shù)據(jù)、對(duì)齊優(yōu)化和混合檢索。
Enhancing data granularity aims to elevate text standard-ization, consistency, factual accuracy, and rich context to im-prove the RAG system’s performance. This includes remov-ing irrelevant information, dispelling ambiguity in entities and terms, confirming factual accuracy, maintaining context, and updating outdated documents.	增強(qiáng)數(shù)據(jù)粒度旨在提高文本的標(biāo)準(zhǔn)化、一致性、事實(shí)準(zhǔn)確性和豐富的上下文，從而提高RAG系統(tǒng)的性能。這包括刪除不相關(guān)的信息，消除實(shí)體和術(shù)語(yǔ)中的歧義，確認(rèn)事實(shí)的準(zhǔn)確性，維護(hù)上下文和更新過(guò)時(shí)的文檔。
Optimizing index structures involves adjusting the size of chunks to capture relevant context, querying across multiple index paths, and incorporating information from the graph structure to capture relevant context by leveraging relation-ships between nodes in a graph data index.	優(yōu)化索引結(jié)構(gòu)包括調(diào)整塊的大小以捕獲相關(guān)上下文，跨多個(gè)索引路徑進(jìn)行查詢(xún)，以及通過(guò)利用圖數(shù)據(jù)索引中節(jié)點(diǎn)之間的關(guān)系來(lái)合并圖結(jié)構(gòu)中的信息以捕獲相關(guān)上下文。
Adding metadata information involves integrating refer-enced metadata, such as dates and purposes, into chunks for filtering purposes, and incorporating metadata like chapters and subsections of references to improve retrieval efficiency.	添加元數(shù)據(jù)信息包括將引用的元數(shù)據(jù)(如日期和用途)集成到塊中以進(jìn)行過(guò)濾，以及將引用的章節(jié)和小節(jié)等元數(shù)據(jù)集成到塊中以提高檢索效率。
Alignment optimization addresses alignment issues and disparities between documents by introducing “hypothetical questions” [Li et al., 2023d] into documents to rectify align-ment issues and differences.	對(duì)齊優(yōu)化通過(guò)在文檔中引入“假設(shè)問(wèn)題”[Li等人，2023]來(lái)糾正對(duì)齊問(wèn)題和差異，從而解決文檔之間的對(duì)齊問(wèn)題和差異。
Retrieval During the retrieval stage, the primary focus is on identifying the appropriate context by calculating the similarity between the query and chunks. The embedding model is central to this process. In the advanced RAG, there is potential for op-timization of the embedding models.	檢索在檢索階段，主要關(guān)注的是通過(guò)計(jì)算查詢(xún)和塊之間的相似性來(lái)識(shí)別適當(dāng)?shù)纳舷挛?。嵌入模型是這個(gè)過(guò)程的核心。在高級(jí)RAG中，有可能對(duì)嵌入模型進(jìn)行優(yōu)化。
Fine-tuning Embedding. Fine-tuning embedding models significantly impact the relevance of retrieved content in RAG systems. This process involves customizing embedding mod-els to enhance retrieval relevance in domain-specific contexts, especially for professional domains dealing with evolving or rare terms. The BGE embedding model [BAAI, 2023], such as BGE-large-EN developed by BAAI2, is an example of a high-performance embedding model that can be fine-tuned to optimize retrieval relevance. Training data for fine-tuning can be generated using language models like GPT-3.5-turbo to formulate questions grounded on document chunks, which are then used as fine-tuning pairs.	微調(diào)嵌入。微調(diào)嵌入模型會(huì)顯著影響RAG系統(tǒng)中檢索內(nèi)容的相關(guān)性。該過(guò)程包括自定義嵌入模型，以增強(qiáng)特定領(lǐng)域上下文中的檢索相關(guān)性，特別是對(duì)于處理演化或罕見(jiàn)術(shù)語(yǔ)的專(zhuān)業(yè)領(lǐng)域。BGE嵌入模型[BAAI, 2023]，如BAAI2開(kāi)發(fā)的BGE-large- en，就是一個(gè)可以微調(diào)以?xún)?yōu)化檢索相關(guān)性的高性能嵌入模型的例子?？梢允褂肎PT-3.5-turbo等語(yǔ)言模型生成用于微調(diào)的訓(xùn)練數(shù)據(jù)，以制定基于文檔塊的問(wèn)題，然后將其用作微調(diào)對(duì)。
Dynamic Embedding adapts to the context in which words are used, unlike static embedding, which uses a single vec-tor for each word [Karpukhin et al., 2020]. For example, in transformer models like BERT, the same word can have varied embeddings depending on surrounding words. Ope-nAI’s embeddings-ada-02 model3, built upon the principles?of LLMs like GPT, is a sophisticated dynamic embedding model that captures contextual understanding. However, it may not exhibit the same sensitivity to context as the latest full-size language models like GPT-4.	與靜態(tài)嵌入不同，動(dòng)態(tài)嵌入適應(yīng)單詞使用的上下文，靜態(tài)嵌入為每個(gè)單詞使用單個(gè)向量[Karpukhin等人，2020]。例如，在像BERT這樣的變壓器模型中，相同的單詞可以根據(jù)周?chē)膯卧~具有不同的嵌入。Ope-nAI的embedding_ada -02模型建立在法學(xué)碩士(如GPT)的原理之上，是一個(gè)復(fù)雜的動(dòng)態(tài)嵌入模型，可以捕獲上下文理解。然而，它可能不會(huì)像最新的全尺寸語(yǔ)言模型(如GPT-4)那樣對(duì)上下文表現(xiàn)出同樣的敏感性。
Post-Retrieval Process After retrieving valuable context from the database, it is es-sential to merge it with the query as an input into LLMs while addressing challenges posed by context window limits. Sim-ply presenting all relevant documents to the LLM at once may exceed the context window limit, introduce noise, and hinder the focus on crucial information. Additional processing of the retrieved content is necessary to address these issues.	Post-Retrieval過(guò)程在從數(shù)據(jù)庫(kù)中檢索有價(jià)值的上下文之后，必須將其與查詢(xún)合并，作為llm的輸入，同時(shí)解決上下文窗口限制帶來(lái)的挑戰(zhàn)。簡(jiǎn)單地將所有相關(guān)文件一次性呈現(xiàn)給法學(xué)碩士可能會(huì)超出上下文窗口限制，引入噪音，并阻礙對(duì)關(guān)鍵信息的關(guān)注。為了解決這些問(wèn)題，需要對(duì)檢索到的內(nèi)容進(jìn)行額外處理。
Re-Ranking. Re-ranking the retrieved information to re-locate the most relevant content to the edges of the prompt is a key strategy. This concept has been implemented in frameworks such as LlamaIndex4, LangChain5, and HayStack [Blagojevi, 2023]. For example, Diversity Ranker6 prioritizes reordering based on document diversity, while LostInTheMiddleRanker alternates placing the best docu-ment at the beginning and end of the context window. Ad-ditionally, approaches like cohereAI rerank [Cohere, 2023], bge-rerank7, and LongLLMLingua [Jiang et al., 2023a] re-calculate the semantic similarity between relevant text and the query, addressing the challenge of interpreting vector-based simulated searches for semantic similarity.	重新評(píng)估。對(duì)檢索到的信息重新排序以將最相關(guān)的內(nèi)容重新定位到提示的邊緣是一個(gè)關(guān)鍵策略。這個(gè)概念已經(jīng)在LlamaIndex4、LangChain5和HayStack等框架中實(shí)現(xiàn)[Blagojevi, 2023]。例如，Diversity Ranker6根據(jù)文檔多樣性對(duì)重新排序進(jìn)行優(yōu)先級(jí)排序，而LostInTheMiddleRanker則交替將最佳文檔放在上下文窗口的開(kāi)頭和結(jié)尾。此外，cohereAI rerank [Cohere, 2023]、big -rerank7和LongLLMLingua [Jiang等人，2023]等方法重新計(jì)算了相關(guān)文本與查詢(xún)之間的語(yǔ)義相似度，解決了解釋基于向量的模擬搜索語(yǔ)義相似度的挑戰(zhàn)。
Prompt Compression. Research indicates that noise in re-trieved documents adversely affects RAG performance. In post-processing, the emphasis lies in compressing irrelevant context, highlighting pivotal paragraphs, and reducing the overall context length. Approaches such as Selective Context and LLMLingua [Litman et al., 2020, Anderson et al., 2022] utilize small language models to calculate prompt mu-tual information or perplexity, estimating element impor-tance. Recomp [Xu et al., 2023a] addresses this by train-ing compressors at different granularities, while Long Context [Xu et al., 2023b] and “Walking in the Memory Maze” [Chen et al., 2023a] design summarization techniques to enhance LLM’s key information perception, particularly in dealing with extensive contexts.	提示壓縮。研究表明，檢索文檔中的噪聲會(huì)對(duì)RAG性能產(chǎn)生不利影響。在后處理中，重點(diǎn)在于壓縮不相關(guān)的上下文，突出關(guān)鍵段落，減少整體上下文長(zhǎng)度。選擇性語(yǔ)境(Selective Context)和LLMLingua等方法[Litman et al.， 2020, Anderson et al.， 2022]利用小語(yǔ)言模型來(lái)計(jì)算提示互信息或困惑，從而估計(jì)元素的重要性。Recomp [Xu等人，2023a]通過(guò)在不同粒度上訓(xùn)練壓縮器來(lái)解決這個(gè)問(wèn)題，而Long Context [Xu等人，2023b]和“在記憶迷宮中行走”[Chen等人，2023a]設(shè)計(jì)了總結(jié)技術(shù)來(lái)增強(qiáng)LLM的關(guān)鍵信息感知，特別是在處理廣泛的上下文時(shí)。

3.3 Modular RAG

?The modular RAG structure diverges from the tradi-tional Naive RAG framework, providing greater versatil-ity and flexibility. It integrates various methods to en-hance functional modules, such as incorporating a search module for similarity retrieval and applying a fine-tuning approach in the retriever [Lin et al., 2023]. Restructured RAG modules [Yu et al., 2022] and iterative methodologies like [Shao et al., 2023] have been developed to address spe-cific issues. The modular RAG paradigm is increasingly be-coming the norm in the RAG domain, allowing for either a serialized pipeline or an end-to-end training approach across multiple modules. The comparison of three RAG paradigms?is depicted in Figure 3. However, Modular RAG is not stan-dalone. Advanced RAG is a specialized form of modular RAG, and further, Naive RAG itself is a special case of Ad-vanced RAG. The relationship among the three paradigms is one of inheritance and development.	模塊化的RAG結(jié)構(gòu)與傳統(tǒng)的樸素RAG框架不同，提供了更大的通用性和靈活性。它集成了各種方法來(lái)增強(qiáng)功能模塊，例如在檢索器中加入相似檢索的搜索模塊和應(yīng)用微調(diào)方法[Lin et al.， 2023]。重構(gòu)RAG模塊[Yu et al.， 2022]和迭代方法(如[Shao et al.， 2023])已被開(kāi)發(fā)用于解決特定問(wèn)題。模塊化的RAG范例正日益成為RAG領(lǐng)域的規(guī)范，它允許序列化的管道或跨多個(gè)模塊的端到端訓(xùn)練方法。圖3描述了三個(gè)RAG范例的比較。然而，模塊化RAG并不是獨(dú)立的。高級(jí)RAG是模塊化RAG的一種特殊形式，此外，幼稚RAG本身是高級(jí)RAG的一種特殊情況。三種范式之間是一種繼承與發(fā)展的關(guān)系。
New Modules Search Module. In contrast to the similarity retrieval in Naive/Advanced RAG, the Search Module is tailored to spe-cific scenarios and incorporates direct searches on additional corpora. This integration is achieved using code generated by the LLM, query languages such as SQL or Cypher, and other custom tools. The data sources for these searches can include search engines, text data, tabular data, and knowledge graphs [Wang et al., 2023d].	新模塊搜索模塊。與Naive/Advanced RAG中的相似度檢索相比，Search模塊針對(duì)特定場(chǎng)景進(jìn)行了定制，并結(jié)合了對(duì)其他語(yǔ)料庫(kù)的直接搜索。這種集成是使用LLM生成的代碼、查詢(xún)語(yǔ)言(如SQL或Cypher)以及其他自定義工具來(lái)實(shí)現(xiàn)的。這些搜索的數(shù)據(jù)源可以包括搜索引擎、文本數(shù)據(jù)、表格數(shù)據(jù)和知識(shí)圖譜[Wang et al.， 2023]。
Memory Module. This module harnesses the memory ca-pabilities of the LLM to guide retrieval. The approach in-volves identifying memories most similar to the current input. Selfmem [Cheng et al., 2023b] utilizes a retrieval-enhanced generator to create an unbounded memory pool iteratively, combining the “original question” and “dual question”. By employing a retrieval-enhanced generative model that uses its own outputs to improve itself, the text becomes more aligned with the data distribution during the reasoning process. Con-sequently, the model’s own outputs are utilized instead of the training data [Wang et al., 2022a].	內(nèi)存模塊。該模塊利用LLM的內(nèi)存功能來(lái)指導(dǎo)檢索。這種方法包括識(shí)別與當(dāng)前輸入最相似的記憶。Selfmem [Cheng et al.， 2023b]利用檢索增強(qiáng)生成器迭代創(chuàng)建無(wú)界內(nèi)存池，將“原始問(wèn)題”和“雙重問(wèn)題”結(jié)合起來(lái)。通過(guò)使用檢索增強(qiáng)的生成模型，該模型使用自己的輸出來(lái)改進(jìn)自己，文本在推理過(guò)程中與數(shù)據(jù)分布更加一致。因此，使用模型自身的輸出來(lái)代替訓(xùn)練數(shù)據(jù)[Wang et al.， 2022a]。
Fusion. RAG-Fusion [Raudaschl, 2023]enhances tradi-tional search systems by addressing their limitations through a multi-query approach that expands user queries into multiple, diverse perspectives using an LLM. This approach not only captures the explicit information users seek but also un-covers deeper, transformative knowledge. The fusion pro-cess involves parallel vector searches of both original and expanded queries, intelligent re-ranking to optimize results, and pairing the best outcomes with new queries. This sophis-ticated method ensures search results that align closely with both the explicit and implicit intentions of the user, leading to more insightful and relevant information discovery.	融合。RAG-Fusion [Raudaschl, 2023]通過(guò)使用LLM將用戶(hù)查詢(xún)擴(kuò)展到多個(gè)不同角度的多查詢(xún)方法來(lái)解決傳統(tǒng)搜索系統(tǒng)的局限性，從而增強(qiáng)了傳統(tǒng)搜索系統(tǒng)。這種方法不僅捕獲了用戶(hù)所尋求的明確信息，而且還揭示了更深層次的、具有變革性的知識(shí)。融合過(guò)程包括對(duì)原始查詢(xún)和擴(kuò)展查詢(xún)進(jìn)行并行向量搜索，智能重新排序以?xún)?yōu)化結(jié)果，并將最佳結(jié)果與新查詢(xún)配對(duì)。這種復(fù)雜的方法確保搜索結(jié)果與用戶(hù)的顯性和隱性意圖緊密結(jié)合，從而導(dǎo)致更有洞察力和相關(guān)的信息發(fā)現(xiàn)。
Routing. The RAG system’s retrieval process utilizes di-verse sources, differing in domain, language, and format, which can be either alternated or merged based on the sit-uation [Li et al., 2023b]. Query routing decides the subse-quent action to a user’s query, with options ranging from summarization, searching specific databases, or merging dif-ferent pathways into a single response. The query router also chooses the appropriate data store for the query, which may include various sources like vector stores, graph databases, or relational databases, or a hierarchy of indices—for instance, a summary index and a document block vector index for multi-document storage. The query router’s decision-making is pre-defined and executed via LLMs calls, which direct the query to the chosen index.	路由。RAG系統(tǒng)的檢索過(guò)程利用了多種來(lái)源，這些來(lái)源在領(lǐng)域、語(yǔ)言和格式上都有所不同，可以根據(jù)情況進(jìn)行交替或合并[Li et al.， 2023b]。查詢(xún)路由決定用戶(hù)查詢(xún)的后續(xù)操作，其選項(xiàng)包括匯總、搜索特定數(shù)據(jù)庫(kù)或?qū)⒉煌穆窂胶喜⒌絾蝹€(gè)響應(yīng)中。查詢(xún)路由器還為查詢(xún)選擇適當(dāng)?shù)臄?shù)據(jù)存儲(chǔ)，其中可能包括各種來(lái)源，如矢量存儲(chǔ)、圖形數(shù)據(jù)庫(kù)或關(guān)系數(shù)據(jù)庫(kù)，或者索引層次結(jié)構(gòu)——例如，用于多文檔存儲(chǔ)的摘要索引和文檔塊向量索引。查詢(xún)路由器的決策是預(yù)先定義的，并通過(guò)llm調(diào)用執(zhí)行，llm調(diào)用將查詢(xún)定向到所選的索引。
Predict . It addresses the common issues of redundancy and noise in retrieved content. Instead of directly retrieving from a data source, this module utilizes the LLM to generate the necessary context [Yu et al., 2022]. The content produced by the LLM is more likely to contain pertinent information compared to that obtained through direct retrieval.	預(yù)測(cè)。它解決了檢索內(nèi)容中的冗余和噪聲等常見(jiàn)問(wèn)題。該模塊不是直接從數(shù)據(jù)源中檢索，而是利用LLM生成必要的上下文[Yu et al.， 2022]。與通過(guò)直接檢索獲得的內(nèi)容相比，法學(xué)碩士產(chǎn)生的內(nèi)容更有可能包含相關(guān)信息。
Task Adapter. This module focuses on adapting RAG to a variety of downstream tasks. UPRISE automates the retrieval of prompts for zero-shot task inputs from a pre-constructed data pool, thereby enhancing universality across tasks and models [Cheng et al., 2023a]. Meanwhile, PROMPTAGA-TOR [Dai et al., 2022] utilizes LLM as a few-shot query gen-erator and, based on the generated data, creates task-specific retrievers. By leveraging the generalization capability of LLMs, it enables the development of task-specific end-to-end retrievers with minimal examples.	任務(wù)適配器。本模塊側(cè)重于使RAG適應(yīng)各種下游任務(wù)。UPRISE自動(dòng)從預(yù)構(gòu)建的數(shù)據(jù)池中檢索零shot任務(wù)輸入的提示，從而增強(qiáng)了任務(wù)和模型之間的通用性[Cheng等人，2023a]。同時(shí)，PROMPTAGA-TOR [Dai et al.， 2022]利用LLM作為少量查詢(xún)生成器，并基于生成的數(shù)據(jù)創(chuàng)建特定于任務(wù)的檢索器。通過(guò)利用llm的泛化能力，它可以用最少的示例開(kāi)發(fā)特定于任務(wù)的端到端檢索器。
New Patterns The organizational structure of Modular RAG is highly adapt-able, allowing for the substitution or rearrangement of mod-ules within the RAG process to suit specific problem contexts.	新模式模塊化RAG的組織結(jié)構(gòu)具有高度的適應(yīng)性，允許在RAG過(guò)程中替換或重新排列模塊以適應(yīng)特定的問(wèn)題上下文。
Naive RAG and Advanced RAG can both be considered as being composed of some fixed modules. As illustrated in the figure 3, Naive RAG primarily consists of the “Retrieve” and “Read” modules. A typical pattern of Advanced RAG builds upon the foundation of Naive RAG by adding “Rewrite” and “Rerank” modules. However, on the whole, modular RAG enjoys greater diversity and flexibility.	初級(jí)RAG和高級(jí)RAG都可以認(rèn)為是由一些固定的模塊組成的。如圖3所示，Naive RAG主要由“Retrieve”和“Read”模塊組成。高級(jí)RAG的典型模式建立在樸素RAG的基礎(chǔ)上，通過(guò)添加“重寫(xiě)”和“重新排序”模塊。但總體而言，模塊化RAG具有更大的多樣性和靈活性。
Current research primarily explores two organizational paradigms. The first involves adding or replacing modules, while the second focuses on adjusting the organizational flow between modules. This flexibility enables tailoring the RAG process to effectively address a wide array of tasks.	目前的研究主要探討了兩種組織范式。前者涉及添加或替換模塊，而后者側(cè)重于調(diào)整模塊之間的組織流程。這種靈活性使RAG過(guò)程能夠有效地處理各種任務(wù)。
Adding or Replacing Modules.The strategy of introducing or substituting modules involves maintaining the core struc-ture of the Retrieval-Read process while integrating addi-tional modules to enhance specific functionalities. The RRR model [Ma et al., 2023a] introduces the Rewrite-Retrieve-Read process, utilizing the LLM performance as a reinforce-ment learning incentive for a rewriting module. This enables the rewriter to fine-tune retrieval queries, thereby improving the downstream task performance of the reader.	增加或更換模塊。引入或替換模塊的策略包括維護(hù)檢索-讀取過(guò)程的核心結(jié)構(gòu)，同時(shí)集成其他模塊以增強(qiáng)特定功能。RRR模型[Ma et al.， 2023a]引入了重寫(xiě)-檢索-讀取過(guò)程，利用LLM性能作為重寫(xiě)模塊的強(qiáng)化學(xué)習(xí)激勵(lì)。這使重寫(xiě)器能夠微調(diào)檢索查詢(xún)，從而提高讀取器的下游任務(wù)性能。
Similarly, modules can be selectively swapped in method-ologies like Generate-Read [Yu et al., 2022], where the LLM’s generation module takes the place of the retrieval module. The Recite-Read approach [Sun et al., 2022] trans-forms external retrieval into retrieval from model weights, requiring the LLM to initially memorize task-specific infor-mation and subsequently produce output capable of handling knowledge-intensive natural language processing tasks.	類(lèi)似地，模塊可以在Generate-Read [Yu et al.， 2022]等方法中選擇性地交換，其中LLM的生成模塊取代了檢索模塊。背誦-閱讀方法[Sun et al.， 2022]將外部檢索轉(zhuǎn)換為從模型權(quán)重中檢索，要求LLM首先記住特定于任務(wù)的信息，然后產(chǎn)生能夠處理知識(shí)密集型自然語(yǔ)言處理任務(wù)的輸出。
Adjusting the Flow between Modules. zheIn the realm of module flow adjustment, there is a focus on enhancing the interaction between language models and retrieval mod-els. DSP [Khattab et al., 2022] introduces the Demonstrate-Search-Predict framework, treating the context learning sys-tem as an explicit program rather than a final task prompt, leading to more effective handling of knowledge-intensive tasks. The ITER-RETGEN [Shao et al., 2023] approach uti-lizes generated content to guide retrieval, iteratively im-plementing “retrieval-enhanced generation” and “generation-enhanced retrieval” within a Retrieve-Read-Retrieve-Read flow. This method demonstrates an innovative way of using one module’s output to improve the functionality of another.	調(diào)整模塊間的流程。在模塊流調(diào)整領(lǐng)域，重點(diǎn)是加強(qiáng)語(yǔ)言模型和檢索模型之間的交互。DSP [Khattab等人，2022]引入了演示-搜索-預(yù)測(cè)框架，將上下文學(xué)習(xí)系統(tǒng)視為一個(gè)明確的程序，而不是最終的任務(wù)提示，從而更有效地處理知識(shí)密集型任務(wù)。ITER-RETGEN [Shao等人，2023]方法利用生成的內(nèi)容來(lái)指導(dǎo)檢索，在檢索-讀取-檢索-讀取流程中迭代實(shí)現(xiàn)“檢索增強(qiáng)生成”和“生成增強(qiáng)檢索”。這種方法展示了一種使用一個(gè)模塊的輸出來(lái)改進(jìn)另一個(gè)模塊的功能的創(chuàng)新方法。
Optimizing the RAG Pipeline The optimization of the retrieval process aims to enhance the efficiency and quality of information in RAG systems. Cur-rent research focuses on integrating diverse search technolo-gies, refining retrieval steps, incorporating cognitive back-tracking, implementing versatile query strategies, and lever-aging embedding similarity. These efforts collectively strive to achieve a balance between retrieval efficiency and the depth of contextual information in RAG systems.	RAG管道優(yōu)化優(yōu)化檢索過(guò)程的目的是提高檢索效率和檢索質(zhì)量。目前的研究主要集中在整合多種搜索技術(shù)、優(yōu)化檢索步驟、結(jié)合認(rèn)知回溯、實(shí)現(xiàn)通用查詢(xún)策略以及利用老化嵌入相似度等方面。這些努力共同努力實(shí)現(xiàn)檢索效率和上下文信息深度在RAG系統(tǒng)之間的平衡。
Hybrid Search Exploration. The RAG system optimizes its performance by intelligently integrating various techniques, including keyword-based search, semantic search, and vec-tor search. This approach leverages the unique strengths of each method to accommodate diverse query types and infor-mation needs, ensuring consistent retrieval of highly relevant and context-rich information. The use of hybrid search serves as a robust supplement to retrieval strategies, thereby enhanc-ing the overall efficacy of the RAG pipeline.	混合搜索探索。RAG系統(tǒng)通過(guò)智能集成各種技術(shù)來(lái)優(yōu)化其性能，包括基于關(guān)鍵字的搜索、語(yǔ)義搜索和向量搜索。這種方法利用每種方法的獨(dú)特優(yōu)勢(shì)來(lái)適應(yīng)不同的查詢(xún)類(lèi)型和信息需求，確保對(duì)高度相關(guān)和上下文豐富的信息進(jìn)行一致的檢索。使用混合搜索作為檢索策略的強(qiáng)大補(bǔ)充，從而提高了RAG管道的整體效率。
Recursive Retrieval and Query Engine. Recursive retrieval involves acquiring smaller chunks during the initial retrieval phase to capture key semantic meanings. Subsequently, larger chunks containing more contextual information are provided to the LLM in later stages of the process. This two-step re-trieval method helps to strike a balance between efficiency and the delivery of contextually rich responses.	遞歸檢索和查詢(xún)引擎。遞歸檢索涉及在初始檢索階段獲取較小的塊以捕獲關(guān)鍵語(yǔ)義。隨后，在流程的后期階段，將向法學(xué)碩士提供包含更多上下文信息的大塊。這種兩步檢索方法有助于在效率和提供上下文豐富的響應(yīng)之間取得平衡。
StepBack-prompt approach encourages the LLM to move away from specific instances and engage in reasoning around broader concepts and principles [Zheng et al., 2023]. Experi-mental results demonstrate a significant performance increase in various challenging, inference-based tasks when backward prompts are used, highlighting their natural adaptability to the RAG process. These retrieval-enhancing steps can be applied both in generating responses to backward prompts and in the final question-answering process.	退步提示方法鼓勵(lì)法學(xué)碩士從具體實(shí)例中轉(zhuǎn)移出來(lái)，圍繞更廣泛的概念和原則進(jìn)行推理[Zheng等人，2023]。實(shí)驗(yàn)結(jié)果表明，當(dāng)使用向后提示時(shí)，在各種具有挑戰(zhàn)性的、基于推理的任務(wù)中，性能顯著提高，突出了它們對(duì)RAG過(guò)程的自然適應(yīng)性。這些增強(qiáng)檢索的步驟既可以應(yīng)用于生成對(duì)向后提示的響應(yīng)，也可以應(yīng)用于最終的問(wèn)答過(guò)程。
Sub-Queries. Depending on the scenario, various query strategies can be employed, such as using query engines provided by frameworks like LlamaIndex, leveraging tree queries, utilizing vector queries, or executing simple sequen-tial querying of chunks.	子查詢(xún)。根據(jù)場(chǎng)景的不同，可以采用各種查詢(xún)策略，例如使用LlamaIndex等框架提供的查詢(xún)引擎、利用樹(shù)查詢(xún)、利用向量查詢(xún)或執(zhí)行簡(jiǎn)單的塊順序查詢(xún)。
Hypothetical Document Embeddings. HyDE operates on the belief that the answers generated might be closer in the embedding space than a direct query. Using the LLM, HyDE creates a hypothetical document (answer) in response to a query, embeds this document, and uses the resulting em-bedding to retrieve real documents similar to the hypotheti-cal one. Instead of seeking embedding similarity based on the query, this approach focuses on the embedding similar-ity from one answer to another [Gao et al., 2022]. However, it might not consistently produce desirable outcomes, espe-cially when the language model is unfamiliar with the subject matter, potentially leading to more instances with errors.	假設(shè)的文檔嵌入。HyDE相信生成的答案在嵌入空間中可能比直接查詢(xún)更接近。使用LLM, HyDE為響應(yīng)查詢(xún)創(chuàng)建一個(gè)假設(shè)文檔(答案)，嵌入該文檔，并使用生成的嵌入來(lái)檢索與假設(shè)文檔相似的真實(shí)文檔。該方法不是基于查詢(xún)尋求嵌入相似度，而是側(cè)重于從一個(gè)答案到另一個(gè)答案的嵌入相似度[Gao et al.， 2022]。然而，它可能不會(huì)始終產(chǎn)生理想的結(jié)果，特別是當(dāng)語(yǔ)言模型不熟悉主題時(shí)，可能會(huì)導(dǎo)致更多帶有錯(cuò)誤的實(shí)例。

4 Retrieval

?In the context of RAG, it is crucial to efficiently retrieve rel-evant documents from the data source. However, creating a proficient retriever presents significant challenges. This sec-tionelves into three fundamental questions: 1) How can we achieve accurate semantic representations? 2) What methods?can align the semantic spaces of queries and documents? 3) How can the retriever’s output be aligned with the preferences of the Large Language Model?

在RAG上下文中，從數(shù)據(jù)源中有效地檢索相關(guān)事件文檔是至關(guān)重要的。然而，創(chuàng)造一只熟練的尋回犬面臨著巨大的挑戰(zhàn)。本節(jié)分為三個(gè)基本問(wèn)題:1)我們?nèi)绾螌?shí)現(xiàn)準(zhǔn)確的語(yǔ)義表示?2)什么方法可以對(duì)齊查詢(xún)和文檔的語(yǔ)義空間?3)如何使檢索器的輸出與大語(yǔ)言模型的偏好保持一致?

4.1 Enhancing Semantic Representations

?In RAG, the semantic space is essential as it involves the mul-tidimensional mapping of queries and documents. Retrieval accuracy in this semantic space significantly impacts RAG outcomes. This section will present two methods for building accurate semantic spaces.

在RAG中，語(yǔ)義空間是必不可少的，因?yàn)樗婕安樵?xún)和文檔的多維映射。語(yǔ)義空間的檢索精度顯著影響RAG結(jié)果。本節(jié)將介紹構(gòu)建準(zhǔn)確語(yǔ)義空間的兩種方法。

?Chunk optimization When managing external documents, the initial step involves breaking them down into smaller chunks to extract fine-grained features, which are then embedded to represent their semantics. However, embedding overly large or excessively small text chunks may lead to sub-optimal outcomes. There-fore, identifying the optimal chunk size for documents within the corpus is crucial to ensuring the accuracy and relevance of the retrieved results.	塊優(yōu)化在管理外部文檔時(shí)，最初的步驟包括將它們分解為更小的塊，以提取細(xì)粒度的特性，然后嵌入這些特性以表示它們的語(yǔ)義。然而，嵌入過(guò)大或過(guò)小的文本塊可能會(huì)導(dǎo)致次優(yōu)結(jié)果。因此，確定語(yǔ)料庫(kù)中文檔的最佳塊大小對(duì)于確保檢索結(jié)果的準(zhǔn)確性和相關(guān)性至關(guān)重要。
Choosing an appropriate chunking strategy requires care-ful consideration of several vital factors, such as the nature of the indexed content, the embedding model and its opti-mal block size, the expected length and complexity of user queries, and the specific application’s utilization of the re-trieved results. For instance, the selection of a chunking model should be based on the content’s length—whether it is longer or shorter. Additionally, different embedding mod-els demonstrate distinct performance characteristics at vary-ing block sizes. For example, sentence-transformer performs better with single sentences, while text-embedding-ada-002 excels with blocks containing 256 or 512 tokens.	選擇適當(dāng)?shù)姆謮K策略需要仔細(xì)考慮幾個(gè)重要因素，例如索引內(nèi)容的性質(zhì)、嵌入模型及其最優(yōu)塊大小、用戶(hù)查詢(xún)的預(yù)期長(zhǎng)度和復(fù)雜性，以及特定應(yīng)用程序?qū)z索結(jié)果的利用。例如，分塊模型的選擇應(yīng)該基于內(nèi)容的長(zhǎng)度——是長(zhǎng)還是短。此外，不同的嵌入模型在不同塊大小下表現(xiàn)出不同的性能特征。例如，句子轉(zhuǎn)換器在處理單個(gè)句子時(shí)表現(xiàn)更好，而text-embedding-ada-002在處理包含256或512個(gè)令牌的塊時(shí)表現(xiàn)出色。
Additionally, factors like the length and complexity of user input questions, and the specific needs of the application (e.g., semantic search or question answering), have effect on the choice of a chunking strategy. This choice can be directly in-fluenced by the token limits of the selected LLMs, requiring adjustments to the block size. In reality, getting precise query results involves flexibly applying different chunking strate-gies. There is no one-size-fits-all ”best” strategy, only the most appropriate one for a particular context.	此外，用戶(hù)輸入問(wèn)題的長(zhǎng)度和復(fù)雜性以及應(yīng)用程序的特定需求(例如，語(yǔ)義搜索或問(wèn)題回答)等因素也會(huì)影響分塊策略的選擇。這種選擇可能直接受到所選llm的令牌限制的影響，需要調(diào)整塊大小。在現(xiàn)實(shí)中，獲得精確的查詢(xún)結(jié)果需要靈活地應(yīng)用不同的分塊策略。沒(méi)有放之四海而皆準(zhǔn)的“最佳”策略，只有最適合特定環(huán)境的策略。
Current research in RAG explores various block optimiza-tion techniques aimed at improving both retrieval efficiency and accuracy. One such approach involves the use of slid-ing window technology, enabling layered retrieval by merg-ing globally related information across multiple retrieval pro-cesses. Another strategy, known as the “small2big” method, utilizes small text blocks during the initial search phase and subsequently provides larger related text blocks to the lan-guage model for processing.	當(dāng)前RAG的研究探索了各種塊優(yōu)化技術(shù)，旨在提高檢索效率和準(zhǔn)確性。其中一種方法涉及使用滑動(dòng)窗口技術(shù)，通過(guò)跨多個(gè)檢索過(guò)程合并全局相關(guān)信息來(lái)實(shí)現(xiàn)分層檢索。另一種策略，稱(chēng)為“small2big”方法，在初始搜索階段利用小文本塊，隨后向語(yǔ)言模型提供更大的相關(guān)文本塊進(jìn)行處理。
The abstract embedding technique prioritizes top K re-trieval based on document abstracts (or summaries), offering a comprehensive understanding of the entire document con-text. Additionally, the metadata filtering technique leverages document metadata to enhance the filtering process. An in-novative approach, the graph indexing technique, transforms entities and relationships into nodes and connections, sig-nificantly improving relevance, particularly in the context of multi-hop problems.	摘要嵌入技術(shù)根據(jù)文檔摘要(或摘要)對(duì)top K檢索進(jìn)行優(yōu)先級(jí)排序，從而提供對(duì)整個(gè)文檔上下文的全面理解。此外，元數(shù)據(jù)過(guò)濾技術(shù)利用文檔元數(shù)據(jù)來(lái)增強(qiáng)過(guò)濾過(guò)程。一種創(chuàng)新的方法，圖索引技術(shù)，將實(shí)體和關(guān)系轉(zhuǎn)換為節(jié)點(diǎn)和連接，顯著提高相關(guān)性，特別是在多跳問(wèn)題的背景下。
The combination of these diverse methods has led to no-table advancements, resulting in enhanced retrieval outcomes and improved performance for RAG.	這些不同方法的組合導(dǎo)致了無(wú)表的進(jìn)步，從而增強(qiáng)了檢索結(jié)果并改進(jìn)了RAG的性能。
Fine-tuning Embedding Models Once the appropriate size of chunks is determined, the next crucial step involves embedding these chunks and the query into the semantic space using an embedding model. The effectiveness of the embedding is critical as it impacts the model’s ability to represent the corpus. Recent re-search has introduced prominent embedding models such as AngIE, Voyage, BGE,etc [Li and Li, 2023, VoyageAI, 2023, BAAI, 2023]. These models have undergone pre-training on extensive corpora. However, their capability to accurately capture domain-specific information may be limited when ap-plied to specialized domains.	微調(diào)嵌入模型一旦確定了適當(dāng)?shù)膲K大小，下一個(gè)關(guān)鍵步驟是使用嵌入模型將這些塊和查詢(xún)嵌入到語(yǔ)義空間中。嵌入的有效性至關(guān)重要，因?yàn)樗绊懩Ｐ捅硎菊Z(yǔ)料庫(kù)的能力。最近的研究引入了AngIE、Voyage、BGE等突出的嵌入模型[Li and Li, 2023, VoyageAI, 2023, BAAI, 2023]。這些模型在廣泛的語(yǔ)料庫(kù)上進(jìn)行了預(yù)訓(xùn)練。然而，當(dāng)應(yīng)用于特定領(lǐng)域時(shí)，它們準(zhǔn)確捕獲特定領(lǐng)域信息的能力可能會(huì)受到限制。
Moreover, task-specific fine-tuning of embedding models is essential to ensure that the model comprehends the user query in terms of content relevance. A model without fine-tuning may not adequately address the requirements of a spe-cific task. Consequently, fine-tuning an embedding model be-comes crucial for downstream applications. There are two primary paradigms in embedding fine-tuning methods.	此外，嵌入模型的特定任務(wù)微調(diào)對(duì)于確保模型從內(nèi)容相關(guān)性方面理解用戶(hù)查詢(xún)至關(guān)重要。沒(méi)有微調(diào)的模型可能無(wú)法充分滿(mǎn)足特定任務(wù)的需求。因此，對(duì)嵌入模型進(jìn)行微調(diào)對(duì)于下游應(yīng)用程序至關(guān)重要。在嵌入微調(diào)方法中有兩種主要的范式。
Domain Knowledge Fine-tuning. To ensure that an embed-ding model accurately captures domain-specific information, it is imperative to utilize domain-specific datasets for fine-tuning. This process diverges from standard language model fine-tuning, chiefly in the nature of the datasets involved. Typically, the dataset for embedding model fine-tuning en-compasses three principal elements: queries, a corpus, and relevant documents. The model employs these queries to identify pertinent documents within the corpus. The effi-cacy of the model is then gauged based on its ability to re-trieve these relevant documents in response to the queries. The dataset construction, model fine-tuning, and evalua-tion phases each present distinct challenges. The LlamaIn-dex [Liu, 2023] introduces a suite of pivotal classes and func-tions designed to enhance the embedding model fine-tuning workflow, thereby simplifying these intricate processes. By curating a corpus infused with domain knowledge and lever-aging the methodologies offered, one can adeptly fine-tune an embedding model to align closely with the specific require-ments of the target domain.	領(lǐng)域知識(shí)微調(diào)。為了確保嵌入模型準(zhǔn)確地捕獲特定于領(lǐng)域的信息，必須利用特定于領(lǐng)域的數(shù)據(jù)集進(jìn)行微調(diào)。這個(gè)過(guò)程與標(biāo)準(zhǔn)語(yǔ)言模型微調(diào)不同，主要在于所涉及的數(shù)據(jù)集的性質(zhì)。通常，用于嵌入模型微調(diào)的數(shù)據(jù)集包含三個(gè)主要元素:查詢(xún)、語(yǔ)料庫(kù)和相關(guān)文檔。該模型使用這些查詢(xún)來(lái)識(shí)別語(yǔ)料庫(kù)中的相關(guān)文檔。然后，根據(jù)響應(yīng)查詢(xún)而重新檢索這些相關(guān)文檔的能力來(lái)衡量模型的有效性。數(shù)據(jù)集構(gòu)建、模型微調(diào)和評(píng)估階段各有不同的挑戰(zhàn)。LlamaIn-dex [Liu, 2023]引入了一套關(guān)鍵類(lèi)和函數(shù)，旨在增強(qiáng)嵌入模型微調(diào)工作流程，從而簡(jiǎn)化這些復(fù)雜的過(guò)程。通過(guò)管理充滿(mǎn)領(lǐng)域知識(shí)的語(yǔ)料庫(kù)并利用所提供的方法，可以熟練地微調(diào)嵌入模型，使其與目標(biāo)領(lǐng)域的特定需求緊密結(jié)合。
Fine-tuning for Downstream Tasks. Fine-tuning embed-ding models for downstream tasks is a critical step in en-hancing model performance. In the realm of utilizing RAG for these tasks, innovative methods have emerged to fine-tune embedding models by harnessing the capabilities of LLMs. For example, PROMPTAGATOR [Dai et al., 2022] utilizes the LLM as a few-shot query generator to cre-ate task-specific retrievers, addressing challenges in super-vised fine-tuning, particularly in data-scarce domains. An-other approach, LLM-Embedder [Zhang et al., 2023a], ex-ploits LLMs to generate reward signals for data across mul-tiple downstream tasks. The retriever is fine-tuned with two types of supervised signals: hard labels for the dataset and soft rewards from the LLMs. This dual-signal approach fos-ters a more effective fine-tuning process, tailoring the embed-ding model to diverse downstream applications.	對(duì)下游任務(wù)進(jìn)行微調(diào)。對(duì)下游任務(wù)的嵌入模型進(jìn)行微調(diào)是提高模型性能的關(guān)鍵步驟。在利用RAG完成這些任務(wù)的領(lǐng)域中，通過(guò)利用llm的功能來(lái)微調(diào)嵌入模型的創(chuàng)新方法已經(jīng)出現(xiàn)。例如，PROMPTAGATOR [Dai等人，2022]利用LLM作為少量查詢(xún)生成器來(lái)創(chuàng)建特定于任務(wù)的檢索器，解決了監(jiān)督微調(diào)中的挑戰(zhàn)，特別是在數(shù)據(jù)稀缺領(lǐng)域。另一種方法是LLM-Embedder [Zhang等，2023a]，利用llm為跨多個(gè)下游任務(wù)的數(shù)據(jù)生成獎(jiǎng)勵(lì)信號(hào)。檢索器使用兩種類(lèi)型的監(jiān)督信號(hào)進(jìn)行微調(diào):數(shù)據(jù)集的硬標(biāo)簽和來(lái)自llm的軟獎(jiǎng)勵(lì)。這種雙信號(hào)方法實(shí)現(xiàn)了更有效的微調(diào)過(guò)程，使嵌入模型適應(yīng)不同的下游應(yīng)用。
While these methods improve semantic representation by incorporating domain knowledge and task-specific fine-tuning, retrievers may not always exhibit optimal compatibil-ity with certain LLMs. To address this, some researchers have explored direct supervision of the fine-tuning process using feedback from LLMs. This direct supervision seeks to align the retriever more closely with the LLM, thereby improving performance on downstream tasks. A more comprehensive discussion on this topic is presented in Section 4.3.	雖然這些方法通過(guò)結(jié)合領(lǐng)域知識(shí)和特定于任務(wù)的微調(diào)來(lái)改進(jìn)語(yǔ)義表示，但檢索器可能并不總是表現(xiàn)出與某些llm的最佳兼容性。為了解決這個(gè)問(wèn)題，一些研究人員探索了利用法學(xué)碩士的反饋直接監(jiān)督微調(diào)過(guò)程。這種直接監(jiān)督旨在使檢索器更緊密地與LLM保持一致，從而提高下游任務(wù)的性能。關(guān)于這個(gè)主題的更全面的討論將在第4.3節(jié)中介紹。

4.2 Aligning Queries and Documents

?In the context of RAG applications, retrievers may utilize a single embedding model for encoding both the query and the documents, or employ separate models for each. Addi-tionally, the user’s original query may suffer from imprecise phrasing and lack of semantic information. Therefore, it is crucial to align the semantic space of the user’s query with those of the documents. This section introduces two funda-mental techniques aimed at achieving this alignment.	在RAG應(yīng)用程序的上下文中，檢索器可以使用單個(gè)嵌入模型對(duì)查詢(xún)和文檔進(jìn)行編碼，或者為每個(gè)模型使用單獨(dú)的模型。此外，用戶(hù)的原始查詢(xún)可能會(huì)受到措辭不精確和缺乏語(yǔ)義信息的影響。因此，將用戶(hù)查詢(xún)的語(yǔ)義空間與文檔的語(yǔ)義空間保持一致是至關(guān)重要的。本節(jié)將介紹兩種旨在實(shí)現(xiàn)這種一致性的基本技術(shù)。
Query Rewriting Query rewriting is a fundamental approach for aligning the semantics of a query and a document. Methods such as Query2Doc and ITER-RETGEN leverage LLMs to create a pseudo-document by combining the origi-nal query with additional guidance [Wang et al., 2023c, Shao et al., 2023]. HyDE constructs query vectors using textual cues to generate a “hypothetical” document captur-ing essential patterns [Gao et al., 2022]. RRR introduces a framework that reverses the traditional retrieval and read-ing order, focusing on query rewriting [Ma et al., 2023a]. STEP-BACKPROMPTING enables LLMs to perform ab-stract reasoning and retrieval based on high-level con-cepts [Zheng et al., 2023]. Additionally, the multi-query re-trieval method utilizes LLMs to generate and execute multiple search queries simultaneously, advantageous for addressing complex problems with multiple sub-problems.	查詢(xún)重寫(xiě) 查詢(xún)重寫(xiě)是對(duì)齊查詢(xún)和文檔語(yǔ)義的基本方法。Query2Doc和ITER-RETGEN等方法利用llm通過(guò)將原始查詢(xún)與附加指導(dǎo)相結(jié)合來(lái)創(chuàng)建偽文檔[Wang et al.， 2023c, Shao et al.， 2023]。HyDE使用文本線索構(gòu)建查詢(xún)向量，以生成捕獲基本模式的“假設(shè)”文檔[Gao等人，2022]。RRR引入了一個(gè)框架，該框架逆轉(zhuǎn)了傳統(tǒng)的檢索和讀取順序，重點(diǎn)是查詢(xún)重寫(xiě)[Ma et al.， 2023a]。step - backprompts使llm能夠基于高級(jí)概念執(zhí)行抽象推理和檢索[Zheng等，2023]。此外，多查詢(xún)重新檢索方法利用llm同時(shí)生成和執(zhí)行多個(gè)搜索查詢(xún)，有利于解決包含多個(gè)子問(wèn)題的復(fù)雜問(wèn)題。
Embedding Transformation Beyond broad strategies such as query rewriting, there exist more granular techniques specifically designed for embed-ding transformations. LlamaIndex [Liu, 2023] exemplifies this by introducing an adapter module that can be integrated following the query encoder. This adapter facilitates fine-tuning, thereby optimizing the representation of query em-beddings to map them into a latent space that is more closely aligned with the intended tasks.	嵌入轉(zhuǎn)換除了諸如查詢(xún)重寫(xiě)之類(lèi)的廣泛策略之外，還有專(zhuān)門(mén)為嵌入轉(zhuǎn)換設(shè)計(jì)的更細(xì)粒度的技術(shù)。LlamaIndex [Liu, 2023]通過(guò)引入一個(gè)可以集成在查詢(xún)編碼器之后的適配器模塊來(lái)舉例說(shuō)明這一點(diǎn)。這個(gè)適配器促進(jìn)了微調(diào)，從而優(yōu)化了查詢(xún)嵌入的表示，將它們映射到與預(yù)期任務(wù)更緊密結(jié)合的潛在空間中。
The challenge of aligning queries with structured exter-nal documents, particularly when addressing the incongruity between structured and unstructured data, is addressed by SANTA [Li et al., 2023d]. It enhances the retriever’s sen-sitivity to structured information through two pre-training strategies: first, by leveraging the intrinsic alignment between structured and unstructured data to inform contrastive learn-ing in a structured-aware pre-training scheme; and second, by implementing Masked Entity Prediction. The latter utilizes an entity-centric masking strategy that encourages language models to predict and fill in the masked entities, thereby fos-tering a deeper understanding of structured data.	將查詢(xún)與結(jié)構(gòu)化外部文檔對(duì)齊的挑戰(zhàn)，特別是在處理結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的不一致性時(shí)，SANTA解決了這個(gè)問(wèn)題[Li等人，2023]。它通過(guò)兩種預(yù)訓(xùn)練策略來(lái)提高檢索器對(duì)結(jié)構(gòu)化信息的敏感性:第一，利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的內(nèi)在一致性，在結(jié)構(gòu)化感知預(yù)訓(xùn)練方案中通知對(duì)比學(xué)習(xí);第二，通過(guò)實(shí)現(xiàn)屏蔽實(shí)體預(yù)測(cè)。后者利用以實(shí)體為中心的屏蔽策略，鼓勵(lì)語(yǔ)言模型預(yù)測(cè)和填充被屏蔽的實(shí)體，從而促進(jìn)對(duì)結(jié)構(gòu)化數(shù)據(jù)的更深入理解。
The issue of aligning queries with structured exter-nal documents, especially when dealing with the dispar-ity between structured and unstructured data, is tackled by SANTA [Li et al., 2023d]. This approach improves the re-triever’s ability to recognize structured information through two pre-training strategies: firstly, by utilizing the inher-ent alignment between structured and unstructured data to guide contrastive learning in a structured-aware pre-training scheme; and secondly, by employing Masked Entity Predic-tion. The latter uses an entity-centric masking strategy to prompt language models to predict and complete the masked entities, thus promoting a more profound comprehension of structured data.	將查詢(xún)與結(jié)構(gòu)化外部文檔對(duì)齊的問(wèn)題，特別是在處理結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的差異時(shí)，由SANTA解決[Li等人，2023]。該方法通過(guò)兩種預(yù)訓(xùn)練策略提高了檢索器識(shí)別結(jié)構(gòu)化信息的能力:第一，利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)之間的內(nèi)在一致性來(lái)指導(dǎo)結(jié)構(gòu)化感知預(yù)訓(xùn)練方案中的對(duì)比學(xué)習(xí);其次，采用屏蔽實(shí)體預(yù)測(cè)。后者使用以實(shí)體為中心的掩蔽策略來(lái)提示語(yǔ)言模型預(yù)測(cè)和完成掩蔽實(shí)體，從而促進(jìn)對(duì)結(jié)構(gòu)化數(shù)據(jù)的更深刻理解。

4.3 Aligning Retriever and LLM

?In the RAG pipeline, enhancing retrieval hit rate through var-ious techniques may not necessarily improve the final out-come, as the retrieved documents may not align with the spe-cific requirements of the LLMs. Therefore, this section in-troduces two methods aimed at aligning the retriever outputs with the preferences of the LLMs.	在RAG管道中，通過(guò)各種技術(shù)提高檢索命中率不一定會(huì)改善最終結(jié)果，因?yàn)闄z索的文檔可能與llm的特定需求不一致。因此，本節(jié)將介紹兩種方法，旨在使檢索器輸出與llm的首選項(xiàng)保持一致。
Fine-tuning Retrievers Several studies utilize feedback signals from LLMs to refine retrieval models. For instance, AAR [Yu et al., 2023b] intro-duces supervisory signals for a pre-trained retriever using an encoder-decoder architecture. This is achieved by identifying the LM’s preferred documents through FiD cross-attention scores. Subsequently, the retriever undergoes fine-tuning with hard negative sampling and standard cross-entropy loss. Ultimately, the refined retriever can be directly applied to en-hance unseen target LMs, resulting in improved performance in the target task. Additionally, it is suggested that LLMs may have a preference for focusing on readable rather than information-rich documents.	微調(diào)獵犬一些研究利用llm的反饋信號(hào)來(lái)完善檢索模型。例如，AAR [Yu等人，2023b]使用編碼器-解碼器架構(gòu)為預(yù)訓(xùn)練的檢索器引入監(jiān)視信號(hào)。這是通過(guò)FiD交叉注意分?jǐn)?shù)來(lái)識(shí)別LM的首選文檔來(lái)實(shí)現(xiàn)的。隨后，通過(guò)硬負(fù)采樣和標(biāo)準(zhǔn)交叉熵?fù)p失對(duì)尋回犬進(jìn)行微調(diào)。最終，改進(jìn)后的檢索器可以直接用于增強(qiáng)未見(jiàn)目標(biāo)lm，從而提高目標(biāo)任務(wù)的性能。此外，有人建議法學(xué)碩士可能更傾向于關(guān)注可讀而不是信息豐富的文檔。
REPLUG [Shi et al., 2023] utilizes a retriever and an LLM to calculate the probability distributions of the retrieved doc-uments and then performs supervised training by computing the KL divergence. This straightforward and effective train-ing method enhances the performance of the retrieval model by using an LM as the supervisory signal, eliminating the need for specific cross-attention mechanisms.	REPLUG [Shi et al.， 2023]利用檢索器和LLM計(jì)算檢索文檔的概率分布，然后通過(guò)計(jì)算KL散度進(jìn)行監(jiān)督訓(xùn)練。這種簡(jiǎn)單有效的訓(xùn)練方法通過(guò)使用LM作為監(jiān)督信號(hào)來(lái)提高檢索模型的性能，消除了對(duì)特定交叉注意機(jī)制的需要。
UPRISE [Cheng et al., 2023a] also employs frozen LLMs to fine-tune the prompt retriever. Both the LLM and the re-triever take prompt-input pairs as inputs and utilize the scores provided by the LLM to supervise the retriever’s training, ef-fectively treating the LLM as a dataset labeler. In addition, Atlas [Izacard et al., 2022] proposes four methods of super-vised fine-tuning embedding models: >>Attention Distillation. This approach employs cross-attention scores generated by the LLM during output to distill the model’s knowledge. >>EMDR2. By using the Expectation-Maximization algo-rithm, this method trains the model with retrieved docu-ments as latent variables. >>Perplexity Distillation directly trains the model using the perplexity of generated tokens as an indicator. >>LOOP. This method presents a novel loss function based on the impact of document deletion on LLM prediction, offering an efficient training strategy to better adapt the model to specific tasks.	UPRISE [Cheng et al.， 2023a]也使用凍結(jié)llm對(duì)提示檢索器進(jìn)行微調(diào)。LLM和尋回犬都將提示輸入對(duì)作為輸入，并利用LLM提供的分?jǐn)?shù)來(lái)監(jiān)督尋回犬的訓(xùn)練，有效地將LLM視為數(shù)據(jù)集標(biāo)注器。此外，Atlas [Izacard et al.， 2022]提出了四種監(jiān)督微調(diào)嵌入模型的方法: > >注意蒸餾。該方法利用LLM在輸出過(guò)程中生成的交叉注意分?jǐn)?shù)來(lái)提取模型的知識(shí)。 > > EMDR2。該方法采用期望最大化算法，以檢索到的文檔作為潛在變量對(duì)模型進(jìn)行訓(xùn)練。 Perplexity Distillation直接使用生成的token的Perplexity作為指標(biāo)來(lái)訓(xùn)練模型。 > >循環(huán)。該方法提出了一種新的基于文檔刪除對(duì)LLM預(yù)測(cè)影響的損失函數(shù)，提供了一種有效的訓(xùn)練策略，使模型更好地適應(yīng)特定的任務(wù)。
These approaches aim to improve the synergy between the retriever and the LLM, leading to enhanced retrieval perfor-mance and more accurate responses to user inquiries.	這些方法旨在提高檢索器和LLM之間的協(xié)同作用，從而提高檢索性能并更準(zhǔn)確地響應(yīng)用戶(hù)查詢(xún)。
Adapters Fine-tuning models may present challenges, such as integrat-ing functionality through an API or addressing constraints arising from limited local computational resources. Con-sequently, some approaches opt to incorporate an external adapter to aid in alignment.	適配器微調(diào)模型可能會(huì)帶來(lái)挑戰(zhàn)，例如通過(guò)API集成功能或解決由有限的本地計(jì)算資源引起的約束。因此，一些方法選擇合并外部適配器來(lái)幫助校準(zhǔn)。
PRCA trains the adapter through a context extraction phase and a reward-driven phase. The retriever’s out-put is then optimized using a token-based autoregres-sive strategy [Yang et al., 2023b]. The token filtering ap-proach employs cross-attention scores to efficiently fil-ter tokens, selecting only the highest-scoring input to-kens [Berchansky et al., 2023].RECOMP introduces both ex-tractive and generative compressors for summary generation. These compressors either select relevant sentences or syn-thesize document information, creating summaries tailored to multi-document queries [Xu et al., 2023a].	PRCA通過(guò)上下文提取階段和獎(jiǎng)勵(lì)驅(qū)動(dòng)階段訓(xùn)練適配器。然后使用基于令牌的自回歸策略對(duì)檢索器的輸出進(jìn)行優(yōu)化[Yang等人，2023b]。令牌過(guò)濾方法采用交叉注意分?jǐn)?shù)來(lái)有效地過(guò)濾令牌，只選擇得分最高的輸入令牌[Berchansky等人，2023]。RECOMP引入了抽取壓縮器和生成壓縮器來(lái)生成摘要。這些壓縮器要么選擇相關(guān)句子，要么合成文檔信息，創(chuàng)建適合多文檔查詢(xún)的摘要[Xu等人，2023a]。
Furthermore, PKG introduces an innovative method for in-tegrating knowledge into white-box models via directive fine-tuning [Luo et al., 2023]. In this approach, the retriever mod-ule is directly substituted to generate relevant documents ac-cording to a query. This method assists in addressing the dif-ficulties encountered during the fine-tuning process and en-hances model performance.	此外，PKG引入了一種通過(guò)指令微調(diào)將知識(shí)集成到白盒模型中的創(chuàng)新方法[Luo等人，2023]。在這種方法中，直接替換檢索器模塊，根據(jù)查詢(xún)生成相關(guān)文檔。該方法有助于解決在微調(diào)過(guò)程中遇到的困難，并提高模型性能。

5 Generation

?A crucial component of RAG is its generator, which is re-sponsible for converting retrieved information into coherent and fluent text. Unlike traditional language models, RAG’s generator sets itself apart by improving accuracy and rele-vance via the incorporation of retrieved data. In RAG, the generator’s input encompasses not only typical contextual in-formation but also relevant text segments obtained through the retriever. This comprehensive input enables the generator to gain a deep understanding of the question’s context, result-ing in more informative and contextually relevant responses.	RAG的一個(gè)關(guān)鍵組件是它的生成器，它負(fù)責(zé)將檢索到的信息轉(zhuǎn)換成連貫流暢的文本。與傳統(tǒng)的語(yǔ)言模型不同，RAG的生成器通過(guò)整合檢索到的數(shù)據(jù)來(lái)提高準(zhǔn)確性和相關(guān)性，從而使自己與眾不同。在RAG中，生成器的輸入不僅包括典型的上下文信息，還包括通過(guò)檢索器獲得的相關(guān)文本片段。這種全面的輸入使生成器能夠深入了解問(wèn)題的上下文，從而產(chǎn)生更多信息和上下文相關(guān)的響應(yīng)。
Furthermore, the generator is guided by the retrieved text to ensure coherence between the generated content and the ob-tained information. The diverse input data has led to targeted efforts during the generation phase, all aimed at refining the adaptation of the large model to the input data derived from queries and documents. In the following subsections, we will explore the introduction of the generator by delving into as-pects of post-retrieval processing and fine-tuning.	此外，生成器由檢索文本引導(dǎo)，以確保生成的內(nèi)容與獲取的信息之間的一致性。不同的輸入數(shù)據(jù)導(dǎo)致在生成階段進(jìn)行有針對(duì)性的工作，所有這些工作都旨在改進(jìn)大型模型對(duì)來(lái)自查詢(xún)和文檔的輸入數(shù)據(jù)的適應(yīng)。在接下來(lái)的小節(jié)中，我們將通過(guò)深入研究檢索后處理和微調(diào)的各個(gè)方面來(lái)探討生成器的介紹。

5.1 Post-retrieval with Frozen LLM

?In the realm of untunable LLMs , many studies rely on well-established models like GPT-4 [OpenAI, 2023] to harness their comprehensive internal knowledge for systematically synthesizing retrieved information from various documents.?	在不可調(diào)法學(xué)碩士領(lǐng)域，許多研究依賴(lài)于完善的模型，如GPT-4 [OpenAI, 2023]來(lái)利用其全面的內(nèi)部知識(shí)，系統(tǒng)地綜合從各種文檔中檢索到的信息。
However, challenges persist with these large models, includ-ing limitations on context length and susceptibility to redun-dant information. To tackle these issues, certain research en-deavors have turned their focus to post-retrieval processing.	然而，這些大型模型仍然存在挑戰(zhàn)，包括上下文長(zhǎng)度的限制和對(duì)冗余信息的敏感性。為了解決這些問(wèn)題，一些研究人員將重點(diǎn)轉(zhuǎn)向了檢索后處理。
Post-retrieval processing involves treating, filtering, or op-timizing the relevant information retrieved by the retriever from a large document database. Its main goal is to enhance the quality of retrieval results, aligning them more closely with user needs or subsequent tasks. It can be viewed as a reprocessing of the documents obtained during the retrieval phase. Common operations in post-retrieval processing typi-cally include information compression and result reranking.	檢索后處理包括處理、過(guò)濾或優(yōu)化檢索器從大型文檔數(shù)據(jù)庫(kù)檢索到的相關(guān)信息。它的主要目標(biāo)是提高檢索結(jié)果的質(zhì)量，使它們更貼近用戶(hù)需求或后續(xù)任務(wù)。它可以看作是對(duì)檢索階段獲得的文檔的再處理。檢索后處理中的常見(jiàn)操作通常包括信息壓縮和結(jié)果重新排序。
Information Compression The retriever excels at retrieving relevant information from a vast knowledge base, but managing the substantial amount of information within retrieval documents is a challenge. Ongo-ing research aims to extend the context length of large lan-guage models to tackle this issue. However, current large models still struggle with context limitations. Therefore, there are scenarios where condensing information becomes necessary. Information condensation is significant for reduc-ing noise, addressing context length restrictions, and enhanc-ing generation effects.	信息壓縮檢索器擅長(zhǎng)從龐大的知識(shí)庫(kù)中檢索相關(guān)信息，但是管理檢索文檔中的大量信息是一個(gè)挑戰(zhàn)。正在進(jìn)行的研究旨在擴(kuò)展大型語(yǔ)言模型的上下文長(zhǎng)度來(lái)解決這個(gè)問(wèn)題。然而，當(dāng)前的大型模型仍然與上下文限制作斗爭(zhēng)。因此，在某些情況下，壓縮信息是必要的。信息凝聚對(duì)于降低噪聲、解決上下文長(zhǎng)度限制和增強(qiáng)生成效果具有重要意義。
PRCA tackled this issue by training an information ex-tractor [Yang et al., 2023b]. In the context extraction phase, when provided with an input text Sinput, it is capable of producing an output sequence Cextracted that represents the condensed context from the input document. The train-ing process is designed to minimize the difference between Cextracted and the actual context Ctruth.	PRCA通過(guò)培訓(xùn)信息提取拖拉機(jī)解決了這個(gè)問(wèn)題[Yang等，2023b]。在上下文提取階段，當(dāng)提供輸入文本Sinput時(shí)，它能夠生成一個(gè)輸出序列，該序列表示從輸入文檔中提取的濃縮上下文。訓(xùn)練過(guò)程的目的是盡量減少提取和實(shí)際上下文之間的差異。
Similarly, RECOMP adopts a comparable approach by training an information condenser using contrastive learn-ing [Xu et al., 2023a]. Each training data point consists of one positive sample and five negative samples, and the en-coder undergoes training using contrastive loss throughout this process [Karpukhin et al., 2020] .	類(lèi)似地，RECOMP采用了一種類(lèi)似的方法，即使用對(duì)比學(xué)習(xí)來(lái)訓(xùn)練信息收集器[Xu et al.， 2023a]。每個(gè)訓(xùn)練數(shù)據(jù)點(diǎn)由一個(gè)正樣本和五個(gè)負(fù)樣本組成，編碼器在整個(gè)過(guò)程中使用對(duì)比損失進(jìn)行訓(xùn)練[Karpukhin et al.， 2020]。
Another study has taken a different approach by aim-ing to reduce the number of documents in order to im-prove the accuracy of the model’s answers. In the study by [Ma et al., 2023b], they propose the “Filter-Reranker” paradigm, which combines the strengths of LLMs and Small Language Models (SLMs). In this paradigm, SLMs serve as filters, while LLMs function as reordering agents. The re-search shows that instructing LLMs to rearrange challeng-ing samples identified by SLMs leads to significant improve-ments in various Information Extraction (IE) tasks.	另一項(xiàng)研究采取了不同的方法，旨在減少文件的數(shù)量，以提高模型答案的準(zhǔn)確性。在[Ma et al.， 2023b]的研究中，他們提出了“Filter-Reranker”范式，該范式結(jié)合了llm和小語(yǔ)言模型(Small Language Models, slm)的優(yōu)勢(shì)。在這個(gè)范例中，slm充當(dāng)過(guò)濾器，而llm充當(dāng)重新排序代理。研究表明，指導(dǎo)llm重新排列由slm識(shí)別的具有挑戰(zhàn)性的樣本可以顯著改善各種信息提取(IE)任務(wù)。
Reranking The re-ranking model is pivotal in optimizing the document set retrieved from the retriever. Language models often face performance declines when additional context is introduced, and re-ranking effectively addresses this issue. The core con-cept involves rearranging document records to prioritize the most relevant items at the top, thereby limiting the total num-ber of documents. This not only resolves the challenge of context window expansion during retrieval but also enhances retrieval efficiency and responsiveness.	Reranking 重新排序模型是優(yōu)化從檢索器檢索到的文檔集的關(guān)鍵。當(dāng)引入額外的上下文時(shí)，語(yǔ)言模型經(jīng)常面臨性能下降的問(wèn)題，重新排序可以有效地解決這個(gè)問(wèn)題。核心概念包括重新排列文檔記錄，將最相關(guān)的項(xiàng)放在最上面，從而限制文檔的總數(shù)。這既解決了檢索過(guò)程中上下文窗口展開(kāi)的難題，又提高了檢索效率和響應(yīng)速度。
The re-ranking model assumes a dual role throughout the information retrieval process, functioning as both an?optimizer and a refiner. It provides more effective and accurate input for subsequent language model process-ing [Zhuang et al., 2023].	重新排序模型在整個(gè)信息檢索過(guò)程中扮演雙重角色，既充當(dāng)優(yōu)化器，又充當(dāng)精煉器。它為后續(xù)的語(yǔ)言模型處理提供了更有效和準(zhǔn)確的輸入[Zhuang等，2023]。
Contextual compression is incorporated into the reorder-ing process to offer more precise retrieval information. This method entails reducing the content of individual documents and filtering the entire document, with the ultimate goal of presenting the most relevant information in the search results for a more focused and accurate display of pertinent content.	上下文壓縮被整合到重新排序過(guò)程中，以提供更精確的檢索信息。這種方法需要減少單個(gè)文檔的內(nèi)容并過(guò)濾整個(gè)文檔，其最終目標(biāo)是在搜索結(jié)果中顯示最相關(guān)的信息，以便更集中、更準(zhǔn)確地顯示相關(guān)內(nèi)容。

5.2 Fine-tuning LLM for RAG

?Optimizing the generator within the RAG model is a critical aspect of its architecture. The generator’s role is to take the retrieved information and produce relevant text, forming the final output of the model. The optimization of the generator aims to ensure that the generated text is both natural and ef-fectively leverages the retrieved documents to better meet the user’s query needs.	在RAG模型中優(yōu)化生成器是其體系結(jié)構(gòu)的一個(gè)關(guān)鍵方面。生成器的作用是獲取檢索到的信息并生成相關(guān)文本，形成模型的最終輸出。生成器的優(yōu)化旨在確保生成的文本既自然又有效地利用檢索到的文檔來(lái)更好地滿(mǎn)足用戶(hù)的查詢(xún)需求。
In standard LLMs generation tasks, the input typically consists of a query. RAG stands out by incorporating not only a query but also various retrieved documents (struc-tured/unstructured) by the retriever into the input. This ad-ditional information can significantly influence the model’s understanding, particularly for smaller models. In such cases, fine-tuning the model to adapt to the input of both query and retrieved documents becomes crucial. Before presenting the input to the fine-tuned model, post-retrieval processing usu-ally occurs for the documents retrieved by the retriever. It is essential to note that the fine-tuning method for the genera-tor in RAG aligns with the general fine-tuning approach for LLMs. In the following, we will briefly describe some rep-resentative works involving data (formatted/unformatted) and optimization functions.	在標(biāo)準(zhǔn)llm生成任務(wù)中，輸入通常由查詢(xún)組成。RAG的突出之處在于，它不僅將查詢(xún)，還將檢索器檢索到的各種文檔(結(jié)構(gòu)化/非結(jié)構(gòu)化)合并到輸入中。這些附加信息可以顯著地影響模型的理解，特別是對(duì)于較小的模型。在這種情況下，對(duì)模型進(jìn)行微調(diào)以適應(yīng)查詢(xún)和檢索文檔的輸入變得至關(guān)重要。在將輸入呈現(xiàn)給微調(diào)模型之前，通常會(huì)對(duì)檢索器檢索到的文檔進(jìn)行檢索后處理。必須注意的是，RAG中用于genera-tor的微調(diào)方法與用于llm的通用微調(diào)方法是一致的。下面，我們將簡(jiǎn)要介紹一些涉及數(shù)據(jù)(格式化/未格式化)和優(yōu)化函數(shù)的代表性工作。
General Optimization Process As part of the general optimization process, the training data typically consists of input-output pairs, aiming to train the model to produce the output y given the input x. In the work of Self-Mem [Cheng et al., 2023b], a traditional training process is employed, where given the input x, relevant documents z are retrieved (selecting Top-1 in the paper), and after integrating (x, z), the model generates the output y. The paper utilizes two common paradigms for fine-tuning, namely Joint-Encoder and Dual-Encoder [Arora et al., 2023, Wang et al., 2022b, Lewis et al., 2020, Xia et al., 2019, Cai et al., 2021, Cheng et al., 2022].	一般優(yōu)化過(guò)程作為一般優(yōu)化過(guò)程的一部分，訓(xùn)練數(shù)據(jù)通常由輸入輸出對(duì)組成，目的是訓(xùn)練模型在給定輸入x的情況下產(chǎn)生輸出y。在Self-Mem [Cheng et al.， 2023b]的工作中，采用了傳統(tǒng)的訓(xùn)練過(guò)程，給定輸入x，檢索相關(guān)文檔z(在本文中選擇Top-1)，在對(duì)(x, z)進(jìn)行積分后，模型生成輸出y。本文采用了兩種常見(jiàn)的范式進(jìn)行微調(diào):即聯(lián)合編碼器和雙編碼器[Arora等，2023,Wang等，2022b, Lewis等，2020,Xia等，2019,Cai等，2021,Cheng等，2022]。
In the Joint-Encoder paradigm, a standard model based on an encoder-decoder is used. Here, the encoder initially en-codes the input, and the decoder, through attention mecha-nisms, combines the encoded results to generate tokens in an autoregressive manner. On the other hand, in the Dual-Encoder paradigm, the system sets up two independent en-coders, with each encoder encoding the input (query, con-text) and the document, respectively. The resulting out-puts undergo bidirectional cross-attention processing by the decoder in sequence. Both architectures utilize the Trans-former [Vaswani et al., 2017] as the foundational block and optimize with Negative Log-Likelihood loss.	在聯(lián)合編碼器范例中，使用了基于編碼器-解碼器的標(biāo)準(zhǔn)模型。在這里，編碼器最初對(duì)輸入進(jìn)行編碼，而解碼器通過(guò)注意機(jī)制組合編碼結(jié)果，以自回歸的方式生成標(biāo)記。另一方面，在雙編碼器范例中，系統(tǒng)設(shè)置了兩個(gè)獨(dú)立的編碼器，每個(gè)編碼器分別編碼輸入(查詢(xún)、上下文)和文檔。由此產(chǎn)生的輸出由解碼器按順序進(jìn)行雙向交叉注意處理。這兩種架構(gòu)都使用transformer [Vaswani等人，2017]作為基礎(chǔ)塊，并使用負(fù)對(duì)數(shù)似然損失進(jìn)行優(yōu)化。
Utilizing Contrastive Learning In the phase of preparing training data for language mod-els, interaction pairs of input and output are usually created. This traditional method can lead to ”exposure bias,” where the model is only trained on individual, correct output ex-amples, thus restricting its exposure to a range of possible outputs citesequence. This limitation can hinder the model’s real-world performance by causing it to overfit to the partic-ular examples in the training set, thereby reducing its ability to generalize across various contexts.	運(yùn)用對(duì)比學(xué)習(xí) 在為語(yǔ)言模型準(zhǔn)備訓(xùn)練數(shù)據(jù)的階段，通常會(huì)創(chuàng)建輸入和輸出的交互對(duì)。這種傳統(tǒng)方法可能導(dǎo)致“暴露偏差”，即模型只在單個(gè)正確的輸出樣本上進(jìn)行訓(xùn)練，從而限制了其暴露于一系列可能的輸出序列。這種限制可能會(huì)導(dǎo)致模型過(guò)度擬合訓(xùn)練集中的特定示例，從而降低其在各種上下文中泛化的能力，從而阻礙模型的實(shí)際性能。
To mitigate exposure bias, SURGE [Kang et al., 2023] proposes the use of graph-text contrastive learning. This method includes a contrastive learning objective that prompts the model to produce a range of plausible and coherent re-sponses, expanding beyond the instances encountered in the training data. This approach is crucial in reducing overfitting and strengthening the model’s ability to generalize.	為了減輕暴露偏差，SURGE [Kang等人，2023]提出使用圖文對(duì)比學(xué)習(xí)。這種方法包括一個(gè)對(duì)比學(xué)習(xí)目標(biāo)，促使模型產(chǎn)生一系列合理和連貫的響應(yīng)，擴(kuò)展到訓(xùn)練數(shù)據(jù)中遇到的實(shí)例之外。這種方法對(duì)于減少過(guò)擬合和增強(qiáng)模型的泛化能力至關(guān)重要。
For retrieval tasks that engage with structured data, the SANTA framework [Li et al., 2023d] implements a tripartite training regimen to effectively encapsulate both structural and semantic nuances. The initial phase focuses on the retriever, where contrastive learning is harnessed to refine the query and document embeddings.	對(duì)于涉及結(jié)構(gòu)化數(shù)據(jù)的檢索任務(wù)，SANTA框架[Li et al.， 2023]實(shí)現(xiàn)了一個(gè)三方訓(xùn)練方案，以有效地封裝結(jié)構(gòu)和語(yǔ)義的細(xì)微差別。初始階段關(guān)注檢索器，利用對(duì)比學(xué)習(xí)來(lái)細(xì)化查詢(xún)和文檔嵌入。
Subsequently, the generator’s preliminary training stage employs contrastive learning to align the structured data with its unstructured document descriptions. In a further stage of generator training, the model acknowledges the critical role of entity semantics in the representation learning of textual data for retrieval, as highlighted by [Sciavolino et al., 2021, Zhang et al., 2019]. This process commences with the identi-fication of entities within the structured data, followed by the application of masks over these entities within the generator’s input data, thus setting the stage for the model to anticipate and predict these masked elements.	隨后，生成器的初步訓(xùn)練階段采用對(duì)比學(xué)習(xí)將結(jié)構(gòu)化數(shù)據(jù)與其非結(jié)構(gòu)化文檔描述對(duì)齊。在生成器訓(xùn)練的進(jìn)一步階段，該模型承認(rèn)實(shí)體語(yǔ)義在文本數(shù)據(jù)的表示學(xué)習(xí)中起著關(guān)鍵作用，如[Sciavolino等人，2021,Zhang等人，2019]所強(qiáng)調(diào)的那樣。這個(gè)過(guò)程從識(shí)別結(jié)構(gòu)化數(shù)據(jù)中的實(shí)體開(kāi)始，然后在生成器的輸入數(shù)據(jù)中對(duì)這些實(shí)體應(yīng)用掩碼，從而為模型預(yù)測(cè)和預(yù)測(cè)這些掩碼元素奠定基礎(chǔ)。
The training regimen progresses with the model learning to reconstruct the masked entities by leveraging contextual information. This exercise cultivates the model’s comprehen-sion of the textual data’s structural semantics and facilitates the alignment of pertinent entities within the structured data. The overarching optimization goal is to train the language model to accurately restore the obscured spans, thereby en-riching its understanding of entity semantics [Ye et al., 2020].	訓(xùn)練方案隨著模型學(xué)習(xí)的進(jìn)展，利用上下文信息重構(gòu)被掩蓋的實(shí)體。這個(gè)練習(xí)培養(yǎng)了模型對(duì)文本數(shù)據(jù)的結(jié)構(gòu)語(yǔ)義的理解，并促進(jìn)了結(jié)構(gòu)化數(shù)據(jù)中相關(guān)實(shí)體的對(duì)齊。總體優(yōu)化目標(biāo)是訓(xùn)練語(yǔ)言模型準(zhǔn)確地恢復(fù)模糊的跨度，從而豐富其對(duì)實(shí)體語(yǔ)義的理解[Ye et al.， 2020]。

6 Augmentation in RAG

?This section is structured around three key aspects: the aug-mentation stage, sources of augmentation data, and the aug-mentation process. These facets elucidate the critical tech-nologies pivotal to RAG’s development. A taxonomy of RAG’s core components is presented in Figure 4.

本節(jié)圍繞三個(gè)關(guān)鍵方面展開(kāi):增強(qiáng)階段、增強(qiáng)數(shù)據(jù)的來(lái)源和增強(qiáng)過(guò)程。這些方面闡明了對(duì)RAG發(fā)展至關(guān)重要的關(guān)鍵技術(shù)。RAG的核心組件的分類(lèi)如圖4所示。

6.1 RAG in Augmentation Stages

?RAG, a knowledge-intensive endeavor, incorporates a vari-ety of technical methodologies across the pre-training, fine-tuning, and inference stages of language model training.	RAG是一項(xiàng)知識(shí)密集型的工作，它在語(yǔ)言模型訓(xùn)練的預(yù)訓(xùn)練、微調(diào)和推理階段整合了各種技術(shù)方法。
Pre-training Stage During the pre-training stage, researchers have investigated methods to bolster PTMs for open-domain QA through?retrieval-based strategies. The REALM model adopts a struc-tured, interpretable method for knowledge embedding, fram-ing pre-training, and fine-tuning as a retrieve-then-predict workflow within the masked language model (MLM) frame-work [Arora et al., 2023] .	訓(xùn)練的階段在預(yù)訓(xùn)練階段，研究人員研究了通過(guò)基于檢索的策略來(lái)支持開(kāi)放域QA的ptm的方法。REALM模型采用結(jié)構(gòu)化、可解釋的方法進(jìn)行知識(shí)嵌入、框架預(yù)訓(xùn)練和微調(diào)，作為掩模語(yǔ)言模型(MLM)框架內(nèi)的檢索-預(yù)測(cè)工作流[Arora等人，2023]。
RETRO [Borgeaud et al., 2022] leverages retrieval aug-mentation for large-scale pre-training from scratch, achieving a reduction in model parameters while surpassing standard GPT models in terms of perplexity. RETRO distinguishes it-self with an additional encoder designed to process features of entities retrieved from an external knowledge base, build-ing on the foundational structure of GPT models.	RETRO [Borgeaud等人，2022]利用檢索增強(qiáng)從頭開(kāi)始進(jìn)行大規(guī)模預(yù)訓(xùn)練，實(shí)現(xiàn)了模型參數(shù)的減少，同時(shí)在困惑度方面超過(guò)了標(biāo)準(zhǔn)GPT模型。RETRO的獨(dú)特之處在于它有一個(gè)額外的編碼器，該編碼器設(shè)計(jì)用于處理從外部知識(shí)庫(kù)檢索到的實(shí)體的特征，建立在GPT模型的基礎(chǔ)結(jié)構(gòu)上。
Atlas[Izacard et al., 2022] also incorporates a retrieval mechanism into the T5 architecture [Raffel et al., 2020] in both the pre-training and fine-tuning stages. It uses a pre-trained T5 to initialize the encoder-decoder language model and a pre-trained Contriever for the dense retriever, improv-ing its efficiency for complex language modeling tasks.	Atlas[Izacard等人，2022]還在預(yù)訓(xùn)練和微調(diào)階段將檢索機(jī)制納入T5架構(gòu)[rafael等人，2020]。它使用預(yù)訓(xùn)練的T5來(lái)初始化編碼器-解碼器語(yǔ)言模型，使用預(yù)訓(xùn)練的Contriever來(lái)初始化密集檢索器，從而提高了復(fù)雜語(yǔ)言建模任務(wù)的效率。
Furthermore, COG [Lan et al., 2022] introduces a novel text generation methodology that emulates copying text frag-ments from pre-existing collections. Utilizing efficient vector search tools, COG computes and indexes contextually mean-ingful representations of text fragments, demonstrating supe-rior performance in domains such as question-answering and domain adaptation when compared to RETRO.	此外，COG [Lan等人，2022]引入了一種新的文本生成方法，該方法模擬從預(yù)先存在的集合中復(fù)制文本片段。利用高效的向量搜索工具，COG計(jì)算和索引文本片段的上下文有意義的表示，與RETRO相比，在問(wèn)答和領(lǐng)域適應(yīng)等領(lǐng)域表現(xiàn)出卓越的性能。
The advent of scaling laws has catalyzed the growth of model parameters, propelling autoregressive models into the mainstream. Researchers are expanding the RAG approach to pretrained larger models, with RETRO++ exemplifying this trend by scaling up the model parameters while preserving or enhancing performance [Wang et al., 2023b].	標(biāo)度定律的出現(xiàn)促進(jìn)了模型參數(shù)的增長(zhǎng)，推動(dòng)自回歸模型成為主流。研究人員正在將RAG方法擴(kuò)展到預(yù)訓(xùn)練更大的模型，RETRO++通過(guò)在保持或增強(qiáng)性能的同時(shí)擴(kuò)大模型參數(shù)來(lái)體現(xiàn)這一趨勢(shì)[Wang等人，2023b]。
Empirical evidence underscores marked improvements in text generation quality, factual accuracy, reduced toxicity, and downstream task proficiency, especially in knowledge-intensive applications like open-domain QA. These results imply that integrating retrieval mechanisms into the pretraining of autoregressive language models constitutes a promising avenue, marrying sophisticated retrieval tech-niques with expansive language models to yield more precise and efficient language generation.	經(jīng)驗(yàn)證據(jù)強(qiáng)調(diào)了在文本生成質(zhì)量、事實(shí)準(zhǔn)確性、降低毒性和下游任務(wù)熟練程度方面的顯著改進(jìn)，特別是在像開(kāi)放領(lǐng)域QA這樣的知識(shí)密集型應(yīng)用中。這些結(jié)果表明，將檢索機(jī)制集成到自回歸語(yǔ)言模型的預(yù)訓(xùn)練中是一條很有前途的途徑，將復(fù)雜的檢索技術(shù)與廣泛的語(yǔ)言模型相結(jié)合，以產(chǎn)生更精確和有效的語(yǔ)言生成。
The benefits of augmented pre-training include a robust foundational model that outperforms standard GPT models in perplexity, text generation quality, and task-specific per-formance, all while utilizing fewer parameters. This method is particularly adept at handling knowledge-intensive tasks and facilitates the development of domain-specific models through training on specialized corpora.	增強(qiáng)預(yù)訓(xùn)練的好處包括一個(gè)健壯的基礎(chǔ)模型，該模型在困惑度、文本生成質(zhì)量和特定任務(wù)性能方面優(yōu)于標(biāo)準(zhǔn)GPT模型，同時(shí)使用更少的參數(shù)。這種方法特別擅長(zhǎng)處理知識(shí)密集型任務(wù)，并通過(guò)對(duì)專(zhuān)門(mén)語(yǔ)料庫(kù)的訓(xùn)練促進(jìn)特定領(lǐng)域模型的開(kāi)發(fā)。
Nonetheless, this approach faces challenges such as the necessity for extensive pre-training datasets and resources, as well as diminished update frequencies with increasing model sizes. Despite these hurdles, the approach offers significant advantages in model resilience. Once trained, retrieval-enhanced models can operate independently of ex-ternal libraries, enhancing generation speed and operational efficiency. The potential gains identified render this method-ology a compelling subject for ongoing investigation and in-novation in artificial intelligence and machine learning.	盡管如此，這種方法面臨著挑戰(zhàn)，例如需要廣泛的預(yù)訓(xùn)練數(shù)據(jù)集和資源，以及隨著模型大小的增加而減少的更新頻率。盡管存在這些障礙，但該方法在模型彈性方面提供了顯著的優(yōu)勢(shì)。經(jīng)過(guò)訓(xùn)練后，檢索增強(qiáng)模型可以獨(dú)立于外部庫(kù)運(yùn)行，從而提高了生成速度和操作效率。所確定的潛在收益使這種方法成為人工智能和機(jī)器學(xué)習(xí)領(lǐng)域正在進(jìn)行的研究和創(chuàng)新的引人注目的主題。
Fine-tuning Stage RAG and Fine-tuning are powerful tools for enhancing LLMs, and combining the two can meet the needs of more specific scenarios. On one hand, fine-tuning allows for the retrieval of documents with a unique style, achieving bet-ter semantic expression and aligning the differences between queries and documents. This ensures that the output of the retriever is more aptly suited to the scenario at hand. On the other hand, fine-tuning can fulfill the generation needs of making stylized and targeted adjustments. Furthermore, fine-tuning can also be used to align the retriever and generator for improved model synergy.	微調(diào)階段 RAG和Fine-tuning是增強(qiáng)llm的強(qiáng)大工具，將兩者結(jié)合起來(lái)可以滿(mǎn)足更具體場(chǎng)景的需求。一方面，微調(diào)允許檢索具有獨(dú)特樣式的文檔，實(shí)現(xiàn)更好的語(yǔ)義表達(dá)，并調(diào)整查詢(xún)和文檔之間的差異。這確保了檢索器的輸出更適合手頭的場(chǎng)景。另一方面，微調(diào)可以滿(mǎn)足進(jìn)行風(fēng)格化和針對(duì)性調(diào)整的生成需求。此外，微調(diào)還可以用于對(duì)齊檢索器和生成器，以改進(jìn)模型協(xié)同。
The main goal of fine-tuning the retriever is to improve the quality of semantic representations, achieved by directly fine-tuning the Embedding model using a corpus [Liu, 2023]. By aligning the retriever’s capabilities with the prefer-ences of the LLMs through feedback signals, both can be better coordinated [Yu et al., 2023b, Izacard et al., 2022, Yang et al., 2023b, Shi et al., 2023]. Fine-tuning the retriever for specific downstream tasks can lead to improved adapt-ability [cite]. The introduction of task-agnostic fine-tuning aims to enhance the retriever’s versatility in multi-task sce-narios [Cheng et al., 2023a].	微調(diào)檢索器的主要目標(biāo)是通過(guò)使用語(yǔ)料庫(kù)直接微調(diào)嵌入模型來(lái)提高語(yǔ)義表示的質(zhì)量[Liu, 2023]。通過(guò)反饋信號(hào)使尋回犬的能力與llm的偏好一致，可以更好地協(xié)調(diào)兩者[Yu et al.， 2023b, Izacard et al.， 2022, Yang et al.， 2023b, Shi et al.， 2023]。為特定的下游任務(wù)微調(diào)檢索器可以提高適應(yīng)能力[引用]。引入任務(wù)不可知微調(diào)的目的是增強(qiáng)尋回犬在多任務(wù)場(chǎng)景中的多功能性[Cheng等人，2023a]。
Fine-tuning generator can result in outputs that are more stylized and customized. On one hand, it allows for specialized adaptation to different input data formats. For example, fine-tuning LLMs to fit the structure of knowledge graphs [Kang et al., 2023], the structure of text pairs [Kang et al., 2023, Cheng et al., 2023b], and other spe-cific structures [Li et al., 2023d]. On the other hand, by con-structing directive datasets, one can demand LLMs to gen-erate specific formats content. For instance, in adaptive or iterative retrieval scenarios, LLMs are fine-tuned to generate content that will help determine the timing for the next step of action [Jiang et al., 2023b, Asai et al., 2023].	微調(diào)生成器可以產(chǎn)生更加風(fēng)格化和定制的輸出。一方面，它允許專(zhuān)門(mén)適應(yīng)不同的輸入數(shù)據(jù)格式。例如，微調(diào)llm以擬合知識(shí)圖的結(jié)構(gòu)[Kang等人，2023]、文本對(duì)的結(jié)構(gòu)[Kang等人，2023,Cheng等人，2023b]和其他特定結(jié)構(gòu)[Li等人，2023d]。另一方面，通過(guò)構(gòu)建指令數(shù)據(jù)集，可以要求llm生成特定格式的內(nèi)容。例如，在自適應(yīng)或迭代檢索場(chǎng)景中，llm被微調(diào)以生成有助于確定下一步行動(dòng)時(shí)間的內(nèi)容[Jiang等人，2023b, Asai等人，2023]。
By synergistically fine-tuning both the retriever and the generator, we can enhance the model’s generalization capabilities and avoid overfitting that may arise from training them separately. However, joint fine-tuning also leads to increased resource consumption. RA-DIT [Lin et al., 2023] presents a lightweight, dual-instruction tuning framework that can effectively add retrieval capabilities to any LLMs. The retrieval-enhanced directive fine-tuning updates the LLM, guiding it to make more efficient use of the information re-trieved and to disregard distracting content.	通過(guò)協(xié)同微調(diào)檢索器和生成器，我們可以增強(qiáng)模型的泛化能力，并避免單獨(dú)訓(xùn)練它們可能產(chǎn)生的過(guò)擬合。然而，聯(lián)合微調(diào)也會(huì)導(dǎo)致資源消耗增加。RA-DIT [Lin等，2023]提出了一種輕量級(jí)的雙指令調(diào)優(yōu)框架，可以有效地為任何llm添加檢索功能。檢索增強(qiáng)指令微調(diào)更新LLM，指導(dǎo)它更有效地利用檢索到的信息，并忽略分散注意力的內(nèi)容。
Despite its advantages, fine-tuning has limitations, includ-ing the need for specialized datasets for RAG fine-tuning and the requirement for significant computational resources. However, this stage allows for customizing models to specific needs and data formats, potentially reducing resource usage compared to the pre-training phase while still being able to fine-tune the model’s output style.	盡管有其優(yōu)點(diǎn)，但微調(diào)也有局限性，包括需要專(zhuān)門(mén)的數(shù)據(jù)集進(jìn)行RAG微調(diào)以及需要大量的計(jì)算資源。然而，這個(gè)階段允許根據(jù)特定的需求和數(shù)據(jù)格式定制模型，與預(yù)訓(xùn)練階段相比，潛在地減少了資源使用，同時(shí)仍然能夠微調(diào)模型的輸出樣式。
In summary, the fine-tuning stage is essential for the adap-tation of RAG models to specific tasks, enabling the refine-ment of both retrievers and generators. This stage enhances the model’s versatility and adaptability to various tasks, de-spite the challenges presented by resource and dataset re-quirements. The strategic fine-tuning of RAG models is therefore a critical component in the development of efficient and effective retrieval-augmented systems.	總之，微調(diào)階段對(duì)于使RAG模型適應(yīng)特定的任務(wù)是必不可少的，從而可以對(duì)檢索器和生成器進(jìn)行細(xì)化。這一階段增強(qiáng)了模型的通用性和對(duì)各種任務(wù)的適應(yīng)性，盡管存在資源和數(shù)據(jù)集需求帶來(lái)的挑戰(zhàn)。因此，RAG模型的戰(zhàn)略性微調(diào)是開(kāi)發(fā)高效和有效的檢索增強(qiáng)系統(tǒng)的關(guān)鍵組成部分。
Inference Stage The inference stage in RAG models is crucial, as it in-volves extensive integration with LLMs. Traditional RAG approaches, also known as Naive RAG, involve incorporating retrieval content at this stage to guide the generation process.	推理階段 RAG模型中的推理階段是至關(guān)重要的，因?yàn)樗婕暗脚cllm的廣泛集成。傳統(tǒng)的RAG方法，也稱(chēng)為樸素RAG，涉及在此階段合并檢索內(nèi)容以指導(dǎo)生成過(guò)程。
To overcome the limitations of Naive RAG, advanced tech-niques introduce more contextually rich information dur-ing inference. The DSP framework [Khattab et al., 2022] utilizes a sophisticated exchange of natural language text between fronzen LMs and retrieval models (RMs), en-riching the context and thereby improving generation out-comes. The PKG [Luo et al., 2023] method equips LLMs with a knowledge-guided module that allows for the retrieval of pertinent information without modifying the LMs’ pa-rameters, enabling more complex task execution. CREA-ICL [Li et al., 2023b] employs a synchronous retrieval of cross-lingual knowledge to enhance context, while RE-CITE [Sun et al., 2022] generates context by sampling para-graphs directly from LLMs.	為了克服樸素RAG的局限性，先進(jìn)的技術(shù)在推理過(guò)程中引入了更多上下文豐富的信息。DSP框架[Khattab等人，2022]利用前沿lm和檢索模型(rm)之間復(fù)雜的自然語(yǔ)言文本交換，豐富了上下文，從而改善了生成結(jié)果。PKG [Luo等人，2023]方法為llm配備了一個(gè)知識(shí)引導(dǎo)模塊，該模塊允許在不修改LMs的pa參數(shù)的情況下檢索相關(guān)信息，從而能夠執(zhí)行更復(fù)雜的任務(wù)。CREA-ICL [Li et al.， 2023b]采用跨語(yǔ)言知識(shí)的同步檢索來(lái)增強(qiáng)上下文，而RE-CITE [Sun et al.， 2022]通過(guò)直接從llm中采樣段落來(lái)生成上下文。
Further refinement of the RAG process during infer-ence is seen in approaches that cater to tasks necessi-tating multi-step reasoning. ITRG [Feng et al., 2023] it-eratively retrieves information to identify the correct rea-soning paths, thereby improving task adaptability. ITER-RETGEN [Shao et al., 2023] follows an iterative strat-egy, merging retrieval and generation in a cyclical pro-cess that alternates between “retrieval-enhanced generation” and “generation-enhanced retrieval”. For non-knowledge-intensive (NKI) tasks, PGRA [Guo et al., 2023] proposes a two-stage framework, starting with a task-agnostic retriever followed by a prompt-guided reranker to select and priori-tize evidence. In contrast, IRCOT [Trivedi et al., 2022] com-bines RAG with Chain of Thought (CoT) methodologies, al-ternating CoT-guided retrievals with retrieval-informed CoT processes, significantly boosting GPT-3’s performance across?various question-answering tasks.	在推理過(guò)程中，RAG過(guò)程的進(jìn)一步細(xì)化可以在滿(mǎn)足需要多步驟推理的任務(wù)的方法中看到。ITRG [Feng et .， 2023]通過(guò)迭代檢索信息來(lái)識(shí)別正確的推理路徑，從而提高任務(wù)適應(yīng)性。ITER-RETGEN [Shao et al.， 2023]采用迭代策略，在“檢索增強(qiáng)生成”和“生成增強(qiáng)檢索”之間交替的循環(huán)過(guò)程中合并檢索和生成。對(duì)于非知識(shí)密集型(NKI)任務(wù)，PGRA [Guo等人，2023]提出了一個(gè)兩階段框架，首先是任務(wù)不可知的檢索器，然后是提示引導(dǎo)的重新排序器，以選擇和優(yōu)先排序證據(jù)。相比之下，IRCOT [Trivedi等人，2022]將RAG與思維鏈(CoT)方法相結(jié)合，將思維鏈引導(dǎo)的檢索與檢索通知的CoT過(guò)程相結(jié)合，顯著提高了GPT-3在各種問(wèn)答任務(wù)中的表現(xiàn)。
In essence, these inference-stage enhancements provide lightweight, cost-effective alternatives that leverage the ca-pabilities of pre-trained models without necessitating further training. The principal advantage is maintaining static LLM parameters while supplying contextually relevant information to meet specific task demands. Nevertheless, this approach is not without limitations, as it requires meticulous data pro-cessing and optimization, and is bound by the foundational model’s intrinsic capabilities. To address diverse task require-ments effectively, this method is often paired with procedural optimization techniques such as step-wise reasoning, iterative retrieval, and adaptive retrieval strategies.	從本質(zhì)上講，這些推理階段的增強(qiáng)提供了輕量級(jí)的、經(jīng)濟(jì)有效的替代方案，可以利用預(yù)訓(xùn)練模型的功能，而不需要進(jìn)一步的訓(xùn)練。其主要優(yōu)點(diǎn)是在提供上下文相關(guān)信息以滿(mǎn)足特定任務(wù)需求的同時(shí)維護(hù)靜態(tài)LLM參數(shù)。然而，這種方法并非沒(méi)有局限性，因?yàn)樗枰?xì)致的數(shù)據(jù)處理和優(yōu)化，并且受到基礎(chǔ)模型固有能力的約束。為了有效地解決不同的任務(wù)需求，這種方法通常與過(guò)程優(yōu)化技術(shù)相結(jié)合，如分步推理、迭代檢索和自適應(yīng)檢索策略。

6.2 Augmentation Source

?The effectiveness of RAG models is heavily impacted by the selection of data sources for augmentation. Different levels of knowledge and dimensions require distinct processing tech-niques. They are categorized as unstructured data, structured data, and content generated by LLMs. The technology tree of representative RAG research with different augmentation aspects is depicted in Figure 5. The leaves, colored in three different shades, represent enhancements using various types of data: unstructured data, structured data, and content gener-ated by LLMs. The diagram clearly shows that initially, aug-mentation was mainly achieved through unstructured data, such as pure text. This approach later expanded to include the use of structured data (e.g. knowledge graph) for further improvement. More recently, there has been a growing trend in research that utilizes content generated by the LLMs them-selves for retrieval and augmentation purposes.	擴(kuò)充數(shù)據(jù)源的選擇嚴(yán)重影響RAG模型的有效性。不同的知識(shí)層次和維度需要不同的處理技術(shù)。它們分為非結(jié)構(gòu)化數(shù)據(jù)、結(jié)構(gòu)化數(shù)據(jù)和法學(xué)碩士生成的內(nèi)容。具有代表性的不同增強(qiáng)方面的RAG研究技術(shù)樹(shù)如圖5所示。葉子以三種不同的深淺顏色表示使用不同類(lèi)型數(shù)據(jù)的增強(qiáng):非結(jié)構(gòu)化數(shù)據(jù)、結(jié)構(gòu)化數(shù)據(jù)和llm生成的內(nèi)容。圖表清楚地表明，最初，增強(qiáng)主要是通過(guò)非結(jié)構(gòu)化數(shù)據(jù)，如純文本來(lái)實(shí)現(xiàn)的。這種方法后來(lái)擴(kuò)展到包括使用結(jié)構(gòu)化數(shù)據(jù)(例如知識(shí)圖)以進(jìn)一步改進(jìn)。最近，在研究中有一種日益增長(zhǎng)的趨勢(shì)，即利用llm本身生成的內(nèi)容進(jìn)行檢索和增強(qiáng)。
Augmented with Unstructured Data Unstructured text, is gathered from corpora, such as prompt data for fine-tuning large models [Cheng et al., 2023a] and cross-lingual data [Li et al., 2023b]. Retrieval units vary from tokens (e.g., kNN-LM [Khandelwal et al., 2019]) to phrases (e.g., NPM, COG [Lee et al., 2020, Lan et al., 2022]) and document paragraphs, with finer granularities offering pre-cision at the cost of increased retrieval complexity.	擴(kuò)充非結(jié)構(gòu)化數(shù)據(jù) 非結(jié)構(gòu)化文本從語(yǔ)料庫(kù)中收集，例如用于微調(diào)大型模型的提示數(shù)據(jù)[Cheng等人，2023a]和跨語(yǔ)言數(shù)據(jù)[Li等人，2023b]。檢索單元從令牌(例如kNN-LM [Khandelwal等人，2019])到短語(yǔ)(例如NPM, COG [Lee等人，2020,Lan等人，2022])和文檔段落不等，更細(xì)的粒度以增加檢索復(fù)雜性為代價(jià)提供了精確的決策。
FLARE [Jiang et al., 2023b] introduces an active re-trieval approach, triggered by the LM’s generation of low-probability words. It creates a temporary sentence for doc-ument retrieval, then regenerates the sentence with the re-trieved context to predict subsequent sentences. RETRO uses the previous chunk to retrieve the nearest neighbor at the chunk level, combined with the previous chunk’s context, it guides the generation of the next chunk. To preserve causal-ity, the generation of the next block Ci only utilizes the near-est neighbor of the previous block N(Ci?1) and not N(Ci).	FLARE [Jiang等人，2023b]引入了一種主動(dòng)重新檢索方法，該方法由LM生成的低概率詞觸發(fā)。它為文檔檢索創(chuàng)建一個(gè)臨時(shí)句子，然后使用檢索到的上下文重新生成該句子，以預(yù)測(cè)后續(xù)的句子。RETRO使用前一個(gè)塊來(lái)檢索塊級(jí)別上最近的鄰居，結(jié)合前一個(gè)塊的上下文，它指導(dǎo)下一個(gè)塊的生成。為了保持因果關(guān)系，下一個(gè)塊Ci的生成只利用前一個(gè)塊的最近鄰居N(Ci?1)而不是N(Ci)。
Augmented with Structured Data Structured data, such as knowledge graphs (KGs), pro-vide high-quality context and mitigate model hallucina-tions. RET-LLMs [Modarressi et al., 2023] constructs a knowledge graph memory from past dialogues for future ref-erence. SUGRE [Kang et al., 2023] employs Graph Neu-ral Networks (GNNs) to encode relevant KG subgraphs, ensuring consistency between retrieved facts and gener-ated text through multi-modal contrastive learning. KnowledGPT [Wang et al., 2023d] generates KB search queries and stores knowledge in a personalized base, enhancing the RAG model’s knowledge richness and contextuality.	增強(qiáng)結(jié)構(gòu)化數(shù)據(jù) 結(jié)構(gòu)化數(shù)據(jù)，如知識(shí)圖(KGs)，提供了高質(zhì)量的背景，減輕了模型幻覺(jué)。RET-LLMs [Modarressi et al.， 2023]從過(guò)去的對(duì)話中構(gòu)建了一個(gè)知識(shí)圖記憶，以供將來(lái)參考。SUGRE [Kang et al.， 2023]使用圖神經(jīng)網(wǎng)絡(luò)(Graph neural Networks, gnn)對(duì)相關(guān)KG子圖進(jìn)行編碼，通過(guò)多模態(tài)對(duì)比學(xué)習(xí)確保檢索事實(shí)與生成文本之間的一致性。KnowledGPT [Wang et al.， 2023]生成知識(shí)庫(kù)搜索查詢(xún)，并將知識(shí)存儲(chǔ)在個(gè)性化庫(kù)中，增強(qiáng)了RAG模型的知識(shí)豐富度和上下文性。
LLMs-Generated Content in RAG Addressing the limitations of external auxiliary information in RAG, some research has focused on exploiting LLMs’ in-ternal knowledge. SKR [Wang et al., 2023e] classifies ques-tions as known or unknown, applying retrieval enhancement selectively. GenRead [Yu et al., 2022] replaces the retriever with an LLM generator, finding that LLM-generated con-texts often contain more accurate answers due to better align-ment with the pre-training objectives of causal language mod-eling. Selfmem [Cheng et al., 2023b] iteratively creates an unbounded memory pool with a retrieval-enhanced genera-tor, using a memory selector to choose outputs that serve as dual problems to the original question, thus self-enhancing the generative model.	法學(xué)碩士生成的內(nèi)容在RAG 針對(duì)外部輔助信息在RAG中的局限性，一些研究側(cè)重于利用法學(xué)碩士的內(nèi)部知識(shí)。SKR [Wang et al.， 2023e]將問(wèn)題分類(lèi)為已知或未知，有選擇地應(yīng)用檢索增強(qiáng)。GenRead [Yu et al.， 2022]用LLM生成器取代了檢索器，發(fā)現(xiàn)LLM生成的上下文通常包含更準(zhǔn)確的答案，因?yàn)樗c因果語(yǔ)言建模的預(yù)訓(xùn)練目標(biāo)更一致。Selfmem [Cheng et al.， 2023b]使用檢索增強(qiáng)的genera-tor迭代創(chuàng)建無(wú)界內(nèi)存池，使用內(nèi)存選擇器選擇作為原始問(wèn)題的雙重問(wèn)題的輸出，從而自我增強(qiáng)生成模型。
These methodologies underscore the breadth of innovative data source utilization in RAG, striving to improve model per-formance and task effectiveness.	這些方法強(qiáng)調(diào)了RAG中創(chuàng)新數(shù)據(jù)源利用的廣度，努力提高模型性能和任務(wù)有效性。

6.3 Augmentation Process

?In the domain of RAG, the standard practice often involves a singular retrieval step followed by generation, which can lead to inefficiencies. A notable issue, termed the “l(fā)ost in the middle” phenomenon, arises when a single retrieval yields redundant content that may dilute or contradict es-sential information, thereby degrading the generation qual-ity [Liu et al., 2023a]. Furthermore, such singular retrieval is typically insufficient for complex problems demanding multi-step reasoning, as it provides a limited scope of informa-tion [Yoran et al., 2023].	在RAG領(lǐng)域中，標(biāo)準(zhǔn)實(shí)踐通常涉及單個(gè)檢索步驟，然后是生成，這可能導(dǎo)致效率低下。一個(gè)值得注意的問(wèn)題被稱(chēng)為“中間丟失”現(xiàn)象，當(dāng)單個(gè)檢索產(chǎn)生冗余內(nèi)容時(shí)，可能會(huì)稀釋或矛盾基本信息，從而降低生成質(zhì)量[Liu et al.， 2023a]。此外，這種奇異檢索通常不足以解決需要多步推理的復(fù)雜問(wèn)題，因?yàn)樗峁┑男畔⒎秶邢轠Yoran等人，2023]。
As illustrated in Figure 5, to circumvent these challenges, contemporary research has proposed methods for refining the retrieval process: iterative retrieval, recursive retrieval and adaptive retrieval. Iterative retrieval allows the model to en-gage in multiple retrieval cycles, enhancing the depth and relevance of the information obtained. Recursive retrieval process where the results of one retrieval operation are used as the input for the subsequent retrieval. It helps to delve deeper into relevant information, particularly when dealing with complex or multi-step queries. Recursive retrieval is of-ten used in scenarios where a gradual approach is needed to converge on a final answer, such as in academic research, le-gal case analysis, or certain types of data mining tasks. Adap-tive retrieval, on the other hand, offers a dynamic adjustment mechanism, tailoring the retrieval process to the specific de-mands of varying tasks and contexts.	如圖5所示，為了規(guī)避這些挑戰(zhàn)，當(dāng)代研究提出了改進(jìn)檢索過(guò)程的方法:迭代檢索、遞歸檢索和自適應(yīng)檢索。迭代檢索允許模型參與多個(gè)檢索周期，增強(qiáng)所獲得信息的深度和相關(guān)性。遞歸檢索過(guò)程，其中一次檢索操作的結(jié)果用作后續(xù)檢索的輸入。它有助于深入研究相關(guān)信息，特別是在處理復(fù)雜或多步驟查詢(xún)時(shí)。遞歸檢索通常用于需要逐步收斂于最終答案的場(chǎng)景，例如在學(xué)術(shù)研究、法律案例分析或某些類(lèi)型的數(shù)據(jù)挖掘任務(wù)中。另一方面，自適應(yīng)檢索提供了一種動(dòng)態(tài)調(diào)整機(jī)制，使檢索過(guò)程適應(yīng)不同任務(wù)和上下文的具體要求。
Iterative Retrieval Iterative retrieval in RAG models is a process where doc-uments are repeatedly collected based on the initial query and the text generated thus far, providing a more compre-hensive knowledge base for LLMs [Borgeaud et al., 2022, Arora et al., 2023]. This approach has been shown to en-hance the robustness of subsequent answer generation by of-fering additional contextual references through multiple re-trieval iterations. However, it may suffer from semantic dis-continuity and the accumulation of irrelevant information, as?it typically relies on a sequence of n tokens to demarcate the boundaries between generated text and retrieved documents.	迭代的檢索 RAG模型中的迭代檢索是基于初始查詢(xún)和迄今為止生成的文本重復(fù)收集文檔的過(guò)程，為法學(xué)碩士提供了更全面的知識(shí)庫(kù)[Borgeaud等人，2022,Arora等人，2023]。這種方法已被證明可以通過(guò)多次重新檢索迭代提供額外的上下文引用來(lái)增強(qiáng)后續(xù)答案生成的魯棒性。然而，它可能會(huì)受到語(yǔ)義不連續(xù)性和不相關(guān)信息積累的影響，因?yàn)樗ǔＲ蕾?lài)于n個(gè)令牌序列來(lái)劃定生成文本和檢索文檔之間的邊界。
To address specific data scenarios, recursive retrieval and multi-hop retrieval techniques are utilized. Recursive re-trieval involves a structured index to process and retrieve data in a hierarchical manner, which may include summa-rizing sections of a document or lengthy PDF before per-forming a retrieval based on this summary. Subsequently, a secondary retrieval within the document refines the search, embodying the recursive nature of the process. In contrast, multi-hop retrieval is designed to delve deeper into graph-structured data sources, extracting interconnected informa-tion [Li et al., 2023c].	為了解決特定的數(shù)據(jù)場(chǎng)景，使用了遞歸檢索和多跳檢索技術(shù)。遞歸重新檢索涉及到以分層方式處理和檢索數(shù)據(jù)的結(jié)構(gòu)化索引，其中可能包括在基于該摘要執(zhí)行檢索之前對(duì)文檔或冗長(zhǎng)PDF的各個(gè)部分進(jìn)行匯總。隨后，文檔中的二次檢索細(xì)化了搜索，體現(xiàn)了該過(guò)程的遞歸性質(zhì)。相比之下，多跳檢索旨在更深入地挖掘圖結(jié)構(gòu)數(shù)據(jù)源，提取相互關(guān)聯(lián)的信息[Li et al.， 2023c]。
Additionally, some methodologies integrate the steps of re-trieval and generation. ITER-RETGEN [Shao et al., 2023] employs a synergistic approach that leverages “retrieval-enhanced generation” alongside “generation-enhanced re-trieval” for tasks that necessitate the reproduction of specific information. The model harnesses the content required to ad-dress the input task as a contextual basis for retrieving per-tinent knowledge, which in turn facilitates the generation of improved responses in subsequent iterations.	此外，一些方法集成了重新檢索和生成的步驟。ITER-RETGEN [Shao等人，2023]采用了一種協(xié)同方法，在需要復(fù)制特定信息的任務(wù)中，利用“檢索增強(qiáng)生成”和“生成增強(qiáng)檢索”。該模型利用處理輸入任務(wù)所需的內(nèi)容作為檢索各大洲知識(shí)的上下文基礎(chǔ)，這反過(guò)來(lái)又促進(jìn)了在隨后的迭代中生成改進(jìn)的響應(yīng)。
Recursive Retrieval Recursive Retrieval is often used in information retrieval and NLP to improve the depth and relevance of search results.?The process involves iteratively refining search queries based on the results obtained from previous searches. Recursive Retrieval aims to enhance the search experience by gradu-ally converging on the most pertinent information through a feedback loop. IRCoT [Trivedi et al., 2022] uses chain-of-thought to guide the retrieval process and refines the CoT with the obtained retrieval results. ToC [Kim et al., 2023] creates a clarification tree that systematically optimizes the ambiguous parts in the Query. It can be particularly useful in complex search scenarios where the user’s needs are not en-tirely clear from the outset or where the information sought is highly specialized or nuanced. The recursive nature of the process allows for continuous learning and adaptation to the user’s requirements, often resulting in improved satisfaction with the search outcomes.	遞歸檢索遞歸檢索常用于信息檢索和自然語(yǔ)言處理，以提高搜索結(jié)果的深度和相關(guān)性。該過(guò)程涉及基于從以前的搜索中獲得的結(jié)果迭代地改進(jìn)搜索查詢(xún)。遞歸檢索旨在通過(guò)反饋循環(huán)逐步收斂到最相關(guān)的信息，從而增強(qiáng)搜索體驗(yàn)。IRCoT [Trivedi et al.， 2022]使用思維鏈(chain-of-thought)來(lái)指導(dǎo)檢索過(guò)程，并利用獲得的檢索結(jié)果對(duì)CoT進(jìn)行細(xì)化。ToC [Kim等人，2023]創(chuàng)建了一個(gè)澄清樹(shù)，系統(tǒng)地優(yōu)化查詢(xún)中的模糊部分。在復(fù)雜的搜索場(chǎng)景中，如果用戶(hù)的需求從一開(kāi)始就不完全清楚，或者所搜索的信息非常專(zhuān)門(mén)化或微妙，那么它特別有用。該過(guò)程的遞歸性質(zhì)允許不斷學(xué)習(xí)和適應(yīng)用戶(hù)的需求，通常會(huì)提高對(duì)搜索結(jié)果的滿(mǎn)意度。
Adaptive Retrieval Adaptive retrieval methods, exemplified by Flare and Self-RAG [Jiang et al., 2023b, Asai et al., 2023], refine the RAG framework by enabling LLMs to actively determine the op-timal moments and content for retrieval, thus enhancing the efficiency and relevance of the information sourced.	自適應(yīng)的檢索自適應(yīng)檢索方法，如Flare和Self-RAG [Jiang等人，2023b, Asai等人，2023]，通過(guò)使llm能夠主動(dòng)確定檢索的最優(yōu)時(shí)刻和內(nèi)容，從而提高信息源的效率和相關(guān)性，從而完善了RAG框架。
These methods are part of a broader trend wherein LLMs employ active judgment in their operations, as seen in model agents like AutoGPT, Toolformer, and Graph-Toolformer [Yang et al., 2023c, Schick et al., 2023,Zhang, 2023]. Graph-Toolformer, for instance, divides its re-trieval process into distinct steps where LLMs proactively use retrievers, apply Self-Ask techniques, and employ few-shot prompts to initiate search queries. This proactive stance al-lows LLMs to decide when to search for necessary informa-tion, akin to how an agent utilizes tools.	這些方法是llm在其操作中采用主動(dòng)判斷的更廣泛趨勢(shì)的一部分，正如在AutoGPT, Toolformer和Graph-Toolformer等模型代理中所看到的那樣[Yang等人，2023c, Schick等人，2023,Zhang, 2023]。例如，Graph-Toolformer將其檢索過(guò)程劃分為不同的步驟，其中l(wèi)lm主動(dòng)使用檢索器，應(yīng)用Self-Ask技術(shù)，并使用少量提示來(lái)啟動(dòng)搜索查詢(xún)。這種主動(dòng)的姿態(tài)允許llm決定何時(shí)搜索必要的信息，類(lèi)似于代理如何利用工具。
WebGPT [Nakano et al., 2021] integrates a reinforcement learning framework to train the GPT-3 model in au-tonomously using a search engine during text generation. It navigates this process using special tokens that facili-tate actions such as search engine queries, browsing results, and citing references, thereby expanding GPT-3’s capabilities through the use of external search engines.	WebGPT [Nakano等人，2021]集成了一個(gè)強(qiáng)化學(xué)習(xí)框架，在文本生成過(guò)程中使用搜索引擎自主訓(xùn)練GPT-3模型。它使用特殊的令牌來(lái)導(dǎo)航這個(gè)過(guò)程，這些令牌促進(jìn)了諸如搜索引擎查詢(xún)、瀏覽結(jié)果和引用引用等操作，從而通過(guò)使用外部搜索引擎擴(kuò)展了GPT-3的功能。
Flare automates timing retrieval by monitoring the confi-dence of the generation process, as indicated by the probabil-ity of generated terms [Jiang et al., 2023b]. When the prob-ability falls below a certain threshold would activates the re-trieval system to collect relevant information, thus optimizing the retrieval cycle.	耀斑通過(guò)監(jiān)測(cè)生成過(guò)程的置信度來(lái)自動(dòng)獲取時(shí)間，如生成項(xiàng)的概率所示[Jiang等，2023b]。當(dāng)概率低于一定閾值時(shí)，將激活檢索系統(tǒng)收集相關(guān)信息，從而優(yōu)化檢索周期。
Self-RAG [Asai et al., 2023] introduces “reflection to-kens” that allow the model to introspect its outputs. These tokens come in two varieties: “retrieve” and “critic”. The model autonomously decides when to activate retrieval, or alternatively, a predefined threshold may trigger the pro-cess. During retrieval, the generator conducts a fragment-level beam search across multiple paragraphs to derive the most coherent sequence. Critic scores are used to update the subdivision scores, with the flexibility to adjust these weights during inference, tailoring the model’s behavior. Self-RAG’s design obviates the need for additional classifiers or reliance on Natural Language Inference (NLI) models, thus stream-lining the decision-making process for when to engage re-trieval mechanisms and improving the model’s autonomous judgment capabilities in generating accurate responses.	Self-RAG [Asai等人，2023]引入了“反射因子”，允許模型自省其輸出。這些標(biāo)記有兩種:“檢索”和“批評(píng)”。模型自主地決定何時(shí)激活檢索，或者，預(yù)定義的閾值可能觸發(fā)該流程。在檢索過(guò)程中，生成器跨多個(gè)段落進(jìn)行片段級(jí)波束搜索，以獲得最連貫的序列。評(píng)論家分?jǐn)?shù)用于更新細(xì)分分?jǐn)?shù)，在推理過(guò)程中可以靈活地調(diào)整這些權(quán)重，從而調(diào)整模型的行為。Self-RAG的設(shè)計(jì)不需要額外的分類(lèi)器或依賴(lài)于自然語(yǔ)言推理(NLI)模型，從而簡(jiǎn)化了何時(shí)使用重新檢索機(jī)制的決策過(guò)程，并提高了模型在生成準(zhǔn)確響應(yīng)方面的自主判斷能力。
LLM optimization has received significant attention due to its increasing prevalence. Techniques such as prompt engi-neering, Fine-Tuning (FT), and RAG each have distinct char-acteristics, visually represented in Figure 6. While prompt engineering leverages a model’s inherent capabilities, opti-mizing LLMs often requires the application of both RAG and FT methods. The choice between RAG and FT should be based on the specific requirements of the scenario and the in-herent properties of each approach. A detailed comparison of RAG and FT is presented in Table 1.	LLM優(yōu)化由于其日益普及而受到了極大的關(guān)注。諸如提示工程、微調(diào)(FT)和RAG等技術(shù)各有不同的特征，如圖6所示。雖然快速工程利用了模型的固有功能，但優(yōu)化llm通常需要同時(shí)應(yīng)用RAG和FT方法。RAG和FT之間的選擇應(yīng)該基于場(chǎng)景的特定需求和每種方法的固有屬性。表1給出了RAG和FT的詳細(xì)比較。

6.4 RAG vs Fine-Tuning

?RAG is like giving a model a textbook for tailored informa-tion retrieval, perfect for specific queries. On the other hand, FT is like a student internalizing knowledge over time, bet-ter for replicating specific structures, styles, or formats. FT can improve model performance and efficiency by reinforc-ing base model knowledge, adjusting outputs, and teaching complex instructions. However, it is not as good for integrat-ing new knowledge or rapidly iterating new use cases.

RAG就像給模型提供了一本教科書(shū)，用于定制信息檢索，非常適合特定查詢(xún)。另一方面，《金融時(shí)報(bào)》就像一個(gè)學(xué)生，隨著時(shí)間的推移將知識(shí)內(nèi)化，更適合復(fù)制特定的結(jié)構(gòu)、風(fēng)格或格式。FT可以通過(guò)強(qiáng)化基礎(chǔ)模型知識(shí)、調(diào)整輸出和教授復(fù)雜指令來(lái)提高模型性能和效率。然而，它不適合集成新知識(shí)或快速迭代新用例。

The two methods, RAG and FT, are not mutually exclusive and can be complementary, augmenting a model’s capabil-ities at different levels. In some cases, their combined use may yield optimal performance. The optimization process?involving RAG and FT can necessitate multiple iterations to achieve satisfactory results.

這兩種方法，RAG和FT，并不是相互排斥的，而是可以互補(bǔ)的，可以在不同層次上增強(qiáng)模型的能力。在某些情況下，它們的組合使用可能產(chǎn)生最佳性能。涉及RAG和FT的優(yōu)化過(guò)程可能需要多次迭代才能獲得滿(mǎn)意的結(jié)果。

7 RAG Evaluation

?The rapid advancement and growing adoption of RAG in the field of Natural Language Processing (NLP) have propelled the evaluation of RAG models to the forefront of research in the LLMs community. The primary objective of this evalua-tion is to comprehend and optimize the performance of RAG models across diverse application scenarios.	RAG在自然語(yǔ)言處理(NLP)領(lǐng)域的快速發(fā)展和越來(lái)越多的采用，將RAG模型的評(píng)估推向了法學(xué)碩士社區(qū)研究的前沿。該評(píng)估的主要目標(biāo)是理解和優(yōu)化RAG模型跨不同應(yīng)用程序場(chǎng)景的性能。
Historically, RAG models assessments have centered on their execution in specific downstream tasks. These evaluations employ established metrics suitable to the tasks at hand. For instance, question answering evaluations might rely on EM and F1 scores [Wang et al., 2023a, Shi et al., 2023, Feng et al., 2023, Ma et al., 2023a], whereas fact-checking tasks often hinge on accuracy as the pri-mary metric [Lewis et al., 2020, Izacard et al., 2022, Shao et al., 2023]. Tools like RALLE, designed for the auto-matic evaluation of RAG applications, similarly base their as-sessments on these task-specific metrics [Hoshi et al., 2023]. Despite this, there is a notable paucity of research dedicated to evaluating the distinct characteristics of RAG models, with only a handful of related studies.	從歷史上看，RAG模型評(píng)估集中在它們?cè)谔囟ㄏ掠稳蝿?wù)中的執(zhí)行。這些評(píng)估采用適合手頭任務(wù)的既定指標(biāo)。例如，問(wèn)答評(píng)估可能依賴(lài)于EM和F1分?jǐn)?shù)[Wang等人，2023a, Shi等人，2023,Feng等人，2023,Ma等人，2023a]，而事實(shí)核查任務(wù)通常依賴(lài)于準(zhǔn)確性作為主要指標(biāo)[Lewis等人，2020,Izacard等人，2022,Shao等人，2023]。為RAG應(yīng)用程序的自動(dòng)評(píng)估而設(shè)計(jì)的工具，如RALLE，同樣基于這些特定于任務(wù)的指標(biāo)進(jìn)行評(píng)估[Hoshi等人，2023]。盡管如此，致力于評(píng)估RAG模型獨(dú)特特征的研究明顯缺乏，只有少數(shù)相關(guān)研究。
The following section shifts the focus from task-specific evaluation methods and metrics to provide a synthesis of the existing literature based on their unique attributes. This ex-ploration covers the objectives of RAG evaluation, the aspects along which these models are assessed, and the benchmarks and tools available for such evaluations. The aim is to offer a comprehensive overview of RAG model evaluation, outlining the methodologies that specifically address the unique aspects of these advanced generative systems.	以下部分將重點(diǎn)從特定于任務(wù)的評(píng)估方法和度量轉(zhuǎn)移到基于其獨(dú)特屬性的現(xiàn)有文獻(xiàn)的綜合。本文探討了RAG評(píng)估的目標(biāo)、評(píng)估這些模型的各個(gè)方面，以及可用于此類(lèi)評(píng)估的基準(zhǔn)和工具。目的是提供RAG模型評(píng)估的全面概述，概述了具體解決這些先進(jìn)生成系統(tǒng)獨(dú)特方面的方法。

7.1 Evaluation Targets

?The assessment of RAG models mainly revolves around two key components: the retrieval and generation modules. This division ensures a thorough evaluation of both the quality of context provided and the quality of content produced.

RAG模型的評(píng)估主要圍繞兩個(gè)關(guān)鍵組件進(jìn)行:檢索和生成模塊。這種劃分確保了對(duì)所提供的上下文質(zhì)量和所產(chǎn)生的內(nèi)容質(zhì)量的全面評(píng)估。

Retrieval Quality

Evaluating the retrieval quality is crucial for determining the effectiveness of the context sourced by the retriever com-ponent. Standard metrics from the domains of search en-gines, recommendation systems, and information retrieval systems are employed to measure the performance of the RAG retrieval module. Metrics such as Hit Rate, MRR, and NDCG are commonly utilized for this purpose [Liu, 2023, Nguyen, 2023].

檢索的質(zhì)量

評(píng)估檢索質(zhì)量對(duì)于確定檢索器組件來(lái)源的上下文的有效性至關(guān)重要。使用來(lái)自搜索引擎、推薦系統(tǒng)和信息檢索系統(tǒng)領(lǐng)域的標(biāo)準(zhǔn)度量來(lái)度量RAG檢索模塊的性能。命中率、MRR和NDCG等指標(biāo)通常用于此目的[Liu, 2023, Nguyen, 2023]。

Generation Quality

The assessment of generation quality centers on the gener-ator’s capacity to synthesize coherent and relevant answers from the retrieved context. This evaluation can be catego-rized based on the content’s objectives: unlabeled and la-beled content. For unlabeled content, the evaluation encom-passes the faithfulness, relevance, and non-harmfulness of the generated answers. In contrast, for labeled content, the fo-cus is on the accuracy of the information produced by the?model [Liu, 2023]. Additionally, both retrieval and genera-tion quality assessments can be conducted through manual or automatic evaluation methods [Liu, 2023, Lan et al., 2022, Leng et al., 2023].

一代質(zhì)量

發(fā)電質(zhì)量的評(píng)估集中在發(fā)電機(jī)從檢索上下文合成連貫和相關(guān)答案的能力上。這種評(píng)估可以根據(jù)內(nèi)容的目標(biāo)進(jìn)行分類(lèi):未標(biāo)記和標(biāo)記的內(nèi)容。對(duì)于未標(biāo)記的內(nèi)容，評(píng)估包括生成答案的可靠性、相關(guān)性和非危害性。相比之下，對(duì)于標(biāo)記的內(nèi)容，重點(diǎn)是模型產(chǎn)生的信息的準(zhǔn)確性[Liu, 2023]。此外，檢索和生成質(zhì)量評(píng)估都可以通過(guò)手動(dòng)或自動(dòng)評(píng)估方法進(jìn)行[Liu, 2023, Lan等，2022,Leng等，2023]。

?7.2 Evaluation Aspects

Contemporary evaluation practices of RAG models empha-size three primary quality scores and four essential abilities, which collectively inform the evaluation of the two principal targets of the RAG model: retrieval and generation.	當(dāng)代RAG模型的評(píng)估實(shí)踐強(qiáng)調(diào)三個(gè)主要質(zhì)量分?jǐn)?shù)和四個(gè)基本能力，它們共同通知了RAG模型的兩個(gè)主要目標(biāo)的評(píng)估:檢索和生成。
Quality Scores Quality scores include context relevance, answer faith-fulness, and answer relevance. These quality scores?evaluate the efficiency of the RAG model from differ-ent perspectives in the process of information retrieval and generation [Es et al., 2023, Saad-Falcon et al., 2023, Jarvis and Allard, 2023]. The quality scores—context rele-vance, answer faithfulness, and answer relevance—assess the RAG model’s efficiency from various angles throughout the information retrieval and generation process [Es et al., 2023, Saad-Falcon et al., 2023, Jarvis and Allard, 2023].	質(zhì)量分?jǐn)?shù) 質(zhì)量分?jǐn)?shù)包括上下文相關(guān)性、答案真實(shí)性和答案相關(guān)性。這些質(zhì)量分?jǐn)?shù)從不同角度評(píng)價(jià)RAG模型在信息檢索和生成過(guò)程中的效率[Es et al.， 2023; Saad-Falcon et al.， 2023; Jarvis and Allard, 2023]。質(zhì)量分?jǐn)?shù)——上下文相關(guān)性、答案忠實(shí)度和答案相關(guān)性——在整個(gè)信息檢索和生成過(guò)程中從不同角度評(píng)估RAG模型的效率[Es等人，2023;Saad-Falcon等人，2023;Jarvis和Allard, 2023]。
Context Relevance evaluates the precision and specificity of the retrieved context, ensuring relevance and minimizing processing costs associated with extraneous content.	上下文相關(guān)性評(píng)估檢索上下文的準(zhǔn)確性和特異性，確保相關(guān)性并最大限度地減少與無(wú)關(guān)內(nèi)容相關(guān)的處理成本。
Answer Faithfulness ensures that the generated answers re-main true to the retrieved context, maintaining consistency?and avoiding contradictions.	答案忠實(shí)確保生成的答案與檢索的上下文保持一致，保持一致性并避免矛盾。
Answer Relevance requires that the generated answers are directly pertinent to the posed questions, effectively address-ing the core inquiry.	答案相關(guān)性要求生成的答案與提出的問(wèn)題直接相關(guān)，有效地解決核心問(wèn)題。
Required Abilities RAG evaluation also encompasses four abilities indicative of its adaptability and efficiency: noise robustness, negative re-jection, information integration, and counterfactual robust-ness [Chen et al., 2023b, Liu et al., 2023b]. These abilities are critical for the model’s performance under various chal-lenges and complex scenarios, impacting the quality scores.	所需的能力 RAG評(píng)估還包括表明其適應(yīng)性和效率的四種能力:噪聲魯棒性、負(fù)面拒絕、信息整合和反事實(shí)魯棒性[Chen et al.， 2023b, Liu et al.， 2023b]。這些能力對(duì)于模型在各種挑戰(zhàn)和復(fù)雜場(chǎng)景下的性能至關(guān)重要，影響質(zhì)量分?jǐn)?shù)。
Noise Robustness appraises the model’s capability to man-age noise documents that are question-related but lack sub-stantive information.	噪聲魯棒性評(píng)價(jià)模型管理與問(wèn)題相關(guān)但缺乏實(shí)質(zhì)性信息的噪聲文件的能力。
Negative Rejection assesses the model’s discernment in re-fraining from responding when the retrieved documents do not contain the necessary knowledge to answer a question.	當(dāng)檢索到的文檔不包含回答問(wèn)題所需的知識(shí)時(shí)，負(fù)面拒絕評(píng)估模型在重新訓(xùn)練時(shí)的識(shí)別能力。
Information Integration evaluates the model’s proficiency in synthesizing information from multiple documents to ad-dress complex questions.	信息集成評(píng)估模型從多個(gè)文檔中綜合信息以解決復(fù)雜問(wèn)題的熟練程度。
Counterfactual Robustness tests the model’s ability to rec-ognize and disregard known inaccuracies within documents, even when instructed about potential misinformation.	反事實(shí)魯棒性測(cè)試模型識(shí)別和忽略文檔中已知不準(zhǔn)確的能力，即使在被告知可能存在錯(cuò)誤信息的情況下也是如此。
Context relevance and noise robustness are important for evaluating the quality of retrieval, while answer faithfulness, answer relevance, negative rejection, information integration, and counterfactual robustness are important for evaluating the?quality of generation.	上下文相關(guān)性和噪聲魯棒性對(duì)于評(píng)估檢索質(zhì)量很重要，而答案忠實(shí)度、答案相關(guān)性、負(fù)面拒絕、信息整合和反事實(shí)魯棒性對(duì)于評(píng)估生成質(zhì)量很重要。
The specific metrics for each evaluation aspect are summa-rized in Table 2. It is essential to recognize that these metrics, derived from related work, are traditional measures and do not yet represent a mature or standardized approach for quan-tifying RAG evaluation aspects. Custom metrics tailored to the nuances of RAG models, though not included here, have also been developed in some evaluation studies.	表2總結(jié)了每個(gè)評(píng)估方面的具體指標(biāo)。必須認(rèn)識(shí)到，這些源自相關(guān)工作的度量標(biāo)準(zhǔn)是傳統(tǒng)的度量標(biāo)準(zhǔn)，尚未代表對(duì)RAG評(píng)價(jià)方面進(jìn)行量化的成熟或標(biāo)準(zhǔn)化的方法。針對(duì)RAG模型的細(xì)微差別量身定制的度量標(biāo)準(zhǔn)，雖然沒(méi)有包括在這里，但也在一些評(píng)估研究中得到了開(kāi)發(fā)。

7.3 Evaluation Benchmarks and Tools

?This section delineates the evaluation framework for RAG models, comprising benchmark tests and automated eval-uation tools. These instruments furnish quantitative met-rics that not only gauge RAG model performance but also enhance comprehension of the model’s capabilities across various evaluation aspects. Prominent benchmarks such as RGB and RECALL [Chen et al., 2023b, Liu et al., 2023b] focus on appraising the essential abilities of RAG mod-els. Concurrently, state-of-the-art automated tools like RA-GAS [Es et al., 2023], ARES [Saad-Falcon et al., 2023], and TruLens8 employ LLMs to adjudicate the quality scores. These tools and benchmarks collectively form a robust frame-work for the systematic evaluation of RAG models, as sum-marized in Table 3.

本節(jié)描述RAG模型的評(píng)估框架，包括基準(zhǔn)測(cè)試和自動(dòng)評(píng)估工具。這些工具提供了定量的度量標(biāo)準(zhǔn)，不僅衡量RAG模型的性能，而且還增強(qiáng)了對(duì)模型跨各種評(píng)估方面的能力的理解。突出的基準(zhǔn)，如RGB和RECALL [Chen et al.， 2023b, Liu et al.， 2023b]側(cè)重于評(píng)估RAG模型的基本能力。同時(shí)，最先進(jìn)的自動(dòng)化工具，如RA-GAS [Es等人，2023]，ARES [Saad-Falcon等人，2023]和TruLens8使用法學(xué)碩士來(lái)評(píng)判質(zhì)量分?jǐn)?shù)。這些工具和基準(zhǔn)共同構(gòu)成了一個(gè)健壯的框架，用于對(duì)RAG模型進(jìn)行系統(tǒng)評(píng)估，如表3所示。

8 Future Prospects

?This section explores three future prospects for RAG: future challenges, modality expansion, and the RAG ecosystem.

本節(jié)探討了RAG的三個(gè)未來(lái)前景:未來(lái)的挑戰(zhàn)、模式擴(kuò)展和RAG生態(tài)系統(tǒng)。

8.1 Future Challenges of RAG

Despite the considerable progress in RAG technology, several challenges persist that warrant in-depth research:	盡管RAG技術(shù)取得了長(zhǎng)足的進(jìn)步，但仍存在一些需要深入研究的挑戰(zhàn):
Context Length. RAG’s efficacy is limited by the context window size of Large Language Models (LLMs). Balancing the trade-off between a window that is too short, risking insuf-ficient information, and one that is too long, risking informa-tion dilution, is crucial. With ongoing efforts to expand LLM context windows to virtually unlimited sizes, the adaptation of RAG to these changes presents a significant research ques-tion [Xu et al., 2023c, Packer et al., 2023, Xiao et al., 2023].	上下文的長(zhǎng)度。RAG的有效性受到大型語(yǔ)言模型(llm)的上下文窗口大小的限制。在窗口太短(可能導(dǎo)致信息不足)和窗口太長(zhǎng)(可能導(dǎo)致信息稀釋)之間取得平衡至關(guān)重要。隨著人們不斷努力將LLM上下文窗口擴(kuò)展到幾乎無(wú)限的大小，RAG對(duì)這些變化的適應(yīng)提出了一個(gè)重要的研究問(wèn)題[Xu等人，2023c, Packer等人，2023,Xiao等人，2023]。
Robustness. The presence of noise or contradictory infor-mation during retrieval can detrimentally affect RAG’s output quality. This situation is figuratively referred to as “Mis-information can be worse than no information at all”. Im-proving RAG’s resistance to such adversarial or counterfac-tual inputs is gaining research momentum and has become a key performance metric [Yu et al., 2023a, Glass et al., 2021, Baek et al., 2023].	魯棒性。在檢索過(guò)程中，噪聲或矛盾信息的存在會(huì)對(duì)RAG的輸出質(zhì)量產(chǎn)生不利影響。這種情況被比喻為“錯(cuò)誤的信息可能比根本沒(méi)有信息更糟糕”。提高RAG對(duì)這種對(duì)抗性或反事實(shí)輸入的抵抗力正在獲得研究勢(shì)頭，并已成為關(guān)鍵的績(jī)效指標(biāo)[Yu等，2023a, Glass等，2021,Baek等，2023]。
Hybrid Approaches (RAG+FT). Combining RAG with fine-tuning is emerging as a leading strategy. Determining the optimal integration of RAG and fine-tuning whether sequen-tial, alternating, or through end-to-end joint training—and how to harness both parameterized and non-parameterized advantages are areas ripe for exploration [Lin et al., 2023].	混合方法(RAG+FT)。將RAG與微調(diào)相結(jié)合正在成為領(lǐng)先的策略。確定RAG和微調(diào)的最佳集成，無(wú)論是順序的、交替的還是通過(guò)端到端聯(lián)合訓(xùn)練，以及如何利用參數(shù)化和非參數(shù)化優(yōu)勢(shì)，都是成熟的探索領(lǐng)域[Lin等，2023]。
Expanding LLM Roles. Beyond generating final answers, LLMs are leveraged for retrieval and evaluation within RAG frameworks. Identifying ways to further unlock LLMs poten-tial in RAG systems is a growing research direction.	擴(kuò)展法學(xué)碩士角色。除了生成最終答案之外，llm還用于在RAG框架內(nèi)進(jìn)行檢索和評(píng)估。確定進(jìn)一步釋放RAG系統(tǒng)中l(wèi)lm潛力的方法是一個(gè)日益增長(zhǎng)的研究方向。
Scaling Laws. While scaling laws [Kaplan et al., 2020] are established for LLMs, their applicability to RAG remains uncertain. Initial studies [Wang et al., 2023b] have begun to ad-dress this, yet the parameter count in RAG models still lags behind that of LLMs. The possibility of an Inverse Scaling Law9, where smaller models outperform larger ones, is par-ticularly intriguing and merits further investigation.	比例法。雖然已經(jīng)為法學(xué)碩士建立了標(biāo)度定律[Kaplan et al.， 2020]，但它們對(duì)RAG的適用性仍然不確定。初步研究[Wang et al.， 2023b]已經(jīng)開(kāi)始解決這個(gè)問(wèn)題，但RAG模型的參數(shù)計(jì)數(shù)仍然落后于llm。逆縮放定律的可能性，即較小的模型優(yōu)于較大的模型，特別有趣，值得進(jìn)一步研究。
Production-Ready RAG. RAG’s practicality and alignment with engineering requirements have facilitated its adoption. However, enhancing retrieval efficiency, improving document recall in large knowledge bases, and ensuring data secu-rity—such as preventing inadvertent disclosure of document sources or metadata by LLMs—are critical engineering chal-lenges that remain to be addressed [Alon et al., 2022].	生產(chǎn)使用的抹布。RAG的實(shí)用性和與工程需求的一致性促進(jìn)了它的采用。然而，提高檢索效率，提高大型知識(shí)庫(kù)中的文檔召回率，并確保數(shù)據(jù)安全(如防止法學(xué)碩士無(wú)意中泄露文檔源或元數(shù)據(jù))是仍有待解決的關(guān)鍵工程挑戰(zhàn)[Alon等人，2022]。
Modality Extension of RAG RAG has transcended its initial text-based question-answering confines, embracing a diverse array of modal data. This expansion has spawned innovative multimodal models that integrate RAG concepts across various domains:	RAG的情態(tài)擴(kuò)展 RAG已經(jīng)超越了它最初基于文本的問(wèn)答限制，包含了多種模態(tài)數(shù)據(jù)。這種擴(kuò)展產(chǎn)生了創(chuàng)新的多模態(tài)模型，將RAG概念集成到各個(gè)領(lǐng)域:
Image. RA-CM3 [Yasunaga et al., 2022] stands as a pio-neering multimodal model of both retrieving and generating text and images. BLIP-2 [Li et al., 2023a] leverages frozen image encoders alongside LLMs for efficient visual language pre-training, enabling zero-shot image-to-text conversions. The “Visualize Before You Write” method [Zhu et al., 2022] employs image generation to steer the LM’s text generation, showing promise in open-ended text generation tasks.	的形象。RA-CM3 [Yasunaga等人，2022]是一種檢索和生成文本和圖像的并行多模態(tài)模型。BLIP-2 [Li等，2023a]利用凍結(jié)圖像編碼器和llm進(jìn)行有效的視覺(jué)語(yǔ)言預(yù)訓(xùn)練，實(shí)現(xiàn)零鏡頭圖像到文本的轉(zhuǎn)換?！霸谀銓?xiě)之前可視化”方法[Zhu等人，2022]使用圖像生成來(lái)引導(dǎo)LM的文本生成，在開(kāi)放式文本生成任務(wù)中顯示出前景。
Audio and Video. The GSS method retrieves and stitches together audio clips to convert machine-translated data into speech-translated data [Zhao et al., 2022]. UEOP marks a significant advancement in end-to-end automatic speech recognition by incorporating external, offline strategies for voice-to-text conversion [Chan et al., 2023]. Additionally, KNN-based attention fusion leverages audio embeddings and semantically related text embeddings to refine ASR, thereby accelerating domain adaptation. Vid2Seq augments language models with specialized temporal markers, facilitating the prediction of event boundaries and textual descriptions within a unified output sequence [Yang et al., 2023a].	音頻和視頻。GSS方法檢索并拼接音頻片段，將機(jī)器翻譯數(shù)據(jù)轉(zhuǎn)換為語(yǔ)音翻譯數(shù)據(jù)[Zhao et al.， 2022]。UEOP通過(guò)結(jié)合外部離線策略進(jìn)行語(yǔ)音到文本轉(zhuǎn)換，標(biāo)志著端到端自動(dòng)語(yǔ)音識(shí)別的重大進(jìn)步[Chan等人，2023]。此外，基于knn的注意力融合利用音頻嵌入和語(yǔ)義相關(guān)的文本嵌入來(lái)改進(jìn)ASR，從而加速領(lǐng)域適應(yīng)。Vid2Seq用專(zhuān)門(mén)的時(shí)間標(biāo)記增強(qiáng)了語(yǔ)言模型，便于在統(tǒng)一的輸出序列中預(yù)測(cè)事件邊界和文本描述[Yang等，2023a]。
Code. RBPS [Nashid et al., 2023] excels in small-scale learning tasks by retrieving code examples that align with de-velopers’ objectives through encoding and frequency analy-sis. This approach has demonstrated efficacy in tasks such as test assertion generation and program repair. For structured knowledge, the CoK method [Li et al., 2023c] first extracts facts pertinent to the input query from a knowledge graph, then integrates these facts as hints within the input, enhancing performance in knowledge graph question-answering tasks.	代碼。RBPS [Nashid等人，2023]通過(guò)編碼和頻率分析檢索與開(kāi)發(fā)人員目標(biāo)一致的代碼示例，在小規(guī)模學(xué)習(xí)任務(wù)中表現(xiàn)出色。這種方法在測(cè)試斷言生成和程序修復(fù)等任務(wù)中已被證明是有效的。對(duì)于結(jié)構(gòu)化知識(shí)，CoK方法[Li et al.， 2023c]首先從知識(shí)圖中提取與輸入查詢(xún)相關(guān)的事實(shí)，然后將這些事實(shí)作為提示集成到輸入中，從而提高知識(shí)圖問(wèn)答任務(wù)的性能。

8.2 Ecosystem of RAG

?Downstream Tasks and Evaluation RAG has shown considerable promise in enriching language models with the capacity to handle intricate queries and pro-duce detailed responses by leveraging extensive knowledge bases. Empirical evidence suggests that RAG excels in a variety of downstream tasks, including open-ended question answering and fact verification. The integration of RAG not only bolsters the precision and relevance of responses but also their diversity and depth.	下游任務(wù)及評(píng)估通過(guò)利用廣泛的知識(shí)庫(kù)，RAG在豐富語(yǔ)言模型，處理復(fù)雜查詢(xún)和生成詳細(xì)響應(yīng)的能力方面顯示出了相當(dāng)大的前景。經(jīng)驗(yàn)證據(jù)表明，RAG在各種下游任務(wù)中表現(xiàn)出色，包括開(kāi)放式問(wèn)題回答和事實(shí)驗(yàn)證。RAG的整合不僅提高了響應(yīng)的準(zhǔn)確性和相關(guān)性，而且提高了響應(yīng)的多樣性和深度。
The scalability and versatility of RAG across multiple do-mains warrant further investigation, particularly in special-ized fields such as medicine, law, and education. In these ar-eas, RAG could potentially reduce training costs and enhance performance compared to traditional fine-tuning approaches in professional domain knowledge question answering.	RAG跨多個(gè)主要領(lǐng)域的可伸縮性和多功能性值得進(jìn)一步研究，特別是在醫(yī)學(xué)、法律和教育等特殊領(lǐng)域。在這些領(lǐng)域，與專(zhuān)業(yè)領(lǐng)域知識(shí)問(wèn)答的傳統(tǒng)微調(diào)方法相比，RAG可以潛在地降低培訓(xùn)成本并提高性能。
Concurrently, refining the evaluation framework for RAG is essential to maximize its efficacy and utility across different tasks. This entails the development of nuanced metrics and assessment tools that can gauge aspects such as contextual relevance, creativity of content, and non-maleficence.	同時(shí)，細(xì)化RAG的評(píng)估框架對(duì)于最大限度地提高其跨不同任務(wù)的效率和效用是必不可少的。這需要開(kāi)發(fā)細(xì)微的度量標(biāo)準(zhǔn)和評(píng)估工具，這些工具可以衡量諸如上下文相關(guān)性、內(nèi)容的創(chuàng)造性和非惡意性等方面。
Furthermore, improving the interpretability of RAG-driven models continues to be a key goal. Doing so would allow users to understand the reasoning behind the responses gener-ated by the model, thereby promoting trust and transparency in the use of RAG applications.	此外，改進(jìn)rag驅(qū)動(dòng)模型的可解釋性仍然是一個(gè)關(guān)鍵目標(biāo)。這樣做將允許用戶(hù)理解模型生成的響應(yīng)背后的原因，從而促進(jìn)RAG應(yīng)用程序使用中的信任和透明度。
Technical Stack The development of the RAG ecosystem is greatly impacted by the progression of its technical stack. Key tools like LangChain and LLamaIndex have quickly gained popularity with the emergence of ChatGPT, providing extensive RAG-related APIs and becoming essential in the realm of LLMs.	技術(shù)堆棧 RAG生態(tài)系統(tǒng)的發(fā)展很大程度上受到其技術(shù)堆棧進(jìn)步的影響。隨著ChatGPT的出現(xiàn)，LangChain和LLamaIndex等關(guān)鍵工具迅速流行起來(lái)，提供了大量與rag相關(guān)的api，并成為llm領(lǐng)域必不可少的工具。
Emerging technical stacks, while not as feature-rich as LangChain and LLamaIndex, distinguish themselves with specialized offerings. For instance, Flowise AI10 prioritizes a low-code approach, enabling users to deploy AI applications, including RAG, through a user-friendly drag-and-drop inter-face. Other technologies like HayStack, Meltano11, and Co-here Coral12 are also gaining attention for their unique con-tributions to the field.	新興的技術(shù)棧雖然不像LangChain和LLamaIndex那樣功能豐富，但它們以專(zhuān)門(mén)的產(chǎn)品脫穎而出。例如，Flowise AI10優(yōu)先考慮低代碼方法，使用戶(hù)能夠通過(guò)用戶(hù)友好的拖放界面部署AI應(yīng)用程序，包括RAG。干草堆、Meltano11和Co-here Coral12等其他技術(shù)也因其對(duì)該領(lǐng)域的獨(dú)特貢獻(xiàn)而受到關(guān)注。
In addition to AI-focused providers, traditional software and cloud service providers are expanding their offerings to include RAG-centric services. Verba13 from Weaviate is de-signed for personal assistant applications, while Amazon’s Kendra14 provides an intelligent enterprise search service, al-lowing users to navigate through various content repositories using built-in connectors. During the evolution of the RAG technology landscape, there has been a clear divergence to-wards different specializations, such as: 1) Customization. Tailoring RAG to meet a specific requirements. 2) Simpli-fication. Making RAG easier to use, thereby reducing the ini-tial learning curve. 3) Specialization. Refining RAG to serve production environments more effectively.	除了專(zhuān)注于人工智能的提供商外，傳統(tǒng)的軟件和云服務(wù)提供商也在擴(kuò)大他們的產(chǎn)品，包括以rag為中心的服務(wù)。Weaviate的Verba13是為個(gè)人助理應(yīng)用程序設(shè)計(jì)的，而亞馬遜的Kendra14提供了智能企業(yè)搜索服務(wù)，允許用戶(hù)使用內(nèi)置連接器瀏覽各種內(nèi)容存儲(chǔ)庫(kù)。在RAG技術(shù)領(lǐng)域的發(fā)展過(guò)程中，已經(jīng)出現(xiàn)了明顯的分化，趨向于不同的專(zhuān)門(mén)化，例如:1)定制。裁剪RAG以滿(mǎn)足特定要求。2) Simpli-fication。使RAG更容易使用，從而減少最初的學(xué)習(xí)曲線。3)專(zhuān)業(yè)化。改進(jìn)RAG以更有效地服務(wù)于生產(chǎn)環(huán)境。
The mutual growth of RAG models and their technical stack is evident; technological advancements consistently es-tablish new standards for the existing infrastructure. In turn, enhancements to the technical stack drive the evolution of RAG capabilities. The RAG toolkit is converging into a foun-dational technical stack, laying the groundwork for advanced enterprise applications. However, the concept of a fully in-tegrated, comprehensive platform remains on the horizon, pending further innovation and development.	RAG模型及其技術(shù)棧的相互增長(zhǎng)是顯而易見(jiàn)的;技術(shù)進(jìn)步不斷為現(xiàn)有的基礎(chǔ)設(shè)施建立新的標(biāo)準(zhǔn)。反過(guò)來(lái)，對(duì)技術(shù)堆棧的增強(qiáng)推動(dòng)了RAG功能的發(fā)展。RAG工具包正在聚合成一個(gè)基礎(chǔ)技術(shù)堆棧，為高級(jí)企業(yè)應(yīng)用程序奠定基礎(chǔ)。然而，一個(gè)完全集成的綜合平臺(tái)的概念仍然在地平線上，等待進(jìn)一步的創(chuàng)新和發(fā)展。

9 Conclusion

The summary of this paper, as depicted in Figure 7, high-lights RAG’s significant advancement in enhancing the ca-pabilities of LLMs through the integration of parameter-ized knowledge from language models with extensive non-parameterized data from external knowledge bases. Our sur-vey illustrates the evolution of RAG technologies and their impact on knowledge-intensive tasks. Our analysis delin-eates three developmental paradigms within the RAG frame-work: Naive, Advanced, and Modular RAG, each marking a progressive enhancement over its predecessors. The Ad-vanced RAG paradigm extends beyond the Naive approach by incorporating sophisticated architectural elements, includ-ing query rewriting, chunk reranking, and prompt summariza-tion. These innovations have led to a more nuanced and mod-ular architecture that enhances both the performance and the interpretability of LLMs. RAG’s technical integration with other AI methodologies, such as fine-tuning and reinforce-ment learning, has further expanded its capabilities. In con-tent retrieval, a hybrid methodology that leverages both struc-tured and unstructured data sources is emerging as a trend, providing a more enriched retrieval process. Cutting-edge re-search within the RAG framework is exploring novel con-cepts such as self-retrieval from LLMs and the dynamic tim-ing of information retrieval.

如圖7所示，本文的總結(jié)強(qiáng)調(diào)了RAG通過(guò)集成語(yǔ)言模型的參數(shù)化知識(shí)和來(lái)自外部知識(shí)庫(kù)的大量非參數(shù)化數(shù)據(jù)，在增強(qiáng)llm的計(jì)算能力方面取得的重大進(jìn)展。我們的調(diào)查說(shuō)明了RAG技術(shù)的演變及其對(duì)知識(shí)密集型任務(wù)的影響。我們的分析在RAG框架中劃分了三種發(fā)展范式:樸素的、高級(jí)的和模塊化的RAG，每一種都標(biāo)志著對(duì)其前身的逐步增強(qiáng)。高級(jí)RAG范例通過(guò)合并復(fù)雜的體系結(jié)構(gòu)元素，包括查詢(xún)重寫(xiě)、塊重新排序和提示摘要，擴(kuò)展了樸素方法。這些創(chuàng)新帶來(lái)了更加細(xì)致和模塊化的體系結(jié)構(gòu)，增強(qiáng)了llm的性能和可解釋性。RAG與其他人工智能方法(如微調(diào)和強(qiáng)化學(xué)習(xí))的技術(shù)集成進(jìn)一步擴(kuò)展了其功能。在內(nèi)容檢索中，利用結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)源的混合方法正在成為一種趨勢(shì)，它提供了更豐富的檢索過(guò)程。RAG框架內(nèi)的前沿研究正在探索新的概念，如法學(xué)碩士的自我檢索和信息檢索的動(dòng)態(tài)時(shí)序。

Despite the strides made in RAG technology, research op-portunities abound in improving its robustness and its abil-ity to manage extended contexts. RAG’s application scope is also widening into multimodal domains, adapting its principles to interpret and process diverse data forms such as im-ages, videos, and code. This expansion underscores RAG’s significant practical implications for AI deployment, attract-ing interest from both academic and industrial sectors. The growing ecosystem of RAG is underscored by an increase in RAG-centric AI applications and the ongoing development of supportive tools. However, as RAG’s application land-scape expands, there is an imperative need to refine evaluation methodologies to keep pace with its evolution. Ensuring that performance assessments remain accurate and representative is crucial for capturing the full extent of RAG’s contributions to the AI research and development community.

盡管RAG技術(shù)取得了長(zhǎng)足的進(jìn)步，但在改進(jìn)其健壯性和管理擴(kuò)展上下文的能力方面，研究機(jī)會(huì)仍然很多。RAG的應(yīng)用范圍也擴(kuò)展到多模式領(lǐng)域，調(diào)整其原理來(lái)解釋和處理不同的數(shù)據(jù)形式，如圖像、視頻和代碼。這一擴(kuò)展凸顯了RAG對(duì)人工智能部署的重要實(shí)際意義，吸引了學(xué)術(shù)界和工業(yè)界的興趣。以RAG為中心的人工智能應(yīng)用程序的增加和支持性工具的持續(xù)開(kāi)發(fā)強(qiáng)調(diào)了RAG生態(tài)系統(tǒng)的不斷發(fā)展。然而，隨著RAG應(yīng)用程序領(lǐng)域的擴(kuò)展，有必要改進(jìn)評(píng)估方法以跟上其發(fā)展的步伐。確保績(jī)效評(píng)估保持準(zhǔn)確和代表性對(duì)于充分了解RAG對(duì)人工智能研究和開(kāi)發(fā)社區(qū)的貢獻(xiàn)至關(guān)重要。