Schedule a demo of Alkymi
Interested in learning how Alkymi can help you go from unstructured data to instantly actionable insights? Schedule a personalized product demo with our team today!
Data Science Room
How LLMs can reason over your data
Retrieval Augmented Generation (RAG) is a method for supplementing large language models (LLMs) with relevant contextual information that they can use for reasoning. It allows for a type of fine-tuning of the responses that can be generated by an LLM, without needing to modify the underlying LLM model.
This white paper describes what RAG is and how it can be used to provide a personalized experience when using an LLM with your data.
LLMs are trained on data that exists at a point in time from a specific collection of documents. For publicly available LLMs, such as OpenAI’s GPT-4, this collection typically contains large amounts of publicly available data from the Web. However, the LLM is not aware of anything that has occurred outside of the data that it was trained with. The effect of this is that an LLM trained in 2022 will not be aware of data that was created in 2023. Most publicly accessible LLMs like OpenAI’s GPT-4 (which ChatGPT uses) and Meta’s Llama 3 are trained on public data and are not aware of the contents of private collections of data. The effect of this is that they cannot provide answers and insight into information outside of what was present in the data that they were trained on. For these LLMs to be used with data in private document collections, they need some way of being exposed to this data.
One way to make LLMs aware of your data is through fine-tuning. Fine-tuning is not specific to LLMs and, instead, refers to the process of taking a machine-learning model that has been trained for one task and modifying it through additional training to perform well on a different but related task. For instance, a machine-learning model that has been trained to identify cats and dogs in pictures could be fine-tuned to also be capable of identifying rabbits. The key idea behind fine-tuning a model is that the original model has some foundational knowledge, as a result of being trained on a very large amount of data, and that foundational knowledge can be leveraged to perform specialized tasks that the original model was not trained for. By leveraging its existing knowledge, the fine-tuned model should be able to perform better at some downstream tasks than a new machine learning model trained from scratch.
Fine-tuning for LLMs refers to taking a general-purpose LLM, such as Llama 3, and specializing it through exposure to specific data, prompts, and instructions.
While fine-tuning allows for the creation of an LLM that is aware of a specific collection of data, there are several challenges to fine-tuning LLMs:
Retrieval Augmented Generation provides an alternative way to make LLMs aware of your data without the need to fine-tune.
(1) Azure OpenAI provides fine-tuning capabilities for a limited subset of OpenAI models under certain conditions.
LLMs are very powerful when it comes to reasoning over information that they are provided with. They can leverage their internal knowledge and combine it with user-provided context to generate responses that are relevant given the context. Retrieval Augmented Generation (RAG) is a method for providing LLMs with relevant context to assist them in generating a response. It is an alternative to fine-tuning where, instead of fine-tuning an LLM ahead of time, you provide the LLM with relevant information when it is needed. The LLM then leverages this information to provide domain-specific responses. This has several benefits over traditional fine-tuning:
The figure below shows an overview of the steps involved in Retrieval Augmented Generation and how it can be used to work with your data. RAG makes extensive use of semantic search and vector databases. Each step in the process is described in more detail below.
As suggested in the description above, the quality of RAG is strongly influenced by the power of the semantic search system and the context that it can provide to the LLM. It is important to carefully consider the way that data is indexed in the vector store, e.g., at the document-, page-, paragraph-, or table-level. Additionally, how the semantic search system ranks content is important as you want the most relevant content to be provided to the LLM as context.
One might wonder if it makes sense to provide an LLM with as much context as possible to allow it to have as much information as possible to use for reasoning. While intuitively this may make sense, there are several reasons why it may be preferable to limit the context provided to an LLM.
(2) Tokens are the units that LLMs process. They may be words or subwords and are created using a tokenizer.
Retrieval Augmented Generation provides a means for personalizing the performance of LLMs based on custom data that they were not trained with. It can be used to help LLMs generate better responses at a lower cost. At Alkymi, we use Retrieval Augmented Generation to power our generative AI products, such as the Answer Tool and Document Chat, available through Alkymi Alpha. RAG allows our customers to enjoy a personalized LLM experience based on their relevant data without the overhead and expense of traditional fine-tuning.
Kyle Williams
Kyle Williams is a Data Scientist Manager at Alkymi. He has over 10 years of experience using data science, machine learning, and NLP to build products used by millions of users in diverse domains, such as finance, healthcare, academia, and productivity. He received his Ph.D. from The Pennsylvania State University and has published over 50 peer-reviewed papers in NLP and information retrieval.
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., ... & Kaplan, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
Ramasesh, V. V., Lewkowycz, A., & Dyer, E. (2021, October). Effect of scale on catastrophic forgetting in neural networks. In International Conference on Learning Representations.
Interested in learning how Alkymi can help you go from unstructured data to instantly actionable insights? Schedule a personalized product demo with our team today!