Retrieval-Augmented Generation (RAG) significantly boosts language models by integrating external information retrieval. Standard RAG, while improving response relevance, often falters in complex retrieval situations. This article examines the shortcomings of basic RAG and presents advanced methods to improve accuracy and efficiency.
Consider a simple scenario: retrieving relevant information from several documents. Our dataset includes:
<code>main_document_text = """ Morning Routine (5:30 AM - 9:00 AM) ✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested. ✅ Hydrate First - Drink a glass of water to rehydrate your body. ✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body. ✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing. ✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber. ✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks. ... """</code>
A basic RAG system, when queried with:
may struggle to consistently retrieve the primary document due to the presence of similar words in unrelated documents.
To improve retrieval accuracy and simplify query processing, we introduce helper functions. These functions handle tasks such as querying the ChatGPT API, calculating document embeddings, and determining similarity scores. This creates a more efficient RAG pipeline.
Here are the helper functions:
<code># **Imports** import os import json import openai import numpy as np from scipy.spatial.distance import cosine from google.colab import userdata # Set up OpenAI API key os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>
<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN): try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.0 , # Adjust for more or less creativity response_format=response_format ) return response.choices[0].message.content.strip() except Exception as e: return f"Error: {e}"</code>
<code>def get_embedding(text, model="text-embedding-3-large"): #"text-embedding-ada-002" """Fetches the embedding for a given text using OpenAI's API.""" response = client.embeddings.create( input=[text], model=model ) return response.data[0].embedding</code>
<code>def compute_similarity_metrics(embed1, embed2): """Computes different similarity/distance metrics between two embeddings.""" cosine_sim = 1- cosine(embed1, embed2) # Cosine similarity return cosine_sim</code>
<code>def fetch_similar_docs(query, docs, threshold = .55, top=1): query_em = get_embedding(query) data = [] for d in docs: # Compute and print similarity metrics similarity_results = compute_similarity_metrics(d["embedding"], query_em) if(similarity_results >= threshold): data.append({"id":d["id"], "ref_doc":d.get("ref_doc", ""), "score":similarity_results}) # Sorting by value (second element in each tuple) sorted_data = sorted(data, key=lambda x: x["score"], reverse=True) # Ascending order sorted_data = sorted_data[:min(top, len(sorted_data))] return sorted_data</code>
We test the basic RAG using predefined queries to assess its ability to retrieve the most relevant document based on semantic similarity. This highlights its limitations.
<code>"""# **Testing Vanilla RAG**""" query = "what should I do to stay healthy and productive?" r = fetch_similar_docs(query, docs) print("query = ", query) print("documents = ", r) query = "what are the best practices to stay healthy and productive ?" r = fetch_similar_docs(query, docs) print("query = ", query) print("documents = ", r)</code>
To improve the retrieval process, we introduce functions that generate structured information to enhance document retrieval and query processing.
Three key enhancements are implemented:
Creating FAQs from the document expands the query matching possibilities. These FAQs are generated once and stored, enriching the search space without recurring costs.
<code>def generate_faq(text): prompt = f''' given the following text: """{text}""" Ask relevant simple atomic questions ONLY (don't answer them) to cover all subjects covered by the text. Return the result as a json list example [q1, q2, q3...] ''' return query_chatgpt(prompt, response_format={ "type": "json_object" })</code>
A concise summary captures the document's core ideas, improving retrieval effectiveness. The overview's embedding is added to the document collection.
<code>def generate_overview(text): prompt = f''' given the following text: """{text}""" Generate an abstract for it that tells in maximum 3 lines what is it about and use high level terms that will capture the main points, Use terms and words that will be most likely used by average person. ''' return query_chatgpt(prompt)</code>
Broad queries are broken down into smaller, more precise sub-queries. These sub-queries are compared against the enhanced document collection (original document, FAQs, and overview). Results are merged for improved relevance.
<code>main_document_text = """ Morning Routine (5:30 AM - 9:00 AM) ✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested. ✅ Hydrate First - Drink a glass of water to rehydrate your body. ✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body. ✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing. ✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber. ✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks. ... """</code>
Re-running the initial queries with these enhancements shows significant improvement. Query decomposition generates multiple sub-queries, leading to successful retrieval from both the FAQs and the original document.
<code># **Imports** import os import json import openai import numpy as np from scipy.spatial.distance import cosine from google.colab import userdata # Set up OpenAI API key os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>
Example FAQ output:
<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN): try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.0 , # Adjust for more or less creativity response_format=response_format ) return response.choices[0].message.content.strip() except Exception as e: return f"Error: {e}"</code>
While preprocessing (generating FAQs, overviews, and embeddings) adds an upfront cost, it's a one-time cost per document. This offsets the inefficiencies of a poorly optimized RAG system: frustrated users and increased query costs from retrieving irrelevant information. For high-volume systems, preprocessing is a worthwhile investment.
Combining document preprocessing (FAQs and overviews) with query decomposition creates a more intelligent RAG system that balances accuracy and cost-effectiveness. This enhances retrieval quality, reduces irrelevant results, and improves the user experience. Future research can explore further optimizations like dynamic thresholding and reinforcement learning for query refinement.
The above is the detailed content of Enhancing RAG: Beyond Vanilla Approaches. For more information, please follow other related articles on the PHP Chinese website!