Enhancing RAG: Beyond Vanilla Approaches-AI-php.cn

Enhancing RAG: Beyond Vanilla Approaches

Retrieval-Augmented Generation (RAG) significantly boosts language models by integrating external information retrieval. Standard RAG, while improving response relevance, often falters in complex retrieval situations. This article examines the shortcomings of basic RAG and presents advanced methods to improve accuracy and efficiency.

Limitations of Basic RAG

Consider a simple scenario: retrieving relevant information from several documents. Our dataset includes:

A primary document detailing healthy, productive lifestyle practices.
Two unrelated documents containing some overlapping keywords, but in different contexts.

<code>main_document_text = """
Morning Routine (5:30 AM - 9:00 AM)
✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested.
✅ Hydrate First - Drink a glass of water to rehydrate your body.
✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body.
✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing.
✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber.
✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks.
...
"""</code>

Copy after login

A basic RAG system, when queried with:

How can I improve my health and productivity?
What are the best strategies for a healthy and productive lifestyle?

may struggle to consistently retrieve the primary document due to the presence of similar words in unrelated documents.

Helper Functions: Streamlining the RAG Pipeline

To improve retrieval accuracy and simplify query processing, we introduce helper functions. These functions handle tasks such as querying the ChatGPT API, calculating document embeddings, and determining similarity scores. This creates a more efficient RAG pipeline.

Here are the helper functions:

<code># **Imports**
import os
import json
import openai
import numpy as np
from scipy.spatial.distance import cosine
from google.colab import userdata

# Set up OpenAI API key
os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>

Copy after login

<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0 , # Adjust for more or less creativity
            response_format=response_format
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"</code>

Copy after login

<code>def get_embedding(text, model="text-embedding-3-large"): #"text-embedding-ada-002"
    """Fetches the embedding for a given text using OpenAI's API."""
    response = client.embeddings.create(
        input=[text],
        model=model
    )
    return response.data[0].embedding</code>

Copy after login

<code>def compute_similarity_metrics(embed1, embed2):
    """Computes different similarity/distance metrics between two embeddings."""
    cosine_sim = 1- cosine(embed1, embed2)  # Cosine similarity

    return cosine_sim</code>

Copy after login

<code>def fetch_similar_docs(query, docs, threshold = .55, top=1):
  query_em = get_embedding(query)
  data = []
  for d in docs:
    # Compute and print similarity metrics
    similarity_results = compute_similarity_metrics(d["embedding"], query_em)
    if(similarity_results >= threshold):
      data.append({"id":d["id"], "ref_doc":d.get("ref_doc", ""), "score":similarity_results})

  # Sorting by value (second element in each tuple)
  sorted_data = sorted(data, key=lambda x: x["score"], reverse=True)  # Ascending order
  sorted_data = sorted_data[:min(top, len(sorted_data))]
  return sorted_data</code>

Copy after login

Evaluating Basic RAG

We test the basic RAG using predefined queries to assess its ability to retrieve the most relevant document based on semantic similarity. This highlights its limitations.

<code>"""# **Testing Vanilla RAG**"""

query = "what should I do to stay healthy and productive?"
r = fetch_similar_docs(query, docs)
print("query = ", query)
print("documents = ", r)

query = "what are the best practices to stay healthy and productive ?"
r = fetch_similar_docs(query, docs)
print("query = ", query)
print("documents = ", r)</code>

Copy after login

Advanced Techniques for Enhanced RAG

To improve the retrieval process, we introduce functions that generate structured information to enhance document retrieval and query processing.

Three key enhancements are implemented:

1. Generating FAQs

Creating FAQs from the document expands the query matching possibilities. These FAQs are generated once and stored, enriching the search space without recurring costs.

<code>def generate_faq(text):
  prompt = f'''
  given the following text: """{text}"""
  Ask relevant simple atomic questions ONLY (don't answer them) to cover all subjects covered by the text. Return the result as a json list example [q1, q2, q3...]
  '''
  return query_chatgpt(prompt, response_format={ "type": "json_object" })</code>

Copy after login

2. Creating an Overview

A concise summary captures the document's core ideas, improving retrieval effectiveness. The overview's embedding is added to the document collection.

<code>def generate_overview(text):
  prompt = f'''
  given the following text: """{text}"""
  Generate an abstract for it that tells in maximum 3 lines what is it about and use high level terms that will capture the main points,
  Use terms and words that will be most likely used by average person.
  '''
  return query_chatgpt(prompt)</code>

Copy after login

3. Query Decomposition

Broad queries are broken down into smaller, more precise sub-queries. These sub-queries are compared against the enhanced document collection (original document, FAQs, and overview). Results are merged for improved relevance.

<code>main_document_text = """
Morning Routine (5:30 AM - 9:00 AM)
✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested.
✅ Hydrate First - Drink a glass of water to rehydrate your body.
✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body.
✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing.
✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber.
✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks.
...
"""</code>

Copy after login

Evaluating the Enhanced RAG

Re-running the initial queries with these enhancements shows significant improvement. Query decomposition generates multiple sub-queries, leading to successful retrieval from both the FAQs and the original document.

<code># **Imports**
import os
import json
import openai
import numpy as np
from scipy.spatial.distance import cosine
from google.colab import userdata

# Set up OpenAI API key
os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>

Copy after login

Example FAQ output:

<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0 , # Adjust for more or less creativity
            response_format=response_format
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"</code>

Copy after login

Cost-Benefit Analysis

While preprocessing (generating FAQs, overviews, and embeddings) adds an upfront cost, it's a one-time cost per document. This offsets the inefficiencies of a poorly optimized RAG system: frustrated users and increased query costs from retrieving irrelevant information. For high-volume systems, preprocessing is a worthwhile investment.

Conclusion

Combining document preprocessing (FAQs and overviews) with query decomposition creates a more intelligent RAG system that balances accuracy and cost-effectiveness. This enhances retrieval quality, reduces irrelevant results, and improves the user experience. Future research can explore further optimizations like dynamic thresholding and reinforcement learning for query refinement.

The above is the detailed content of Enhancing RAG: Beyond Vanilla Approaches. For more information, please follow other related articles on the PHP Chinese website!