作者:Tobi.A
在使用大型儲存庫時,跟上拉取請求(PR)——尤其是那些包含數千行程式碼的請求——可能是一個真正的挑戰。無論是了解特定變化的影響還是瀏覽大量更新,公關評論很快就會變得勢不可擋。為了解決這個問題,我著手建立一個項目,讓我能夠快速有效地了解這些大型 PR 中的變化。
使用檢索增強產生(RAG)結合Langtrace的可觀察性工具,我開發了「Chat with Repo(PRs)」-一個旨在簡化大型PR審查過程的工具。此外,我還記錄並比較了 Llama 3.1B 與 GPT-4o 的性能。透過這個項目,我探索了這些模型如何處理程式碼解釋和摘要,以及哪些模型為此用例提供了速度和準確性的最佳平衡。
本部落格中使用的所有程式碼都可以在這裡找到
在深入了解細節之前,讓我們先概述一下該專案中使用的關鍵工具:
法學碩士服務:
嵌入模型:
向量資料庫:
法學碩士可觀察性:
Chat with Repo(PRs) 系統實作了一個簡單的 RAG 架構來進行 PR 分析。它首先透過 GitHub 的 API 提取 PR 數據,對大檔案進行分塊來管理令牌限制。這些區塊使用 SentenceTransformers 進行向量化,創建捕獲程式碼語義的密集嵌入。 FAISS 索引可以對這些嵌入進行亞線性時間相似性搜尋。查詢經歷相同的嵌入過程,促進與程式碼索引的語意配對。檢索到的區塊形成所選 LLM 的動態上下文(透過 OpenAI、Groq 或 Ollama),然後執行上下文推理。這種方法利用了向量搜尋的效率和法學碩士的生成能力,允許細緻入微的程式碼理解,適應不同的 PR 複雜性。最後,Langtrace 整合提供了嵌入和 LLM 操作的精細可觀察性,提供了對 RAG 管道中效能瓶頸和潛在優化的見解。讓我們深入了解它的關鍵組件。
該系統中的分塊過程旨在將大型拉取請求分解為可管理的、上下文豐富的部分。這個過程的核心是在 IngestionService 類別中實現的,特別是 chunk_large_file 和 create_chunks_from_patch 方法。
當提取 PR 時,每個檔案的補丁都會單獨處理。 chunk_large_file方法負責分割大檔案:
def chunk_large_file(self, file_patch: str, chunk_size: int = config.CHUNK_SIZE) -> List[str]: lines = file_patch.split('\n') chunks = [] current_chunk = [] current_chunk_size = 0 for line in lines: line_size = len(line) if current_chunk_size + line_size > chunk_size and current_chunk: chunks.append('\n'.join(current_chunk)) current_chunk = [] current_chunk_size = 0 current_chunk.append(line) current_chunk_size += line_size if current_chunk: chunks.append('\n'.join(current_chunk)) return chunks
此方法根據配置的區塊大小拆分文件,確保每個區塊不超過此限制。這是一種基於行的方法,試圖在大小限制內盡可能地將邏輯程式碼單元保持在一起。
一旦檔案被分割成區塊,create_chunks_from_patch 方法就會處理每個區塊。此方法透過上下文資訊豐富每個區塊:
def create_chunks_from_patch(self, repo_info, pr_info, file_info, repo_explanation, pr_explanation): code_blocks = self.chunk_large_file(file_info['patch']) chunks = [] for i, block in enumerate(code_blocks): chunk_explanation = self.generate_safe_explanation(f"Explain this part of the code and its changes: {block}") chunk = { "code": block, "explanations": { "repository": repo_explanation, "pull_request": pr_explanation, "file": file_explanation, "code": chunk_explanation }, "metadata": { "repo": repo_info["name"], "pr_number": pr_info["number"], "file": file_info["filename"], "chunk_number": i + 1, "total_chunks": len(code_blocks), "timestamp": pr_info["updated_at"] } } chunks.append(chunk)
它使用 LLM 服務為每個程式碼區塊產生解釋。
它附加元數據,包括儲存庫名稱、PR 編號、檔案名稱、區塊編號和時間戳記。
它包括更廣泛的上下文,例如存儲庫和拉取請求解釋。
這種方法確保每個區塊不只是一段程式碼,而是一個豐富的、上下文感知的單元:
這包括:
EmbeddingService 類別處理嵌入的建立與相似性搜尋:
1.嵌入建立:
對於每個區塊,我們使用 SentenceTransformer 建立一個嵌入:
text_to_embed = self.get_full_context(chunk) embedding = self.model.encode([text_to_embed])[0]
嵌入結合了程式碼內容、程式碼解釋、文件解釋、PR 解釋和儲存庫解釋。
2.索引:
我們使用 FAISS 來索引這些嵌入:
self.index.add(np.array([embedding]))
3。查詢處理:
當提出問題時,我們為查詢建立嵌入並執行相似性搜尋:
query_vector = self.model.encode([query]) D, I = self.index.search(query_vector, k)
4. Chunk Selection:
The system selects the top k chunks (default 3) with the highest similarity scores.
This captures both code structure and semantic meaning, allowing for relevant chunk retrieval even when queries don't exactly match code syntax. FAISS enables efficient similarity computations, making it quick to find relevant chunks in large repositories.
To ensure comprehensive observability and performance monitoring, we've integrated Langtrace into our "Chat with Repo(PRs)" application. Langtrace provides real-time tracing, evaluations, and metrics for our LLM interactions, vector database operations, and overall application performance.
To explore how open-source models compare to their closed-source counterparts in handling large PRs, I conducted a comparative analysis between Llama 3.1b (open-source) and GPT-4o (closed-source). The test case involved a significant update to the Langtrace's repository, with over 2,300 additions, nearly 200 deletions, 250 commits, and changes across 47 files. My goal was to quickly understand these specific changes and assess how each model performs in code review tasks.
Methodology:
I posed a set of technical questions related to the pull request (PR), covering:
Both models were provided with the same code snippets and contextual information. Their responses were evaluated based on:
Code Understanding:
Knowledge of Frameworks:
Architectural Insights:
Handling Uncertainty:
Technical Detail vs. Broader Context:
Below are examples of questions posed to both models, the expected output, and their respective answers:
While GPT-4o remains stronger in broader architectural insights, Llama 3.1b's rapid progress and versatility in code comprehension make it a powerful option for code review. Open-source models are catching up quickly, and as they continue to improve, they could play a significant role in democratizing AI-assisted software development. The ability to tailor and integrate these models into specific development workflows could soon make them indispensable tools for reviewing, debugging, and managing large codebases.
We'd love to hear your thoughts! Join our community on Discord or reach out at support@langtrace.ai to share your experiences, insights, and suggestions. Together, we can continue advancing observability in LLM development and beyond.
Happy tracing!
Sumber Berguna
Bermula dengan Langtrace https://docs.langtrace.ai/introduction
Langtrace Twitter(X) https://x.com/langtrace_ai
Langtrace Linkedin https://www.linkedin.com/company/langtrace/about/
Laman Web Langtrace https://langtrace.ai/
Langtrace Discord https://discord.langtrace.ai/
Langtrace Github https://github.com/Scale3-Labs/langtrace
以上是使用 Llama B 與 Repos(PR) 聊天的詳細內容。更多資訊請關注PHP中文網其他相關文章!