使用 Llama B 与 Repos(PR) 聊天-Python教程-PHP中文网

作者：Tobi.A

介绍：

在使用大型存储库时，跟上拉取请求（PR）——尤其是那些包含数千行代码的请求——可能是一个真正的挑战。无论是了解特定变化的影响还是浏览大量更新，公关评论很快就会变得势不可挡。为了解决这个问题，我着手构建一个项目，让我能够快速有效地了解这些大型 PR 中的变化。

使用检索增强生成（RAG）结合Langtrace的可观察性工具，我开发了“Chat with Repo（PRs）”——一个旨在简化大型PR审查过程的工具。此外，我还记录并比较了 Llama 3.1B 与 GPT-4o 的性能。通过这个项目，我探索了这些模型如何处理代码解释和摘要，以及哪些模型为此用例提供了速度和准确性的最佳平衡。

本博客中使用的所有代码都可以在这里找到

Chat With Repos(PRs) Using Llama B

在深入了解细节之前，我们先概述一下该项目中使用的关键工具：
法学硕士服务：

OpenAI API
Groq API
Ollama（针对当地法学硕士）

嵌入模型：

SentenceTransformers（特别是“all-mpnet-base-v2”）

矢量数据库：

FAISS（Facebook AI 相似性搜索）

法学硕士可观察性：

Langtrace 用于端到端跟踪和指标

Chat with Repo 的工作原理：

Chat with Repo(PRs) 系统实现了一个简单的 RAG 架构来进行 PR 分析。它首先通过 GitHub 的 API 获取 PR 数据，对大文件进行分块来管理令牌限制。这些块使用 SentenceTransformers 进行矢量化，创建捕获代码语义的密集嵌入。 FAISS 索引可以对这些嵌入进行亚线性时间相似性搜索。查询经历相同的嵌入过程，促进与代码索引的语义匹配。检索到的块形成所选 LLM 的动态上下文（通过 OpenAI、Groq 或 Ollama），然后执行上下文推理。这种方法利用了向量搜索的效率和法学硕士的生成能力，允许细致入微的代码理解，适应不同的 PR 复杂性。最后，Langtrace 集成提供了嵌入和 LLM 操作的精细可观察性，提供了对 RAG 管道中性能瓶颈和潜在优化的见解。让我们深入了解它的关键组件。

分块过程：

该系统中的分块过程旨在将大型拉取请求分解为可管理的、上下文丰富的部分。这个过程的核心是在 IngestionService 类中实现的，特别是 chunk_large_file 和 create_chunks_from_patch 方法中。
当提取 PR 时，每个文件的补丁都会单独处理。 chunk_large_file方法负责分割大文件：

def chunk_large_file(self, file_patch: str, chunk_size: int = config.CHUNK_SIZE) -> List[str]:
    lines = file_patch.split('\n')
    chunks = []
    current_chunk = []
    current_chunk_size = 0

    for line in lines:
        line_size = len(line)
        if current_chunk_size + line_size > chunk_size and current_chunk:
            chunks.append('\n'.join(current_chunk))
            current_chunk = []
            current_chunk_size = 0
        current_chunk.append(line)
        current_chunk_size += line_size

    if current_chunk:
        chunks.append('\n'.join(current_chunk))

    return chunks

登录后复制

此方法根据配置的块大小拆分文件，确保每个块不超过此限制。这是一种基于行的方法，试图在大小限制内尽可能地将逻辑代码单元保持在一起。
一旦文件被分割成块，create_chunks_from_patch 方法就会处理每个块。此方法通过上下文信息丰富每个块：

def create_chunks_from_patch(self, repo_info, pr_info, file_info, repo_explanation, pr_explanation):

    code_blocks = self.chunk_large_file(file_info['patch'])
    chunks = []

    for i, block in enumerate(code_blocks):
        chunk_explanation = self.generate_safe_explanation(f"Explain this part of the code and its changes: {block}")

        chunk = {
            "code": block,
            "explanations": {
                "repository": repo_explanation,
                "pull_request": pr_explanation,
                "file": file_explanation,
                "code": chunk_explanation
            },
            "metadata": {
                "repo": repo_info["name"],
                "pr_number": pr_info["number"],
                "file": file_info["filename"],
                "chunk_number": i + 1,
                "total_chunks": len(code_blocks),
                "timestamp": pr_info["updated_at"]
            }
        }
        chunks.append(chunk)

登录后复制

它使用 LLM 服务为每个代码块生成解释。
它附加元数据，包括存储库名称、PR 编号、文件名、块编号和时间戳。
它包括更广泛的上下文，例如存储库和拉取请求解释。
这种方法确保每个块不仅仅是一段代码，而是一个丰富的、上下文感知的单元：

Chat With Repos(PRs) Using Llama B
这包括：

实际代码更改
这些变化的解释
文件级上下文
公关级别上下文
存储库级别上下文

嵌入和相似性搜索：

EmbeddingService 类处理嵌入的创建和相似性搜索：
1.嵌入创建：
对于每个块，我们使用 SentenceTransformer 创建一个嵌入：

text_to_embed = self.get_full_context(chunk)
embedding = self.model.encode([text_to_embed])[0]

登录后复制

嵌入结合了代码内容、代码解释、文件解释、PR 解释和存储库解释。
2.索引：
我们使用 FAISS 来索引这些嵌入：

self.index.add(np.array([embedding]))

登录后复制

3。查询处理：
当提出问题时，我们为查询创建嵌入并执行相似性搜索：

query_vector = self.model.encode([query])

D, I = self.index.search(query_vector, k)

登录后复制

4. Chunk Selection:
The system selects the top k chunks (default 3) with the highest similarity scores.
This captures both code structure and semantic meaning, allowing for relevant chunk retrieval even when queries don't exactly match code syntax. FAISS enables efficient similarity computations, making it quick to find relevant chunks in large repositories.

Langtrace Integration:

To ensure comprehensive observability and performance monitoring, we've integrated Langtrace into our "Chat with Repo(PRs)" application. Langtrace provides real-time tracing, evaluations, and metrics for our LLM interactions, vector database operations, and overall application performance.

Model Performance Evaluation: Llama 3.1 70b Open-Source vs. GPT-4o Closed-Source LLMs in Large-Scale Code Review:

To explore how open-source models compare to their closed-source counterparts in handling large PRs, I conducted a comparative analysis between Llama 3.1b (open-source) and GPT-4o (closed-source). The test case involved a significant update to the Langtrace's repository, with over 2,300 additions, nearly 200 deletions, 250 commits, and changes across 47 files. My goal was to quickly understand these specific changes and assess how each model performs in code review tasks.
Methodology:
I posed a set of technical questions related to the pull request (PR), covering:

Specific code change explanations
Broader architectural impacts
Potential performance issues
Compatibility concerns

Both models were provided with the same code snippets and contextual information. Their responses were evaluated based on:

Technical accuracy
Depth of understanding
Ability to infer broader system impacts

Key Findings:

Code Understanding:

Llama 3.1b performed well in understanding individual code changes, especially with workflow updates and React component changes.
GPT-4o had a slight edge in connecting changes to the overall system architecture, such as identifying the ripple effect of modifying API routes on Prisma schema updates.

Knowledge of Frameworks:

Both models demonstrated strong understanding of frameworks like React, Next.js, and Prisma.
Llama 3.1b's versatility is impressive, particularly in web development knowledge, showing that open-source models are closing the gap on specialized domain expertise.

Architectural Insights:

GPT-4o excelled in predicting the broader implications of local changes, such as how adjustments to token-counting methods could affect the entire application.
Llama 3.1b, while precise in explaining immediate code impacts, was less adept at extrapolating these changes to system-wide consequences.

Handling Uncertainty:

Both models appropriately acknowledged uncertainty when presented with incomplete data, which is crucial for reliable code review.
Llama 3.1b's ability to express uncertainty highlights the progress open-source models have made in mimicking sophisticated reasoning.

Technical Detail vs. Broader Context:

Llama 3.1b provided highly focused and technically accurate explanations for specific code changes.
GPT-4o offered broader system context, though sometimes at the expense of missing finer technical details.

Question Comparison:

Below are examples of questions posed to both models, the expected output, and their respective answers:

Chat With Repos(PRs) Using Llama B

Conclusion:

While GPT-4o remains stronger in broader architectural insights, Llama 3.1b's rapid progress and versatility in code comprehension make it a powerful option for code review. Open-source models are catching up quickly, and as they continue to improve, they could play a significant role in democratizing AI-assisted software development. The ability to tailor and integrate these models into specific development workflows could soon make them indispensable tools for reviewing, debugging, and managing large codebases.

We'd love to hear your thoughts! Join our community on Discord or reach out at support@langtrace.ai to share your experiences, insights, and suggestions. Together, we can continue advancing observability in LLM development and beyond.

Happy tracing!

유용한 자료
Langtrace 시작하기 https://docs.langtrace.ai/introduction
랭트레이스 트위터(X) https://x.com/langtrace_ai
랭트레이스 링크드인 https://www.linkedin.com/company/langtrace/about/
랭트레이스 홈페이지 https://langtrace.ai/
랭트레이스 디스코드 https://discord.langtrace.ai/
랭트레이스 Github https://github.com/Scale3-Labs/langtrace

以上是使用 Llama B 与 Repos(PR) 聊天的详细内容。更多信息请关注PHP中文网其他相关文章！