首頁 後端開發 Python教學 建立一個簡單的圖查詢引擎

建立一個簡單的圖查詢引擎

Aug 21, 2024 pm 10:47 PM

在過去的 2 篇部落格中,我們了解如何安裝 neo4j 並將資料載入其中。在這篇部落格中,我們將了解如何建立一個簡單的圖形查詢引擎來回答我們的問題,但從 Neo4j 檢索資料。

Building A Simple Graph Query Engine

第 1 步:建立 CYPHER 查詢

  • 要建立密碼查詢,我們需要向 GPT 提供架構資訊、屬性資訊以及我們的問題。使用此元資料 GPT 將為我們提供查詢。

  • 我已經建立了提示,為每個使用者輸入回傳 3 個查詢

  1. 正規表示式 - 此查詢將具有正規表示式模式來符合 graphDB 中的資料
  2. 編輯相似度 - 此查詢將使用閾值分數大於 0.5 的編輯相似度來匹配並從圖形資料庫中獲取資料。
  3. 基於嵌入的匹配 - 我們已經將嵌入推送到我們的資料庫中,因此此查詢將使用使用者查詢的嵌入,並使用餘弦相似度的分數重新排序完整列表。也許這也可以改進以返回前 5 名。
class GraphQueryEngine:
    def __init__(self):
        self.client = OpenAI(api_key="")
        self.url = "bolt://localhost:7687"
        self.auth = ("neo4j", "neo4j@123")

    def get_response(self, user_input):
        """Used to get cypher queries from user input"""
        completion = self.client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system",
                 "content": "You are an expert in generating Cypher queries for a Neo4j database. Your task is to understand the input and generate only Cypher read queries. Do not return anything other than the Cypher queries, as the returned result will be executed directly in the database."},
                {"role": "user",
                 "content": f"""
                 Schema Information:
                 NODES: Product_type - Contains the distinct types of products such as headphones/mobiles/laptops/washing machines, Product_details - Contains products within a product_type for example apple, samsung within mobiles, DELL within laptops 
                 NODE PROPERTIES: In node Product_type there are name(name of the product type - String), embedding(embedding of the name), and in node Product_details there are name(name of the product - string), price(price of the product - integer), description(description of the product), product_release_date(when product was release on - date), available_stock(stock left - integer), review_rating(product review - float) 
                 DIRECTION OF RELATIONSHIPS: Node Product_type is connected to node Product_details using relationship CONTAINS

                 Based on the schema, generate three read-only Cypher queries related to Product_type (e.g., chairs, headphones, fridge) or Product_details (e.g., name, description) or combination of both. Ensure that product category uses Product_type and product name/ price 

                 Query 1: Use regular expressions (avoid 'contains') - Exclude the 'embedding' property from the result.
                 Query 2: Use `apoc.text.levenshteinSimilarity > 0.5` - Exclude the 'embedding' property from the result.
                 Query 3: Use `gds.similarity.cosine()` to reorder nodes based on similarity scores. The query must include a `%s` placeholder for embedding input but exclude the 'embedding' property in the result.

                 Generate targeted queries using relationships only when necessary. The embedding property should only be used in the logic and must not appear in the query results.

                 Strictly return only the Cypher queries with no embeddings. The returned result will be executed directly in the database.

                 {user_input}
                 """},
            ],
        )

        response = completion.choices[0].message.content

        completion = self.client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system",
                 "content": "You are an expert in parsing generating Cypher queries."},
                {"role": "user",
                 "content": f"""Use this input - {response} and parse and return only the cypher queries from the input, ensure that in the cypher query if it returns embeddings then remove the embeddings alone from the query"""},
            ],
            response_format=CypherQuery,
        )
        event = completion.choices[0].message.parsed
        cypher_queries = event.cypher_queries
        print("################################## CYPHER QUERIES ######################################")
        for query in cypher_queries:
            print(query)
        return cypher_queries

 

第 2 步 - 在第三個查詢中填入嵌入

  • 第三個查詢使用 gds.similarity.cosine() 因此我們將使用者查詢轉換為嵌入並將其填入第三個查詢中
    def populate_embedding_in_query(self, user_input, cypher_queries):
        """Used to add embeddings of the user input in the 3rd query"""
        model = "text-embedding-3-small"
        user_input = user_input.replace("\n", " ")
        embeddings = self.client.embeddings.create(input=[user_input], model=model).data[0].embedding
        cypher_queries[2] = cypher_queries[2] % embeddings
        return cypher_queries

 

第 3 步 - 查詢資料庫

  • 使用準備好的密碼查詢查詢資料庫
    def execute_read_query(self, query):
        """Execute the cypher query"""
        results = []

        with GraphDatabase.driver(self.url, auth=self.auth) as driver:
            with driver.session() as session:
                try:
                    result = session.run(query)
                    # Collect the result from the read query
                    records = [record.data() for record in result]
                    if records:
                        results.append(records)
                except Exception as error:
                    print(f"Error in executing query")

        return results

    def fetch_data(self, cypher_queries):
        """Return the fetched data from DB post formatting"""
        results = None
        for idx in range(len(cypher_queries)):
            try:
                results = self.execute_read_query(cypher_queries[idx])
                if results:
                    if idx == len(cypher_queries) - 1:
                        results = results[0][:10]
                    break
            except Exception:
                pass
        return results

 

第 4 步 - 增強世代

  • 使用取得的資料命中GPT,使用增強生成技術,借助增強資訊產生使用者查詢的回應
    def get_final_response(self, user_input, fetched_data):
        """Augumented generation using data fetched from DB"""
        completion = self.client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system",
                 "content": "You are a chatbot for an ecommerce website, you help users to identify their desired products"},
                {"role": "user", "content": f"""User query - {user_input}
                Use the below metadata to answer my query
                {fetched_data}     
            """},
            ],
        )

        response = completion.choices[0].message.content
        return response

 

完整程式碼

from openai import OpenAI
from pydantic import BaseModel
from typing import List
from neo4j import GraphDatabase


class CypherQuery(BaseModel):
    cypher_queries: List[str]


class GraphQueryEngine:
    def __init__(self):
        self.client = OpenAI(api_key="")
        self.url = "bolt://localhost:7687"
        self.auth = ("neo4j", "neo4j@123")

    def populate_embedding_in_query(self, user_input, cypher_queries):
        """Used to add embeddings of the user input in the 3rd query"""
        model = "text-embedding-3-small"
        user_input = user_input.replace("\n", " ")
        embeddings = self.client.embeddings.create(input=[user_input], model=model).data[0].embedding
        cypher_queries[2] = cypher_queries[2] % embeddings
        return cypher_queries

    def execute_read_query(self, query):
        """Execute the cypher query"""
        results = []

        with GraphDatabase.driver(self.url, auth=self.auth) as driver:
            with driver.session() as session:
                try:
                    result = session.run(query)
                    # Collect the result from the read query
                    records = [record.data() for record in result]
                    if records:
                        results.append(records)
                except Exception as error:
                    print(f"Error in executing query")

        return results

    def get_response(self, user_input):
        """Used to get cypher queries from user input"""
        completion = self.client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system",
                 "content": "You are an expert in generating Cypher queries for a Neo4j database. Your task is to understand the input and generate only Cypher read queries. Do not return anything other than the Cypher queries, as the returned result will be executed directly in the database."},
                {"role": "user",
                 "content": f"""
                 Schema Information:
                 NODES: Product_type - Contains the distinct types of products such as headphones/mobiles/laptops/washing machines, Product_details - Contains products within a product_type for example apple, samsung within mobiles, DELL within laptops 
                 NODE PROPERTIES: In node Product_type there are name(name of the product type - String), embedding(embedding of the name), and in node Product_details there are name(name of the product - string), price(price of the product - integer), description(description of the product), product_release_date(when product was release on - date), available_stock(stock left - integer), review_rating(product review - float) 
                 DIRECTION OF RELATIONSHIPS: Node Product_type is connected to node Product_details using relationship CONTAINS

                 Based on the schema, generate three read-only Cypher queries related to Product_type (e.g., chairs, headphones, fridge) or Product_details (e.g., name, description) or combination of both. Ensure that product category uses Product_type and product name/ price 

                 Query 1: Use regular expressions (avoid 'contains') - Exclude the 'embedding' property from the result.
                 Query 2: Use `apoc.text.levenshteinSimilarity > 0.5` - Exclude the 'embedding' property from the result.
                 Query 3: Use `gds.similarity.cosine()` to reorder nodes based on similarity scores. The query must include a `%s` placeholder for embedding input but exclude the 'embedding' property in the result.

                 Generate targeted queries using relationships only when necessary. The embedding property should only be used in the logic and must not appear in the query results.

                 Strictly return only the Cypher queries with no embeddings. The returned result will be executed directly in the database.

                 {user_input}
                 """},
            ],
        )

        response = completion.choices[0].message.content

        completion = self.client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system",
                 "content": "You are an expert in parsing generating Cypher queries."},
                {"role": "user",
                 "content": f"""Use this input - {response} and parse and return only the cypher queries from the input, ensure that in the cypher query if it returns embeddings then remove the embeddings alone from the query"""},
            ],
            response_format=CypherQuery,
        )
        event = completion.choices[0].message.parsed
        cypher_queries = event.cypher_queries
        print("################################## CYPHER QUERIES ######################################")
        for query in cypher_queries:
            print(query)
        return cypher_queries

    def get_final_response(self, user_input, fetched_data):
        """Augumented generation using data fetched from DB"""
        completion = self.client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system",
                 "content": "You are a chatbot for an ecommerce website, you help users to identify their desired products"},
                {"role": "user", "content": f"""User query - {user_input}
                Use the below metadata to answer my query
                {fetched_data}     
            """},
            ],
        )

        response = completion.choices[0].message.content
        return response

    def fetch_data(self, cypher_queries):
        """Return the fetched data from DB post formatting"""
        results = None
        for idx in range(len(cypher_queries)):
            try:
                results = self.execute_read_query(cypher_queries[idx])
                if results:
                    if idx == len(cypher_queries) - 1:
                        results = results[0][:10]
                    break
            except Exception:
                pass
        return results

 

讓我們嘗試一下

user_input = input("Enter your question : ")
query_engine = GraphQueryEngine()
cypher_queries = query_engine.get_response(user_input)
cypher_queries = query_engine.populate_embedding_in_query(user_input, cypher_queries)
fetched_data = query_engine.fetch_data(cypher_queries)
response = query_engine.get_final_response(user_input, fetched_data)

 

輸出

Building A Simple Graph Query Engine

Building A Simple Graph Query Engine

在下一篇部落格中,我們將建立一個簡單的 FastAPI 應用程序,將此設定公開為 API。

 
希望有幫助...!!!

 
LinkedIn - https://www.linkedin.com/in/praveenr2998/
Github - https://github.com/praveenr2998/Creating-Lightweight-RAG-Systems-With-Graphs/blob/main/fastapi_app/query_engine.py

以上是建立一個簡單的圖查詢引擎的詳細內容。更多資訊請關注PHP中文網其他相關文章!

本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn

熱AI工具

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Stock Market GPT

Stock Market GPT

人工智慧支援投資研究,做出更明智的決策

熱工具

記事本++7.3.1

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

SublimeText3漢化版

中文版,非常好用

禪工作室 13.0.1

禪工作室 13.0.1

強大的PHP整合開發環境

Dreamweaver CS6

Dreamweaver CS6

視覺化網頁開發工具

SublimeText3 Mac版

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

如何使用Python自動化從Excel到Web表單的數據輸入? 如何使用Python自動化從Excel到Web表單的數據輸入? Aug 12, 2025 am 02:39 AM

使用Python自動化將Excel數據填入網頁表單的方法是:先用pandas讀取Excel數據,再用Selenium控制瀏覽器自動填寫並提交表單;具體步驟包括安裝pandas、openpyxl和Selenium庫,下載對應瀏覽器驅動,用pandas讀取data.xlsx文件中的Name、Email、Phone等字段,通過Selenium啟動瀏覽器打開目標網頁,定位表單元素並逐行填入數據,使用WebDriverWait處理動態加載內容,添加異常處理和延遲確保穩定性,最後提交表單並循環處理所有數據行

Python中的類方法是什麼 Python中的類方法是什麼 Aug 21, 2025 am 04:12 AM

ClassmethodsinPythonareboundtotheclassandnottoinstances,allowingthemtobecalledwithoutcreatinganobject.1.Theyaredefinedusingthe@classmethoddecoratorandtakeclsasthefirstparameter,referringtotheclassitself.2.Theycanaccessclassvariablesandarecommonlyused

HDF5 數據集名稱與組名稱衝突:解決方案與最佳實踐 HDF5 數據集名稱與組名稱衝突:解決方案與最佳實踐 Aug 23, 2025 pm 01:15 PM

本文針對使用 h5py 庫操作 HDF5 文件時,數據集名稱與組名稱衝突的問題,提供詳細的解決方案和最佳實踐。文章將深入分析衝突產生的原因,並提供代碼示例,展示如何有效地避免和解決此類問題,確保 HDF5 文件的正確讀寫。通過本文,讀者將能夠更好地理解 HDF5 文件結構,並編寫更健壯的 h5py 代碼。

如何處理不適合內存的Python中的大型數據集? 如何處理不適合內存的Python中的大型數據集? Aug 14, 2025 pm 01:00 PM

當Python中處理超出內存的大型數據集時,不能一次性加載到RAM中,而應採用分塊處理、磁盤存儲或流式處理等策略;可通過Pandas的chunksize參數分塊讀取CSV文件並逐塊處理,使用Dask實現類似Pandas語法的並行化和任務調度以支持大內存數據操作,編寫生成器函數逐行讀取文本文件減少內存佔用,利用Parquet列式存儲格式結合PyArrow高效讀取特定列或行組,使用NumPy的memmap對大型數值數組進行內存映射以按需訪問數據片段,或將數據存入SQLite或DuckDB等輕量級數據

python asyncio隊列示例 python asyncio隊列示例 Aug 21, 2025 am 02:13 AM

asyncio.Queue是用於異步任務間安全通信的隊列工具,1.生產者通過awaitqueue.put(item)添加數據,消費者用awaitqueue.get()獲取數據;2.每處理完一項需調用queue.task_done(),以便queue.join()等待所有任務完成;3.使用None作為結束信號通知消費者停止;4.多個消費者時,需發送多個結束信號或在取消任務前確保所有任務已處理完畢;5.隊列支持設置maxsize限制容量,put和get操作自動掛起不阻塞事件循環,程序最終通過canc

如何使用Python進行股票市場分析和預測? 如何使用Python進行股票市場分析和預測? Aug 11, 2025 pm 06:56 PM

Python可以用於股票市場分析與預測,答案是肯定的,通過使用yfinance等庫獲取數據,利用pandas進行數據清洗和特徵工程,結合matplotlib或seaborn進行可視化分析,再運用ARIMA、隨機森林、XGBoost或LSTM等模型構建預測系統,並通過回測評估性能,最終可藉助Flask或FastAPI部署應用,但需注意市場預測的不確定性、過擬合風險及交易成本影響,成功依賴於數據質量、模型設計和合理預期。

如何使用Python中的RE模塊使用正則表達式? 如何使用Python中的RE模塊使用正則表達式? Aug 22, 2025 am 07:07 AM

正則表達式在Python中通過re模塊實現,用於搜索、匹配和操作字符串。 1.使用re.search()在整個字符串中查找第一個匹配項,re.match()僅在字符串開頭匹配;2.用括號()捕獲匹配的子組,可命名以提高可讀性;3.re.findall()返回所有非重疊匹配的列表,re.finditer()返回匹配對象的迭代器;4.re.sub()替換匹配的文本,支持函數動態替換;5.常用模式包括\d、\w、\s等,可使用re.IGNORECASE、re.MULTILINE、re.DOTALL、re

如何將命令行的參數傳遞給Python中的腳本 如何將命令行的參數傳遞給Python中的腳本 Aug 20, 2025 pm 01:50 PM

Usesys.argvforsimpleargumentaccess,whereargumentsaremanuallyhandledandnoautomaticvalidationorhelpisprovided.2.Useargparseforrobustinterfaces,asitsupportsautomatichelp,typechecking,optionalarguments,anddefaultvalues.3.argparseisrecommendedforcomplexsc

See all articles