LangChain을 사용하여 IRIS SQL에 텍스트 보내기-파이썬 튜토리얼-php.cn

Text to IRIS SQL with LangChain

LangChain 프레임워크, IRIS 벡터 검색 및 LLM을 사용하여 사용자 프롬프트에서 IRIS 호환 SQL을 생성하는 방법에 대한 실험입니다.

이 글은 이 노트를 바탕으로 작성되었습니다. OpenExchange에서 이 애플리케이션을 사용하면 바로 사용할 수 있는 환경에서 실행할 수 있습니다.

설정

먼저 필요한 라이브러리를 설치해야 합니다.

으아아아

다음으로 필수 모듈을 가져오고 환경을 설정합니다.

으아아아

SQLiteCache를 사용하여 LLM 호출을 캐시합니다.

으아아아

IRIS 데이터베이스 연결 매개변수 설정:

으아아아

OpenAI API 키가 환경에 아직 설정되지 않은 경우 사용자에게 이를 입력하라는 메시지를 표시합니다.

으아아아

IRIS 데이터베이스에 대한 연결 문자열을 생성합니다.

으아아아

IRIS 데이터베이스 연결 설정:

으아아아

시스템 프롬프트에 대한 컨텍스트 정보를 담을 사전을 준비하세요.

으아아아

프롬프트 생성

사용자 입력을 IRIS 데이터베이스와 호환되는 SQL 쿼리로 변환하려면 언어 모델에 대한 효과적인 프롬프트를 만들어야 합니다. SQL 쿼리 생성에 대한 기본 지침을 제공하는 초기 프롬프트부터 시작합니다. 이 템플릿은 MSSQL에 대한 LangChain의 기본 프롬프트에서 파생되었으며 IRIS 데이터베이스에 맞게 사용자 정의되었습니다.

으아아아

이 기본 프롬프트는 IRIS 데이터베이스에 대한 특정 지침을 통해 SQL 전문가 역할을 하도록 언어 모델(LLM)을 구성합니다. 다음으로, 환각을 피하기 위해 데이터베이스 스키마에 대한 정보가 포함된 보조 프롬프트를 제공합니다.

으아아아

LLM 응답의 정확성을 높이기 위해 Few-Shot Prompting이라는 기술을 사용합니다. 여기에는 LLM에 몇 가지 예를 제시하는 것이 포함됩니다.

으아아아

몇 장의 예시를 위한 템플릿을 정의합니다.

으아아아

Few-Shot 템플릿을 사용하여 사용자 프롬프트를 구축합니다.

으아아아

마지막으로 최종 메시지를 만들기 위한 모든 프롬프트를 작성합니다.

으아아아

이 프롬프트에는 example_value, input, table_info 및 top_k 변수가 필요합니다.

프롬프트의 구성은 다음과 같습니다.

으아아아

프롬프트가 LLM으로 전송되는 방식을 시각화하기 위해 필수 변수에 대한 자리 표시자 값을 사용할 수 있습니다.

으아아아 으아아아

이제 필요한 변수를 제공하여 이 프롬프트를 LLM에 보낼 준비가 되었습니다. 준비가 되면 다음 단계로 넘어가세요.

테이블 정보 제공

정확한 SQL 쿼리를 생성하려면 데이터베이스 테이블에 대한 자세한 정보를 언어 모델(LLM)에 제공해야 합니다. 이 정보가 없으면 LLM은 그럴듯해 보이지만 환각으로 인해 잘못된 쿼리를 생성할 수 있습니다. 따라서 첫 번째 단계는 IRIS 데이터베이스에서 테이블 정의를 검색하는 함수를 만드는 것입니다.

테이블 정의를 검색하는 기능

다음 함수는 INFORMATION_SCHEMA를 쿼리하여 지정된 스키마에 대한 테이블 정의를 가져옵니다. 특정 테이블이 제공되면 해당 테이블에 대한 정의를 검색합니다. 그렇지 않으면 스키마의 모든 테이블에 대한 정의를 검색합니다.

으아아아

스키마에 대한 테이블 정의 검색

이 예에서는 여기에서 사용할 수 있는 항공 스키마를 사용합니다.

으아아아

이 함수는 항공 스키마의 모든 테이블에 대해 CREATE TABLE 문을 반환합니다.

으아아아

이러한 테이블 정의를 사용하여 LLM 프롬프트에 통합하는 다음 단계로 진행할 수 있습니다. 이를 통해 LLM은 SQL 쿼리를 생성할 때 데이터베이스 스키마에 대한 정확하고 포괄적인 정보를 확보할 수 있습니다.

가장 관련성이 높은 테이블 선택

데이터베이스, 특히 대규모 데이터베이스 작업 시 프롬프트의 모든 테이블에 대해 DDL(데이터 정의 언어)을 보내는 것은 비현실적일 수 있습니다. 이 접근 방식은 소규모 데이터베이스에 적합할 수 있지만 실제 데이터베이스에는 수백 또는 수천 개의 테이블이 포함되어 있어 모든 테이블을 처리하는 것이 비효율적입니다.

또한 언어 모델이 SQL 쿼리를 효과적으로 생성하기 위해 데이터베이스의 모든 테이블을 인식해야 할 가능성은 거의 없습니다. 이 문제를 해결하기 위해 의미론적 검색 기능을 활용하여 사용자 쿼리를 기반으로 가장 관련성이 높은 테이블만 선택할 수 있습니다.

접근하다

우리는 IRIS 벡터 검색과 함께 의미론적 검색을 사용하여 이를 달성합니다. 이 방법은 SQL 요소 식별자(예: 테이블, 필드, 키)에 의미 있는 이름이 있는 경우 가장 효과적입니다. 식별자가 임의 코드인 경우 대신 데이터 사전 사용을 고려해 보세요.

단계

테이블 정보 검색

먼저 테이블 정의를 pandas DataFrame으로 추출합니다.

# Retrieve table definitions into a pandas DataFrame table_def = get_table_definitions_array(cnx=cnx, schema='Aviation') table_df = pd.DataFrame(data=table_def, columns=["col_def"]) table_df["id"] = table_df.index + 1 table_df

로그인 후 복사

The DataFrame (table_df) will look something like this:

	col_def	id
0	CREATE TABLE Aviation.Aircraft (\n Event bigi...	1
1	CREATE TABLE Aviation.Crew (\n Aircraft varch...	2
2	CREATE TABLE Aviation.Event (\n ID bigint NOT...	3

Split Definitions into Documents

Next, split the table definitions into Langchain Documents. This step is crucial for handling large chunks of text and extracting text embeddings:

loader = DataFrameLoader(table_df, page_content_column="col_def") documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=20, separator="\n") tables_docs = text_splitter.split_documents(documents) tables_docs

로그인 후 복사

The resulting tables_docs list contains split documents with metadata, like so:

[Document(metadata={'id': 1}, page_content='CREATE TABLE Aviation.Aircraft (\n Event bigint NOT NULL,\n ID varchar NOT NULL,\n ...'), Document(metadata={'id': 2}, page_content='CREATE TABLE Aviation.Crew (\n Aircraft varchar NOT NULL,\n ID varchar NOT NULL,\n ...'), Document(metadata={'id': 3}, page_content='CREATE TABLE Aviation.Event (\n ID bigint NOT NULL DEFAULT $i(^Aviation.EventD),\n ...')]

로그인 후 복사

Extract Embeddings and Store in IRIS

Now, use the IRISVector class from langchain-iris to extract embedding vectors and store them:

tables_vector_store = IRISVector.from_documents( embedding=OpenAIEmbeddings(), documents=tables_docs, connection_string=iris_conn_str, collection_name="sql_tables", pre_delete_collection=True )

로그인 후 복사

Note: The pre_delete_collection flag is set to True for demonstration purposes to ensure a fresh collection in each test run. In a production environment, this flag should generally be set to False.

Find Relevant Documents

With the table embeddings stored, you can now query for relevant tables based on user input:

input_query = "List the first 2 manufacturers" relevant_tables_docs = tables_vector_store.similarity_search(input_query, k=3) relevant_tables_docs

로그인 후 복사

For example, querying for manufacturers might return:

[Document(metadata={'id': 1}, page_content='GearType varchar,\n LastInspectionDate timestamp,\n ...'), Document(metadata={'id': 1}, page_content='AircraftModel varchar,\n AircraftRegistrationClass varchar,\n ...'), Document(metadata={'id': 3}, page_content='LocationSiteZipCode varchar,\n LocationState varchar,\n ...')]

로그인 후 복사

From the metadata, you can see that only table ID 1 (Aviation.Aircraft) is relevant, which aligns with the query.

Handling Edge Cases

While this approach is generally effective, it may not always be perfect. For instance, querying for crash sites might also return less relevant tables:

input_query = "List the top 10 most crash sites" relevant_tables_docs = tables_vector_store.similarity_search(input_query, k=3) relevant_tables_docs

로그인 후 복사

Results might include:

[Document(metadata={'id': 3}, page_content='LocationSiteZipCode varchar,\n LocationState varchar,\n ...'), Document(metadata={'id': 3}, page_content='InjuriesGroundSerious integer,\n InjuriesHighest varchar,\n ...'), Document(metadata={'id': 1}, page_content='CREATE TABLE Aviation.Aircraft (\n Event bigint NOT NULL,\n ID varchar NOT NULL,\n ...')]

로그인 후 복사

Despite retrieving the correct Aviation.Event table twice, the Aviation.Aircraft table may also appear, which could be improved with additional filtering or thresholding. This is beyond the scope of this example and will be left for future implementations.

Define a Function to Retrieve Relevant Tables

To automate this process, define a function to filter and return the relevant tables based on user input:

def get_relevant_tables(user_input, tables_vector_store, table_df): relevant_tables_docs = tables_vector_store.similarity_search(user_input) relevant_tables_docs_indices = [x.metadata["id"] for x in relevant_tables_docs] indices = table_df["id"].isin(relevant_tables_docs_indices) relevant_tables_array = [x for x in table_df[indices]["col_def"]] return relevant_tables_array

로그인 후 복사

This function will help in efficiently retrieving only the relevant tables to send to the LLM, reducing the prompt length and improving overall query performance.

Selecting the Most Relevant Examples (Few-Shot Prompting)

When working with language models (LLMs), providing them with relevant examples helps ensure accurate and contextually appropriate responses. These examples, referred to as "few-shot" examples, guide the LLM in understanding the structure and context of the queries it should handle.

In our case, we need to populate the examples_value variable with a diverse set of SQL queries that cover a broad spectrum of IRIS SQL syntax and the tables available in the database. This helps prevent the LLM from generating incorrect or irrelevant queries.

Defining Example Queries

Below is a list of example queries designed to illustrate various SQL operations:

examples = [ {"input": "List all aircrafts.", "query": "SELECT * FROM Aviation.Aircraft"}, {"input": "Find all incidents for the aircraft with ID 'N12345'.", "query": "SELECT * FROM Aviation.Event WHERE EventId IN (SELECT EventId FROM Aviation.Aircraft WHERE ID = 'N12345')"}, {"input": "List all incidents in the 'Commercial' operation type.", "query": "SELECT * FROM Aviation.Event WHERE EventId IN (SELECT EventId FROM Aviation.Aircraft WHERE OperationType = 'Commercial')"}, {"input": "Find the total number of incidents.", "query": "SELECT COUNT(*) FROM Aviation.Event"}, {"input": "List all incidents that occurred in 'Canada'.", "query": "SELECT * FROM Aviation.Event WHERE LocationCountry = 'Canada'"}, {"input": "How many incidents are associated with the aircraft with AircraftKey 5?", "query": "SELECT COUNT(*) FROM Aviation.Aircraft WHERE AircraftKey = 5"}, {"input": "Find the total number of distinct aircrafts involved in incidents.", "query": "SELECT COUNT(DISTINCT AircraftKey) FROM Aviation.Aircraft"}, {"input": "List all incidents that occurred after 5 PM.", "query": "SELECT * FROM Aviation.Event WHERE EventTime > 1700"}, {"input": "Who are the top 5 operators by the number of incidents?", "query": "SELECT TOP 5 OperatorName, COUNT(*) AS IncidentCount FROM Aviation.Aircraft GROUP BY OperatorName ORDER BY IncidentCount DESC"}, {"input": "Which incidents occurred in the year 2020?", "query": "SELECT * FROM Aviation.Event WHERE YEAR(EventDate) = '2020'"}, {"input": "What was the month with most events in the year 2020?", "query": "SELECT TOP 1 MONTH(EventDate) EventMonth, COUNT(*) EventCount FROM Aviation.Event WHERE YEAR(EventDate) = '2020' GROUP BY MONTH(EventDate) ORDER BY EventCount DESC"}, {"input": "How many crew members were involved in incidents?", "query": "SELECT COUNT(*) FROM Aviation.Crew"}, {"input": "List all incidents with detailed aircraft information for incidents that occurred in the year 2012.", "query": "SELECT e.EventId, e.EventDate, a.AircraftManufacturer, a.AircraftModel, a.AircraftCategory FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE Year(e.EventDate) = 2012"}, {"input": "Find all incidents where there were more than 5 injuries and include the aircraft manufacturer and model.", "query": "SELECT e.EventId, e.InjuriesTotal, a.AircraftManufacturer, a.AircraftModel FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE e.InjuriesTotal > 5"}, {"input": "List all crew members involved in incidents with serious injuries, along with the incident date and location.", "query": "SELECT c.CrewNumber AS 'Crew Number', c.Age, c.Sex AS Gender, e.EventDate AS 'Event Date', e.LocationCity AS 'Location City', e.LocationState AS 'Location State' FROM Aviation.Crew c JOIN Aviation.Event e ON c.EventId = e.EventId WHERE c.Injury = 'Serious'"} ]

로그인 후 복사

Selecting Relevant Examples

Given the ever-expanding list of examples, it’s impractical to provide the LLM with all of them. Instead, we use IRIS Vector Search along with the SemanticSimilarityExampleSelector class to identify the most relevant examples based on user prompts.

Define the Example Selector:

example_selector = SemanticSimilarityExampleSelector.from_examples( examples, OpenAIEmbeddings(), IRISVector, k=5, input_keys=["input"], connection_string=iris_conn_str, collection_name="sql_samples", pre_delete_collection=True )

로그인 후 복사

Note: The pre_delete_collection flag is used here for demonstration purposes to ensure a fresh collection in each test run. In a production environment, this flag should be set to False to avoid unnecessary deletions.

Query the Selector:

To find the most relevant examples for a given input, use the selector as follows:

input_query = "Find all events in 2010 informing the Event Id and date, location city and state, aircraft manufacturer and model." relevant_examples = example_selector.select_examples({"input": input_query})

로그인 후 복사

The results might look like this:

[{'input': 'List all incidents with detailed aircraft information for incidents that occurred in the year 2012.', 'query': 'SELECT e.EventId, e.EventDate, a.AircraftManufacturer, a.AircraftModel, a.AircraftCategory FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE Year(e.EventDate) = 2012'}, {'input': "Find all incidents for the aircraft with ID 'N12345'.", 'query': "SELECT * FROM Aviation.Event WHERE EventId IN (SELECT EventId FROM Aviation.Aircraft WHERE ID = 'N12345')"}, {'input': 'Find all incidents where there were more than 5 injuries and include the aircraft manufacturer and model.', 'query': 'SELECT e.EventId, e.InjuriesTotal, a.AircraftManufacturer, a.AircraftModel FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE e.InjuriesTotal > 5'}, {'input': 'List all aircrafts.', 'query': 'SELECT * FROM Aviation.Aircraft'}, {'input': 'Find the total number of distinct aircrafts involved in incidents.', 'query': 'SELECT COUNT(DISTINCT AircraftKey) FROM Aviation.Aircraft'}]

로그인 후 복사

If you specifically need examples related to quantities, you can query the selector accordingly:

input_query = "What is the number of incidents involving Boeing aircraft." quantity_examples = example_selector.select_examples({"input": input_query})

로그인 후 복사

The output may be:

[{'input': 'How many incidents are associated with the aircraft with AircraftKey 5?', 'query': 'SELECT COUNT(*) FROM Aviation.Aircraft WHERE AircraftKey = 5'}, {'input': 'Find the total number of distinct aircrafts involved in incidents.', 'query': 'SELECT COUNT(DISTINCT AircraftKey) FROM Aviation.Aircraft'}, {'input': 'How many crew members were involved in incidents?', 'query': 'SELECT COUNT(*) FROM Aviation.Crew'}, {'input': 'Find all incidents where there were more than 5 injuries and include the aircraft manufacturer and model.', 'query': 'SELECT e.EventId, e.InjuriesTotal, a.AircraftManufacturer, a.AircraftModel FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE e.InjuriesTotal > 5'}, {'input': 'List all incidents with detailed aircraft information for incidents that occurred in the year 2012.', 'query': 'SELECT e.EventId, e.EventDate, a.AircraftManufacturer, a.AircraftModel, a.AircraftCategory FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE Year(e.EventDate) = 2012'}]

로그인 후 복사

This output includes examples that specifically address counting and quantities.

Future Considerations

While the SemanticSimilarityExampleSelector is powerful, it’s important to note that not all selected examples may be perfect. Future improvements may involve adding filters or thresholds to exclude less relevant results, ensuring that only the most appropriate examples are provided to the LLM.

Accuracy Test

To assess the performance of the prompt and SQL query generation, we need to set up and run a series of tests. The goal is to evaluate how well the LLM generates SQL queries based on user inputs, with and without the use of example-based few shots.

Function to Generate SQL Queries

We start by defining a function that uses the LLM to generate SQL queries based on the provided context, prompt, user input, and other parameters:

def get_sql_from_text(context, prompt, user_input, use_few_shots, tables_vector_store, table_df, example_selector=None, example_prompt=None): relevant_tables = get_relevant_tables(user_input, tables_vector_store, table_df) context["table_info"] = "\n\n".join(relevant_tables) examples = example_selector.select_examples({"input": user_input}) if example_selector else [] context["examples_value"] = "\n\n".join([ example_prompt.invoke(x).to_string() for x in examples ]) model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) output_parser = StrOutputParser() chain_model = prompt | model | output_parser response = chain_model.invoke({ "top_k": context["top_k"], "table_info": context["table_info"], "examples_value": context["examples_value"], "input": user_input }) return response

로그인 후 복사

Execute the Prompt

Test the prompt with and without examples:

# Prompt execution **with** few shots input = "Find all events in 2010 informing the Event Id and date, location city and state, aircraft manufacturer and model." response_with_few_shots = get_sql_from_text( context, prompt, user_input=input, use_few_shots=True, tables_vector_store=tables_vector_store, table_df=table_df, example_selector=example_selector, example_prompt=example_prompt, ) print(response_with_few_shots)

로그인 후 복사

SELECT e.EventId, e.EventDate, e.LocationCity, e.LocationState, a.AircraftManufacturer, a.AircraftModel FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE Year(e.EventDate) = 2010

로그인 후 복사

# Prompt execution **without** few shots input = "Find all events in 2010 informing the Event Id and date, location city and state, aircraft manufacturer and model." response_with_no_few_shots = get_sql_from_text( context, prompt, user_input=input, use_few_shots=False, tables_vector_store=tables_vector_store, table_df=table_df, ) print(response_with_no_few_shots)

로그인 후 복사

SELECT TOP 3 "EventId", "EventDate", "LocationCity", "LocationState", "AircraftManufacturer", "AircraftModel" FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.ID = a.Event WHERE e.EventDate >= '2010-01-01' AND e.EventDate < '2011-01-01'

로그인 후 복사

Utility Functions for Testing

To test the generated SQL queries, we define some utility functions:

def execute_sql_query(cnx, query): try: cursor = cnx.cursor() cursor.execute(query) rows = cursor.fetchall() return rows except: print('Error running query:') print(query) print('-'*80) return None def sql_result_equals(cnx, query, expected): rows = execute_sql_query(cnx, query) result = [set(row._asdict().values()) for row in rows or []] if result != expected and rows is not None: print('Result not as expected for query:') print(query) print('-'*80) return result == expected

로그인 후 복사

# SQL test for prompt **with** few shots print("SQL is OK" if not execute_sql_query(cnx, response_with_few_shots) is None else "SQL is not OK")

로그인 후 복사

SQL is OK

로그인 후 복사

# SQL test for prompt **without** few shots print("SQL is OK" if not execute_sql_query(cnx, response_with_no_few_shots) is None else "SQL is not OK")

로그인 후 복사

error on running query: SELECT TOP 3 "EventId", "EventDate", "LocationCity", "LocationState", "AircraftManufacturer", "AircraftModel" FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.ID = a.Event WHERE e.EventDate >= '2010-01-01' AND e.EventDate < '2011-01-01' -------------------------------------------------------------------------------- SQL is not OK

로그인 후 복사

Define and Execute Tests

Define a set of test cases and run them:

tests = [{ "input": "What were the top 3 years with the most recorded events?", "expected": [{128, 2003}, {122, 2007}, {117, 2005}] },{ "input": "How many incidents involving Boeing aircraft.", "expected": [{5}] },{ "input": "How many incidents that resulted in fatalities.", "expected": [{237}] },{ "input": "List event Id and date and, crew number, age and gender for incidents that occurred in 2013.", "expected": [{1, datetime.datetime(2013, 3, 4, 11, 6), '20130305X71252', 59, 'M'}, {1, datetime.datetime(2013, 1, 1, 15, 0), '20130101X94035', 32, 'M'}, {2, datetime.datetime(2013, 1, 1, 15, 0), '20130101X94035', 35, 'M'}, {1, datetime.datetime(2013, 1, 12, 15, 0), '20130113X42535', 25, 'M'}, {2, datetime.datetime(2013, 1, 12, 15, 0), '20130113X42535', 34, 'M'}, {1, datetime.datetime(2013, 2, 1, 15, 0), '20130203X53401', 29, 'M'}, {1, datetime.datetime(2013, 2, 15, 15, 0), '20130218X70747', 27, 'M'}, {1, datetime.datetime(2013, 3, 2, 15, 0), '20130303X21011', 49, 'M'}, {1, datetime.datetime(2013, 3, 23, 13, 52), '20130326X85150', 'M', None}] },{ "input": "Find the total number of incidents that occurred in the United States.", "expected": [{1178}] },{ "input": "List all incidents latitude and longitude coordinates with more than 5 injuries that occurred in 2010.", "expected": [{-78.76833333333333, 43.25277777777778}] },{ "input": "Find all incidents in 2010 informing the Event Id and date, location city and state, aircraft manufacturer and model.", "expected": [ {datetime.datetime(2010, 5, 20, 13, 43), '20100520X60222', 'CIRRUS DESIGN CORP', 'Farmingdale', 'New York', 'SR22'}, {datetime.datetime(2010, 4, 11, 15, 0), '20100411X73253', 'CZECH AIRCRAFT WORKS SPOL SRO', 'Millbrook', 'New York', 'SPORTCRUISER'}, {'108', datetime.datetime(2010, 1, 9, 12, 55), '20100111X41106', 'Bayport', 'New York', 'STINSON'}, {datetime.datetime(2010, 8, 1, 14, 20), '20100801X85218', 'A185F', 'CESSNA', 'New York', 'Newfane'} ] }]

로그인 후 복사

Accuracy Evaluation

Run the tests and calculate the accuracy:

def execute_tests(cnx, context, prompt, use_few_shots, tables_vector_store, table_df, example_selector, example_prompt): tests_generated_sql = [(x, get_sql_from_text( context, prompt, user_input=x['input'], use_few_shots=use_few_shots, tables_vector_store=tables_vector_store, table_df=table_df, example_selector=example_selector if use_few_shots else None, example_prompt=example_prompt if use_few_shots else None, )) for x in deepcopy(tests)] tests_sql_executions = [(x[0], sql_result_equals(cnx, x[1], x[0]['expected'])) for x in tests_generated_sql] accuracy = sum(1 for i in tests_sql_executions if i[1] == True) / len(tests_sql_executions) print(f'Accuracy: {accuracy}') print('-'*80)

로그인 후 복사

Results

# Accuracy tests for prompts executed **without** few shots use_few_shots = False execute_tests( cnx, context, prompt, use_few_shots, tables_vector_store, table_df, example_selector, example_prompt )

로그인 후 복사

error on running query: SELECT "EventDate", COUNT("EventId") as "TotalEvents" FROM Aviation.Event GROUP BY "EventDate" ORDER BY "TotalEvents" DESC TOP 3; -------------------------------------------------------------------------------- error on running query: SELECT "EventId", "EventDate", "C"."CrewNumber", "C"."Age", "C"."Sex" FROM "Aviation.Event" AS "E" JOIN "Aviation.Crew" AS "C" ON "E"."ID" = "C"."EventId" WHERE "E"."EventDate" >= '2013-01-01' AND "E"."EventDate" < '2014-01-01' -------------------------------------------------------------------------------- result not expected for query: SELECT TOP 3 "e"."EventId", "e"."EventDate", "e"."LocationCity", "e"."LocationState", "a"."AircraftManufacturer", "a"."AircraftModel" FROM "Aviation"."Event" AS "e" JOIN "Aviation"."Aircraft" AS "a" ON "e"."ID" = "a"."Event" WHERE "e"."EventDate" >= '2010-01-01' AND "e"."EventDate" < '2011-01-01' -------------------------------------------------------------------------------- accuracy: 0.5714285714285714 --------------------------------------------------------------------------------

로그인 후 복사

# Accuracy tests for prompts executed **with** few shots use_few_shots = True execute_tests( cnx, context, prompt, use_few_shots, tables_vector_store, table_df, example_selector, example_prompt )

로그인 후 복사

error on running query: SELECT e.EventId, e.EventDate, e.LocationCity, e.LocationState, a.AircraftManufacturer, a.AircraftModel FROM Aviation.Event e JOIN Aviation.Aircraft a ON e.EventId = a.EventId WHERE Year(e.EventDate) = 2010 TOP 3 -------------------------------------------------------------------------------- accuracy: 0.8571428571428571 --------------------------------------------------------------------------------

로그인 후 복사

Conclusion

The accuracy of SQL queries generated with examples (few shots) is approximately 49% higher compared to those generated without examples (85% vs. 57%).