30 lines of Python code can call the ChatGPT API to summarize the main content of the paper-AI-php.cn

30 lines of Python code can call the ChatGPT API to summarize the main content of the paper

PHPz

Release： 2023-04-04 12:05:06

forward

985 people have browsed it

Reading papers can be said to be one of our daily tasks. There are too many papers. How can we read and summarize them quickly? Since the emergence of ChatGPT, there are many services available for reading papers. In fact, using the ChatGPT API is very simple. We can build our own application locally with only 30 lines of python code.

30 lines of Python code can call the ChatGPT API to summarize the main content of the paper

The steps to summarize the paper using Python and the ChatGPT API are simple:

PyPDF2 for PDF processing and GPT-3.5- OpenAI with turbo interface.
Use PyPDF2 to open and read PDF files.
Traverse each page in the PDF document and extract text.
Use GPT-3.5-turbo to generate summaries for each page's text.
Merge summaries and save the final summary text to a file.

import PyPDF2
import openai
pdf_summary_text = ""

Parse pdf

pdf_file_path = "./pdfs/paper.pdf"
pdf_file = open(pdf_file_path, 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_file)

Get the text of each page:

for page_num in range(len(pdf_reader. pages)):
page_text = pdf_reader.pages[page_num].extract_text().lower()

Use openai’s api for summary

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful research assistant."},
{" role": "user", "content": f"Summarize this: {page_text}"},
],
)
page_summary = response["choices"][0]["message"] ["content"]

Merge summary

pdf_summary_text = page_summary "n"
pdf_summary_file = pdf_file_path.replace(os.path.splitext(pdf_file_path)[1], "_summary.txt ")
with open(pdf_summary_file, "w ") as file:
file.write(pdf_summary_text)

Done, close the pdf file and recycle memory

pdf_file.close( )

The complete code is as follows:

import os
import PyPDF2
import re
import openai

# Here I assume you are on a Jupiter Notebook and download the paper directly from the URL
!curl -o paper.pdf https://arxiv.org/pdf/2301.00810v3.pdf?utm_source=pocket_saves

# Set the string that will contain the summary
pdf_summary_text = ""
# Open the PDF file
pdf_file_path = "paper.pdf"
# Read the PDF file using PyPDF2
pdf_file = open(pdf_file_path, 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_file)
# Loop through all the pages in the PDF file
for page_num in range(len(pdf_reader.pages)):
# Extract the text from the page
page_text = pdf_reader.pages[page_num].extract_text().lower()

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages= [
{"role": "system", "content": "You are a helpful research assistant."},
{"role": "user", "content": f"Summarize this: { page_text}"},
],
)
page_summary = response["choices"][0]["message"]["content"]
pdf_summary_text =page_summary "n"
pdf_summary_file = pdf_file_path.replace(os.path.splitext(pdf_file_path)[1], "_summary.txt")
with open(pdf_summary_file, "w ") as file:
file.write(pdf_summary_text)

pdf_file.close()

with open(pdf_summary_file, "r") as file:
print(file.read())

There are 2 things to note Things:

1. Openai’s free API call limit is limited. A paper with this method costs about 0.2-0.5 US dollars, which will vary depending on the length of the paper.

2. Gpt4’s API I haven't tested it because I haven't applied for it yet, and the price is too expensive (20 times more expensive). I don't think it's worth it, but you can try to upload the charts of the paper to see if it will have better results (not sure) )

The above is the detailed content of 30 lines of Python code can call the ChatGPT API to summarize the main content of the paper. For more information, please follow other related articles on the PHP Chinese website!