Reading papers can be said to be one of our daily tasks. There are too many papers. How can we read and summarize them quickly? Since the emergence of ChatGPT, there are many services available for reading papers. In fact, using the ChatGPT API is very simple. We can build our own application locally with only 30 lines of python code.
Reading papers can be said to be one of our daily tasks. There are too many papers. How can we read and summarize them quickly? Since the emergence of ChatGPT, there are many services available for reading papers. In fact, using the ChatGPT API is very simple. We can build our own application locally with only 30 lines of python code.
The steps to summarize the paper using Python and the ChatGPT API are simple:
import PyPDF2
import openai
pdf_summary_text = ""
Parse pdf
pdf_file_path = "./pdfs/paper.pdf"
pdf_file = open(pdf_file_path, 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_file)
Get the text of each page:
for page_num in range(len(pdf_reader. pages)):
page_text = pdf_reader.pages[page_num].extract_text().lower()
Use openai’s api for summary
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful research assistant."},
{" role": "user", "content": f"Summarize this: {page_text}"},
],
)
page_summary = response["choices"][0]["message"] ["content"]
Merge summary
pdf_summary_text = page_summary "n"
pdf_summary_file = pdf_file_path.replace(os.path.splitext(pdf_file_path)[1], "_summary.txt ")
with open(pdf_summary_file, "w ") as file:
file.write(pdf_summary_text)
Done, close the pdf file and recycle memory
pdf_file.close( )
The complete code is as follows:
import os
import PyPDF2
import re
import openai
# Here I assume you are on a Jupiter Notebook and download the paper directly from the URL
!curl -o paper.pdf https://arxiv.org/pdf/2301.00810v3.pdf?utm_source=pocket_saves
# Set the string that will contain the summary
pdf_summary_text = ""
# Open the PDF file
pdf_file_path = "paper.pdf"
# Read the PDF file using PyPDF2
pdf_file = open(pdf_file_path, 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_file)
# Loop through all the pages in the PDF file
for page_num in range(len(pdf_reader.pages)):
# Extract the text from the page
page_text = pdf_reader.pages[page_num].extract_text().lower()
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages= [
{"role": "system", "content": "You are a helpful research assistant."},
{"role": "user", "content": f"Summarize this: { page_text}"},
],
)
page_summary = response["choices"][0]["message"]["content"]
pdf_summary_text =page_summary "n"
pdf_summary_file = pdf_file_path.replace(os.path.splitext(pdf_file_path)[1], "_summary.txt")
with open(pdf_summary_file, "w ") as file:
file.write(pdf_summary_text)
pdf_file.close()
with open(pdf_summary_file, "r") as file:
print(file.read())
There are 2 things to note Things:
1. Openai’s free API call limit is limited. A paper with this method costs about 0.2-0.5 US dollars, which will vary depending on the length of the paper.
2. Gpt4’s API I haven't tested it because I haven't applied for it yet, and the price is too expensive (20 times more expensive). I don't think it's worth it, but you can try to upload the charts of the paper to see if it will have better results (not sure) )
The above is the detailed content of 30 lines of Python code can call the ChatGPT API to summarize the main content of the paper. For more information, please follow other related articles on the PHP Chinese website!