Are you a Harry Potter fan who want to have everything about the Harry Potter universe on your fingertips? Or do you simply want to impress your friends with a cool chart of how the different characters in Harry Potter come together? Look no further than knowledge graphs.
This guide will show you how to get a knowledge graph up in Neo4J with just your laptop and your favourite book.
According to Wikipedia:
A knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data.
In terms of hardware, all you need is a computer, preferably one with a Nvidia graphics card. To be fully self-sufficient, I will go with a local LLM setup, but one could easily also use an OpenAI API for the same purpose.
You will need the following:
As I am coding on Ubuntu 24.04 in WSL2, in order for any GPU workload to passthrough easily, I am using Ollama docker. Running Ollama as a docker container is as simple as first installing the Nvidia container toolkit, and then the following:
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
If you do not have a Nvidia GPU, you can run a CPU-only Ollama using the following command in CLI:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Once you are done, you can pull your favourite LLM model into Ollama. The list of models available on Ollama is here. For example if I want to pull qwen2.5, I can run the following command in CLI:
docker exec -it ollama ollama run qwen2.5
And you are done with Ollama!
You will first want to create a python virtual environment, so that any packages you install, or any configurations changes you made, are restricted to within the environment, instead of having these applied globally. The following command will create a virtual environment harry-potter-rag:
python -m venv harry-potter-rag
You can then activate the virtual environment using the following command:
source tutorial-env/bin/activate
Next, use pip to install the relevant packages, mainly from LangChain:
%pip install --upgrade --quiet langchain langchain-community langchain-openai langchain-experimental neo4j
We will set up Neo4J as a docker container. For ease of setting up with specific configurations, we use docker compose. You may simply copy the following into a file called docker-compose.yaml, and then run docker-compose up -d in the same directory to set up Neo4J.
This setup also ensures data, logs and plugins are persisted in local folders, i.e. /data. /logs and plugins.
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
We can now start building the Knowledge Graph in Jupyter Notebook! We first set up an Ollama LLM instance using the following:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Next, we connect our LLM to Neo4J:
docker exec -it ollama ollama run qwen2.5
Now, it is time to grab your favourite Harry Potter text, or any favourite book, and we will use LangChain to split the text into chunks. Chunking is a strategy to break down a long text into parts, and we can then send each part to the LLM to convert them into nodes and edges, and insert each chunk's nodes and edges in Neo4J. Just a quick primer, nodes are circles you see on a graph, and each edge joins two nodes together.
The code also prints the first chunk for a quick preview of how the chunks look like.
python -m venv harry-potter-rag
Now, it is time to let our GPU do the heavy lifting and convert out text into Knowledge Graph! Before we dive deep into the entire book, let us experiment with prompts to better guide the LLM in returning a graph in the way we want.
Prompts are essentially examples of what we expect, or instructions of what we want to appear in the response. In the context of knowledge graphs, we can instruct the LLM to only extract persons and organisations as nodes, and to only accept certain types of relationships given the entities. For example, we can allow the relationship of spouse to only happen between a person and another person, and not between a person and an organisation.
We can now employ the LLMGraphTransformer on the first chunk of text to see how the graph could turn out. This is a good chance for us to tweak the prompt until the result is to our liking.
The following example expects nodes which could be a Person or Organization, and the allowed_relationships specify the types of relationships that are allowed. In order to allow LLM to capture the variety of the original text, I also set strict_mode to False, so that any other relationships or entities which are not defined below can also be captured. If you instead set strict_mode to True, entities and relationships that do not comply with what is allowed could be either dropped, or forced into what is allowed (which may be inaccurate).
source tutorial-env/bin/activate
After you are satisfied with fine-tuning your prompt, it is now time to ingest into a Knowledge Graph. Note that the try - except is to explicitly handle any response that could not be properly inserted into Neo4J -- the code is designed so that any error is logged, but does not block the loop from moving on with converting subsequent chunks into graph.
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
The loop above took me about 46 minutes to ingest Harry Potter and the Philosopher's Stone, Harry Potter and the Chamber of Secrets, and Harry Potter and the Prisoner of Azkaban. I end up with 4868 unique nodes! A quick preview is available below. You can see that the graph is really crowded, and and it is hard to distinguish who is related to who else, and in what way.
We can now leverage on cypher queries to look at say, Dumbledore!
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Ok now we get just Dumbledore himself. Let's see how he is related to Harry Potter.
docker exec -it ollama ollama run qwen2.5
Ok, now we are interested in what Harry and Dumbledore have spoked.
python -m venv harry-potter-rag
We can see that the graph is still really confusing, with many documents to go through to really find what we are looking for. We can see that the modelling of documents as nodes is not ideal, and further work could be done on the LLMGraphTransformer to make the graph more intuitive to use.
You can see how easy it is to set up a Knowledge Graph on your own local computer, without even needing to connect to the internet.
The github repo, which also contains the entire Knowledge Graph of the Harry Potter universe, is available here.
To import the harry_potter.graphml file into Neo4J, copy the graphml file into neo4j /import folder, and run the following on the Neo4J browser:
source tutorial-env/bin/activate
The above is the detailed content of Navigating the world of Harry Potter with Knowledge Graphs. For more information, please follow other related articles on the PHP Chinese website!