Answer questions about your documents with OpenAI and Pinecone

13.11.2023 | 12 minutes reading time

In recent years, large language models (LLMs) have made remarkable progress in interacting with humans, showcasing their ability to answer a wide array of questions. Trained on publicly accessible internet content, these models have broad knowledge across many different topics. However, they are limited to information that was available to them during training and thus fail to answer any questions about specific content from your personal documents.

In this article, we will have a look at how we can overcome this limitation, by combining OpenAI's chat completion model with a Pinecone vector database. We will first have a look at the general approach and then go into detail and implement an intelligent question answering system in Python, using the APIs of both OpenAI and Pinecone, which enables the LLM to provide useful responses to inquiries about personal documents.

Approach

The core idea of the approach is to use OpenAI's chat completion model to answer questions about our documents. To do so, we create a prompt that includes the question and the documents and asks the model to answer the question based on the text contents of these documents. With this naive idea we face the obstacle that the prompt we can provide as input to the chat completion model is limited in length. Meanwhile, we might have a large number of documents, whose contents combined exceed this limit. Hence, we first have to filter the documents to find the most relevant ones for the question, in order to reduce the length to the prompt below the limit.

To find the documents that are relevant for a question we make use of text embeddings. Text embeddings are high-dimensional numerical vectors that represent the meaning of a text in such a way, that semantically related texts are close to each other in the vector space. We can use an embedding model to embed all our documents, resulting in a vector for each document. While different embedding models are available, we will use an embedding model provided by OpenAI via the API.

The resulting vectors are then stored in a vector database. Vector databases are designed to store and efficiently query large amounts of vectors. Querying uses an algorithmic approach to find the nearest neighboring vectors to a given query vector in a database index, using different distance metrics. In this case, we will use the vector database provided by Pinecone, which is a managed vector database service.

Document embedding process: The document is embedded using OpenAI and the resulting vector is stored in the Pinecone database

With a Pinecone index filled with our embedded documents, we can now ask questions about their contents. To do so, we first embed the question using the same embedding model that we used for the documents. This results in a vector representation of the question, which should be close to the vectors of semantically related documents, that could provide the information required to answer the question.

By querying the Pinecone index with the embedding vector of the question, we retrieve the nearest document vectors in the database. We load the texts of the found documents and combine them to a prompt for the chat completion model, along with the question text. Then, we put the prompt into the chat completion model, which will return an answer to our question based on the texts of the relevant documents.

Query process: the query is embedded and the nearest document vector in Pinecone is used to build the prompt for OpenAI

Implementation

We will look at an implementation of the question answering approach in a small Python script. You can find the entire code for the demo on GitHub.

Set up OpenAI and Pinecone

We first need to create accounts and API keys for OpenAI and Pinecone in the respective developer consoles. Assuming the API keys are stored in environment variables, we can initialize the OpenAI and Pinecone clients in our python script.

1import os
2import openai
3import pinecone
4
5openai.api_key = os.getenv("OPENAI_API_KEY")
6pinecone.init(api_key=os.getenv("PINECONE_API_KEY"), environment='gcp-starter')

Create the Pinecone index

We can create a Pinecone index either in the Pinecone console or programmatically using the client. Here, we will do the latter to create an index named document-search-index. The index can be configured with regard to different parameters, most notably the dimensions and the metric. The dimensions specify the size of the vectors that we will store in the index. As our embedding model will use vectors of size 1,536, we set the dimensions accordingly. For the metric we have the choice between cosine, dotproduct and euclidean. As the OpenAI documentation recommends using cosine similarity, we will use the cosine metric.

1pinecone_index_name = 'document-search-index'
2pinecone.create_index(pinecone_index_name, metric='cosine', dimension=1536)

We can also configure the index with regard to the number of pods and pod type. However, in the free tier we are limited to a single pod and pod type. The Pinecone documentation explains how the index can be configured in more detail.

Embed and store your documents in the Pinecone index

Now that we have created the Pinecone index, we can embed and store our documents in the index. First, we need to load our documents from the disk. In this case, we assume that the documents are stored in a directory named data. The documents are loaded from the directory and returned as a list of dicts, consisting of the title (i.e. the filename without ending) and the content.

1import os
2
3def load_documents():
4    documents = []
5    documents_path = 'data'
6    for filename in os.listdir(documents_path):
7        file_path = os.path.join(documents_path, filename)
8        with open(file_path, 'r', encoding='utf-8') as file:
9            content = file.read()
10        documents.append({'title': filename.split('.')[0], 'content': content})
11    return documents

Next, we need a function to embed the content of a document using OpenAI's embedding model. The OpenAI client offers an endpoint for that, which allows us to specify an embedding model. We use the model text-embedding-ada-002, which is recommended by OpenAI at the time of writing this article. The model generates embedding vectors of size 1,536.

1import openai.embeddings_utils
2
3def get_embedding_vector_from_openai(text):
4    return openai.embeddings_utils.get_embedding(text, engine='text-embedding-ada-002')

With the documents and the embedding function, we are now able to fill our Pinecone index with the embedded documents. The upsert method of the Pinecone client expects a list of vectors with id, values (i.e. the actual vector), and metadata. The id is a unique identifier for each vector in the index and can be used to query a particular vector. As we won't need this in our use case, we simply set a random value as id. The metadata can be any additional information that we want to store together with the vector. In this case, we store the title of the document as metadata.

1import time
2import uuid
3
4def fill_pinecone_index(documents):
5    index = pinecone.Index(pinecone_index_name)
6    for doc in documents:
7        try:
8            embedding_vector = get_embedding_vector_from_openai(doc['content'])
9            data = pinecone.Vector(
10                id=str(uuid.uuid4()),
11                values=embedding_vector,
12                metadata={'title': doc['title']}
13            )
14            index.upsert([data])
15            print(f'Embedded and inserted document with title ' + doc['title'])
16            time.sleep(1)
17        except:
18            print(f'Could not embed and insert document with title ' + doc['title'])
19
20documents = load_documents()
21fill_pinecone_index(documents)

You may notice that we have added a time.sleep(1) after each embedding and insertion. This is in order to avoid a rate limit error from OpenAI, which only allows a certain number of tokens to be embedded per minute. Further, the embedding model we use is currently limited to texts of up to 8,191 input tokens, which may not be enough for all documents in data. In this case, we simply skip the embedding and insertion of these document, so that not all our documents will end up as vectors in the index. If you have large documents with a lot of text, you may want to consider splitting them into smaller chunks and embed those individually.

Answer questions about the documents

To answer questions about our documents, we will first find the relevant ones by querying the Pinecone index, and then use these documents and combine them with the question into a prompt for the OpenAI chat completion endpoint, asking the model to answer the question based on the given text.

To retrieve the relevant documents, we simply embed the question using the same model that we used to embed the documents. Then, we query the index with this embedding vector, which will retrieve the top k similar vectors in the index. We set k to 1 in this case, as we only answer the question based on a single document. You may want to use a larger value for k to enable the system to take multiple documents into account, if that is required for your use case. We fetch the title of the document from the metadata, which will enable us to retrieve the document from the disk.

1def query_pinecone_index(query):
2    index = pinecone.Index(pinecone_index_name)
3    query_embedding_vector = get_embedding_vector_from_openai(query)
4    response = index.query(
5        vector=query_embedding_vector,
6        top_k=1,
7        include_metadata=True
8    )
9    return response['matches'][0]['metadata']['title']

We use the title of the document to retrieve the document content from the disk:

1def load_document_content(title):
2    documents_path = 'data'
3    file_path = os.path.join(documents_path, title + '.txt')
4    with open(file_path, 'r', encoding='utf-8') as file:
5        content = file.read()
6    return content

We further implement a helper method, to combine the document and the question into a prompt for the chat completion model:

1def create_prompt(question, document_content):
2    return 'You are given a document and a question. Your task is to answer the question based on the document.\n\n' \
3           'Document:\n\n' \
4           f'{document_content}\n\n' \
5           f'Question: {question}'

Finally, we can use the OpenAI client to ask the chat completion model to answer the question based on the document. We set the model to gpt-3.5-turbo-16k. This is not the state-of-the-art model, but it is currently cheaper than the different variants of gpt-4 and should be sufficient for this use case. The 16k version of the model allows for up to 16,385 tokens, which allows us to put long texts into the prompt. We pass a list of messages to the chat completion model, which consists of the conversation up to this point. As we start a new conversation, our list consists of a single user message with our prompt as content. The model returns a list of completion choices, which could be more than one if specified in the request. As we did not specify a value, it defaults to only a single completion. We extract the message content of the completion, which contains the answer to our prompt.

1def get_answer_from_openai(question):
2    relevant_document_title = query_pinecone_index(question)
3    print(f'Relevant document title: {relevant_document_title}')
4    document_content = load_document_content(relevant_document_title)
5    prompt = create_prompt(question, document_content)
6    print(f'Prompt:\n\n{prompt}\n\n')
7    completion = openai.ChatCompletion.create(
8        model='gpt-3.5-turbo-16k',
9        messages=[{
10            'role': 'user',
11            'content': prompt
12        }]
13    )
14    return completion.choices[0].message.content
15
16question = input('Enter a question: ')
17answer = get_answer_from_openai(question)
18print(answer)

Examples

Now we can ask questions about information from out documents and retrieve an answer from OpenAI. Using around 800 Wikipedia articles about different topics as our example documents, we try out the following question:

What role does the president play in the political system of Angola?

The Pinecone index yields the vector of the document Politics of Angola as most similar to the embedded query. Using this document in our prompt enables OpenAI to answer the question correctly:

The president in the political system of Angola holds almost absolute power. They are the head of state and head of government, as well as the leader of the winning party or coalition. The president appoints and dismisses members of the government, members of various courts, the Governor and Vice-Governors of the Nacional Angolan Bank, the General-Attorney and their deputies, the Governors of the provinces, and many other key positions in the government, military, police, intelligence, and security organs. The president is also responsible for defining the policy of the country and has the power to promulgate laws and make edicts. However, the president is not directly involved in making laws.

While this is already impressive, it has to be mentioned that we cheated a little, as we used Wikipedia articles as our documents for testing the system. As the OpenAI model was trained on publicly available internet content, it is likely that it has seen this exact article and would have been able to answer the question anyway, even without receiving the document as part of the input prompt. Hence, we will have a look at what happens when we ask about a topic that has not been seen by the OpenAI model before.

To this end, I made up an article about the fictional chemist Jacob Miller, who was born in 1850, discovered a chemical element in 1886, which he named Jacobium, received the Nobel Prize for his findings in 1901, and died in 1932. You can find the article in the data directory of the GitHub repository. The document was embedded along with all other documents and inserted into the Pinecone index. Now, let's have a look at what happens when we ask about Jacob Miller:

In which year did the discoverer of the chemical element Jacobium win the nobel prize?

As we would expect, the Pinecone index yields the vector of the document Jacob Miller (Chemist) as most similar to the embedded query. Using this document in the prompt enables the chat completion model to provide the correct answer to the question:

The discoverer of the chemical element Jacobium won the Nobel Prize in Chemistry in the year 1901.

Conclusion

As demonstrated in this article, we can combine large language models and vector databases to build intelligent question answering systems, that are able to answer questions about specific content from our documents. With managed services like OpenAI and Pinecone, we can easily build such systems without having to worry about training our own models or setting up and maintaining our own vector database.

As a disclaimer, I want to note that data privacy should be considered when using the approach presented in this article to build a question answering system on documents that contain sensitive information. While OpenAI claims, that they do not train their models on inputs provided via the API, you might still want to consider alternatives when working with private or enterprise data. Instead of using OpenAI, you could use one of the many open source LLMs and host it yourself, which naturally comes with significantly greater expenses.

Further, Pinecone is not the only vector database service available. An interesting alternative is the open-source database solution Chroma, which actually comes with its own embedding model. Instead of inserting and querying embedding vectors into the database, Chroma allows us to work with the texts directly, rendering the use of OpenAI's embedding model unnecessary. Another alternative is Faiss, which is an open-source library for efficient GPU-driven vector similarity search. Compared to Pinecone, Faiss lacks the ability to store vectors, as it is only a vector index, not a vector database. Both Chroma and Faiss require you to host a database yourself, which make them somewhat less convenient to use than the managed Pinecone database.

Was this post helpful?

Blog author

Lukas Lehmann

Full Stack Developer

Do you still have questions? Just send me a message.

fromLukas Lehmann

Server Actions in Next.js 14

Server Actions were introduced in Next.js 14 as a new method to send data to the server (see the documentation). They are asynchronous functions that can be used in server components, within server-side forms, as well as in client-side components. While...

Webdevelopment
React
JavaScript

10.6.2024 | 9 minutes reading time

Lukas Lehmann

Macro annotations in Scala 3

In a previous blog post we took a look at macro annotations in Scala 2, where they have been present for a while. Only recently they have been added to Scala 3 as well, specifically in the pre-release version 3.3.0-RC2 of the Dotty compiler. Same as...

Scala

4.4.2023 | 9 minutes reading time

Lukas Lehmann

Macro annotations in Scala 2

In this blog post we will take a look at macro annotations, a powerful tool for code transformation and generation in Scala. Macro annotations allow us to transform the code of a definition, e.g., a class or method, at compile time. This can be used ...

Scala

28.3.2023 | 11 minutes reading time

Lukas Lehmann

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

On January 27, 2025, the technology stock exchange experienced an unexpected crash: The NVIDIA stock price plummeted by over 17%, temporarily wiping out nearly $600 billion in market value and setting a new historical record in the stock market. Many...

AI
Generative AI
LLM

29.1.2025 | 8 [Missing String "readingTime"]

How we can hack an AI with just a few words

How we can hack an AI with just a few words Artificial intelligence (AI) has undergone an astonishing transformation in recent years and is now present in many areas of life. Whether in the form of chatbots that help us with everyday questions or generative...

IT-Security
AI

27.1.2025 | 4 [Missing String "readingTime"]

Simplifying LLM Application Development: A Newcomer's Perspective

I. Introduction Large Language Models (LLMs) have become highly popular due to their transformative impact on various fields, especially within IT. They enable developers to create innovative software applications centered around AI interactions, offering...

Generative AI
AI

6.12.2024 | 13 [Missing String "readingTime"]

Function Calling with GPT Models

GenAI is a powerful tool for generating content and interacting with applications using natural language. However, this tool also has significant limitations when you plan to use it in your own software. GenAI's knowledge is limited to information that...

Generative AI
AI
LLM

6.9.2024 | 5 [Missing String "readingTime"]

Fighting Gandalf with magic spells (the spells are prompt injections) ...

Note: Do not attack any systems for which you do not have explicit permission to do so. In this article, I will recount the tale of outwitting a large language model by performing prompt injection attacks. Before we start, let's establish a common baseline...

IT-Security
AI

10.7.2023 | 12 [Missing String "readingTime"]

Michael Wagner

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

In this article, we'll explore how to use the Poetry package manager to manage the dependencies of a machine learning project that makes use of the M1 GPU for TensorFlow training. We'll cover the motivation for using Poetry in this context, and we'll...

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 [Missing String "readingTime"]

Denis Stalz-John

How to use Java classes in Python

There is an old truism: “Use the right tool for the job.” However, in building software, we are often forced to nail in screws, just because the rest of the application was built with the figurative hammer Java. Of course, one of the preferred solutions...

AI
Java
Python

15.11.2021 | 8 [Missing String "readingTime"]

The universal recommender in Action(ML)

IntroductionRecommender systems have become crucial for many different businesses. E-commerce uses recommenders to guide their customers in finding the right products and to assure they stay on the site. Newspapers or entertainment websites want to keep...

AI
NoSQL
Data
Machine Learning
Python

18.4.2021 | 11 [Missing String "readingTime"]

Francesca Diana

NER with little data? Transformers to the rescue!

How do you solve deep learning problems with too little labelled data? The answer, of course, is transfer learning. In this post, we will apply this concept to named entity recognition (NER) andfine-tune a pre-trained BERT to extract information from...

Data
Machine Learning
AI
NLP
Agile transformation

14.12.2020 | 8 [Missing String "readingTime"]

Take control of named entity recognition with your own Keras model!

This post shows how to extract information from text documents with the high-level deep learning library Keras : we build, train and evaluate a bidirectional LSTM model by hand for a custom named entity recognition (NER) task on legal texts.In a previous...

Data
Python
AI
NLP
Machine Learning

13.11.2020 | 9 [Missing String "readingTime"]

NER @ CLI: Custom-named entity recognition with spaCy in four lines

Named entity recognition is a technical term for a solution to a key automation problem: extraction of information from text. Applications includeautomation of business processes involving documentsdistillation of data from the web by scraping websitesindexing...

Data
AI
NLP
Machine Learning

6.11.2020 | 9 [Missing String "readingTime"]

DISH-O-TRON – Train that vision model!

With this article we continue our endeavor of building dish-o-tron – an AI system designed to prevent the sudden appearance of dirty dishes in the community kitchen sink, and hence turning the community kitchen into a place of peace and harmony.This ...

AI
Computer Vision

11.10.2020 | 11 [Missing String "readingTime"]

Marcel Mikl

DISH-O-TRON – Gather that DATA you must!

This is the second article in our dish-o-tron series (a non-standard Deep Learning tutorial) in which we tackle one of the biggest problems in community kitchens: coming across someone else’s dirty dishes. We are facing this problem by building a state...

AI
Computer Vision
Machine Learning

24.9.2020 | 11 [Missing String "readingTime"]

Marcel Mikl

DISH-O-TRON – No more dirty dishes thanks to AI

Sadly, to tell you the truth, doing dishes is still a thing. However, so far most of our readers still like our non-standard Deep Learning tutorial.Typically, AI is demonstrated as solving various toy problems. AI plays chess and Go, AI plays video games...

10.9.2020 | 7 [Missing String "readingTime"]

Marcel Mikl

Why user-oriented development is so important – the story of tactics.ai

In this blog post, we want to give you an insight into the product development of tactics.ai. Our initial idea was a data-driven football analysis tool that applies machine learning techniques to analyze the strengths and weaknesses of opponents and ...

Agile
AI
Startup
Machine Learning
Product management

23.8.2020 | 8 [Missing String "readingTime"]

Denis Stalz-John

Thinking AI means re-thinking data

While doing AI is sexy and cool, data infrastructure is typically not considered any of this. However, production-grade machine learning applications heavily rely on proper data infrastructure. Hence, in order to generate actual business value, solid...

AI
Big Data
Data
Machine Learning

27.5.2020 | 7 [Missing String "readingTime"]

Marcel Mikl

Kofax Transformation Modules: Natural Language Processing, sentiments ...

Kofax Transformation Modules (KTM) offers several tools for document classification and data extraction. There are some older blog articles about these tools:– Document classification – Data extraction with format locators – Machine Learning The ...

Content Management
AI
Archiving
NLP

6.4.2020 | 8 [Missing String "readingTime"]

Physical regression testing for the Thermomix

Automating physical regression testing of products with computer vision and roboticsTesting a physical product can be a highly manual task. The advances in Deep Learning techniques and computer vision have led to a situation where we can start to strive...

AWS
IoT
Computer Vision
Product management
AI
Testing

31.3.2020 | 8 [Missing String "readingTime"]

Remote training with GitLab-CI and DVC

In many Data Science projects there is a point in time where the workstation under your desk is not the ideal machine to perform the model training anymore. More potent processors and GPUs are required, e.g. a suitable server in your company’s rack or...

Git
Machine Learning
CI/CD
AI
GitLab

27.1.2020 | 15 [Missing String "readingTime"]

Marcel Mikl

AWS SageMaker Machine Learning Data handling

Seven ways of handling image and machine learning data with AWS SageMaker and S3If you start using AWS machine learning services, you will have to dive into data handling with AWS SageMaker and S3. We want to show you seven ways of handling image and...

AWS
Computer Vision
Data
AI
Machine Learning

17.1.2020 | 10 [Missing String "readingTime"]

Answer questions about your documents with OpenAI and Pinecone

Approach

Implementation

Set up OpenAI and Pinecone

Create the Pinecone index

Embed and store your documents in the Pinecone index

Answer questions about the documents

Examples

Conclusion

Was this post helpful?

Blog author

More articles

Server Actions in Next.js 14

Macro annotations in Scala 3

Macro annotations in Scala 2

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

How we can hack an AI with just a few words

Simplifying LLM Application Development: A Newcomer's Perspective

Function Calling with GPT Models

Fighting Gandalf with magic spells (the spells are prompt injections) ...

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

How to use Java classes in Python

The universal recommender in Action(ML)

NER with little data? Transformers to the rescue!

Take control of named entity recognition with your own Keras model!

NER @ CLI: Custom-named entity recognition with spaCy in four lines

DISH-O-TRON – Train that vision model!

DISH-O-TRON – Gather that DATA you must!

DISH-O-TRON – No more dirty dishes thanks to AI

Why user-oriented development is so important – the story of tactics.ai

Thinking AI means re-thinking data

Kofax Transformation Modules: Natural Language Processing, sentiments ...

Physical regression testing for the Thermomix

Remote training with GitLab-CI and DVC

AWS SageMaker Machine Learning Data handling