Retrieval-Augmented Generation with Postgres Vector DB and LlamaIndex

Zhimin Wen
6 min readSep 27
Image by Willfried Wende from Pixabay

To allow Large Language Model deal with data that is not available during the pre-training stage or to deal with private data, one of the common technique without retraining the model is the Retrieval-Augmented Generation (RAG). In essence, this technique retrieve the relevant data and supplies them to the LLM for it to generate the result.

These text data can be represented with its embeddings format and saved into a vector database. When required, the similar embeddings of a question text can be retrieved based on the vector search algorithm from the database. The result together with the original question forms a relevant context, from which the LLM can inference its answer.

Lets explore RAG with Postgres vector DB. We will implement it using LLamaIndex framework without excessive magics, such as condensing the framework into 4 lines of code, instead we will drill down a bit deeper to understand how RAG system works.

Install the Postgres Vector DB on OpenShift

Install the Postgres DB with the operator of CloudNativePG.

oc apply -f

A namespace cnpg-system will be created and a deployment of cnpg-controller-manager suppose to be running. But as it lacks of the Security Context Constraint for running as any uid and the seccompProfiles, the pod is blocked. Create the following SCC and apply it,

kind: SecurityContextConstraints
name: anyuid-seccomp

allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
defaultAddCapabilities: null
type: RunAsAny
readOnlyRootFilesystem: false
type: RunAsAny
type: MustRunAs
type: RunAsAny
- configMap
- downwardAPI
- emptyDir
- ephemeral
- persistentVolumeClaim
- projected
- secret

- runtime/default