Retrieval-Augmented Generation with Postgres Vector DB and LlamaIndex
To allow Large Language Model deal with data that is not available during the pre-training stage or to deal with private data, one of the common technique without retraining the model is the Retrieval-Augmented Generation (RAG). In essence, this technique retrieve the relevant data and supplies them to the LLM for it to generate the result.
These text data can be represented with its embeddings format and saved into a vector database. When required, the similar embeddings of a question text can be retrieved based on the vector search algorithm from the database. The result together with the original question forms a relevant context, from which the LLM can inference its answer.
Lets explore RAG with Postgres vector DB. We will implement it using LLamaIndex framework without excessive magics, such as condensing the framework into 4 lines of code, instead we will drill down a bit deeper to understand how RAG system works.
Install the Postgres Vector DB on OpenShift
Install the Postgres DB with the operator of CloudNativePG.
oc apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.2.yaml
cnpg-system will be created and a deployment of
cnpg-controller-manager suppose to be running. But as it lacks of the Security Context Constraint for running as any
uid and the
seccompProfiles, the pod is blocked. Create the following SCC and apply it,