Since Meta released the open source large language model Llama2, thanks to the effort of the community, the barrier to access a LLM to developers and normal users is largely removed, which is the so called democratised LLM.
Lets explore running a LLM on a KVM based VM.
Create a VM on a host machine of AMD processor 2.3Ghz (No GPU available) with 24 vCPU and 64GB memory, Ubuntu 22.04 Jammy Jellyfish.
Install the base build tools,
sudo apt update -y
sudo apt install -y build-essential
The Engine to Run LLM
Thanks to the open source community, especially the llama.cpp project, its now possible to run a quantised LLM model (as in the latest format of GGUF) even without GPU.
Let’s built it.
git clone https://github.com/ggerganov/llama.cpp.git
A sample program is also built for you to test the LLM as a command line tool, let’s make install it to the system level.
sudo cp main /usr/local/bin/llm
The LLM Model
We have the command line engine, where to get the quantised model files? TheBlok, purveyor of fine local LLMs for your fun and profit, has provided us many of the GGUF LLM model from huggingface.co.
Lets download the top performer, the WizardCoder’s model based on the Llama coder, a bit greedily with the 34B model.
curl -LO https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/resolve/main/wizardcoder-python-34b-v1.0.Q5_K_M.gguf
It will be taking a while as file size is huge, 23GB.
Test it with Command Line
Run the model with the following command,
llm -m ~/models/wizardcoder-python-34b-v1.0.Q5_K_M.gguf \
We run the model interactively using the full 24 CPU cores. Ask a question to create a python code
> Write a python code to calculate Fibonacci numbers
To calculate the fibonacci sequence, we can use recursion. The nth Fibonacci…