A private llama; To-Go!
Ollama, PrivateGPT, and even my very own local llama project are all very popular due to their ability to run LLMs locally on consumer grade hardware, and keep your chats and data private…but what about when you’re on the go?
In this tutorial I’ll show you how to create your own Ollama chatbot you can take anywhere!
As I progressed in my learning journey, I started contemplating different projects and resources I had encountered. It occurred to me: why not merge them all to create my own personalized chatbot that I could access anytime, anywhere on my mobile device?
Those separate resources I am referring to are Ollama, Runpod, and Enchanted.
Ollama, as you have most likely seen and heard, allows you to serve LLMs on consumer grade hardware.
Runpod provides cloud GPU’s at a low cost for you to do all sorts of things, check them out here https://runpod.io?ref=1n4hk44x
Enchanted is a really cool open source project that gives iOS users a beautiful mobile UI for chatting with your Ollama LLM.
With brief definitions out of the way, let's get started with Runpod.
The first step I took to get this accomplished was navigating to https://www.runpod.io/console/gpu-cloud and selecting the GPU I wanted.
Next under templates, I searched for and added the VS code server template.
Lastly I added the env vars from step 1 of this tutorial: https://docs.runpod.io/tutorials/pods/run-ollama and deployed my pod!
Runpod deployment selections
Runpod makes it stupidly simple to connect to your VS code server using github auth. You simply use the URL in the container startup logs visible from the Runpod UI and enter the code given. Voila, you now can navigate back to vscode and interact with your pod. This brings me to our next step, setting up ollama.
The first thing to do is to run the command below to get ollama installed and start the service.
(curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1)
&
Once installed, you will want to run
CODE:ollama run your_model_name
CODE:Ollama will begin pulling your models image and then you can start your chat in the terminal to check everything is working okay.
Now for the final, and most difficult step for me to figure out 😅
The failure:
I began with ngrok and executed the commands below, but was met with the error:
ERR_NGROK_121
Message
Your PRODUCT version "VERSION" is too old. The minimum supported agent version for your account is "MINVERSION". Please update to a newer version with ngrok update
However when I ran ngrok update, it said I had the latest version…weird.
The success!
On to npx localtunnel!
Now we will use npx to create a localtunnel that will allow our ollama server to be reached from anywhere. Just execute the following commands:
# Install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
# Set NVM_DIR environment variable
export NVM_DIR="$HOME/.nvm"
# Load nvm script
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
# Install and use Node.js version 12 using nvm
nvm install 12
nvm use 12
# Install npx globally
npm install -g npx
# Use npx to start localtunnel
npx localtunnel --port 11434
This will output a URL to be used in your terminal. Save it for later!
One last thing left to do before we can start chatting anywhere with our Ollama LLM.
Navigate to the app store and install enchanted LLM. Once installed, navigate to settings and enter the URL we saved that local tunnel printed into your VScode server terminal.
Now you are all set to begin chatting with your very own ollama LLM with blazing fast inference.