So I decided to make a discord bot that uses a locally running Large Language Model. In this case, I used gemma2:9b, a lightweight, low parameter LLM that can be run with limited VRAM (llama70b needs at least 120 GB of VRAM to run at a reasonable speed). I installed it using Ollama. Once you’ve downloaded and installed Ollama, pull the LLM model you want in your terminal with a similar command.
ollama pull gemma2:9b
After, type this to run it locally.
ollama run gemma2
Now you can run the model locally and send it messages:
Once you’ve done that it’s time to set up Discord. You’re going to want to go to Discord Applications and log in. Once there, select “New Application” button.
Click on bot on the left tab and set up the following options. Name your bot, mine is LocalGemma2Bot.
Turn on Oauth2 Code Grant
Then set Message Content Intent to True:
Now go to OAuth2 on the left tab. And under URL Generator Select “bot”.
Under Bot Permissions select Send Messages and Read Message History.
Copy the generated URL at the bottom and go to that webpage. Now you can add the bot to a server:
You’ll want to take note of the bot token on the bot page on from the left tab to save for later. Now you’re ready to write the Python code. Create a bot.py file, and here is code for the defaulted Ollama gemma2 settings:
import os
from dotenv import load_dotenv
import json
import discord
import requests
load_dotenv()
intents = discord.Intents.default()
intents.message_content = True
client = discord.Client(intents=intents)
DISCORD_TOKEN = os.getenv('DISCORD_TOKEN')
OLLAMA_API_URL = os.getenv('OLLAMA_API_URL')
@client.event
async def on_message(message):
if message.author == client.user:
return
if message.content.startswith('!ask '):
query = message.content[5:]
payload = {
'model': 'gemma2',
'prompt': query
}
response = requests.post(OLLAMA_API_URL, json=payload)
if response.status_code == 200:
try:
json_objects = response.text.split('\n')
responses = []
for obj in json_objects:
if obj.strip():
parsed_obj = json.loads(obj)
responses.append(parsed_obj['response'])
final_response = ''.join(responses)
await message.channel.send(final_response)
except ValueError as e:
await message.channel.send("Sorry, I couldn't parse the response from the LLM.")
else:
await message.channel.send("Sorry, I couldn't get a response from the LLM.")
client.run(DISCORD_TOKEN)
This script will send your discord bot’s query to your local machine where it will return the local LLM’s series of tokens, combine and clean those json token responses, and send them to the channel.
For proper security, set up a .env file, and grab your discord bot token from the discord webpage under the bot tab and enter this into your .env:
DISCORD_TOKEN=YOUR_DISCORD_TOKEN_HERE
Also add your Ollama_Api_Url variable to the .env file as well, but you’ll want to generate your own port:
OLLAMA_API_URL=http://localhost:YOUR_PORT_NUMBER_HERE/api/generate
Once you’ve created this .env file and bot.py file, create a pip environment and install the following dependencies.
pip install requests
pip install discord
pip install python-dotenv
Now you’re ready to run your bot from the terminal by running the bot.py Python script. Use !ask MESSAGE_HERE in discord to query the bot.