Skip to content

A Discord bot that listens to voice chat, transcribes speech using Whisper, generates responses using Ollama, and speaks back using ElevenLabs or Bark TTS.

License

Notifications You must be signed in to change notification settings

ajr-dev/discospeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


DiscoSpeech

discospeech forks discospeech stars discospeech pull-requests

A Discord bot that listens to voice chat, transcribes speech using Whisper, generates responses using Ollama, and speaks back using ElevenLabs or Bark (soon) TTS.

🚀 Features

  • Real-time voice transcription using OpenAI's Whisper
  • AI-powered responses using Ollama (local LLM)
  • Text-to-speech responses using ElevenLabs (cloud) or Bark (local, soon)
  • Automatic audio cleanup and management
  • Configurable logging system

Limits

  • Currently not group chat friendly
  • Not scalable to more servers at once
  • No local tts model option

Prerequisites

  • Python 3.8+
  • FFmpeg (for audio processing)
  • Ollama installed and running locally
  • Discord Bot Token
  • ElevenLabs API Key and Voice ID
  • CUDA-compatible GPU recommended (for faster transcription)

Installation

  1. Clone the repository
  2. Install the required Python packages:
pip install -r requirements.txt
  1. Copy config.example.json to config.json and fill in your credentials:
{
    "discord_token": "YOUR_DISCORD_BOT_TOKEN",
    "elevenlabs_api_key": "YOUR_ELEVENLABS_API_KEY",
    "voice_id": "YOUR_ELEVENLABS_VOICE_ID",
    "ollama_host": "http://localhost:11434",
    "ollama_model": "llama3.1:latest",
    "cleanup_responses": false
}

Configuration

Discord Bot Setup

  1. Go to the Discord Developer Portal
  2. Create a new application
  3. Add a bot to your application
  4. Enable Voice State and Message Intent permissions
  5. Copy the bot token to your config.json

ElevenLabs Setup

  1. Create an account at ElevenLabs
  2. Get your API key from the profile settings
  3. Choose a voice and copy its ID
  4. Add both to your config.json

Ollama Setup

  1. Install Ollama from ollama.ai
  2. Pull your preferred model:
ollama pull mistral

Commands

  • !join - Bot joins your current voice channel
  • !leave - Bot leaves the voice channel

💻 Usage

  1. Start the Ollama service
  2. Run the bot:
python main.py
  1. Invite the bot to your Discord server
  2. Join a voice channel
  3. Use !join to make the bot join
  4. The bot will:
    • Listen to voice chat
    • Transcribe speech in real-time
    • Generate responses using Ollama
    • Speak responses using ElevenLabs TTS or a local bark tts model

Project Structure

  • bot/ - Main bot module
    • services/ - Core services (audio, TTS, LLM)
    • voice/ - Voice processing components
    • utils/ - Utility functions
  • temp/ - Temporary audio files
  • responses/ - Generated audio responses
  • logs/ - Application logs

Logging

Logs are stored in logs/bot.log with automatic rotation at 10MB and keeping 5 backups.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{discospeech,
  author = {AJR},
  title = {DiscoSpeech: Realistic discord voice chat AI},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ajr-dev/discospeech}},

🌟 Star history

DiscoSpeech Star history Chart

License

MIT License

🙇 Acknowledgements

DiscoSpeech couldn't have been built without the help of great software already available. Thank you!

🤗 Contributors

This is a community project, a special thanks to our contributors! 🤗

About

A Discord bot that listens to voice chat, transcribes speech using Whisper, generates responses using Ollama, and speaks back using ElevenLabs or Bark TTS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages