ESP32

ESP32-S3 AI Chatbot: Build a Voice Assistant for $51

Step-by-step guide to building a local AI voice assistant using the LAFVIN ESP32-S3 kit and Tailscale. Private, offline, and fully customizable.

[UNIT_ACQUISITION_REQUIRED]
πŸ›’ BUY ON AMAZON _>

ESP32-S3 AI Chatbot: Build a Voice Assistant for $51

The era of centralized smart speakers is over. Amazon and Google listen to every word, store your data, and charge you for the privilege. In this transmission, we demonstrate how to build a fully autonomous, private, offline-capable voice assistant using the LAFVIN ESP32-S3 kit and your own local LLM.

Total cost: $51. Zero subscriptions. Zero eavesdropping.


[WHAT_YOU_WILL_BUILD]

A voice-activated AI assistant that:

  • Wakes on your custom wake word (β€œHey Dragon,” β€œHey Nelson,” or whatever you choose)
  • Streams your voice to a local LLM running on your own hardware (RTX 3060, Mini PC, or laptop)
  • Speaks responses through the onboard speaker or your existing Alexa Echo
  • Displays conversation text and animated eyes on the 2” TFT screen
  • Lives entirely on your Tailscale mesh network β€” no cloud, no subscriptions

[SPECIFICATIONS]

ComponentDetail
BrainESP32-S3 Xtensa LX7 dual-core @ 240MHz
Memory512KB SRAM + 8MB PSRAM + 16MB Flash
Audio InI2S MEMS microphone (INMP441)
Audio OutMAX98357A I2S amplifier + 3W speaker
Display2.0” TFT ST7789 240Γ—240 IPS
Connectivity2.4GHz Wi-Fi + Bluetooth 5 LE
GPIO45 programmable pins (for sensors, LEDs, servos)
Wake WordESP-SR offline wake word engine (no internet needed)

[HARDWARE_ACQUISITION]

πŸ›’ The Core Kit

The LAFVIN ESP32-S3 AI Chatbot Kit is the heart of this build. It arrives pre-assembled β€” no soldering required. The modular design means you snap the display, mic, and speaker boards together in under 5 minutes.

What’s in the box:

  • ESP32-S3 control board (16MB Flash, 8MB PSRAM)
  • 2.0” TFT-SPI ST7789 display module
  • I2S digital microphone module (INMP441)
  • I2S amplifier module (MAX98357A) with 3W speaker
  • USB-C cable for flashing and power
  • Standoffs and screws for modular assembly

[DEPLOYMENT_SEQUENCE]

Step 1: Assemble the Hardware (5 minutes)

The kit is modular. Follow the pin markings β€” each module only connects one way.

ESP32-S3 Board
β”œβ”€β”€ Display  β†’ SPI header (labeled "TFT")
β”œβ”€β”€ Mic      β†’ I2S header (labeled "I2S IN")
β”œβ”€β”€ Speaker  β†’ I2S header (labeled "I2S OUT")
└── USB-C    β†’ Power + Programming

Snap together. No soldering. Done.


Step 2: Flash the Firmware

We’ll use XiaoZhi-ESP32 β€” the open-source firmware with 26,000+ GitHub stars that powers this exact kit. It supports both DeepSeek and OpenAI APIs out of the box, plus custom WebSocket endpoints for local LLMs.

2.1 Install esptool

pip install esptool

2.2 Download the firmware

git clone https://github.com/78/xiaozhi-esp32.git
cd xiaozhi-esp32

2.3 Flash to the ESP32-S3

Connect the USB-C cable. On Windows (from WSL):

esptool.py --chip esp32s3 --port /dev/ttyUSB0 erase_flash
esptool.py --chip esp32s3 --port /dev/ttyUSB0 write_flash -z 0x0 build/bootloader/bootloader.bin 0x10000 build/partition_table/partition-table.bin 0x20000 build/xiaozhi.bin

On Windows directly (PowerShell):

esptool.py --chip esp32s3 --port COM3 erase_flash
esptool.py --chip esp32s3 --port COM3 write_flash -z 0x0 .\build\bootloader\bootloader.bin 0x10000 .\build\partition_table\partition-table.bin 0x20000 .\build\xiaozhi.bin

Step 3: Configure Wi-Fi + Tailscale

After flashing, the ESP32-S3 creates a hotspot. Connect to it from your phone or laptop:

  1. Join Wi-Fi network: XiaoZhi-Config
  2. Open http://192.168.4.1 in a browser
  3. Enter your home Wi-Fi credentials
  4. Enter your LLM server’s Tailscale IP (e.g., 100.xxx.xxx.xxx)
  5. Set the WebSocket port (default: 8080)

The ESP32 will reboot and connect to your network.


Step 4: Set Up the Local LLM Server

On your NukBox or any machine with a GPU:

Option A: Ollama (Simplest)

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3:8b
ollama pull qwen2.5:7b

Option B: vLLM (Production)

pip install vllm
vllm serve mistralai/Mistral-7B-Instruct-v0.3 --host 0.0.0.0 --port 8000

Step 5: Install Speech-to-Text + Text-to-Speech

On the LLM server, run this bridge:

# STT: Whisper
pip install faster-whisper
# TTS: Piper (offline, fast)
pip install piper-tts

Or use the XiaoClaw firmware which bundles STT/TTS directly on the ESP32:

git clone https://github.com/beancookie/xiaoclaw.git

Step 6: Wire It All Together with n8n

Create an n8n workflow (or a simple Python WebSocket server) that:

ESP32 Voice Input β†’ WebSocket β†’ n8n/Webhook
    β†’ Whisper STT (audio β†’ text)
    β†’ Ollama/vLLM (text β†’ response)
    β†’ Piper TTS (response β†’ audio)
    β†’ WebSocket back to ESP32
    β†’ Speaker plays response

This keeps your voice data air-gapped β€” it never leaves your home network.


Step 7: Test the Wake Word

Say your wake word. The ESP-SR engine runs entirely on the ESP32-S3 β€” no internet, no cloud. It fires instantly.

First test: β€œHey Dragon, what’s the weather?” Response: β€œI don’t have internet access, but your RTX 3060 is at 42Β°C and you have 3 unread emails.”


[ADVANCED_MODS]

Connect to Alexa as a Speaker

The ESP32’s MAX98357A amplifier outputs via I2S. To use your Alexa Echo as the speaker:

  1. Option A: ESP32 I2S β†’ 3.5mm aux jack β†’ Alexa line-in port
  2. Option B: ESP32 Bluetooth β†’ pair to Alexa as Bluetooth speaker
  3. Option C: Use the onboard 3W speaker for a fully self-contained unit

3D Printed Chassis

Design a custom enclosure on your Bambu A1:

  • Cutout for the 2” TFT display (visible through the shell)
  • Mic port aligned with the INMP441
  • Speaker grille for sound passthrough
  • Ventilation slots for the ESP32 (enclosed PLA needs airflow)

Add LED Eyes

Wire WS2812B NeoPixels to GPIO 48. Program animated eye patterns that β€œblink” while the LLM thinks.


<PartsList parts={[ { name: β€œLAFVIN ESP32-S3 AI Chatbot Kit”, link: β€œhttps://amzn.to/lafvin-esp32-s3-kit”, price: β€œ$51.03” }, { name: β€œN100 Mini PC (LLM Server)”, link: β€œhttps://amzn.to/beelink-s12-pro”, price: β€œ$169.00” }, { name: β€œUSB-C Data Cable”, link: β€œhttps://amzn.to/usbc-data-cable”, price: β€œ$7.99” }, { name: β€œWS2812B NeoPixel Ring (optional)”, link: β€œhttps://amzn.to/neopixel-ring”, price: β€œ$9.99” }, { name: β€œ3.5mm Aux Cable (Alexa mod)”, link: β€œhttps://amzn.to/aux-cable”, price: β€œ$5.99” } ]} />


<RelatedTools tools={[ { title: β€œXiaoZhi-ESP32”, link: β€œhttps://github.com/78/xiaozhi-esp32”, description: β€œ26K+ star open-source firmware for ESP32 voice chat. Pre-loaded MCP support.” }, { title: β€œXiaoClaw”, link: β€œhttps://github.com/beancookie/xiaoclaw”, description: β€œLocal AI Agent firmware with tool calling, memory, and autonomous task execution.” }, { title: β€œOllama”, link: β€œhttps://ollama.ai”, description: β€œRun Llama 3, Qwen 2.5, Mistral locally with one command.” }, { title: β€œTailscale”, link: β€œhttps://tailscale.com”, description: β€œMesh VPN connecting your ESP32, server, and phone on one private network.” }, { title: β€œn8n”, link: β€œhttps://n8n.io”, description: β€œWorkflow automation to orchestrate STT β†’ LLM β†’ TTS pipelines.” } ]} />


[TRouBLeSHooTiNg]

SymptomFix
ESP32 won’t flashHold BOOT button while plugging in USB-C
No audio inputCheck I2S pin mapping in config (default: WS=4, SCK=5, SD=6)
LLM connection refusedVerify Tailscale IP and that firewall allows port 8080
Wake word not triggeringFlash ESP-SR model appropriate for your language (en/cn)
Choppy audio over LTEForce Tailscale DERP relay mode for UDP-blocked networks

[NEXT_STEPS]

Now that your voice assistant is live, extend it:

  1. [Add tool calling] β€” Let it control smart home devices, check system stats, or trigger n8n workflows
  2. [3D print a dragon chassis] β€” See our Dragon Smart Speaker build
  3. [Run fully offline] β€” Switch to XiaoClaw firmware for local LLM inference directly on ESP32-S3
  4. [Build a cyberdeck terminal] β€” See our NerdDeck build guide

Transmission complete. Your voice assistant is no longer a corporate listening device. It’s yours.