ESP32-S3 AI Chatbot: Build a Voice Assistant for $51
Step-by-step guide to building a local AI voice assistant using the LAFVIN ESP32-S3 kit and Tailscale. Private, offline, and fully customizable.
ESP32-S3 AI Chatbot: Build a Voice Assistant for $51
The era of centralized smart speakers is over. Amazon and Google listen to every word, store your data, and charge you for the privilege. In this transmission, we demonstrate how to build a fully autonomous, private, offline-capable voice assistant using the LAFVIN ESP32-S3 kit and your own local LLM.
Total cost: $51. Zero subscriptions. Zero eavesdropping.
[WHAT_YOU_WILL_BUILD]
A voice-activated AI assistant that:
- Wakes on your custom wake word (βHey Dragon,β βHey Nelson,β or whatever you choose)
- Streams your voice to a local LLM running on your own hardware (RTX 3060, Mini PC, or laptop)
- Speaks responses through the onboard speaker or your existing Alexa Echo
- Displays conversation text and animated eyes on the 2β TFT screen
- Lives entirely on your Tailscale mesh network β no cloud, no subscriptions
[SPECIFICATIONS]
| Component | Detail |
|---|---|
| Brain | ESP32-S3 Xtensa LX7 dual-core @ 240MHz |
| Memory | 512KB SRAM + 8MB PSRAM + 16MB Flash |
| Audio In | I2S MEMS microphone (INMP441) |
| Audio Out | MAX98357A I2S amplifier + 3W speaker |
| Display | 2.0β TFT ST7789 240Γ240 IPS |
| Connectivity | 2.4GHz Wi-Fi + Bluetooth 5 LE |
| GPIO | 45 programmable pins (for sensors, LEDs, servos) |
| Wake Word | ESP-SR offline wake word engine (no internet needed) |
[HARDWARE_ACQUISITION]
π The Core Kit
The LAFVIN ESP32-S3 AI Chatbot Kit is the heart of this build. It arrives pre-assembled β no soldering required. The modular design means you snap the display, mic, and speaker boards together in under 5 minutes.
Whatβs in the box:
- ESP32-S3 control board (16MB Flash, 8MB PSRAM)
- 2.0β TFT-SPI ST7789 display module
- I2S digital microphone module (INMP441)
- I2S amplifier module (MAX98357A) with 3W speaker
- USB-C cable for flashing and power
- Standoffs and screws for modular assembly
[DEPLOYMENT_SEQUENCE]
Step 1: Assemble the Hardware (5 minutes)
The kit is modular. Follow the pin markings β each module only connects one way.
ESP32-S3 Board
βββ Display β SPI header (labeled "TFT")
βββ Mic β I2S header (labeled "I2S IN")
βββ Speaker β I2S header (labeled "I2S OUT")
βββ USB-C β Power + Programming
Snap together. No soldering. Done.
Step 2: Flash the Firmware
Weβll use XiaoZhi-ESP32 β the open-source firmware with 26,000+ GitHub stars that powers this exact kit. It supports both DeepSeek and OpenAI APIs out of the box, plus custom WebSocket endpoints for local LLMs.
2.1 Install esptool
pip install esptool
2.2 Download the firmware
git clone https://github.com/78/xiaozhi-esp32.git
cd xiaozhi-esp32
2.3 Flash to the ESP32-S3
Connect the USB-C cable. On Windows (from WSL):
esptool.py --chip esp32s3 --port /dev/ttyUSB0 erase_flash
esptool.py --chip esp32s3 --port /dev/ttyUSB0 write_flash -z 0x0 build/bootloader/bootloader.bin 0x10000 build/partition_table/partition-table.bin 0x20000 build/xiaozhi.bin
On Windows directly (PowerShell):
esptool.py --chip esp32s3 --port COM3 erase_flash
esptool.py --chip esp32s3 --port COM3 write_flash -z 0x0 .\build\bootloader\bootloader.bin 0x10000 .\build\partition_table\partition-table.bin 0x20000 .\build\xiaozhi.bin
Step 3: Configure Wi-Fi + Tailscale
After flashing, the ESP32-S3 creates a hotspot. Connect to it from your phone or laptop:
- Join Wi-Fi network:
XiaoZhi-Config - Open
http://192.168.4.1in a browser - Enter your home Wi-Fi credentials
- Enter your LLM serverβs Tailscale IP (e.g.,
100.xxx.xxx.xxx) - Set the WebSocket port (default:
8080)
The ESP32 will reboot and connect to your network.
Step 4: Set Up the Local LLM Server
On your NukBox or any machine with a GPU:
Option A: Ollama (Simplest)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3:8b
ollama pull qwen2.5:7b
Option B: vLLM (Production)
pip install vllm
vllm serve mistralai/Mistral-7B-Instruct-v0.3 --host 0.0.0.0 --port 8000
Step 5: Install Speech-to-Text + Text-to-Speech
On the LLM server, run this bridge:
# STT: Whisper
pip install faster-whisper
# TTS: Piper (offline, fast)
pip install piper-tts
Or use the XiaoClaw firmware which bundles STT/TTS directly on the ESP32:
git clone https://github.com/beancookie/xiaoclaw.git
Step 6: Wire It All Together with n8n
Create an n8n workflow (or a simple Python WebSocket server) that:
ESP32 Voice Input β WebSocket β n8n/Webhook
β Whisper STT (audio β text)
β Ollama/vLLM (text β response)
β Piper TTS (response β audio)
β WebSocket back to ESP32
β Speaker plays response
This keeps your voice data air-gapped β it never leaves your home network.
Step 7: Test the Wake Word
Say your wake word. The ESP-SR engine runs entirely on the ESP32-S3 β no internet, no cloud. It fires instantly.
First test: βHey Dragon, whatβs the weather?β Response: βI donβt have internet access, but your RTX 3060 is at 42Β°C and you have 3 unread emails.β
[ADVANCED_MODS]
Connect to Alexa as a Speaker
The ESP32βs MAX98357A amplifier outputs via I2S. To use your Alexa Echo as the speaker:
- Option A: ESP32 I2S β 3.5mm aux jack β Alexa line-in port
- Option B: ESP32 Bluetooth β pair to Alexa as Bluetooth speaker
- Option C: Use the onboard 3W speaker for a fully self-contained unit
3D Printed Chassis
Design a custom enclosure on your Bambu A1:
- Cutout for the 2β TFT display (visible through the shell)
- Mic port aligned with the INMP441
- Speaker grille for sound passthrough
- Ventilation slots for the ESP32 (enclosed PLA needs airflow)
Add LED Eyes
Wire WS2812B NeoPixels to GPIO 48. Program animated eye patterns that βblinkβ while the LLM thinks.
<PartsList parts={[ { name: βLAFVIN ESP32-S3 AI Chatbot Kitβ, link: βhttps://amzn.to/lafvin-esp32-s3-kitβ, price: β$51.03β }, { name: βN100 Mini PC (LLM Server)β, link: βhttps://amzn.to/beelink-s12-proβ, price: β$169.00β }, { name: βUSB-C Data Cableβ, link: βhttps://amzn.to/usbc-data-cableβ, price: β$7.99β }, { name: βWS2812B NeoPixel Ring (optional)β, link: βhttps://amzn.to/neopixel-ringβ, price: β$9.99β }, { name: β3.5mm Aux Cable (Alexa mod)β, link: βhttps://amzn.to/aux-cableβ, price: β$5.99β } ]} />
<RelatedTools tools={[ { title: βXiaoZhi-ESP32β, link: βhttps://github.com/78/xiaozhi-esp32β, description: β26K+ star open-source firmware for ESP32 voice chat. Pre-loaded MCP support.β }, { title: βXiaoClawβ, link: βhttps://github.com/beancookie/xiaoclawβ, description: βLocal AI Agent firmware with tool calling, memory, and autonomous task execution.β }, { title: βOllamaβ, link: βhttps://ollama.aiβ, description: βRun Llama 3, Qwen 2.5, Mistral locally with one command.β }, { title: βTailscaleβ, link: βhttps://tailscale.comβ, description: βMesh VPN connecting your ESP32, server, and phone on one private network.β }, { title: βn8nβ, link: βhttps://n8n.ioβ, description: βWorkflow automation to orchestrate STT β LLM β TTS pipelines.β } ]} />
[TRouBLeSHooTiNg]
| Symptom | Fix |
|---|---|
| ESP32 wonβt flash | Hold BOOT button while plugging in USB-C |
| No audio input | Check I2S pin mapping in config (default: WS=4, SCK=5, SD=6) |
| LLM connection refused | Verify Tailscale IP and that firewall allows port 8080 |
| Wake word not triggering | Flash ESP-SR model appropriate for your language (en/cn) |
| Choppy audio over LTE | Force Tailscale DERP relay mode for UDP-blocked networks |
[NEXT_STEPS]
Now that your voice assistant is live, extend it:
- [Add tool calling] β Let it control smart home devices, check system stats, or trigger n8n workflows
- [3D print a dragon chassis] β See our Dragon Smart Speaker build
- [Run fully offline] β Switch to XiaoClaw firmware for local LLM inference directly on ESP32-S3
- [Build a cyberdeck terminal] β See our NerdDeck build guide
Transmission complete. Your voice assistant is no longer a corporate listening device. Itβs yours.