Transientica
Beatbox into a mic, play a rhythm game. No MIDI controller, no drum pad - just your mouth.
Gameplay demo - beatboxing against beatmapped songs
From Prototype to Game
This started as a school research project in Python. Beatbox into a mic, classify the sound with SVM, send results to Unity over OSC. It worked, barely. Two processes running, 180ms latency at first, got it down to 45ms after weeks of optimization. But it was still a prototype, not a game.
So I rebuilt the whole thing in Unity from scratch. Threw out Python, OSC, scikit-learn, everything. Wrote the FFT, feature extraction, and classifier in plain C#. No external ML frameworks, no dependencies I don't control. Switched from SVM to KNN because it trains instantly and I can retrain between rounds without the player waiting.
Now it's an actual rhythm game. Notes scroll toward you, you beatbox the right sound at the right time, the game judges your timing. Combos, scoring, grades. The classifier trains on your voice specifically - 10 samples per sound, takes a few seconds, and it learns how you do a kick vs how I do a kick.
How It Works
The Audio Pipeline
Mic captures at 44.1 kHz into a lock-free ring buffer. When the onset detector picks up a hit (RMS threshold + derivative gate, adaptive noise floor from the first 40 buffers), it runs a 512-sample FFT and pulls out 7 spectral features. KNN classifies the sound. Onsets get backdated by ~5.8ms to account for the analysis window. All timing runs on AudioSettings.dspTime, not frame time. You need sub-frame precision for this to feel right.
Training
You record 10+ examples of each sound - kick, snare, hi-hat. The game runs onset detection and feature extraction on each one automatically. KNN stores the samples, computes normalization bounds, and does leave-one-out cross-validation to check if the model is any good. If accuracy is above 70%, you're set. If not, record cleaner samples. The whole thing saves to a JSON file and loads next time you open the game. If there's no trained model at all, a heuristic fallback uses spectral rules so the game is still playable.
Scoring
Each hit gets matched to the nearest note. ±60ms is Perfect (300 pts), ±130ms is Great (200), ±200ms is Good (100), anything else is a Miss. Combo multiplier goes from 1x to 8x as you chain hits. Letter grades from F to S. There's a latency calibration step: the game plays clicks at 100 BPM, you beatbox on beat, it measures the difference and stores your offset. You can also fine-tune it manually by ±1ms.
What Changed From the Prototype
Killed the Python Backend
WhyThe original had Python doing audio analysis and Unity running the game - two separate processes talking over OSC on localhost. It worked but it was fragile. If Python crashed, the game went deaf. Debugging across two runtimes was painful. Building a standalone .exe meant shipping a Python install alongside Unity.
NowEverything is C#, everything is in Unity. The FFT, feature extraction, KNN - all written from scratch. No Python, no OSC, no external ML packages. One process, one build, one thing to debug.
SVM → KNN
WhySVM was great for accuracy in the prototype (92% with scikit-learn). But retraining took seconds, needed a Python call, and the model format didn't play well with Unity serialization. I needed something that trains instantly and is trivial to implement in C#.
NowKNN with K=5 and inverse-distance weighting. Training is just "store the samples" - it's instant. Cross-validation runs in milliseconds. The whole trained model is a JSON file with feature vectors and normalization bounds. Easy to save, easy to load, easy to debug.
Made It an Actual Game
BeforeThe prototype was a tech demo. You could beatbox and see it classified. There was a basic rhythm element but no real game loop - no scoring, no progression, no reason to keep playing.
NowBeatmapped songs, scrolling notes with approach animations, combo system, letter grades, high score tracking. Full game flow: menu, training, song select, gameplay, results. Object pooling on notes (20 per type) so nothing allocates during gameplay. All UI built at runtime in code - no prefabs, no TextMeshPro, just legacy Text and a color theme class.
Original Prototype (2025)
The Python + Unity version. SVM classification, OSC messaging, 45ms latency after optimization. This is what the current build replaced.
Prototype demo - Python backend, Unity frontend, OSC between them