Transientica

Beatbox into a mic, play a rhythm game. No MIDI controller, no drum pad - just your mouth.

Role Solo Developer
Status In Development
Engine Unity 6 + C#
Year 2025 - present
Unity 6 C# KNN FFT URP

Gameplay demo - beatboxing against beatmapped songs

From Prototype to Game

This started as a school research project in Python. Beatbox into a mic, classify the sound with SVM, send results to Unity over OSC. It worked, barely. Two processes running, 180ms latency at first, got it down to 45ms after weeks of optimization. But it was still a prototype, not a game.

So I rebuilt the Unity version from scratch. Threw out Python, OSC, and scikit-learn. Wrote the FFT, feature extraction, and classifier in plain C#. No external ML frameworks. Switched from SVM to KNN because it trains fast enough to retrain between rounds without the player waiting.

Now it's an actual rhythm game. Notes scroll toward you, you beatbox the right sound at the right time, the game judges your timing. Combos, scoring, grades. The classifier trains on your voice specifically - 10 samples per sound, takes a few seconds, and it learns how you do a kick vs how I do a kick.

How It Works

The Audio Pipeline

Mic captures at 44.1 kHz into a lock-free ring buffer. When the onset detector picks up a hit (RMS threshold + derivative gate, adaptive noise floor from the first 40 buffers), it runs a 512-sample FFT and pulls out 7 spectral features. KNN classifies the sound. Onsets get backdated by ~5.8ms to account for the analysis window. All timing runs on AudioSettings.dspTime, not frame time. You need sub-frame precision for this to feel right.

Training

You record 10+ examples of each sound - kick, snare, hi-hat. The game runs onset detection and feature extraction on each one automatically. KNN stores the samples, computes normalization bounds, and does leave-one-out cross-validation to check if the model is any good. If accuracy is above 70%, you're set. If not, record cleaner samples. The model saves to a JSON file and loads next time you open the game. If there's no trained model at all, a heuristic fallback uses spectral rules so the game is still playable.

Scoring

Each hit gets matched to the nearest note. ±60ms is Perfect (300 pts), ±130ms is Great (200), ±200ms is Good (100), anything else is a Miss. Combo multiplier goes from 1x to 8x as you chain hits. Letter grades from F to S. There's a latency calibration step: the game plays clicks at 100 BPM, you beatbox on beat, it measures the difference and stores your offset. You can also fine-tune it manually by ±1ms.

What Changed From the Prototype

Killed the Python Backend

Why

The original had Python doing audio analysis and Unity running the game - two separate processes talking over OSC on localhost. It worked but it was fragile. If Python crashed, the game went deaf. Debugging across two runtimes was painful. Building a standalone .exe meant shipping a Python install alongside Unity.

Now

The rebuild is C# inside Unity. The FFT, feature extraction, KNN - all written from scratch. No Python, no OSC, no external ML packages. One process, one build, one thing to debug.

SVM → KNN

Why

SVM was great for accuracy in the prototype (92% with scikit-learn). But retraining took seconds, needed a Python call, and the model format didn't play well with Unity serialization. I needed something fast enough to retrain between rounds and simple to implement in C#.

Now

KNN with K=5 and inverse-distance weighting. Training mostly stores the samples, so it runs fast enough to retrain between rounds. Cross-validation runs in milliseconds. The trained model is a JSON file with feature vectors and normalization bounds. Easy to save, easy to load, easy to debug.

Made It an Actual Game

Before

The prototype was a tech demo. You could beatbox and see it classified. There was a basic rhythm element but no real game loop - no scoring, no progression, no reason to keep playing.

Now

Beatmapped songs, scrolling notes with approach animations, combo system, letter grades, high score tracking. Full game flow: menu, training, song select, gameplay, results. Object pooling on notes (20 per type) so note spawning does not allocate during gameplay. All UI built at runtime in code - no prefabs, no TextMeshPro, just legacy Text and a color theme class.

Original Prototype (2025)

The Python + Unity version. SVM classification, OSC messaging, 45ms latency after optimization. This is what the current build replaced.

Prototype demo - Python backend, Unity frontend, OSC between them

Audio notes

Original Prototype (2025). The Python + Unity version. SVM classification, OSC messaging, 45ms latency after optimization. This is what the current build replaced.