🎮 Research Project

Transientica: AudioLab

Beatbox-controlled rhythm game using real-time ML classification, spectral flux analysis, and low-latency OSC control. Replace button presses with vocal percussion.

🎤 Real-time ML Classification

⚡ 45ms Median Latency

🐍 Python + Unity

🎵 Beatbox Recognition

📊 Spectral Flux Analysis

🎯 82% Hit Accuracy

Project Overview

Revolutionary Vocal Percussion Interface

Transientica replaces traditional button presses with beatboxing in rhythm games. Using real-time audio analysis, machine learning classification, and OSC networking, the system achieves sub-60ms latency while maintaining 82% hit accuracy for vocal percussion gameplay.

This research explores whether beatboxing alone can satisfy the tight timing requirements of competitive rhythm games, combining spectral-flux onset detection with user-trained RBF-SVM classification for kick, snare, and hi-hat sounds. The system uses a Python backend for audio processing and a Unity frontend for gameplay, connected via OSC messaging.

Technology Stack

Technical Implementation

🎤

Audio Processing

44.1 kHz sampling with 2048-sample blocks. Spectral flux onset detection and MFCC feature extraction producing 39-dimensional vectors for real-time classification.

🧠

Machine Learning

User-trained RBF-SVM classifier (C=10, γ=0.01) for beatbox sound recognition. Achieves 0.16ms inference time with 80%+ accuracy on kick, snare, and hi-hat sounds.

🔗

OSC Communication

Low-latency UDP messaging between Python backend and Unity frontend. Timestamped events enable precise latency measurement and synchronization.

🎮

Game Engine

Unity-based rhythm game with JSON beatmaps, hit detection windows (±70ms), and real-time feedback systems for responsive vocal percussion gameplay.

Research Impact

Key Contributions

Low-latency Vocal Interface

Achieved sub-60ms end-to-end latency suitable for competitive rhythm gaming, demonstrating that vocal percussion can match traditional button-based input methods.

Real-time Beatbox Classification

User-trained machine learning models with 80%+ accuracy on kick, snare, and hi-hat sounds. Personalized training adapts to individual vocal techniques and styles.

OSC-based Architecture

Modular system design enabling integration with DAWs, lighting systems, and haptic feedback devices. Opens possibilities for expanded creative applications beyond gaming.

Personalized Training Approach

Local ML training avoids privacy concerns and cloud latency issues. Users can train models on their own vocal characteristics for optimal recognition performance.