A research-oriented implementation combining Transformer-based symbolic planning, diffusion-based audio synthesis, and preference alignment — demonstrating an end-to-end pipeline for generating MIDI and audio outputs with a modular deep learning architecture.
Implementation of the hybrid three-layer AI music generation framework from the paper "AI in Music Generation", exactly as proposed. Click each layer to expand.
src/models/symbolic_planner.pysrc/models/audio_renderer.pysrc/models/alignment.pyFive stages from raw data to generated music. Small configs for single-GPU prototyping; full configs scale to 8×A100.
Runs the project's hybrid prototype pipeline end-to-end — Symbolic Planning → Audio Rendering → DPO Alignment — generating a brand-new audio clip from your prompt, entirely in the browser. No server required.
Four prototype-scale outputs from ./outputs/ demonstrating cross-genre generation. Click to play the real .wav files.
Four curated datasets spanning pop, classical, diverse genres, and evaluation metadata.
Note: The provided outputs represent sample results generated using a prototype implementation. Due to computational constraints, full-scale training was not performed, and the outputs are included to demonstrate the system pipeline and generation capability.
--skip_neural_render for FluidSynth fallback.Two commands from clone to generated music. First run ~5–10 minutes for environment setup.