Latent Phrase-Aware Generative Modeling for Expressive Symbolic Audio Synthesis
- Authors
-
-
Apeksha Bhuekar
Author
-
- Keywords:
- Generative AI, Symbolic Music Synthesis, Compact Tokenization, Phrase-Aware Latent Alignment, Sequence-Level Regularization, Controllable Generation
- Abstract
-
Constructing expressive symbolic music is a hard task. A good generator must take into account long-range musical structure and fine-grained performance features at the same time. Traditional sequence-based methods typically focus on pitch and timing information while providing limited support for expressive techniques such as bends, slides, vibrato and dynamic articulation. In this paper, we propose a novel generative framework that employs a compact tokenization scheme and phrase-aware latent alignment mechanism to enhance the quality and controllability of symbolic audio synthesis. The tokenization scheme efficiently represents both basic musical events and expressive performance attributes with a limited vocabulary, resulting in substantial computational savings without semantic loss. The phrase-level latent representations are injected into the transformer attention through a KL-divergence-based bias, such that variable-length musical phrases' structural dependencies can be learned. By applying sequence regularization and a repetition-aware loss, a multi-objective optimization framework enhances generation quality by minimizing redundant expressive patterns. Through experimental evaluation on a guitar tablature dataset, we show that our model surpasses established transformer-based baselines on a number of aspects: perplexity, diversity, speed, and expressiveness. These findings prove the proposed framework’s efficiency in generating coherence, expressiveness and computational efficiency in symbolic music.
- References
- Downloads
- Published
- 2026-06-29
- Issue
- Vol. 1 No. 3 (2026)
- Section
- Articles
- License
-
Copyright (c) 2026 International Journal of Intelligent Systems and Data Science

This work is licensed under a Creative Commons Attribution 4.0 International License.
