Latent Phrase-Aware Generative Modeling for Expressive Symbolic Audio Synthesis

Apeksha Bhuekar

doi:10.67231/8pgacf28

Latent Phrase-Aware Generative Modeling for Expressive Symbolic Audio Synthesis

Authors

Apeksha Bhuekar

Author

DOI:

https://doi.org/10.67231/8pgacf28

Keywords:

Generative AI, Symbolic Music Synthesis, Compact Tokenization, Phrase-Aware Latent Alignment, Sequence-Level Regularization, Controllable Generation

Abstract

Constructing expressive symbolic music is a hard task. A good generator must take into account long-range musical structure and fine-grained performance features at the same time. Traditional sequence-based methods typically focus on pitch and timing information while providing limited support for expressive techniques such as bends, slides, vibrato and dynamic articulation. In this paper, we propose a novel generative framework that employs a compact tokenization scheme and phrase-aware latent alignment mechanism to enhance the quality and controllability of symbolic audio synthesis. The tokenization scheme efficiently represents both basic musical events and expressive performance attributes with a limited vocabulary, resulting in substantial computational savings without semantic loss. The phrase-level latent representations are injected into the transformer attention through a KL-divergence-based bias, such that variable-length musical phrases' structural dependencies can be learned. By applying sequence regularization and a repetition-aware loss, a multi-objective optimization framework enhances generation quality by minimizing redundant expressive patterns. Through experimental evaluation on a guitar tablature dataset, we show that our model surpasses established transformer-based baselines on a number of aspects: perplexity, diversity, speed, and expressiveness. These findings prove the proposed framework’s efficiency in generating coherence, expressiveness and computational efficiency in symbolic music.

References

Cover Image

Downloads

PDF

Published

2026-06-29

Issue

Vol. 1 No. 3 (2026)

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.