logo

Latent Phrase-Aware Generative Modeling for Expressive Symbolic Audio Synthesis

Authors
  • Apeksha Bhuekar

    Author

Keywords:
Generative AI, Symbolic Music Synthesis, Compact Tokenization, Phrase-Aware Latent Alignment, Sequence-Level Regularization, Controllable Generation
Abstract

Constructing expressive symbolic music is a hard task. A good generator must take into account long-range musical structure and fine-grained performance features at the same time. Traditional sequence-based methods typically focus on pitch and timing information while providing limited support for expressive techniques such as bends, slides, vibrato and dynamic articulation. In this paper, we propose a novel generative framework that employs a compact tokenization scheme and phrase-aware latent alignment mechanism to enhance the quality and controllability of symbolic audio synthesis. The tokenization scheme efficiently represents both basic musical events and expressive performance attributes with a limited vocabulary, resulting in substantial computational savings without semantic loss. The phrase-level latent representations are injected into the transformer attention through a KL-divergence-based bias, such that variable-length musical phrases' structural dependencies can be learned. By applying sequence regularization and a repetition-aware loss, a multi-objective optimization framework enhances generation quality by minimizing redundant expressive patterns. Through experimental evaluation on a guitar tablature dataset, we show that our model surpasses established transformer-based baselines on a number of aspects: perplexity, diversity, speed, and expressiveness. These findings prove the proposed framework’s efficiency in generating coherence, expressiveness and computational efficiency in symbolic music.

References
Cover Image
cover image
Downloads
Published
2026-06-29
Section
Articles
License

Copyright (c) 2026 International Journal of Intelligent Systems and Data Science

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.