ResampleCraft: Modular Pipelines for Robust Learning on Skewed Data
- Authors
-
-
Monisha Rengaraj
Author
-
- Keywords:
- Imbalanced Classification, Data Resampling Techniques, Synthetic Data Generation, Ensemble Learning, Pipeline-based ML Frameworks
- Abstract
-
Modern classifiers often fail when target classes are heavily skewed, so this paper presents a modular toolkit that rebalances training data through complementary strategies and plugs directly into standard machine-learning workflows. The system offers four families of operations, reducing dominant examples, synthesizing informative minority instances, hybrid cleanup of boundary noise, and ensemble construction over balanced subsets, exposed via a consistent, pipeline-friendly API for seamless composition with preprocessing and estimators. Design choices emphasize reproducibility and engineering quality, including comprehensive tests, documentation, and convention-aligned interfaces that mirror common ML patterns to minimize integration overhead. Usage examples illustrate end-to-end application on imbalanced classification, along with metrics beyond accuracy to reflect costs under skew, enabling practitioners to choose strategies that best align with model behavior and data geometry. By unifying resampling, cleaning, and ensembling under a single interface, the toolkit turns imbalance handling into a first-class step of the ML pipeline rather than an ad-hoc afterthought.
- References
- Downloads
- Published
- 2026-04-25
- Issue
- Vol. 1 No. 1 (2026)
- Section
- Articles
- License
-
Copyright (c) 2026 International Journal of Intelligent Systems and Data Science

This work is licensed under a Creative Commons Attribution 4.0 International License.
