logo

Evaluating Data Balancing Techniques and Feature Selection for Improving Classification Accuracy in URL Access

Authors
  • Santosh Kumar

    Author

Keywords:
URL access classification, Enterprise security, Rough Set Theory (RST), Malicious URL detection, REP Tree, classification accuracy
Abstract

This study examines the impact of data balancing techniques and feature selection on the classification accuracy of URL access requests in corporate environments. Undersampling and oversampling were evaluated using J48, Random Forest, and REP Tree classifiers, with oversampling achieving better performance by preserving important data patterns. Random Forest and REP Tree showed the highest classification accuracy, while redundant patterns were removed to streamline the dataset without significantly affecting performance. The study also applied Rough Set Theory (RST) for feature selection, reducing attributes from 12 to 9, which decreased computational time while maintaining or slightly improving accuracy. Robustness testing, conducted by excluding URL and IP address features, reduced accuracy but still produced practical rules for identifying malicious or unauthorized access requests. The proposed framework supports dynamic, context-aware URL access classification beyond static whitelist/blacklist methods, improving efficiency and reliability in enterprise security systems. Future work will focus on real-world validation and advanced classification and feature extraction techniques.

References
Cover Image
Volume 1 Issue 2
Downloads
Published
2026-06-01
Section
Articles
License

Copyright (c) 2026 International Journal of Intelligent Systems and Data Science

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.