Evaluating Data Balancing Techniques and Feature Selection for Improving Classification Accuracy in URL Access

Santosh Kumar

doi:10.67231/ffzs6x80

Evaluating Data Balancing Techniques and Feature Selection for Improving Classification Accuracy in URL Access

Authors

Santosh Kumar

Author

DOI:

https://doi.org/10.67231/ffzs6x80

Keywords:

URL access classification, Enterprise security, Rough Set Theory (RST), Malicious URL detection, REP Tree, classification accuracy

Abstract

This study examines the impact of data balancing techniques and feature selection on the classification accuracy of URL access requests in corporate environments. Undersampling and oversampling were evaluated using J48, Random Forest, and REP Tree classifiers, with oversampling achieving better performance by preserving important data patterns. Random Forest and REP Tree showed the highest classification accuracy, while redundant patterns were removed to streamline the dataset without significantly affecting performance. The study also applied Rough Set Theory (RST) for feature selection, reducing attributes from 12 to 9, which decreased computational time while maintaining or slightly improving accuracy. Robustness testing, conducted by excluding URL and IP address features, reduced accuracy but still produced practical rules for identifying malicious or unauthorized access requests. The proposed framework supports dynamic, context-aware URL access classification beyond static whitelist/blacklist methods, improving efficiency and reliability in enterprise security systems. Future work will focus on real-world validation and advanced classification and feature extraction techniques.

References

Cover Image

Downloads

PDF

Published

2026-06-01

Issue

Vol. 1 No. 2 (2026)

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.