Machine Learning Algorithms for Detecting Malicious Network Traffic

In this comprehensive analysis, I explored the effectiveness of machine learning algorithms in classifying network traffic as malicious or benign. Using the HIKARI-2021 dataset, which combines real-time benign traffic with synthetic attack data, I implemented and compared two primary machine learning algorithms: Logistic Regression and Random Forest.

Key Aspects of the Project:

  • Utilized a dataset with 555,278 records and 88 features

  • Conducted thorough data preprocessing and exploratory data analysis

  • Implemented Logistic Regression and Random Forest models

  • Addressed class imbalance issues through up-sampling techniques

  • Evaluated model performance using accuracy metrics, classification reports, and confusion matrices

Results:

  • Initial models showed high overall accuracy but struggled with detecting malicious traffic

  • After addressing class imbalance:

    • Logistic Regression achieved 55.94% accuracy

    • Random Forest significantly improved, reaching 93.59% accuracy with high precision and recall for both benign and malicious traffic

This project demonstrates the potential of machine learning in cybersecurity applications, particularly in enhancing network security through accurate detection of malicious network activity. It also highlights the importance of addressing data imbalance in real-world datasets to improve model performance.