This project explores a network traffic dataset (UNSW-NB15) with the goal of identifying and characterising malicious activity. Using R, the analysis follows the classical analytics pipeline:
- Descriptive analysis to summarise traffic properties such as protocol types, flow duration, byte volumes and packet counts and to visualise differences between normal and attack traffic.
- Diagnostic testing including chi-square and non‑parametric tests to determine whether observed patterns (e.g. protocol versus duration, byte ratios, packet counts, connection states) are statistically associated with attacks.
- Predictive modelling with logistic regression and decision trees to estimate the probability that a given flow is an attack based on selected features, evaluated using ROC curves, AUC and confusion matrices.
- Prescriptive rules derived from the predictive results and thresholding (e.g. byte or packet volume thresholds, probability cut‑offs) to flag suspicious flows and guide security responses.
Additional features and risk matrices are developed to support understanding and decision‑making. The overall aim is to demonstrate how data science techniques can be applied to network security, transforming raw packet‑level observations into actionable insights without referencing specific group members.