The Neural Malware Detector is a machine learning-powered application designed to detect malicious files by analyzing their string features. The project leverages a trained neural network model to classify files as either benign or malicious. This Malware Detector, trained on samples from VirusTotal, can classify .exe files as either benign or malicious using string features. To improve accuracy and reduce false positives, I’ve implemented a dual-model approach, leveraging both Random Forest and a Recurrent Neural Network (RNN) built with TensorFlow.
- File Upload Interface: Users can upload files through a web interface.
- Real-time Malware Detection: The application processes the uploaded file and predicts whether it's malicious or benign.
- Responsive Web Design: The UI is designed to be responsive and user-friendly, with a dark, malicious theme.
- Python 3.x
- Flask
- pip (Python package manager)
- Google Cloud SDK (for GCP deployment)
- Azure CLI (for Azure deployment)
-
Clone the repository:
git clone https://github.com/ashfaaq98/neural-malware-detector.git cd neural-malware-detector -
Create a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.envfile in the project root and set your Flask app secret key and other environment variables as needed.
- Create a
-
Run the application locally:
python run.py
-
Access the application:
- Open your browser and go to
http://127.0.0.1:5000.
- Open your browser and go to
-
Upload a file:
- Use the provided interface to upload a file for malware detection.
- View the result on the results page.
The Neural Malware Detector utilizes two machine learning models to classify files as malicious or benign. The models was trained using Google Colab with the following steps:
-
Data Preparation:
- Collect and preprocess the dataset of malware and benign files.
-
Feature Extraction:
- Extract string-based features from each file.
-
Model Training:
- Train a neural network model on the extracted features using Tensorflow
-
Model Evaluation:
- Evaluate the model's performance using appropriate metrics (e.g., accuracy, precision, recall).
-
Save the Model:
- Save the trained model as `` for later use in the Flask application.
-
Training: The Random Forest model was trained using string features extracted from a dataset of malware and benign files. The model was tuned for optimal performance and evaluated using cross-validation.
-
ROC Curve: Below is the ROC curve for the Random Forest model, showing its ability to distinguish between malicious and benign files.
-
Training: A deep learning model was built using Keras, with multiple layers to capture complex patterns in the data. The model was trained using the same dataset as the Random Forest model, with additional tuning for network architecture.
-
ROC Curve: Below is the ROC curve for the Keras model, illustrating its performance in classification tasks.
This project is licensed under the MIT License.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
For any questions or feedback, please contact [email protected]




