Skip to content

dpk45deepak/NetworkSecurity

Repository files navigation

NetworkSecurity

A FastAPI-powered machine learning project for network security and phishing detection. The repository includes data ingestion from MongoDB, validation, transformation, model training, and a prediction API.

Features

  • Data ingestion from MongoDB collection network_data
  • Schema-driven data validation and drift checks
  • Data transformation with KNN imputation and standard scaling
  • Model training with multiple classifiers and selection of the best model
  • FastAPI endpoints for training and prediction
  • Artifact tracking with MLflow and DagsHub (optional)

Requirements

  • Python 3.10+ / 3.12
  • MongoDB connection string available in .env
  • Dependencies listed in requirements.txt

Environment

Create a .env file in the repository root with:

MONGO_DB_URL="<your-mongodb-connection-string>"

The app also accepts MONGODB_URL_KEY, but MONGO_DB_URL is preferred.

Installation

clone "https://github.com/dpk45deepak/networksecurity.git"
py -m pip install -r requirements.txt
py -m pip install python-multipart

If you use a dedicated virtual environment, activate it first.

Run the API

Start the FastAPI server with:

clone "https://github.com/dpk45deepak/networksecurity.git"
py -m uvicorn app:app --reload

Then visit:

  • http://127.0.0.1:8000/docs for the interactive Swagger UI
  • http://127.0.0.1:8000/train to run the training pipeline

API Endpoints

/train (GET)

Triggers the training pipeline to ingest data, validate it, transform it, and train the model. The model and preprocessor are saved under final_model/.

/predict (POST)

Accepts a CSV file upload and returns predictions rendered as HTML.

Request field:

  • file: CSV file containing the input data

Local Data Ingestion

If you need to load sample CSV data into MongoDB, use push_data.py:

clone "https://github.com/dpk45deepak/networksecurity.git"
py push_data.py

This script expects the dataset file at Network_Data/phisingData.csv and writes to MongoDB database networksecurity, collection network_data.

Project Structure

  • app.py - FastAPI application entrypoint
  • main.py - local training pipeline runner
  • push_data.py - MongoDB data upload helper
  • networksecurity/ - main package
    • components/ - pipeline components for ingestion, validation, transformation, training
    • constant/ - reusable constants and schema path
    • entity/ - config and artifact dataclasses
    • exception/ - custom exception handling
    • logging/ - logging setup
    • pipeline/ - full training pipeline orchestration
    • utils/ - utilities for serialization and model evaluation
  • final_model/ - saved model and preprocessor artifacts
  • Artifacts/ - experiment artifacts and pipeline outputs
  • data_schema/schema.yaml - expected schema definition

Notes

  • The prediction endpoint relies on final_model/preprocessor.pkl and final_model/model.pkl.
  • If you see errors about missing multipart support, install python-multipart.
  • app.py loads environment variables using python-dotenv when available.

Troubleshooting

  • Ensure .env exists and defines MONGO_DB_URL
  • Ensure MongoDB is reachable from your environment
  • Install dependencies from requirements.txt
  • If training fails, inspect artifact directories under Artifacts/

License

This project is provided as-is for educational and development purposes.

About

Production-grade Network Security Machine Learning project implementing data ingestion, validation, transformation, model training, evaluation, MongoDB integration, FastAPI, Dockerised, and MLflow-based experiment tracking.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors