A FastAPI-powered machine learning project for network security and phishing detection. The repository includes data ingestion from MongoDB, validation, transformation, model training, and a prediction API.
- Data ingestion from MongoDB collection
network_data - Schema-driven data validation and drift checks
- Data transformation with KNN imputation and standard scaling
- Model training with multiple classifiers and selection of the best model
- FastAPI endpoints for training and prediction
- Artifact tracking with MLflow and DagsHub (optional)
- Python 3.10+ / 3.12
- MongoDB connection string available in
.env - Dependencies listed in
requirements.txt
Create a .env file in the repository root with:
MONGO_DB_URL="<your-mongodb-connection-string>"The app also accepts MONGODB_URL_KEY, but MONGO_DB_URL is preferred.
clone "https://github.com/dpk45deepak/networksecurity.git"
py -m pip install -r requirements.txt
py -m pip install python-multipartIf you use a dedicated virtual environment, activate it first.
Start the FastAPI server with:
clone "https://github.com/dpk45deepak/networksecurity.git"
py -m uvicorn app:app --reloadThen visit:
http://127.0.0.1:8000/docsfor the interactive Swagger UIhttp://127.0.0.1:8000/trainto run the training pipeline
Triggers the training pipeline to ingest data, validate it, transform it, and train the model. The model and preprocessor are saved under final_model/.
Accepts a CSV file upload and returns predictions rendered as HTML.
Request field:
file: CSV file containing the input data
If you need to load sample CSV data into MongoDB, use push_data.py:
clone "https://github.com/dpk45deepak/networksecurity.git"
py push_data.pyThis script expects the dataset file at Network_Data/phisingData.csv and writes to MongoDB database networksecurity, collection network_data.
app.py- FastAPI application entrypointmain.py- local training pipeline runnerpush_data.py- MongoDB data upload helpernetworksecurity/- main packagecomponents/- pipeline components for ingestion, validation, transformation, trainingconstant/- reusable constants and schema pathentity/- config and artifact dataclassesexception/- custom exception handlinglogging/- logging setuppipeline/- full training pipeline orchestrationutils/- utilities for serialization and model evaluation
final_model/- saved model and preprocessor artifactsArtifacts/- experiment artifacts and pipeline outputsdata_schema/schema.yaml- expected schema definition
- The prediction endpoint relies on
final_model/preprocessor.pklandfinal_model/model.pkl. - If you see errors about missing multipart support, install
python-multipart. app.pyloads environment variables usingpython-dotenvwhen available.
- Ensure
.envexists and definesMONGO_DB_URL - Ensure MongoDB is reachable from your environment
- Install dependencies from
requirements.txt - If training fails, inspect artifact directories under
Artifacts/
This project is provided as-is for educational and development purposes.