Skip to content

LSTM based dual-layered model to classify phishing sites through HTML content and TLS certificate analysis

License

Notifications You must be signed in to change notification settings

arulthileeban/PhishCCA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PhishCCA - Classifying phishing sites through content and certificate analysis

Phishing attacks are one of the most commonly used attack vectors in the last decade. Although various counter-measures have been proposed, many of them have huge flaws, particularly using machine learning models which always falls short. Few such particular flaws are lack of usage of appropriate data for the model and the high false positive rate due to the similarity in structure and content between the target site and the phishing site.

In this paper, some of these flaws are tackled through the proposed system - PhishCCA , a two layered classification system which initially classifies the site into the target site(Eg.:Paypal) and then further classifies if it's a phishing site or benign. This is achieved by developing a HTML content based classifier which classifies HTML pages into target sites and a TLS certificate based classifier which further classifies the website as a phishing site or benign. Although the integrated system hasn't been completely built, the HTML classifier achieves an accuracy of 77% and the TLS classifer achieves an accuracy of 98% which showcases the promise in this technique.

Model

Files

  • Certificate Classifier
    • Data
    • Cert_LSTM.ipynb - LSTM model for certificate based classification
    • Cert_RF.ipynb - Random forest model for certificate based classification
  • Content Classifer
    • Dataset
    • HTML_Classification.ipynb - LSTM model for HTML content based classifcation

Results

Model Accuracy Precision Recall
Cert-Random Forest 0.98 0.98 0.998
Cert-LSTM 0.76 0.78 0.72
Content-LSTM 0.77 0.81 0.72

About

LSTM based dual-layered model to classify phishing sites through HTML content and TLS certificate analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages