GitHub - Nilesh1989/Gopher: A repository with source code for the detection of malicious domain.

A Machine Learning based malicious domain detection

Overview

A domain can be used for malicious purposes like

Malware, Virus or Trojan delivery
Phishing
Spam mails
Malicious Ad Campaigns (Malvertising)
Command and Control (C2C)
DGA (Domain Generation Algorithms)
Data Exfiltration etc.

So our idea was to develop an open source code to detect malicious domains using machine learning. We are using Scikit-learn, a free machine learning library for the python programming language.

Note here that we are detecting malicious domain not malicious URL, because we are focusing to prevent victims from attackers. The reason is 90% attacks are performed using domain only, so if we detect malicious domain rather than malicious domain than actually we are stopping 90% attacks.

Problems

There are many repositories are available to detect malicious url, phishing domains, DGA in github. But the problem we have seen is, for different attacks we have different solutions.
Even though attacks have same behaviours in most of the attacks, we have different solutions.
The repositories are not updated up to the mark.

So we have decided to consolidate these behaviours into single problem and develop a prediction model for the detection of malicious domains. Thus we don't have to rely on different solutions and maintaining different models.

Dependencies

requirements.txt file contains actual dependencies to run this project. Install it using pip install requirements.txt command.

Quick Start

To-Do

Feature Least

URL length
Host length
Number of dots
Host ranking in city
Host ranking in country
URL average token length
Host average token length
Path average token length
URL token count (Considering words as a token)
Host token count
Path token count
URL largest token length
Host largest token length
Path largest token length
IP address presence
ASN number
Safe browsing
Domain age
Number of subdomains
Is IDN (International Domain Name)

To-Do

Will add more machine learning models
Will add Is domain from dynamic DNS as a feature
Will add shortened URL as a feature
Will add number of special characters (- and _) as a feature
Will add website contents as a feature

Results

 Testing Accuracy :: 94.67%
 Confusion Matrix :: [102, 4]
                     [5, 58]

Contributing

Feel free to fork and submit pull requests in development.

Refrences

Research paper by Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi
Source code on github by @vaseem-khan
Phishing Domain Detection with Machine Learning

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
datasets		datasets
feature_set		feature_set
ml_models		ml_models
src		src
tree		tree
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Machine Learning based malicious domain detection

Overview

Problems

Dependencies

Quick Start

Feature Least

To-Do

Results

Contributing

Refrences

About

Releases

Packages

Languages

License

Nilesh1989/Gopher

Folders and files

Latest commit

History

Repository files navigation

A Machine Learning based malicious domain detection

Overview

Problems

Dependencies

Quick Start

Feature Least

To-Do

Results

Contributing

Refrences

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages