Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Last update: Nov 05, 2021

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This repository provides two web crawlers to label domain names using the McAfee API (https://www.trustedsource.org/sources/index.pl) and IP reputation using the TALOS API (https://talosintelligence.com/), respectively.

Requirements

BeautifulSoup

Usage

Descriptions of the demonstration code are as follows.

To label the categories of a set of domains, put the domain list in 'data/domain_list.txt' and run 'demo_domain_label.py'. The program will label the (1) category (e.g., Malicious Sites- Parked Domain) as well as (2) risk level (e.g., High Risk) of each domain (using the McAfee API) and save the results in 'res/domain_labels.txt'. When the program continuously outputs ''-Retry-'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the domains already labeled and continue to label the rest domains.
To label the reputation of a set of IP addresses, put the IP list in 'data/IP_list.txt' and run 'demo_IP_label.py'. The program will label the (1) email reputation as well as (2) web reputation (with 3 levels of Poor, Neutral, and Good) and save the results in 'res/IP_labels.txt'. When the program continuously outputs ''None'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the IPs already labeled and continue to label the rest IPs.
An example domain name list (with 21,820 effective second-level domains) and an example IP list (with 67,751 IP addresses) are given in 'data/examples/example_domain_list.txt' and 'data/examples/example_IP_list.txt', repsectively. The corresponding labeled results are saved in 'res/examples/example_domain_labels.txt' and 'res/examples/example_IP_labels.txt', respectively.

If you have questions regarding this repository, you can contact the author via [[email protected]].

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

A database scraper created with mechanical soup and sqlite

Kusonime scraper using python3

A simple python web scraper.

Python script for crawling ResearchGate.net papers✨⭐️📎

Collection of code files to scrap different kinds of websites.

Examine.com supplement research scraper!

Scraping news from Ucsal portal with Scrapy.

This program will help you to properly scrape all data from a specific website

Crawler in Python 3.7, 3.8. 3.9. Pypy3

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Screenhook is a script that captures an image of a web page and send it to a discord webhook.

UdemyBot - A Simple Udemy Free Courses Scrapper

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

Open Crawl Vietnamese Text

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

This is a webscraper for a specific website

Linkedin webscraping - Linkedin web scraping with python

Searching info from Google using Python Scrapy

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.