This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Last update: Jan 10, 2022

Related tags

Web Crawling Website-Crawler-Python-

Overview

Website-Crawler-Python

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address. After getting the website address, it asks for how much crawling depth the user wants in between the number of links has been found after providing the website address.

Website Crawler takes 3 inputs:

A website address
Integer value for the crawling depth
A user specified regular expression to find user specific data

General tasks:

Find all the Nowgegian mobile numbers and saves into a text file.
Find all the sub-links inside the given website and saves into a text file.
Saves the website's raw HTML code into a text file.
Find all email addresses and save into a text file.
Find all the comments used in the website and saves it into a text file.
Find five most used words and print it into the terminal.

This is a Python based project and used some dependent libraries to execute the functionalities.

RegEx
Urllib3
BeautifulSoup 4
Counter in Collections

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

Owner

Faisal Ahmed

A Python library for automating interaction with websites.

Snowflake database loading utility with Scrapy integration

Scrape puzzle scrambles from csTimer.net

mlscraper: Scrape data from HTML pages automatically with Machine Learning

Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

Web Scraping images using Selenium and Python

Searching info from Google using Python Scrapy

VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

A Python module to bypass Cloudflare's anti-bot page.

TikTok Username Swapper/Claimer/etc

Transistor, a Python web scraping framework for intelligent use cases.

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

This is a webscraper for a specific website

UdemyBot - A Simple Udemy Free Courses Scrapper

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

This is a module that I had created along with my friend. It's a basic web scraping module

京东茅台抢购最新优化版本，京东秒杀，添加误差时间调整，优化了茅台抢购进程队列

Google Developer Profile Badge Scraper