Crawler in Python 3.7, 3.8. 3.9. Pypy3

Last update: Mar 12, 2022

Overview

Description

Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8)

Installation and Use

Setup VirtualEnv

which python3 this will output the path of your python3
#now setup a python3 virtualenv
mkvirtualenv crawl3 -p $(which python3)

workon crawler
python main.py -d5 http://gotchacode.com // -d5 means crawl to the depth of 5.

Results:

And the output is:

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 29200.11it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 22563.50it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 21375.28it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 22227.37it/s]
CRAWLER STARTED:
https://vinitkumar.me, will crawl upto depth 2
https://vinitkumar.me/
http://changer.nl
https://twitter.com/vinitkme
https://vinitkumar.me/about
https://vinitkumar.github.io/vinit_kumar.pdf
https://vinitkumar.me/values
https://github.com/vinitkumar
https://vinitkumar.me/2013-03-24-life-has-changed/
https://vinitkumar.me/2013-03-24-my-javascript-love/
https://vinitkumar.me/2013-03-27-twitter-like-app-in-nodejs/
http://twitter.com/vinitkme
https://vinitkumar.me/2013-04-07-first-flight-and-vacation-after-months/
====================================================================================================
Crawler Statistics
====================================================================================================
No of links Found: 12
No of followed:     3
Found all links after 0.54s

Issues

Create an issue here if you encounter a bug: create-issue

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

3 Oct 4, 2022

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

46 Dec 16, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具，可以快速批量下载大量论文，方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文，目前抓取成功率维持在90%以上。通过配置Config文件，可以抓取任意计算机领域相关会议的论文。 Installation Down

47 Nov 23, 2022

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

1 Dec 20, 2021

Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

2 Feb 17, 2022

Comments

Following things are done in this PR:
Code is modified to use async and await and use coroutines to run in parallel. It being a crawler makes sense to use async.

following steps were taken:

All the print statements are not replace with loggers.

Some methods are furthered refactored to enhance readability.

Version bumped.

The code is refactored that in case of error it fails early and fails fast.
opened by vinitkumar 0

Releases(v1.0.0)

v1.0.0(Apr 11, 2015)

This new release ports the pycrawler to have python3 support. Enjoy!
Source code(tar.gz)
Source code(zip)

Crawler in Python 3.7, 3.8. 3.9. Pypy3

Related tags

Overview

Description

Installation and Use

Setup VirtualEnv

Results:

Issues

You might also like...

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

A Pixiv web crawler module

Google Maps crawler using Selenium

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

A dead simple crawler to get books information from Douban.

A dead simple crawler to get books information from Douban.

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

Create crawler get some new products with maximum discount in banimode website

Comments

Following things are done in this PR:

Releases(v1.0.0)

v1.0.0(Apr 11, 2015)

Owner

Vinit Kumar

CreamySoup - a helper script for automated SourceMod plugin updates management.

Async Python 3.6+ web scraping micro-framework based on asyncio

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques

京东茅台抢购

API to parse tibia.com content into python objects.

Screen scraping and web crawling framework

Scrapping Connections' info on Linkedin

a way to scrape a database of all of the isef projects

Find thumbnails and original images from URL or HTML file.

This is python to scrape overview and reviews of companies from Glassdoor.

京东茅台抢购最新优化版本，京东茅台秒杀，优化了茅台抢购进程队列

A pure-python HTML screen-scraping library

Visual scraping for Scrapy

A python module to parse the Open Graph Protocol

Pro Football Reference Game Data Webscraper

Scrape all the media from an OnlyFans account - Updated regularly

Google Scholar Web Scraping