a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Web Crawlingscrapy-x
Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

  • cd to TestCrawler
  • run scrapy x
  • that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

Owner
Speakol Ads
Speakol Ads
Web3 Pancakeswap Sniper bot written in python3

Pancakeswap_BSC_Sniper_Bot Web3 Pancakeswap Sniper bot written in python3, Please note the license conditions! The first Binance Smart Chain sniper bo

Treading-Tigers 295 Dec 31, 2022
A scalable frontier for web crawlers

Frontera Overview Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large sc

Scrapinghub 1.2k Jan 02, 2023
Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b

Fernando 6 Dec 06, 2022
Scrap-mtg-top-8 - A top 8 mtg scraper using python

Scrap-mtg-top-8 - A top 8 mtg scraper using python

1 Jan 24, 2022
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se

Podalirius 13 Dec 27, 2022
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
Web scraper for Zillow

Zillow-Scraper Instructions All terminal commands are highlighted. Make sure you first have python 3 installed. You can check this by running "python

Ali Rastegar 1 Nov 23, 2021
A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

Amazon-Web-Scarper Created a web scraper using simple functions to check price of a product on amazon (can be duplicated to check price at other marke

Swaroop Todankar 1 Jan 17, 2022
Goblyn is a Python tool focused to enumeration and capture of website files metadata.

Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se

Gustavo 46 Nov 22, 2022
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

Aditya Gupta 15 May 17, 2022
Scrapes the Sun Life of Canada Philippines web site for historical prices of their investment funds and then saves them as CSV files.

slocpi-scraper Sun Life of Canada Philippines Inc Investment Funds Scraper Install dependencies pip install -r requirements.txt Usage General format:

Daryl Yu 2 Jan 07, 2022
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

HONGVVENG 1 Jan 06, 2022
Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

SpaceX Sofware I developed software to scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info to use the software you need Python a

Maxence Rémy 16 Aug 02, 2022
Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers.

Louie Cai 13 Oct 15, 2022
Scrapy-soccer-games - Scraping information about soccer games from a few websites

scrapy-soccer-games Esse projeto tem por finalidade pegar informação de tabela d

Caio Alves 2 Jul 20, 2022
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

Pythonic Crawling / Scraping Framework Built on Eventlet Features High Speed WebCrawler built on Eventlet. Supports relational databases engines like

Juan Manuel Garcia 173 Dec 05, 2022
HappyScrapper - Google news web scrapper with python

HappyScrapper ~ Google news web scrapper INSTALLATION ♦ Clone the repository ♦ O

Jhon Aguiar 0 Nov 07, 2022
UdemyBot - A Simple Udemy Free Courses Scrapper

UdemyBot - A Simple Udemy Free Courses Scrapper

Gautam Kumar 112 Nov 12, 2022
哔哩哔哩爬取器:以个人为中心

Open Bilibili Crawer 哔哩哔哩是一个信息非常丰富的社交平台,我们基于此构造社交网络。在该网络中,节点包括用户(up主),以及视频、专栏等创作产物;关系包括:用户之间,包括关注关系(following/follower),回复关系(评论区),转发关系(对视频or动态转发);用户对创

Boshen Shi 3 Oct 21, 2021
Scraping and visualising India's real-time COVID-19 data from the MOHFW dataset.

COVID19-WEB-SCRAPER Open Source Tech Lab - Project [SEMESTER IV] OSTL Assignments OSTL Assignments - 1 OSTL Assignments - 2 Project COVID19 India Data

AMEY THAKUR 8 Apr 28, 2022