a high-performance, lightweight and human friendly serving engine for scrapy

Last update: Mar 01, 2022

Related tags

Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

cd to TestCrawler
run scrapy x
that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Overview

scrapy-x (X)

Installation

Usage

Default Settings

Available Endpoints

Technologies Used

Author

My Similar Projects

Owner

Speakol Ads

Web3 Pancakeswap Sniper bot written in python3

A scalable frontier for web crawlers

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Scrap-mtg-top-8 - A top 8 mtg scraper using python

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

Web scraper for Zillow

A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

Goblyn is a Python tool focused to enumeration and capture of website files metadata.

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Scrapes the Sun Life of Canada Philippines web site for historical prices of their investment funds and then saves them as CSV files.

Searching info from Google using Python Scrapy

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Scrapy-soccer-games - Scraping information about soccer games from a few websites

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

HappyScrapper - Google news web scrapper with python

UdemyBot - A Simple Udemy Free Courses Scrapper

哔哩哔哩爬取器：以个人为中心

Scraping and visualising India's real-time COVID-19 data from the MOHFW dataset.