A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Last update: Dec 13, 2021

Overview

combined-shop-scraper

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items.

Features

Define an input file components.json with components to be scraped and the source urls
Find the cheapest order combination including the shipping prices
Get alarm prices when single components are below a defined price
Easily expand for new shops (scraping basic know-how required). Default basic support for notebooksbilliger, cyberport and future-x

Usage

JSON file definition

The default name of the input JSON file is components.json and must be located in the same folder as scraper.py. This is the basic structure of the file:

{
  "component1": {
    "alarm_price": 260,
    "quantity": 1,
    "urls": [
      "https://www.someshop.com/component1",
      "https://www.someshop.com/component1-alternative",
      "https://www.anothershop.com/component1-alternative"]
  },
  "component2": {
    "urls": [
      "https://www.someshop.com/component2",
      "https://www.anothershop.com/component2",
      "https://www.onemoreshop.com/component2"]
  }

The component name and at least one url are mandatory. It is possible to add several urls from the same shop for the same component if there are some alternatives for this. The quantity of each component defaults to 1, the alarm price is optional.

Execution

Just call the script scraper.py from within the folder, so the components.json file can be found. It will print an overview of the ideal order to minimize the overall cost. The program runs just once and does not keep tracking prices in the background. As usual with scraping, be gentle and fair and don't abuse this program.

Addition of new shops

If you want to add a new shop, you need to edit the file shops.py and:

Enter the significant part of the shop url in the method Shop._get_shops_dict and define a new class type (child of Shop)
Implement the methods _process_soup and get_shipping_cost for the new class. Use the existing classes as reference for the data you need to scrap.
Add your new urls to the input file!

License

See the LICENSE for license details.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Related tags

Overview

combined-shop-scraper

Features

Usage

JSON file definition

Execution

Addition of new shops

License

Owner

Pseudo API for Google Trends

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Scrapy-soccer-games - Scraping information about soccer games from a few websites

Python Web Scrapper Project

Crawl the information of a given keyword on Google search engine

TikTok Username Swapper/Claimer/etc

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Screenhook is a script that captures an image of a web page and send it to a discord webhook.

Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

Iptvcrawl - A scrapy project for crawl IPTV playlist

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

京东云无线宝积分推送，支持查看多设备积分使用情况

Web Content Retrieval for Humans™

tweet random sand cat pictures

OSTA web scraper, for checking the status of school buses in Ottawa

A Spider for BiliBili comments with a simple API server.