A web scraper for nomadlist.com, made to avoid website restrictions.

Last update: Nov 24, 2022

Related tags

Overview

Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
🏙️  city: Lisbon
🌎 country: Portugal
⭐️ overall: 4/5
💵 cost: 4/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

...

#440
🏙️  city: Zurich
🌎 country: Switzerland
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#441
🏙️  city: Leiden
🌎 country: Netherlands
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#442
🏙️  city: Honolulu, Hawaii
🌎 country: United States
⭐️ overall: 4/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

#443
🏙️  city: Lake Tahoe, CA
🌎 country: United States
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.

A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Overview

Gypsylist

Installation

Usage

Known Issues

Owner

Alessio Greggi

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

A simple proxy scraper that utilizes the requests module in python.

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

京东茅台抢购最新优化版本，京东茅台秒杀，优化了茅台抢购进程队列

Kusonime scraper using python3

:arrow_double_down: Dumb downloader that scrapes the web

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

原神爬虫抓取原神界面圣遗物信息

Web Scraping Framework

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Console application for downloading images from Reddit in Python

Scrape puzzle scrambles from csTimer.net

Web-Scraping using Selenium Master

一些爬虫相关的签名、验证码破解

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

哔哩哔哩爬取器：以个人为中心

A Python library for automating interaction with websites.

A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Overview

Gypsylist

Installation

Usage

Known Issues

Owner

Alessio Greggi

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

A simple proxy scraper that utilizes the requests module in python.

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

京东茅台抢购最新优化版本，京东茅台秒杀，优化了茅台抢购进程队列

Kusonime scraper using python3

:arrow_double_down: Dumb downloader that scrapes the web

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

原神爬虫 抓取原神界面圣遗物信息

Web Scraping Framework

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Console application for downloading images from Reddit in Python

Scrape puzzle scrambles from csTimer.net

Web-Scraping using Selenium Master

一些爬虫相关的签名、验证码破解

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

哔哩哔哩爬取器：以个人为中心

A Python library for automating interaction with websites.

原神爬虫抓取原神界面圣遗物信息