A tool for scraping and organizing data from NewsBank API searches

Last update: Jun 17, 2021

Overview

nbscraper

Overview

This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape only searches print sources in the USA.

Requirements

Access to NewsBank (e.g. via your institution's library)
Python 3

Basic Usage

Call nbscrape function
- Arguments include "search", "date_from", and "date_to"
Output is a pandas dataframe, with all available metadata for each source

Disclaimer

This tool is to be used in compliance with terms of service outlined by your institution and NewsBank. As such, it is suggested that you use this tool for research purposes only, once you have settled on your final search terms. This is not an exploratory tool. The purpose of nbscraper is to alleviate the tedium of having to click through 50 pages one by one and to manually save sources' metadata.

Owner

GitHub Repository

A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

151 Dec 23, 2022

Scraping followers of an instagram account

ScrapInsta A script to scraping data from Instagram Install First of all you can run: pip install scrapinsta After that you need to install these requ

1 Sep 05, 2021

The first public repository that provides free BUBT website scraping API script on Github.

BUBT WEBSITE SCRAPPING SCRIPT I think this is the first public repository that provides free BUBT website scraping API script on github. When I was do

3 Feb 10, 2022

Automated Linkedin bot that will improve your visibility and increase your network.

LinkedinSpider LinkedinSpider is a small project using browser automating to increase your visibility and network of connections on Linkedin. DISCLAIM

2 Nov 26, 2021

Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

18 May 05, 2022

fork huanghyw/jd_seckill

Jd_Seckill 特别声明: 本仓库发布的jd_seckill项目中涉及的任何脚本，仅用于测试和学习研究，禁止用于商业用途，不能保证其合法性，准确性，完整性和有效性，请根据情况自行判断。本项目内所有资源文件，禁止任何公众号、自媒体进行任何形式的转载、发布。

512 Jan 03, 2023

A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https

6 Aug 10, 2022

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Unja Fetch Known Urls What's Unja? Unja is a fast & light tool for fetching known URLs from Wayback Machine, Common Crawl, Virus Total & AlienVault's

10 Aug 07, 2022

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

7 Nov 27, 2022

A distributed crawler for weibo, building with celery and requests.

4.8k Jan 03, 2023

Library to scrape and clean web pages to create massive datasets.

lazynlp A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this libr

2.1k Jan 06, 2023

Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

2.3k Jan 04, 2023

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

GNews 🚩 A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response 🚩 As well as you can fetch full

273 Dec 31, 2022

A tool for scraping and organizing data from NewsBank API searches

Related tags

Overview

nbscraper

Overview

Requirements

Basic Usage

Disclaimer

Owner

A modern CSS selector implementation for BeautifulSoup

Scraping followers of an instagram account

The first public repository that provides free BUBT website scraping API script on Github.

Automated Linkedin bot that will improve your visibility and increase your network.

Iptvcrawl - A scrapy project for crawl IPTV playlist

fork huanghyw/jd_seckill

A Python package that scrapes Google News article data while remaining undetected by Google.

Unja is a fast & light tool for fetching known URLs from Wayback Machine

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

A distributed crawler for weibo, building with celery and requests.

Library to scrape and clean web pages to create massive datasets.

Web Scraping Framework

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

Displays market info for the LUNI token on the Terra Blockchain

原神爬虫抓取原神界面圣遗物信息

This is a script that scrapes the longitude and latitude on food.grab.com

A web crawler script that crawls the target website and lists its links

OSTA web scraper, for checking the status of school buses in Ottawa

A tool for scraping and organizing data from NewsBank API searches

Related tags

Overview

nbscraper

Overview

Requirements

Basic Usage

Disclaimer

Owner

A modern CSS selector implementation for BeautifulSoup

Scraping followers of an instagram account

The first public repository that provides free BUBT website scraping API script on Github.

Automated Linkedin bot that will improve your visibility and increase your network.

Iptvcrawl - A scrapy project for crawl IPTV playlist

fork huanghyw/jd_seckill

A Python package that scrapes Google News article data while remaining undetected by Google.

Unja is a fast & light tool for fetching known URLs from Wayback Machine

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

A distributed crawler for weibo, building with celery and requests.

Library to scrape and clean web pages to create massive datasets.

Web Scraping Framework

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

Displays market info for the LUNI token on the Terra Blockchain

原神爬虫 抓取原神界面圣遗物信息

This is a script that scrapes the longitude and latitude on food.grab.com

A web crawler script that crawls the target website and lists its links

OSTA web scraper, for checking the status of school buses in Ottawa

原神爬虫抓取原神界面圣遗物信息