A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
Plazmix API wrapper for Python

An optimised, easy to use Plazmix API wrapper written in Python

Someone 2 Nov 16, 2021
This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners and alerts system.

Python Trading Bot w/ Thinkorswim Description This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners

Trey Thomas 201 Jan 03, 2023
Alcarin Tengwar - a Tengwar typeface designed to pair well with the Brill typeface

Alcarin Tengwar Alcarin Tengwar is a Tengwar typeface designed to pair well with

Toshi Omagari 23 Nov 02, 2022
An unofficial library for discord components (under-development)

discord-components An unofficial library for discord components (under-development) Welcome! Discord components are cool, but discord.py will support

11 Jun 14, 2021
Represents a Lavalink client used to manage nodes and connections.

lavaplayer Represents a Lavalink client used to manage nodes and connections. setup pip install lavaplayer setup lavalink you need to java 11* LTS or

HazemMeqdad 37 Nov 21, 2022
Trading bot - A Trading bot With Python

Trading_bot Trading bot intended for 1) Tracking current prices of tokens 2) Set

Tymur Kotkov 29 Dec 01, 2022
AWS Enumeration and Footprinting Tool

Quiet Riot ๐ŸŽถ C'mon, Feel The Noise ๐ŸŽถ An enumeration tool for scalable, unauthenticated validation of AWS principals; including AWS Acccount IDs, roo

Wes Ladd 89 Jan 05, 2023
A bot can play all variants, but standard are abit weak, so if you need strongest you can change fsf instead of stockfish_14_Dev

MAINTAINERS Drdisrespect1 and drrespectable lichess-bot Engine communication code taken from https://github.com/ShailChoksi/lichess-bot by ShailChoksi

RPNS Nimsilu 1 Dec 12, 2021
Terminal Bot which will Execute your Commands From telegram bot!

Terminal-Bot see this bot alive: https://t.me/HerokuTerminal_Bot With this bot you can execute system commands on your server. how to config? clone or

Moshe 41 Dec 09, 2022
GET-ACQ is a python tool used to gather all companies acquired by a given company domain name.

get-acq ๐Ÿข GET-ACQ is a python tool used to gather all companies acquired by a given company domain name. It is done by calling SecurityTrails API. Us

Milan 7 Dec 19, 2022
A multi-passwordโ€Œ cracking tool that can help you hack facebook accounts very quickly

FbCracker This is a multi-passwordโ€Œ cracking tool that can help you hack facebook accounts very quickly. Facebook Hacking Tool Installation On Termux

ReD H4CkeR 9 Nov 16, 2022
Web3 Ethereum DeFi toolkit for smart contracts, Uniswap and PancakeSwap trades, Ethereum JSON-RPC utilities, wallets and automated test suites.

Web3 Ethereum Defi This project contains common Ethereum smart contracts and utilities, for trading, wallets,automated test suites and backend integra

Trading Strategy 222 Jan 04, 2023
Another secured and Yet Fastest telegram userbot

Vision-UserBot A stable, simple Telegram UserBot in Pyrogram! Support Variables โžจ TG_APP_ID - Your Telegram Api id. โžจ TG_API_HASH - Your Telegram Api

TeamVision 40 Oct 24, 2022
universal messaging & notifications api

Pronounced "boat-shahft" What is botschaft? Botschaft is unified messaging & notifications appliance. Want to text yourself when a long-running task c

Tyler M. Kontra 25 Aug 16, 2022
SmsSender v3.0.0 - the script is designed to send free SMS to any number and with any text.

SmsSender v3.0.0 - ัะบั€ะธะฟั‚ ะฟั€ะตะดะฝะฐะทะฝะฐั‡ะตะฝ ะดะปั ะฑะตัะฟะปะฐั‚ะฝะพะน ะพั‚ะฟั€ะฐะฒะบะธ SMS ะฝะฐ ะปัŽะฑะพะน ะฝะพะผะตั€ ะธ ั ะปัŽะฑั‹ะผ ั‚ะตะบัั‚ะพะผ. ะ’ะพะทะผะพะถะฝั‹ ะฝะตะฑะพะปัŒัˆะธะต ะฑะฐะณะธ, ะฒ ัะบะพั€ะพะผ ะฒั€ะตะผะตะฝะธ ะธัะฟั€ะฐะฒะป

ะะฝะดั€ะตะน ะกะตั€ะณะตะตะฒ 20 Dec 03, 2021
Polar devices Python API and CLI.

loophole - Polar devices API About Python API for Polar devices. Command line interface included. Tested with: A360 Loop M400 Installation pip install

[roscoe] 145 Sep 14, 2022
A twitter multi-tool for OSINT on twitter accounts.

TwitterCheckr A twitter multi-tool for OSINT on twitter accounts. Infomation TwitterCheckr also known as TCheckr is multi-tool for OSINT on twitter a

IRIS 16 Dec 23, 2022
Scrape the Twitter Frontend API without authentication.

Twitter Scraper ๐Ÿ‡ฐ๐Ÿ‡ท Read Korean Version Twitter's API is annoying to work with, and has lots of limitations โ€” luckily their frontend (JavaScript) has

BuฤŸra ฤฐลŸgรผzar 3.4k Jan 08, 2023
A discord bot to assist you when playing phasmophobia.

phasbot A discord bot to assist you when playing phasmophobia. Add phasbot to your server here! Bot Commands ?help - shows commands ?info [ghost name]

1 Dec 22, 2021