Tools for collecting social media data around focal events

Overview

Social Media Focal Events

The focalevents codebase provides tools for organizing data collected around focal events on social media.

It is often difficult to organize data from multiple API queries. For example, we may collect tweets when a hashtag starts trending by using Twitter’s filter stream. Later, we may make a separate query to the search endpoint to backfill our stream with what we missed before we started it, or update it with tweets that occurred since we stopped it. We may also want to get reply threads, quote tweets, or user timelines based on the tweets we collected. All of these queries are related to a common focal event—the hashtag—but they require several separate calls to the API. It is easy for these multiple queries to result in many disjoint files, making it difficult to organize, merge, update, backfill, and preprocess them quickly and reliably.

To address these issues, focalevents can be used to organize social media focal event data collected from Twitter’s v2 API using academic credentials and PostgreSQL. It is easy to do any of the following with the tools here:

  • Query Twitter’s full archive or filter stream for focal event data
  • Backfill and update those queries with additional data
  • Collect conversation threads and quote tweets of focal event tweets
  • Retrieve full user timelines for any user tweeting during a focal event

All of these functionalities are easy, single line commands, rather than long multi-line scripts, as are typically needed to read IDs, query the API, output data, and merge it with existing data. This allows researchers to design more complex studies of social media data, and spend more time focusing on data analysis, rather than data storage and maintenance.

Installation and Documentation

The repository's code can be downloaded directly from Github, or cloned using git:

git clone https://github.com/ryanjgallagher/focalevents

See the full documentation for more information about installing, configuring, and using the focalevent tools.

A Note

The code here is written and maintained by a single person. First and foremost, it has been designed to help them manage their own data and create replicable pipelines. They are sharing it in the hope that it may help others who have similar workflows and are interested in organizing their Twitter data according to focal events using PostgreSQL.

Requests for enhancements or additions to the code will likely be declined if the author does not anticipate using them in their own research. It is highly unlikely that the code will ever be adapted to work with databases other than PostgreSQL. Further, general problems with database setup or conflicts with pre-existing database structures are beyond the scope of this project and will not be addressed.

Owner
Ryan Gallagher
Network science PhD student merging networks and NLP for computational social science
Ryan Gallagher
Mpis-ex7 - Implementation of tasks 1, 2, 3 for Metody Probabilistyczne i Statystyka Lista 7

Implementations of task 1, 2 and 3 from here Author: Maciej Bazela Index: 261743 Each task was implemented in Python 3. I've used Cython to speed up e

Maciej Bazela 1 Feb 27, 2022
Backtest framework based on DAGs

MultitaskQueue It's a simple framework based on three composed concepts: Task: A task is the smaller unit of execution or simple a node in the DAG, ev

4 Dec 09, 2021
A Python3 script to decode an encoded VBScript file, often seen with a .vbe file extension

vbe-decoder.py Decode one or multiple encoded VBScript files, often seen with a .vbe file extension. Usage usage: vbe-decoder.py [-h] [-o output] file

John Hammond 147 Nov 15, 2022
Frappe app for authentication, can be used with FrappeVue-AdminLTE

Frappeauth App Frappe app for authentication, can be used with FrappeVue-AdminLTE

Anthony C. Emmanuel 9 Apr 13, 2022
CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, written in Python.

CaskDB - Disk based Log Structured Hash Table Store CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, w

886 Dec 27, 2022
Recreate the joys of Office Assistant from the comfort of the Python interpreter

Recreate the joys of Office Assistant from the comfort of the Python interpreter.

Louis Sven Goulet 3 May 21, 2022
A tool to build reproducible wheels for you Python project or for all of your dependencies

asaman: Amra Saman (আমরা সমান) This is a tool to build reproducible wheels for your Python project or for all of your dependencies. What this means is

Kushal Das 14 Aug 05, 2022
An AddOn storing wireguard configuration

Wireguard Database Connector Overview Development Status: 0.1.7 (alpha) First of all, I'd like to thank Jared McKnight for wireguard who inspired me t

Markus Neubauer 3 Dec 30, 2021
Predicting Global Crop Yield for World Hunger

Crop Yield And Global Famine - The fifth project I created during my time at General Assembly. I completed this project with three other classmates in the span of three weeks. Most of my work was dir

Adam Muhammad Klesc 2 Jun 19, 2022
Python client library for the Databento API

Databento Python Library The Databento Python client library provides access to the Databento API for both live and historical data, from applications

Databento, Inc. 35 Dec 24, 2022
Scientific Programming: A Crash Course

Scientific Programming: A Crash Course Welcome to the Scientific Programming course. My name is Jon Carr and I am a postdoc in Davide Crepaldi's lab.

Jon Carr 1 Feb 17, 2022
Zotero references script (and app)

A little script (and PyInstaller build) for a very specific, somewhat hack-ish purpose: managing and exporting project references with Zotero and its API.

Marius Rödder 0 Dec 05, 2021
JSEngine is a simple wrapper of Javascript engines.

JSEngine This is a simple wrapper of Javascript engines, it wraps the Javascript interpreter for Python use. There are two ways to call interpreters,

11 Dec 18, 2022
Simple module with some functions such as generate password (get_random_string)

Simple module with some functions such as generate password (get_random_string), fix unicode strings, size converter, dynamic console, read/write speed checker, etc.

Dmitry 2 Dec 03, 2022
Oblique Strategies for Python

Oblique Strategies for Python

Łukasz Langa 3 Feb 17, 2022
Get a list of the top-10 rejected libraries in your WhiteSource inventory

WhiteSource Top 10 Rejected Libraries Generate a spreadsheet listing the 10 most common libraries in your WhiteSource inventory that were rejected by

WhiteSource-PS-tools 10 Mar 23, 2022
Gaia: a chrome extension that curates environmental news of a company

Gaia - Gaia: Your Environment News Curator Call for Code 2021 Gaia: a chrome extension that curates environmental news of a company Explore the docs »

4 Mar 19, 2022
Airbrake Python

airbrake-python Note. Python 3.4+ are advised to use new Airbrake Python notifier which supports async API and code hunks. Python 2.7 users should con

Airbrake 51 Dec 22, 2022
Yandex Media Browser

Браузер медиа для плагина Yandex Station Включайте музыку, плейлисты и радио на Яндекс.Станции из Home Assistant! Скриншот Корневой раздел: Библиотека

Alexander Ryazanov 35 Dec 19, 2022
The learning agent learns firstly approaching to the football and then kicking the football to the target position

Football Court This project utilized Pytorch and Tensorflow so that the learning agent learns firstly approaching to the football and then kicking the

1 Nov 19, 2021