I³ Tracker for Essential Open Innovation Datasets

Overview

I³ Tracker for Essential Open Innovation Datasets

This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovation Dataset Index, which consists of lists of datasets and tools relevant to Innovation Data. This index may be collaboratively edited, either by making edits to markdown files contained in this repository, or editing metadata in the Google Sheet.

The repository checks the Google Sheet for changes every 5min (and will update the site if there are any), and will also re-build the site automatically when somebody makes an edit via git. The site is generated from markdown files in this repository using the static site generator Jekyll.

Add/edit a Dataset using Git

Each record in the index has a corresponding markdown file (auto-generated) in the folder datasets/. These files contain the basic metadata associated with the record in the frontmatter, and also allow more long-form information, such as details of queries, images, and other written information, to be added. Both of these things are editable.

When a markdown file is added to the datasets/ folder, a GitHub action publishes the metadata in the frontmatter to the Google Sheet, and to the archive csv, to keep the records up to date. This script calls various metadata scrapers to automatically pull information like permalinks, citation information, and versioning. Once the file has been successfully committed, a second action will run to refresh the state of the website to reflect the edits.

Contribution Steps

  1. fork the repository, create a markdown in the folder 'datasets' based on the template file
  2. add as much metadata as you like, and create a pull request in this repository
  3. all being well, this should automatically merge. if not, you can check the GitHub actions log, or open an issue. (make sure it's in the correct folder, and has a .md file extension before doing so)

If the dataset is hosted on a platform with parseable citation metadata (Dataverse, Zenodo, ICPSR, and major university repositories are examples of these), then the tool will automatically pull most of the data associated with the dataset -- fields that will auto-fill are indicated by a comment. If the dataset is hosted on e.g. a personal site, then you might want to include some more information -- but ultimately, only a title and URL is really necessary. However you fill out your dataset, a uuid and timestamp will be generated for it automatically; these aren't fields you need to include (hence not included in the template).

The reason we've done this is to save you from copy-pasting a lot of information from existing repositories, and to make it easier for you to curate more useful and harder-to-scrape metadata -- such as the timeframes of datasets, links to code and documentation, and datasets that might be built on top of it but don't use an easy-to-parse citation. So definitely prioritise these fields!

If you're unsure of how to make a pull request, github has some good guides to doing this. You can also just make an edit to the Google Sheet, which will have an equivalent effect.

If there's a piece of metadata you think we should collect but don't, please add it to the frontmatter of the markdown files you contribute. (Nothing will break!) Then open an issue mentioning the new field, so that we can discuss adding it to the repository officially too.

To contribute a new dataset via pull request, please use the template file datasets/_template.md as a reference:

---
title: #required
url: #required
doi: #scrapeable
citation: #scrapeable
description: #scrapeable
timeframe:
documentation:
error_metrics:
code:
versioning: #scrapeable
terms_of_use: #scrapeable
tags:
references:
---


body text. info about `queries`, links and images goes here :)

Collections

The site also indexes collections, which are pages containing thematic information about datasets, tools and resources. These are housed in the folder collections/. The collection intro.md is an example -- this particular collection is also rendered on the front page of the site.

In the same manner as datasets, collection files can be added or edited using pull requests, where the repository is forked, and additions or edits to the collections can be made. The collections are not currently tracked via Google Sheets, and so may only be edited via git.

To create a new collection, the collection template may be copied to use as a reference:

---
title:
author:
tags:
---

Collections are a way to list resources around a theme, relevant to a research agenda or set of papers, or as an introduction to various aspects of the field. They are formatted in markdown:

To list a dataset that's in the index, use a relative link, e.g.

```markdown
[local dataset name](/datasets/dataset_shortname)

Dataset shortnames can be found either by looking at the urls directly, or through the 'shortnames' column of the Google Sheet.

Index

A versioned .csv file containing the index may be accessed in the folder index_archive. If you'd like to browse and query either sheet, you can do so using Github's Flat Data tool here. The Github Action that pulls the sheet is based on Dolthub's Gsheets-to-csv action.

Screenshot 2021-07-13 at 13 35 49

The FLARE team's open-source library to disassemble Common Intermediate Language (CIL) instructions.

dncil is a Common Intermediate Language (CIL) disassembly library written in Python that supports parsing the header, instructions, and exception hand

MANDIANT 95 Jan 08, 2023
Taichi is a parallel programming language for high-performance numerical computations.

Taichi is a parallel programming language for high-performance numerical computations.

Taichi Developers 22k Jan 04, 2023
pythonOS: An operating system kernel made in python and assembly

pythonOS An operating system kernel made in python and assembly Wait what? It uses a custom compiler called snek that implements a part of python3.9 (

Abbix 69 Dec 23, 2022
IPython: Productive Interactive Computing

IPython: Productive Interactive Computing Overview Welcome to IPython. Our full documentation is available on ipython.readthedocs.io and contains info

IPython 15.6k Dec 31, 2022
Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars.

Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars. Practically useless for me now, but can be used as an exam

BoggoTV 1 Dec 11, 2021
Calculate the efficient frontier

关于 代码主要参考Fábio Neves的文章,你可以在他的文章中找到一些细节性的解释

Wyman Lin 104 Nov 11, 2022
RestMapper takes the pain out of integrating with RESTful APIs.

python-restmapper RestMapper takes the pain out of integrating with RESTful APIs. It removes all of the complexity with writing API-specific code, and

Lionheart Software 8 Oct 31, 2020
Some usefull scripts for the Nastran's 145 solution (Flutter Analysis) using the pyNastran package.

nastran-aero-flutter This project is intended to analyse the Supersonic Panel Flutter using the NASTRAN software. The project uses the pyNastran and t

zuckberj 11 Nov 16, 2022
Simple Python-based web application to allow UGM students to fill their QR presence list without having another device in hand.

Praesentia Praesentia is a simple Python-based web application to allow UGM students to fill their QR presence list without having another device in h

loncat 20 Sep 29, 2022
easy_sbatch - Batch submitting Slurm jobs with script templates

easy_sbatch - Batch submitting Slurm jobs with script templates

Wei Shen 13 Oct 11, 2022
Tool that adds githuh profile views to ur acc

Tool that adds githuh profile views to ur acc

Lamp 2 Nov 28, 2021
A python package for batch import of resume attachments to be parsed in HrFlow.

HrFlow Importer Description A python package for batch import of resume attachments to be parsed in HrFlow. hrflow-importer is an open-source project

HrFlow.ai (ex: Riminder.net) 3 Nov 15, 2022
To check my COVID-19 vaccine appointment, I wrote an infinite loop that sends me a Whatsapp message hourly using Twilio and Selenium. It works on my Raspberry Pi computer.

COVID-19_vaccine_appointment To check my COVID-19 vaccine appointment, I wrote an infinite loop that sends me a Whatsapp message hourly using Twilio a

Ayyuce Demirbas 24 Dec 17, 2022
一个Graia-Saya的插件仓库

一个Graia-Saya的插件仓库 这是一个存储基于 Graia-Saya 的插件的仓库 如果您有这类项目

ZAPHAKIEL 111 Oct 24, 2022
Convert Photoshop curves (acv) to xmp presets for Lightroom

acv2xmp Convert Photoshop curves (acv) to Lightroom preset (xmp) acv2xmp.py Basic command prompt that relies on standard library only and can be used

5 Feb 06, 2022
Your missing PO formatter and linter

pofmt Your missing PO formatter and linter Features Wrap msgid and msgstr with a constant max width. Can act as a pre-commit hook. Display lint errors

Frost Ming 5 Mar 22, 2022
A jokes python module

Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/FayasNoushad/Jokes/blob/main/LICENSE Deploy

Fayas Noushad 3 Nov 28, 2021
A script to add issues to a project in Github based on label or status.

Add Github Issues to Project (Beta) A python script to move Github issues to a next-gen (beta) Github Project Getting Started These instructions will

Kate Donaldson 3 Jan 16, 2022
A simple chatbot that I made for school project

Chatbot: Python A simple chatbot that I made for school Project. Tho this chatbot is dumb sometimes, but it's not too bad lol. Check it Out! FAQ How t

Prashant 2 Nov 13, 2021
A gamey, snakey esoteric programming language

Snak Snak is an esolang based on the classic snake game. Installation You will need python3. To use the visualizer, you will need the curses module. T

David Rutter 3 Oct 10, 2022