Info for The Great DataTas plot-a-thon

Overview

The Great DataTas plot-a-thon

Datatas is organising a Data Visualisation competition: The Great DataTas plot-a-thon We will be using Tidy Tuesday data from 2020-09-22 for our competition. We have chosen a non-scientific dataset for participants to create beautiful plots, so anyone can enter our competition.

Our rules for entry are:

  • Entries should have code and final figure available in a public GitHub repository (contact us if you need help with this).
  • Entries can use any programming language.
  • Entries must have a Creative Commons license because we will share them via our social media and GitHub repo.
  • Editing software is allowed but discouraged.
  • External supporting data is allowed and encouraged.
  • Entries must be submitted by the end of day of Sunday, November 21.
    Optional:
  • Provide your twitter handle.

The criteria for choosing the winner are:

  • Three independent judges from EchoView, BOM and UTAS/CLEX will be judging the entries.
  • Figures must be reproducible.
  • Scripts must be easy to read and understand.
  • Figures must convey main message clearly.
  • Scripts must be developed by person submitting the entry, no copying.

Picture of the Coveted Trophy

Event Timing

We will be having an info session on Wednesday, November 10th to explain all details of this competition. Join us in person at IMAS or online to find out how you can participate.

  • Competition starts at 4pm November 10th following the DataTas event
  • Entries are due midnight (11:59 pm) on November 21st, giving you 10 days to put together your entry. During those ten days, we encourage you to work with others and share ideas, but your final entry must be your own work.
  • The winners will be announced at our final event of the year, on Thursday, November 25th at 2pm in the IMAS Aurora Lecture Theatre. Only one person can win the DataTas trophy and be crowned the plot-a-thon champion!

About the Data (From The Tidy Tuesday Page)

Picture of the Himalayas from Wikipedia

Himalayan Climbing Expeditions

The data this week comes from The Himalayan Database.

The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepal Himalaya. The database is based on the expedition archives of Elizabeth Hawley, a longtime journalist based in Kathmandu, and it is supplemented by information gathered from books, alpine journals and correspondence with Himalayan climbers.

The data cover all expeditions from 1905 through Spring 2019 to more than 465 significant peaks in Nepal. Also included are expeditions to both sides of border peaks such as Everest, Cho Oyu, Makalu and Kangchenjunga as well as to some smaller border peaks. Data on expeditions to trekking peaks are included for early attempts, first ascents and major accidents.

h/t to Alex Cookson for sharing and cleaning this data!

This blog post by Alex Cookson explores the data in greater detail.

I don't want to underplay that there are some positives and some awful negatives for native Sherpa climbers. One-third of Everest deaths are Sherpa Climbers.

Also National Geographic has 5 Ways to help the Sherpas of Everest.

Get the data here

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest

# Either ISO-8601 date or year/week works!

tuesdata <- tidytuesdayR::tt_load('2020-09-22')
tuesdata <- tidytuesdayR::tt_load(2020, week = 39)

climbers <- tuesdata$climbers

# Or read in the data manually

members <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv')
expeditions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv')
peaks <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/peaks.csv')

For other languages, see the additional files in this repository.

Data Dictionary

peaks.csv

variable class description
peak_id character Unique identifier for peak
peak_name character Common name of peak
peak_alternative_name character Alternative name of peak (for example, the "Mount Everest" is "Sagarmatha" in Nepalese)
height_metres double Height of peak in metres
climbing_status character Whether the peak has been climbed
first_ascent_year double Year of first successful ascent, if applicable
first_ascent_country character Country name(s) of expedition members part of the first ascent. Can have multiple values if members were from different countries. Country name is as of date of ascent (for example, "W Germany" for ascents before 1990).
first_ascent_expedition_id character Unique identifier for expedition. Can be linked to expeditions or members tables.

expeditions.csv

variable class description
expedition_id character Unique identifier for expedition. Can be linked to peaks or members tables.
peak_id character Unique identifier for peak. Can be linked to peaks table.
peak_name character Common name for peak
year double Year of expedition
season character Season of expedition (Spring, Summer, etc.)
basecamp_date date Date of expedition arrival at basecamp
highpoint_date date Date of expedition summiting the peak for the first time or, if peak wasn't reached, date of reaching its highpoint
termination_date date Date the expedition was terminated
termination_reason character Primary reason the expedition was terminated. There are two possibilities for a successful expeditions, depending on whether the main peak or a sub-peak was summitted.
highpoint_metres double Elevation highpoint of the expedition
members double Number of expedition members. For expeditions in Nepal, this is usually the number of foreigners listed on the expedition permit. For expeditions in China, this is usually the number of non-hired members.
member_deaths double Number of expeditions members who died
hired_staff double Number of hired staff who went above basecamp
hired_staff_deaths double Number of hired staff who died
oxygen_used logical Whether oxygen was used by at least one member of the expedition
trekking_agency character Name of the trekking agency

members.csv

variable class description
expedition_id character Unique identifier for expedition. Can be linked to peaks or members tables.
member_id character Unique identifier for the person. This is not consistent across expeditions, so you cannot use a single member_id to look up all expeditions a person was part of.
peak_id character Unique identifier for peak. Can be linked to peaks table.
peak_name character Common name for peak
year double Year of expedition
season character Season of expedition (Spring, Summer, etc.)
sex character Sex of the person
age double Age of the person. Depending on the best available data, this could be as of the summit date, the date of death, or the date of arrival at basecamp.
citizenship character Citizenship of the person
expedition_role character Role of the person on the expedition
hired logical Whether the person was hired by the expedition
highpoint_metres double Elevation highpoint of the person
success logical Whether the person was successful in summitting a main peak or sub-peak, depending on the goal of expedition
solo logical Whether the person attempted a solo ascent
oxygen_used logical Whether the person used oxygen
died logical Whether the person died
death_cause character Primary cause of death
death_height_metres double Height at which the person died
injured logical Whether the person was injured
injury_type character Primary cause of injury
injury_height_metres double Height at which the injury occurred

Cleaning Script (R Only, see files in repo for Python and R.)

# Libraries
library(tidyverse)
library(janitor)


# Peaks
peaks <- read_csv("./himalayan-expeditions/raw/peaks.csv") %>%
  transmute(
    peak_id = PEAKID,
    peak_name = PKNAME,
    peak_alternative_name = PKNAME2,
    height_metres = HEIGHTM,
    climbing_status = PSTATUS,
    first_ascent_year = PYEAR,
    first_ascent_country = PCOUNTRY,
    first_ascent_expedition_id = PEXPID
  ) %>%
  mutate(
    climbing_status = case_when(
      climbing_status == 0 ~ "Unknown",
      climbing_status == 1 ~ "Unclimbed",
      climbing_status == 2 ~ "Climbed"
    )
  )

# Create small dataframe of peak names to join to other dataframes
peak_names <- peaks %>%
  select(peak_id, peak_name)

# Expeditions
expeditions <- read_csv("./himalayan-expeditions/raw/exped.csv") %>%
  left_join(peak_names, by = c("PEAKID" = "peak_id")) %>%
  transmute(
    expedition_id = EXPID,
    peak_id = PEAKID,
    peak_name,
    year = YEAR,
    season = SEASON,
    basecamp_date = BCDATE,
    highpoint_date = SMTDATE,
    termination_date = TERMDATE,
    termination_reason = TERMREASON,
    # Highpoint of 0 is most likely missing value
    highpoint_metres = ifelse(HIGHPOINT == 0, NA, HIGHPOINT),
    members = TOTMEMBERS,
    member_deaths = MDEATHS,
    hired_staff = TOTHIRED,
    hired_staff_deaths = HDEATHS,
    oxygen_used = O2USED,
    trekking_agency = AGENCY
  ) %>%
  mutate(
    termination_reason = case_when(
      termination_reason == 0 ~ "Unknown",
      termination_reason == 1 ~ "Success (main peak)",
      termination_reason == 2 ~ "Success (subpeak)",
      termination_reason == 3 ~ "Success (claimed)",
      termination_reason == 4 ~ "Bad weather (storms, high winds)",
      termination_reason == 5 ~ "Bad conditions (deep snow, avalanching, falling ice, or rock)",
      termination_reason == 6 ~ "Accident (death or serious injury)",
      termination_reason == 7 ~ "Illness, AMS, exhaustion, or frostbite",
      termination_reason == 8 ~ "Lack (or loss) of supplies or equipment",
      termination_reason == 9 ~ "Lack of time",
      termination_reason == 10 ~ "Route technically too difficult, lack of experience, strength, or motivation",
      termination_reason == 11 ~ "Did not reach base camp",
      termination_reason == 12 ~ "Did not attempt climb",
      termination_reason == 13 ~ "Attempt rumoured",
      termination_reason == 14 ~ "Other"
    ),
    season = case_when(
      season == 0 ~ "Unknown",
      season == 1 ~ "Spring",
      season == 2 ~ "Summer",
      season == 3 ~ "Autumn",
      season == 4 ~ "Winter"
    )
  )

members <-
  read_csv("./himalayan-expeditions/raw/members.csv", guess_max = 100000) %>%
  left_join(peak_names, by = c("PEAKID" = "peak_id")) %>%
  transmute(
    expedition_id = EXPID,
    member_id = paste(EXPID, MEMBID, sep = "-"),
    peak_id = PEAKID,
    peak_name,
    year = MYEAR,
    season = MSEASON,
    sex = SEX,
    age = CALCAGE,
    citizenship = CITIZEN,
    expedition_role = STATUS,
    hired = HIRED,
    # Highpoint of 0 is most likely missing value
    highpoint_metres = ifelse(MPERHIGHPT == 0, NA, MPERHIGHPT),
    success = MSUCCESS,
    solo = MSOLO,
    oxygen_used = MO2USED,
    died = DEATH,
    death_cause = DEATHTYPE,
    # Height of 0 is most likely missing value
    death_height_metres = ifelse(DEATHHGTM == 0, NA, DEATHHGTM),
    injured = INJURY,
    injury_type = INJURYTYPE,
    # Height of 0 is most likely missing value
    injury_height_metres = ifelse(INJURYHGTM == 0, NA, INJURYHGTM)
  ) %>%
  mutate(
    season = case_when(
      season == 0 ~ "Unknown",
      season == 1 ~ "Spring",
      season == 2 ~ "Summer",
      season == 3 ~ "Autumn",
      season == 4 ~ "Winter"
    ),
    age = ifelse(age == 0, NA, age),
    death_cause = case_when(
      death_cause == 0 ~ "Unspecified",
      death_cause == 1 ~ "AMS",
      death_cause == 2 ~ "Exhaustion",
      death_cause == 3 ~ "Exposure / frostbite",
      death_cause == 4 ~ "Fall",
      death_cause == 5 ~ "Crevasse",
      death_cause == 6 ~ "Icefall collapse",
      death_cause == 7 ~ "Avalanche",
      death_cause == 8 ~ "Falling rock / ice",
      death_cause == 9 ~ "Disappearance (unexplained)",
      death_cause == 10 ~ "Illness (non-AMS)",
      death_cause == 11 ~ "Other",
      death_cause == 12 ~ "Unknown"
    ),
    injury_type = case_when(
      injury_type == 0 ~ "Unspecified",
      injury_type == 1 ~ "AMS",
      injury_type == 2 ~ "Exhaustion",
      injury_type == 3 ~ "Exposure / frostbite",
      injury_type == 4 ~ "Fall",
      injury_type == 5 ~ "Crevasse",
      injury_type == 6 ~ "Icefall collapse",
      injury_type == 7 ~ "Avalanche",
      injury_type == 8 ~ "Falling rock / ice",
      injury_type == 9 ~ "Disappearance (unexplained)",
      injury_type == 10 ~ "Illness (non-AMS)",
      injury_type == 11 ~ "Other",
      injury_type == 12 ~ "Unknown"
    ),
    death_cause = ifelse(died, death_cause, NA_character_),
    death_height_metres = ifelse(died, death_height_metres, NA),
    injury_type = ifelse(injured, injury_type, NA_character_),
    injury_height_metres = ifelse(injured, injury_height_metres, NA)
  )


### Write to CSV
write_csv(expeditions, "./himalayan-expeditions/expeditions.csv")
write_csv(members, "./himalayan-expeditions/members.csv")
write_csv(peaks, "./himalayan-expeditions/peaks.csv")

D-Analyst : High Performance Visualization Tool

D-Analyst : High Performance Visualization Tool D-Analyst is a high performance data visualization built with python and based on OpenGL. It allows to

4 Apr 14, 2022
Project coded in Python using Pandas to look at changes in chase% for batters facing a pitcher first time through the order vs. thrid time

Project coded in Python using Pandas to look at changes in chase% for batters facing a pitcher first time through the order vs. thrid time

Jason Kraynak 1 Jan 07, 2022
A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece.

COVID-19-Greece A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece. Data sources Data provided by Johns Hopki

Isabelle Viktoria Maciohsek 23 Jan 03, 2023
A set of useful perceptually uniform colormaps for plotting scientific data

Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti

HoloViz 590 Dec 31, 2022
Data aggregated from the reports found at the MCPS COVID Dashboard into a set of visualizations.

Montgomery County Public Schools COVID-19 Visualizer Contents About this project Data Support this project About this project Data All data we use can

James 3 Jan 19, 2022
Productivity Tools for Plotly + Pandas

Cufflinks This library binds the power of plotly with the flexibility of pandas for easy plotting. This library is available on https://github.com/san

Jorge Santos 2.7k Dec 30, 2022
Movie recommendation using RASA, TigerGraph

Demo run: The below video will highlight the runtime of this setup and some sample real-time conversations using the power of RASA + TigerGraph, Steps

Sudha Vijayakumar 3 Sep 10, 2022
Typical: Fast, simple, & correct data-validation using Python 3 typing.

typical: Python's Typing Toolkit Introduction Typical is a library devoted to runtime analysis, inference, validation, and enforcement of Python types

Sean 171 Jan 02, 2023
Type-safe YAML parser and validator.

StrictYAML StrictYAML is a type-safe YAML parser that parses and validates a restricted subset of the YAML specification. Priorities: Beautiful API Re

Colm O'Connor 1.2k Jan 04, 2023
Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

Michael Waskom 10.2k Dec 30, 2022
A tool for creating Toontown-style nametags in Panda3D

Toontown-Nametag Toontown-Nametag is a tool for creating Toontown Online/Toontown Rewritten-style nametags in Panda3D. It contains a function, createN

BoggoTV 2 Dec 23, 2021
A Python Library for Self Organizing Map (SOM)

SOMPY A Python Library for Self Organizing Map (SOM) As much as possible, the structure of SOM is similar to somtoolbox in Matlab. It has the followin

Vahid Moosavi 497 Dec 29, 2022
股票行情实时数据接口-A股,完全免费的沪深证券股票数据-中国股市,python最简封装的API接口

股票行情实时数据接口-A股,完全免费的沪深证券股票数据-中国股市,python最简封装的API接口,包含日线,历史K线,分时线,分钟线,全部实时采集,系统包括新浪腾讯双数据核心采集获取,自动故障切换,STOCK数据格式成DataFrame格式,可用来查询研究量化分析,股票程序自动化交易系统.为量化研究者在数据获取方面极大地减轻工作量,更加专注于策略和模型的研究与实现。

dev 572 Jan 08, 2023
Simple python implementation with matplotlib to manually fit MIST isochrones to Gaia DR2 color-magnitude diagrams

Simple python implementation with matplotlib to manually fit MIST isochrones to Gaia DR2 color-magnitude diagrams

Karl Jaehnig 7 Oct 22, 2022
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 723 Jan 07, 2023
Minimal Ethereum fee data viewer for the terminal, contained in a single python script.

Minimal Ethereum fee data viewer for the terminal, contained in a single python script. Connects to your node and displays some metrics in real-time.

48 Dec 05, 2022
阴阳师后台全平台(使用网易 MuMu 模拟器)辅助。支持御魂,觉醒,御灵,结界突破,秘闻副本,地域鬼王。

阴阳师后台全平台辅助 Python 版本:Python 3.8.3 模拟器:网易 MuMu | 雷电模拟器 模拟器分辨率:1024*576 显卡渲染模式:兼容(OpenGL) 兼容 Windows 系统和 MacOS 系统 思路: 利用 adb 截图后,使用 opencv 找图找色,模拟点击。使用

简讯 27 Jul 09, 2022
The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

The HIST framework for stock trend forecasting The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining C

Wentao Xu 111 Jan 03, 2023
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

wq framework 1.2k Jan 01, 2023