This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Overview

Feedback Prize - Evaluating Student Writing

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The competition can be found here: https://www.kaggle.com/competitions/feedback-prize-2021/

Datasets required

Use this command to convert roberta-large to LSG

$ python convert_roberta_checkpoint.py \
                        --initial_model roberta-large \
                        --model_name lsg-roberta-large \
                        --max_sequence_length 1536

Follow following instructions to manually add fast tokenizer to transformer library:

# The following is necessary if you want to use the fast tokenizer for deberta v2 or v3
# This must be done before importing transformers
import shutil
from pathlib import Path

# Path to installed transformer library
transformers_path = Path("/opt/conda/lib/python3.7/site-packages/transformers")

input_dir = Path("../input/deberta-v2-3-fast-tokenizer")

convert_file = input_dir / "convert_slow_tokenizer.py"
conversion_path = transformers_path/convert_file.name

if conversion_path.exists():
    conversion_path.unlink()

shutil.copy(convert_file, transformers_path)
deberta_v2_path = transformers_path / "models" / "deberta_v2"

for filename in ['tokenization_deberta_v2.py', 'tokenization_deberta_v2_fast.py']:
    filepath = deberta_v2_path/filename
    if filepath.exists():
        filepath.unlink()

    shutil.copy(input_dir/filename, filepath)

After this ../input directory should look something like this.

.
├── input
│   ├── feedback-prize-2021
│   │   ├── train/
│   │   ├── test/
│   │   ├── sample_submission.csv
│   │   └── train.csv
│   ├── lsg-roberta-large
│   │   ├── config.json
│   │   ├── merges.txt
│   │   ├── modeling.py
│   │   ├── pytorch_model.bin
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── deberta-v2-3-fast-tokenizer
│   │   ├── convert_slow_tokenizer.py
│   │   ├── deberta__init__.py
│   │   ├── tokenization_auto.py
│   │   ├── tokenization_deberta_v2.py
│   │   ├── tokenization_deberta_v2_fast.py
│   │   └── transformers__init__.py
│   └── feedbackgroupshufflesplit1337
│       └── groupshufflesplit_1337.p

or you can change the DATA_BASE_DIR in SETTINGS.json to download the files in your desired location.

Models and Training

  • Deberta large, Deberta xlarge, Deberta v2 xlarge, Deberta v3 large, Funnel transformer large and BigBird are trained using trainer.py

Example:

$ python trainer.py --fold 0 --pretrained_model google/bigbird-roberta-large

where pretrained_model can be microsoft/deberta-large, microsoft/deberta-xlarge, microsoft/deberta-v2-xlarge, microsoft/deberta-v3-large, funnel-transformer/large or google/bigbird-roberta-large

  • Deberta large with LSTM head and jaccard loss is trained using debertabilstm_trainer.py

Example:

$ python debertabilstm_trainer.py --fold 0
  • Longformer large with LSTM head is trained using longformerwithbilstm_trainer.py

Example:

$ python longformerwithbilstm_trainer.py --fold 0
  • LSG Roberta is trained with lsgroberta_trainer.py

Example:

$ python lsgroberta_trainer.py --fold 0
  • YOSO is trained with yoso_trainer.py

Example:

$ python yoso_trainer.py --fold 0

Inference

After training all the models, the outputs were pushed to Kaggle Datasets.

And the final inference kernel can be found here: https://www.kaggle.com/code/cdeotte/2nd-place-solution-cv741-public727-private740?scriptVersionId=90301836

Solution writeup: https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389

Owner
Udbhav Bamba
Deep Learning || Computer Vision || Machine Learning
Udbhav Bamba
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

PSTR (CVPR2022) This code is an official implementation of "PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)". End-to-end one-step

Jiale Cao 28 Dec 13, 2022
FMA: A Dataset For Music Analysis

FMA: A Dataset For Music Analysis Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information

Michaël Defferrard 1.8k Dec 29, 2022
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
PFLD pytorch Implementation

PFLD-pytorch Implementation of PFLD A Practical Facial Landmark Detector by pytorch. 1. install requirements pip3 install -r requirements.txt 2. Datas

zhaozhichao 669 Jan 02, 2023
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
Multispectral Object Detection with Yolov5

Multispectral-Object-Detection Intro Official Code for Cross-Modality Fusion Transformer for Multispectral Object Detection. Multispectral Object Dete

Richard Fang 121 Jan 01, 2023
MvtecAD unsupervised Anomaly Detection

MvtecAD unsupervised Anomaly Detection This respository is the unofficial implementations of DFR: Deep Feature Reconstruction for Unsupervised Anomaly

0 Feb 25, 2022
An algorithm that handles large-scale aerial photo co-registration, based on SURF, RANSAC and PyTorch autograd.

An algorithm that handles large-scale aerial photo co-registration, based on SURF, RANSAC and PyTorch autograd.

Luna Yue Huang 41 Oct 29, 2022
Motion planning environment for Sampling-based Planners

Sampling-Based Motion Planners' Testing Environment Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quick

Soraxas 23 Aug 23, 2022
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

DOC | Quick Start | 中文 Breaking News !! 🔥 🔥 🔥 OGB-LSC KDD CUP 2021 winners announced!! (2021.06.17) Super excited to announce our PGL team won TWO

1.5k Jan 06, 2023
Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

One Pixel Attack How simple is it to cause a deep neural network to misclassify an image if an attacker is only allowed to modify the color of one pix

Dan Kondratyuk 1.2k Dec 26, 2022
PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation Winner method of the ICCV-2021 SemKITTI-DVPS Challenge. [arxiv] [

Yuan Haobo 38 Jan 03, 2023
Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Aerial Depth Completion This work is described in the letter "Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation", by Lucas

ETHZ V4RL 70 Dec 22, 2022
[ICCV 2021] Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

ADDS-DepthNet This is the official implementation of the paper Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation I

LIU_LINA 52 Nov 24, 2022
Tutorial: Introduction to Graph Machine Learning, with Jupyter notebooks

GraphMLTutorialNLDL22 Tutorial NLDL22: Introduction to Graph Machine Learning, with Jupyter notebooks This tutorial takes place during the conference

UiT Machine Learning Group 3 Jan 10, 2022
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing

ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing ProFuzzBench is a benchmark for stateful fuzzing of network protocols. It includes a suite of

155 Jan 08, 2023
Proof-Of-Concept Piano-Drums Music AI Model/Implementation

Rock Piano "When all is one and one is all, that's what it is to be a rock and not to roll." ---Led Zeppelin, "Stairway To Heaven" Proof-Of-Concept Pi

Alex 4 Nov 28, 2021
MTA:SA Server Configer.

MTAConfiger MTA:SA Server Configer. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Learning CS

3 Jun 07, 2022
DeepAL: Deep Active Learning in Python

DeepAL: Deep Active Learning in Python Python implementations of the following active learning algorithms: Random Sampling Least Confidence [1] Margin

Kuan-Hao Huang 583 Jan 03, 2023