GIANT

Code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

https://arxiv.org/pdf/2004.02118.pdf

Please cite our paper if this project is helpful to your work or research, thanks.

How to run

Download files Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP/download.html) and Chinese word embedding (https://ai.tencent.com/ailab/nlp/embedding.html). For word embedding, see note in the bottom.

Revise paths and put files in appropriate paths File paths are defined in common/constants.py. So just go to that file and change the paths according to your own setting. Similarly for other paths defined in some source files.
test run

python3 GIANT_main.py
--data_type concept
--train_file "../../../../Datasets/original/concept/concepts.json"
--emb_tags
--task_output_dims 2
--tasks "phrase"
--edge_types_list "seq" "dep" "contain" "synonym"
--d_model 32
--layers 3
--num_bases 5
--epochs 10
--mode train
--debug

Note: add —processed_emb in above command can help to prevent re-processing word embeddings (as it is time consuming). In this case, you also don't need to download the Chinese word embedding file. It is quite big. Our experience shows that add word embedding feature as a part of node features is not quite helpful in our tasks. Therefore, I think it is safe to ignore the word embedding features in your experiments. If not using word embedding, you may need to revise data_loader.py to avoid some running errors. However, you can still try to improve by word embeddings.

code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

Related tags

Overview

GIANT

How to run

Owner

Excalibur

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Scalable implementation of Lee / Mykland (2012) and Ait-Sahalia / Jacod (2012) Jump tests for noisy high frequency data

Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

Voice control for Garry's Mod

Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

NER for Indian languages

PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

OCR-D wrapper for detectron2 based segmentation models

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

Single object tracking and segmentation.

The Empirical Investigation of Representation Learning for Imitation (EIRLI)

On the model-based stochastic value gradient for continuous reinforcement learning

Baseline and template code for node21 detection track

A Python 3 package for state-of-the-art statistical dimension reduction methods

SphereFace: Deep Hypersphere Embedding for Face Recognition

Neural Module Network for VQA in Pytorch

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.