This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Overview

Introduction

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

If you find this code useful, please cite the following paper:

@article{tan2022coherence,
  title = {Coherence-Based Distributed Document Representation Learning for Scientific Documents},
  author = {Tan, Shicheng and Zhao, Shu and Zhang, Yanping},
  journal = {arXiv},
  year = {2022},
  type = {Journal Article}
}

Run

  1. Installation environment (ref. requirements.txt)
  2. Download data: Link: https://pan.baidu.com/s/1EEJk0_P55Ov5ReXsmyVZPA Password: rkh0
  3. python _av_CTE.py

信息检索数据运行指南

  1. 数据处理(4个文件):使用“...data helper-IR.py”获取3份数据,原始数据处理暂存文件、原始数据处理暂存文件的语料、构建的数据集,然后使用“_aj_get dataset corpus.py”获得 构建的数据集的语料
  2. 词向量训练(4个文件):使用“_ak_get word embedding.py”训练第一步的2个语料得到2个词表和2个词向量文件,glove需要去除后缀名“.txt”
  3. 运行5次“_al_em-avg.py”得到5个结果,avg-word2vec、avg-word2vec(globe)、avg-glove、avg-glove(globe)、random embedding
  4. 运行“_ac_tf-idf.py”得到一个距离矩阵和1个结果,矩阵用于CTE方法
  5. LDA、doc2vec、BM25、LSI、GPT2、XLNet、GPT、Transformer-XL、XLM 对应文件各运行一次得到9个结果
  6. 运行“_ah_WMD.py”4次得到4个结果,WMD-word2vec、WMD-word2vec(globe)、WMD-glove、WMD-glove(globe)
  7. 运行“_at_BERT.py”2次得到2个结果,BERT-Large uncased、BERT-Large uncased(wwm)
  8. 运行“_at_ELMo.py”2次得到2个结果,ELMo-Original(5.5B)、ELMo-Original(5.5B,级联)
  9. 运行“_av_CET.py”13次得到13个结果,基于 random embedding 等13种基础词向量
Owner
tsc
Artificial general intelligence
tsc
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

HKD Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks" cifia-100 result The implementation of compared methods are ba

Wang Yucheng 30 Dec 18, 2022
ULMFiT for Genomic Sequence Data

Genomic ULMFiT This is an implementation of ULMFiT for genomics classification using Pytorch and Fastai. The model architecture used is based on the A

Karl 276 Dec 12, 2022
Open-source implementation of Google Vizier for hyper parameters tuning

Advisor Introduction Advisor is the hyper parameters tuning system for black box optimization. It is the open-source implementation of Google Vizier w

tobe 1.5k Jan 04, 2023
(AAAI2022) Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation

SM-PPM This is a Pytorch implementation of our paper "Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Seman

W-zx-Y 10 Dec 07, 2022
Safe Local Motion Planning with Self-Supervised Freespace Forecasting, CVPR 2021

Safe Local Motion Planning with Self-Supervised Freespace Forecasting By Peiyun Hu, Aaron Huang, John Dolan, David Held, and Deva Ramanan Citing us Yo

Peiyun Hu 90 Dec 01, 2022
CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

M-BERT-Study CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY Motivation Multilingual BERT (M-BERT) has shown surprising cross lingual a

CogComp 1 Feb 28, 2022
Tensor-Based Quantum Machine Learning

TensorLy_Quantum TensorLy-Quantum is a Python library for Tensor-Based Quantum Machine Learning that builds on top of TensorLy and PyTorch. Website: h

TensorLy 85 Dec 03, 2022
This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Patches Are All You Need? 🤷 This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?". Code ov

ICLR 2022 Author 934 Dec 30, 2022
Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

gapmm2: gapped alignment using minimap2 This tool is a wrapper for minimap2 to r

Jon Palmer 2 Jan 27, 2022
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics. By Andres Milioto @ University of Bonn. (for the new P

Photogrammetry & Robotics Bonn 314 Dec 30, 2022
Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021)

Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021) By Jinhyung Park, Dohae Lee, In-Kwon Lee from Yonsei University (Seoul,

Jinhyung Park 0 Jan 09, 2022
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers This is an implementation of A Physics-Informed Vector Quantized Autoencoder for Dat

DreamSoul 3 Sep 12, 2022
This program writes christmas wish programmatically. It is using turtle as a pen pointer draw christmas trees and stars.

Introduction This is a simple program is written in python and turtle library. The objective of this program is to wish merry Christmas programmatical

Gunarakulan Gunaretnam 1 Dec 25, 2021
CowHerd is a partially-observed reinforcement learning environment

CowHerd is a partially-observed reinforcement learning environment, where the player walks around an area and is rewarded for milking cows. The cows try to escape and the player can place fences to h

Danijar Hafner 6 Mar 06, 2022
Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning"

CAPGNN Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning" Paper URL: https://ar

1 Mar 12, 2022
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022
I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

Tobias S 18 Dec 05, 2022