open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Last update: Nov 02, 2022

Overview

Open-Information-Extraction-System

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

码源分析

基于LTP依存句法分析(DP, dependency parsing)的中文开放信息抽取系统(rule-based)。

增加并列关系、左附加关系、右附加关系等(递归实现);
这里的依存句法分析只适合简单短句，过长句子、口语化句子dp效果不好会很影响下游抽取。

结果展示(部分)

{
    "ques": "郑州是那个省的",
    "answer": [
        "河南"
    ],
    "desc": "郑州是河南省省会城市，周边有洛阳、开封、新郑、新密、许昌等城市",
    "SPO": [
        [
            "郑州",
            "是",
            "那个省"
        ]
    ]
},
{
    "ques": "格林童话《灰姑娘》中,灰姑娘参加舞会时所做的车是由哪种植物变成的?",
    "answer": [
        "南瓜"
    ],
    "desc": "这时，有一位仙女出现了，帮助她摇身一变成为高贵的千金小姐，并将老鼠变成马夫，南瓜变成马车，又变了一套漂亮的衣服和一双水晶（玻璃）鞋给灰姑娘穿上。",
    "SPO": [
        [
            "灰姑娘",
            "参加",
            "舞会"
        ],
        [
            "灰姑娘",
            "参加",
            "舞会"
        ],
        [
            "做车",
            "是",
            "变成"
        ]
    ]
 },
 {
    "ques": "中国农历的哪个节气有着北方吃饺子、南方吃汤圆的习俗?",
    "answer": [
        "冬至"
    ],
    "desc": "在冬至节，中国北方有冬至日吃饺子的习俗，南方某些地方有冬至日吃汤圆、粉糍粑的习俗，传说在汉朝的医圣张仲景体念家乡乡民在寒冬中工作的辛苦，在冬至那天利用羊肉等祛寒的药材包在面皮中，作成耳朵的样子，给乡民们治病补身，这个药方的名字...",
    "SPO": [
        [
            "中国农历哪个节气",
            "有着",
            "吃饺子习俗"
        ],
        [
            "北方",
            "吃",
            "饺子"
        ],
        [
            "南方",
            "吃",
            "汤圆"
        ]
    ]
}

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Related tags

Overview

Open-Information-Extraction-System

码源分析

结果展示(部分)

资源&依赖

Owner

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Search for documents in a domain through Google. The objective is to extract metadata

A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Script and models for clustering LAION-400m CLIP embeddings.

Comprehensive-E2E-TTS - PyTorch Implementation

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

LeBenchmark: a reproducible framework for assessing SSL from speech

Yes it's true :broken_heart:

Legal text retrieval for python

Creating an LSTM model to generate music

Adversarial Examples for Extreme Multilabel Text Classification

Module for automatic summarization of text documents and HTML pages.

Code-autocomplete, a code completion plugin for Python