Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Last update: Jan 05, 2023

Related tags

Overview

Open-CyKG

Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Model Description

Open-CyKG is a framework that is constructed using an attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threat information from unstructured Advanced Persistent Threat (APT) reports. More specifically, we first identify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) that aids in labeling relation triples generated by the OIE model. Afterwards, the extracted structured data is canonicalized to build the KG by employing fusion techniques using word embeddings.

Datasets

OIE dataset: Malware DB
NER dataset: Microsoft Security Bulletins (MSB) and Cyber Threat Intelligence reports (CTI)

For dataset files please refer to the appropiate refrence in the paper.

Code:

Dependencies

Compatible with Python 3.x
Dependencies can be installed as specified in Block 1 in the respective notebooks.
All the code was implemented on Google Colab using GPU. Please ensure that you are using the version as specified in the "Ïnstallion and Drives" block.
Make sure to adapt the code based on your dataset and choice of word embeddings.
To utlize CRF in NER model using Keras; plase make sure to:

-- Use tensorFlow version and Keras version:

-- In tensorflow_backend.py and Optimizer.py write down those 2 liness ---> then restart runtime
```
  ```
  import tensorflow.compat.v1 as tf
  tf.disable_v2_behavior()
  ```
```

For more details on the how the exact process was carried out and the final hyper-parameters used; please refer to Open-CyKG paper.

Citing:

Please cite Open-CyKG if you use any of this material in your work.

I. Sarhan and M. Spruit, Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph, Knowledge-Based Systems (2021), doi: https://doi.org/10.1016/j.knosys.2021.107524.

@article{SARHAN2021107524,
title = {Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph},
journal = {Knowledge-Based Systems},
volume = {233},
pages = {107524},
year = {2021},
issn = {0950-7051},
doi = {https://doi.org/10.1016/j.knosys.2021.107524},
url = {https://www.sciencedirect.com/science/article/pii/S0950705121007863},
author = {Injy Sarhan and Marco Spruit},
keywords = {Cyber Threat Intelligence, Knowledge Graph, Named Entity Recognition, Open Information Extraction, Attention network},
abstract = {Instant analysis of cybersecurity reports is a fundamental challenge for security experts as an immeasurable amount of cyber information is generated on a daily basis, which necessitates automated information extraction tools to facilitate querying and retrieval of data. Hence, we present Open-CyKG: an Open Cyber Threat Intelligence (CTI) Knowledge Graph (KG) framework that is constructed using an attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threat information from unstructured Advanced Persistent Threat (APT) reports. More specifically, we first identify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) that aids in labeling relation triples generated by the OIE model. Afterwards, the extracted structured data is canonicalized to build the KG by employing fusion techniques using word embeddings. As a result, security professionals can execute queries to retrieve valuable information from the Open-CyKG framework. Experimental results demonstrate that our proposed components that build up Open-CyKG outperform state-of-the-art models.11Our implementation of Open-CyKG is publicly available at https://github.com/IS5882/Open-CyKG.}
}

Implementation Refrences:

Contextualized word embediings: link to Flairs word embedding documentation, Hugging face link of all pretrained models https://huggingface.co/transformers/v2.3.0/pretrained_models.html
Functions in block 3&9 are originally refrenced from the work of Stanvosky et al. Please refer/cite his work, with exception of some modification in the functions Stanovsky, Gabriel, et al. "Supervised open information extraction." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
OIE implements Bahdanau attention (https://arxiv.org/pdf/1409.0473.pdf). Towards Data Science Blog
NER refrence blog
Knowledge Graph fusion motivated by the work of CESI Vashishth, Shikhar, Prince Jain, and Partha Talukdar. "Cesi: Canonicalizing open knowledge bases using embeddings and side information." Proceedings of the 2018 World Wide Web Conference. 2018..
Neo4J was used for Knowledge Graph visualization.

Please cite the appropriate reference(s) in your work

Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Related tags

Overview

Open-CyKG

Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Model Description

Datasets

Code:

Dependencies

Citing:

Implementation Refrences:

Owner

Injy Sarhan

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

I3-master-layout - Simple master and stack layout script

A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

This is an official implementation for "PlaneRecNet".

某学校选课系统GIF验证码数据集 + Baseline模型 + 上下游相关工具

Time should be taken seer-iously

This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

[ICML 2021] A fast algorithm for fitting robust decision trees.

Repository for training material for the 2022 SDSC HPC/CI User Training Course

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding