Hidden Markov Models in Python, with scikit-learn like API

Last update: Jan 03, 2023

Related tags

Data Analysis hmmlearn

Overview

hmmlearn

hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and similar models see seqlearn.

Note: This package is under limited-maintenance mode.

Important links

Official source code repo: https://github.com/hmmlearn/hmmlearn
HTML documentation (stable release): https://hmmlearn.readthedocs.org/en/stable
HTML documentation (development version): https://hmmlearn.readthedocs.org/en/latest

Dependencies

The required dependencies to use hmmlearn are

Python >= 3.5
NumPy >= 1.10
scikit-learn >= 0.16

You also need Matplotlib >= 1.1.1 to run the examples and pytest >= 2.6.0 to run the tests.

Installation

Requires a C compiler and Python headers.

To install from PyPI:

pip install --upgrade --user hmmlearn

To install from the repo:

pip install --user git+https://github.com/hmmlearn/hmmlearn

Comments

Memory error : HMM for MFCC feautres

I am trying to create audio vocabulary from MFCC features by applying HMM. Since I have 10 speakers in the MFCC features. I need 50 states per speaker. So I used N = 500 states and it throws Memory error, but it works fine with N =100 states.

Memory Error is because of computational in efficiency of a machine or due to in proper initialization?

Here is my code

import numpy as np
from hmmlearn import hmm
import librosa
import matplotlib.pyplot as plt

def getMFCC(episode):

    filename = getPathToGroundtruth(episode)

    y, sr = librosa.load(filename)  # Y gives 

    data = librosa.feature.mfcc(y=y, sr=sr)

    return data

def hmm_init(n,data):  #n = states d = no of feautures

    states =[]

    model = hmm.GaussianHMM(n_components=N, covariance_type="full")

    model.transmat_ = np.ones((N, N)) / N

    model.startprob_ = np.ones(N) / N

    fit = model.fit(data.T)

    z=fit.decode(data.T,algorithm='viterbi')[1]

    states.append(z)

    return states

data_m = getMFCC(1)  # Provides MFCC features of numpy array [20 X 56829]

N = 500

D= len(data)

states = hmm_init(N,data)

In [23]: run Final_hmm.py
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/home/elancheliyan/Final_hmm.py in <module>()
     73 D= len(data)
     74 
---> 75 states = hmm_init(N,data)
     76 states.dump("states")
     77 

/home/elancheliyan/Final_hmm.py in hmm_init(n, data)
     57     model.startprob_ = np.ones(N) / N
     58 
---> 59     fit = model.fit(data.T)
     60 
     61     z=fit.decode(data.T,algorithm='viterbi')[1]

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in fit(self, X, lengths)
    434                 self._accumulate_sufficient_statistics(
    435                     stats, X[i:j], framelogprob, posteriors, fwdlattice,
--> 436                     bwdlattice)
    437 
    438             # XXX must be before convergence check, because otherwise

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/hmm.py in _accumulate_sufficient_statistics(self, stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
    221                                           posteriors, fwdlattice, bwdlattice):
    222         super(GaussianHMM, self)._accumulate_sufficient_statistics(
--> 223             stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
    224 
    225         if 'm' in self.params or 'c' in self.params:

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in _accumulate_sufficient_statistics(self, stats, X, framelogprob, posteriors, fwdlattice, bwdlattice)
    620                 return
    621 
--> 622             lneta = np.zeros((n_samples - 1, n_components, n_components))
    623             _hmmc._compute_lneta(n_samples, n_components, fwdlattice,
    624                                  log_mask_zero(self.transmat_),

MemoryError:

opened by epratheeban 25

GMM -> GaussianMixture

In sklearn GMM was replaced by GaussianMixture. See https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/mixture/gmm.py:

class GMM(_GMMBase): """ Legacy Gaussian Mixture Model .. deprecated:: 0.18 This class will be removed in 0.20. Use :class:sklearn.mixture.GaussianMixture instead. """

However, hmmlearn still uses the old version. A pull request is needed to upgrade hmmlearn to work with the newer API.

opened by chanansh 24
Variational inference
@anntzer Leaving an early draft of incorporating Variational Inference training of HMMs so I may receive feedback before I keep going.

Some Notes:

I derive from BaseHMM, and am able to reuse most of it, with a few exceptions.

VariationalGaussianHMM is still incomplete - only Full Covariance is supported.

Tests are lacking.

Up Next:

Finish the different covariance types for Gaussian

Add Mixture of Gaussian Emissions
opened by blckmaxima 23
reduce memory consumption during GHMMHMM multi sequence fits

Hi, today I learned about your package, started to use it, faced the memory problem, and came up with a PR that fixes it.

I've exploited the lengths option and added another meaning to it. Currently, for the GMMHMM only. Curious users will find a way to extend my implementation to other models as well.

This also partially addresses the comment left in https://github.com/hmmlearn/hmmlearn/commit/08dee6640483cda232f7d2fcc7935d4008f4d368:

https://github.com/hmmlearn/hmmlearn/blob/0562ca65756ffb60da836eeeb1845e61767c705b/lib/hmmlearn/hmm.py#L918-L922

I got rid of the unnecessary 'centered' arrays in the stats dict. If you don't want to store the post_comp_mix matrices in the stats, the logic of computing intermediate variables - c_n and c_d for the covariance - should be moved from the _do_mstep to _accumulate_sufficient_statistics function. Since this is my first PR, I've decided not to rummage through your code a lot. In either case, this should be considered in a separate PR, if you will.

Best, Danylo

opened by dizcza 22

simple multinomial example

Hi there!

Using the latest master of hmmlearn, I tried running a simple MultinomialHMM example (code below) that results in the following error:

File "build/bdist.macosx-10.5-x86_64/egg/hmmlearn/base.py", line 307, in decode ValueError: could not broadcast input array from shape (6) into shape (1)

Could you please tell me what i am doing wrong? My expectation is that applying Viterbi should give me the most probable hidden sequence. However passing a list of observation doesn't work unlike passing a single value which does.

Thanks!

Vlad

from __future__ import division
import numpy as np
from hmmlearn import hmm

states = ["Rainy", "Sunny"]
n_states = len(states)

observations = ["walk", "shop", "clean"]
n_observations = len(observations)

start_probability = np.array([0.6, 0.4])

transition_probability = np.array([
  [0.7, 0.3],
  [0.4, 0.6]
])

emission_probability = np.array([
  [0.1, 0.4, 0.5],
  [0.6, 0.3, 0.1]
])

model = hmm.MultinomialHMM(n_components=n_states)
model.startprob=start_probability
model.transmat=transition_probability
model.emissionprob=emission_probability

# predict a sequence of hidden states based on visible states
bob_says = [0, 2, 1, 1, 2, 0]
model = model.fit(bob_says)
logprob, alice_hears = model.decode(bob_says, algorithm="viterbi")
print "Bob says:", ", ".join(map(lambda x: observations[x], bob_says))
print "Alice hears:", ", ".join(map(lambda x: states[x], alice_hears))

opened by ambushed 22

ImportError: cannot import name hmm

Hi,

I used the hmm module from sklearn and tried to replace it by the hmmlearn module. Unfortunately I could not import it to my notebook.

from hmmlearn import hmm --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-7-8b8c029fb053> in <module>() ----> 1 from hmmlearn import hmm

ImportError: cannot import name hmm

I tried first pip-3.3 install git+https://github.com/hmmlearn/hmmlearn.git

As this didn't work I cloned the project and run the setup.py (with python 3.3) but I still get an import error.

If I try to import

import hmmlearn.hmm

I get another error

ImportError Traceback (most recent call last) <ipython-input-8-8dbb2cfe75b2> in <module>() ----> 1 import hmmlearn.hmm

/home/ipython/python/lib/python3.3/site-packages/hmmlearn/hmm.py in <module>() 22 from sklearn import cluster 23 ---> 24 from .utils.fixes import log_multivariate_normal_density 25 26 from . import _hmmc

ImportError: No module named 'hmmlearn.utils'

What did I do wrong?

Cheers, Evelyn

opened by metterlein 22

gcc error when installing with pip install

I get hmmlearn/_hmmc.c:239:28: fatal error: numpy/npy_math.h: No such file or directory yet the installation seems to finish successfully.

requirements.txt file:

click==6.7
cython==0.25.2
joblib==0.11
numpy==1.12.1
pandas==0.19.2
python-speech-features==0.5
scikit-learn==0.18.1
scipy==0.19.0
hmmlearn==0.2.0

Running setup.py bdist_wheel for hmmlearn: started
  Running setup.py bdist_wheel for hmmlearn: finished with status 'error'
  Complete output from command /opt/conda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-8l6nu2n1/hmmlearn/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpi_45qjtvpip-wheel- --python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.6
  creating build/lib.linux-x86_64-3.6/hmmlearn
  copying hmmlearn/hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn
  copying hmmlearn/utils.py -> build/lib.linux-x86_64-3.6/hmmlearn
  copying hmmlearn/base.py -> build/lib.linux-x86_64-3.6/hmmlearn
  copying hmmlearn/__init__.py -> build/lib.linux-x86_64-3.6/hmmlearn
  creating build/lib.linux-x86_64-3.6/hmmlearn/tests
  copying hmmlearn/tests/test_utils.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
  copying hmmlearn/tests/test_gaussian_hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
  copying hmmlearn/tests/test_gmm_hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
  copying hmmlearn/tests/test_multinomial_hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
  copying hmmlearn/tests/test_base.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
  copying hmmlearn/tests/__init__.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
  running build_ext
  building 'hmmlearn._hmmc' extension
  creating build/temp.linux-x86_64-3.6
  creating build/temp.linux-x86_64-3.6/hmmlearn
  gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/include/python3.6m -c hmmlearn/_hmmc.c -o build/temp.linux-x86_64-3.6/hmmlearn/_hmmc.o -O3
  hmmlearn/_hmmc.c:239:28: fatal error: numpy/npy_math.h: No such file or directory
   #include "numpy/npy_math.h"
                              ^
  compilation terminated.
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for hmmlearn
  Running setup.py clean for hmmlearn
Successfully built python-speech-features
Failed to build hmmlearn
Installing collected packages: click, cython, joblib, numpy, pytz, python-dateutil, pandas, python-speech-features, scikit-learn, scipy, hmmlearn
  Running setup.py install for hmmlearn: started
    Running setup.py install for hmmlearn: finished with status 'done'
**Successfully installed** click-6.7 cython-0.25.2 **hmmlearn-0.2.0** joblib-0.11 numpy-1.12.1 pandas-0.19.2 python-dateutil-2.6.0 python-speech-features-0.5 pytz-2017.2 scikit-learn-0.18.1 scipy-0.19.0

needs-info

opened by chananshgong 21

probability would approach to 0 after several EM iterations

When I used GaussianHMM().fit() to train HMM, there is a RuntimeWarning: divide by zero encountered in log. Then I found that the start probability would approach to 0 after several EM iterations. My question is how to avoid probability approaching to 0 ?

opened by LinZzzzzzzzz 21
ImportError: DLL load failed: The specified module could not be found.

my OS is win7 x64 . visual studio 2015, also visual studio 2013, and python 3.5 x64(by anaconda) are set up. hmmlearn is set up successfully. and validated by the code: >>>import hmmlearn >>> hmmlearn.version and the output is '0.2.0' which is last version of hmmlearn. but, if i put the code like the following, >>>from hmmlearn import hmm i get the error as the following,

C:\Anaconda3_64\python.exe E:/pycharm/plot_hmm_stock_analysis/hmm_stock_analysis.py Traceback (most recent call last): File "E:/pycharm/plot_hmm_stock_analysis/hmm_stock_analysis.py", line 17, in from hmmlearn import hmm File "C:\Anaconda3_64\lib\site-packages\hmmlearn-0.2.0-py3.5-win-amd64.egg\hmmlearn\hmm.py", line 14, in from sklearn import cluster File "C:\Anaconda3_64\lib\site-packages\sklearn__init__.py", line 57, in from .base import clone File "C:\Anaconda3_64\lib\site-packages\sklearn\base.py", line 11, in from .utils.fixes import signature File "C:\Anaconda3_64\lib\site-packages\sklearn\utils__init__.py", line 11, in from .validation import (as_float_array, File "C:\Anaconda3_64\lib\site-packages\sklearn\utils\validation.py", line 16, in from ..utils.fixes import signature File "C:\Anaconda3_64\lib\site-packages\sklearn\utils\fixes.py", line 324, in from scipy.sparse.linalg import lsqr as sparse_lsqr File "C:\Anaconda3_64\lib\site-packages\scipy\sparse\linalg__init__.py", line 109, in from .isolve import * File "C:\Anaconda3_64\lib\site-packages\scipy\sparse\linalg\isolve__init__.py", line 6, in from .iterative import * File "C:\Anaconda3_64\lib\site-packages\scipy\sparse\linalg\isolve\iterative.py", line 7, in from . import _iterative ImportError: DLL load failed: The specified module could not be found.

why? and how to fix it!?

by the way, if in cmd, using "pip freeze" commond, it shows hmmlearn and the version of it is 0.2.0. BUT, if using "conda list", no hmmlearn shows!!

opened by genliu777 18
GMMHMM models training not converging (?)
Hi all, I am having a problem when trying to fit multiple GMMHMM models to solve a classification problem of emotions recognition from speech samples. Basically, the models often don't converge: even if the monitor reports 'True' if printed, I can see in the history that the likelihood is not strictly increasing. Actually, it decreases at some point and the training procedure stops.

Here, I report only the procedure for training one of the models (I should have seven, each one trained with a different training set). The data loaded are attached: data_training.npy.zip

from hmmlearn import hmm import numpy as np data = np.load('data_training.npy', allow_pickle=True) hmm = hmm.GMMHMM(n_components=2, n_mix=2,n_iter=1000, covariance_type="diag", verbose=True ) X_sequence_concat = np.concatenate(data) lengths = [] for el in data: lengths.append(len(el)) hmm.fit(X_sequence_concat, np.array(lengths)) print("Is the HMM training converged? " + str(hmm.monitor_.converged))

In my actual implementation I have to do this for seven different models and sometimes I get this problem and sometimes I don't, as you can see from the results reported below:

Can you please help me? I'm really struggling with this and I can't find a possible cause of the problem.

Thanks in advance!
bug
opened by giorgiolbt 16
[ENH, MRG] Add PoissonHMM

Adds a PoissonHMM with an example.

I think this is somewhat close so if you have time, a review would be great @anntzer @blckmaxima.

I'm not sure if there is a standard we could compare to like the Wikipedia one for the MultiNomialHMM or if that's necessary.

opened by alexrockhill 15
Allow to modify kmeans default params at model creation
I am requesting a new feature

Everywhere in the code, Kmeans clusters init uses Kmeans default params (except n_clusters) :

n_init=10, max_iter=300, tol=1e-4, verbose=0, random_state=None, copy_x=True, algorithm="lloyd"

...

main_kmeans = cluster.KMeans(n_clusters=nc, random_state=self.random_state) or kmeans = cluster.KMeans(n_clusters=self.n_components, random_state=self.random_state)

I got great improvments in my particular case (lot of very very noisy datasets) by modifying kmeans cluster initialization

kmeans = cluster.KMeans(n_clusters=self.n_components,n_init=100, random_state=self.random_state, tol=1e-6)

so will be great to allow to pass kmeans parameters when instanciating the model.

for instance: hmm = hmm.GaussianHMM(n_components, ..., kmeans_params={'n_init': xxx, 'max_iter': yyy, 'tol':zzz})

the n_init params for kmeans++ is quite important in some cases.

thx
opened by tlunati 1
Add Method to get n_params and AIC/BIC for GaussianHMM
Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

Adds the methods _n_parameters, bic, and aic to the GaussianHMM class.

Essentially I copied the implementation from sklearn implementation for gaussian mixture model

Any other comments?

I haven't fully implemented the methods for the other classes, e.g. GMMHMM etc. but it's in the works.
opened by richy1996 3

TypeError

# make our generative model with two components, a fair die and a
# loaded die
gen_model = hmm.MultinomialHMM(n_components=2, random_state=99)

# the first state is the fair die so let's start there so no one
# catches on right away
gen_model.startprob_ = np.array([1.0, 0.0])

# now let's say that we sneak the loaded die in:
# here, we have a 95% chance to continue using the fair die and a 5%
# chance to switch to the loaded die
# when we enter the loaded die state, we have a 90% chance of staying
# in that state and a 10% chance of leaving
gen_model.transmat_ = np.array([[0.95, 0.05],
                                [0.1, 0.9]])

# now let's set the emission means:
# the first state is a fair die with equal probabilities and the
# second is loaded by being biased toward rolling a six
gen_model.emissionprob_ = \
    np.array([[1 / 6, 1 / 6, 1 / 6, 1 / 6, 1 / 6, 1 / 6],
              [1 / 10, 1 / 10, 1 / 10, 1 / 10, 1 / 10, 1 / 2]])

# simulate the loaded dice rolls
rolls, gen_states = gen_model.sample(30000)

# plot states over time, let's just look at the first rolls for clarity
fig, ax = plt.subplots()
ax.plot(gen_states[:500])
ax.set_title('States over time')
ax.set_xlabel('Time (# of rolls)')
ax.set_ylabel('State')
fig.show()

# plot rolls for the fair and loaded states
fig, ax = plt.subplots()
ax.hist(rolls[gen_states == 0], label='fair', alpha=0.5,
        bins=np.arange(7) - 0.5, density=True)
ax.hist(rolls[gen_states == 1], label='loaded', alpha=0.5,
        bins=np.arange(7) - 0.5, density=True)
ax.set_title('Roll probabilities by state')
ax.set_xlabel('Count')
ax.set_ylabel('Roll')
ax.legend()
fig.show()

MultinomialHMM has undergone major changes. The previous version was implementing CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution, e.g. as in https://en.wikipedia.org/wiki/Multinomial_distributionSee these issues for details: https://github.com/hmmlearn/hmmlearn/issues/335 https://github.com/hmmlearn/hmmlearn/issues/340

TypeError                                 Traceback (most recent call last)
[<ipython-input-9-4c6e1e68a7c1>](https://localhost:8080/#) in <module>()
     22 
     23 # simulate the loaded dice rolls
---> 24 rolls, gen_states = gen_model.sample(30000)
     25 
     26 # plot states over time, let's just look at the first rolls for clarity

3 frames
[/root/.local/lib/python3.7/site-packages/hmmlearn/base.py](https://localhost:8080/#) in sample(self, n_samples, random_state, currstate)
    461         state_sequence = [currstate]
    462         X = [self._generate_sample_from_state(
--> 463             currstate, random_state=random_state)]
    464 
    465         for t in range(n_samples - 1):

[/root/.local/lib/python3.7/site-packages/hmmlearn/hmm.py](https://localhost:8080/#) in _generate_sample_from_state(self, state, random_state)
    481         sample = multinomial.rvs(
    482             n=self.n_trials, p=self.emissionprob_[state, :],
--> 483             size=1, random_state=self.random_state)
    484         return sample.squeeze(0)  # shape (1, nf) -> (nf,)
    485 

[/usr/local/lib/python3.7/dist-packages/scipy/stats/_multivariate.py](https://localhost:8080/#) in rvs(self, n, p, size, random_state)
   3216         %(_doc_callparams_note)s
   3217         """
-> 3218         n, p, npcond = self._process_parameters(n, p)
   3219         random_state = self._get_random_state(random_state)
   3220         return random_state.multinomial(n, p, size)

[/usr/local/lib/python3.7/dist-packages/scipy/stats/_multivariate.py](https://localhost:8080/#) in _process_parameters(self, n, p)
   3016         pcond |= np.any(p > 1, axis=-1)
   3017 
-> 3018         n = np.array(n, dtype=np.int, copy=True)
   3019 
   3020         # true for bad n

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

opened by skr3178 4

Floating Pointer Errors to clear during calls to log()
I am requesting a new feature

This is a feature request to track the question: When computing log of probabilities in _src/hmmc.cpp, what floating point errors do we want to clear for?

History:

In lib/hmmlearn/utils.py:log_mask_zero, we ignore errors related to divided by zero.

A pull request to move this to C++ copied this behavior.

However, we may want to copy this code from numpy and clear more errors.
opened by blckmaxima 0

Releases(0.2.5)

Owner

GitHub Repository http://hmmlearn.readthedocs.org

.npy, .npz, .mtx converter.

npy-converter Matrix Data Converter. Expand matrix for multi-thread, multi-process Divid matrix for multi-thread, multi-process Support: .mtx, .npy, .

1 Feb 07, 2022

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Jan 09, 2023

ETL flow framework based on Yaml configs in Python

ETL framework based on Yaml configs in Python A light framework for creating data streams. Setting up streams through configuration in the Yaml file.

18 Jul 06, 2022

Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

2 Nov 18, 2022

Multiple Pairwise Comparisons (Post Hoc) Tests in Python

scikit-posthocs is a Python package that provides post hoc tests for pairwise multiple comparisons that are usually performed in statistical data anal

264 Dec 30, 2022

Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

4 Aug 06, 2022

pyhsmm MITpyhsmm - Bayesian inference in HSMMs and HMMs. MIT

Bayesian inference in HSMMs and HMMs This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and expli

527 Dec 04, 2022

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

7 Nov 20, 2022

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

302 Dec 29, 2022

Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

3 Dec 07, 2022

COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

4 Jul 10, 2022

Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

7 Aug 07, 2022

Spaghetti: an open-source Python library for the analysis of network-based spatial data

pysal/spaghetti SPAtial GrapHs: nETworks, Topology, & Inference Spaghetti is an open-source Python library for the analysis of network-based spatial d

203 Jan 03, 2023

Python tools for querying and manipulating BIDS datasets.

PyBIDS is a Python library to centralize interactions with datasets conforming BIDS (Brain Imaging Data Structure) format.

180 Dec 18, 2022

A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

2 Dec 12, 2021

TheMachineScraper 🐱‍👤 is an Information Grabber built for Machine Analysis

TheMachineScraper 🐱‍👤 is a tool made purely for analysing machine data for any reason.

5 Dec 01, 2022

yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia

367 Dec 25, 2022

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Description Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Ti

4.1k Jan 09, 2023

ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

75 Dec 06, 2022

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022