A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Last update: Nov 17, 2022

Overview

Xcessiv

Xcessiv is a tool to help you create the biggest, craziest, and most excessive stacked ensembles you can think of.

Stacked ensembles are simple in theory. You combine the predictions of smaller models and feed those into another model. However, in practice, implementing them can be a major headache.

Xcessiv holds your hand through all the implementation details of creating and optimizing stacked ensembles so you're free to fully define only the things you care about.

The Xcessiv process

Define your base learners and performance metrics

Keep track of hundreds of different model-hyperparameter combinations

Effortlessly choose your base learners and create an ensemble with the click of a button

Features

Fully define your data source, cross-validation process, relevant metrics, and base learners with Python code
Any model following the Scikit-learn API can be used as a base learner
Task queue based architecture lets you take full advantage of multiple cores and embarrassingly parallel hyperparameter searches
Direct integration with TPOT for automated pipeline construction
Automated hyperparameter search through Bayesian optimization
Easy management and comparison of hundreds of different model-hyperparameter combinations
Automatic saving of generated secondary meta-features
Stacked ensemble creation in a few clicks
Automated ensemble construction through greedy forward model selection
Export your stacked ensemble as a standalone Python file to support multiple levels of stacking

Installation and Documentation

You can find installation instructions and detailed documentation hosted here.

FAQ

Where does Xcessiv fit in the machine learning process?

Xcessiv fits in the model building part of the process after data preparation and feature engineering. At this point, there is no universally acknowledged way of determining which algorithm will work best for a particular dataset (see No Free Lunch Theorem), and while heuristic optimization methods do exist, things often break down into trial and error as you try to find the best model-hyperparameter combinations.

Stacking is an almost surefire method to improve performance beyond that of any single model, however, the complexity of proper implementation often makes it impractical to apply them in practice outside of Kaggle competitions. Xcessiv aims to make the construction of stacked ensembles as painless as possible and lower the barrier for entry.

I don't care about fancy stacked ensembles and what not, should I still use Xcessiv?

Absolutely! Even without the ensembling functionality, the sheer amount of utility provided by keeping track of the performance of hundreds, and even thousands of ML models and hyperparameter combinations is a huge boon.

How does Xcessiv generate meta-features for stacking?

You can choose whether to generate meta-features through cross-validation (stacked generalization) or with a holdout set (blending). You can read about these two methods and a lot more about stacked ensembles in the Kaggle Ensembling Guide. It's a great article and provides most of the inspiration for this project.

Contributing

Xcessiv is in its very early stages and needs the open-source community to guide it along.

There are many ways to contribute to Xcessiv. You could report a bug, suggest a feature, submit a pull request, improve documentation, and many more.

If you would like to contribute something, please visit our Contributor Guidelines.

Project Status

Xcessiv is currently in alpha and is unstable. Future versions are not guaranteed to be backwards-compatible with current project files.

Comments

Can't Use

Sorry for what is no doubt a stupid question:

I've started Redis via redis-server. It says it's running on port 6379. Then I run xcessiv, but it takes me to a page that's not found. The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.. Any idea what I can do? I'm really eager to use Xcessiv.

opened by xnmp 6
Automated ensembling techniques

Working for a while with Xcessiv, I feel there's a need for some way to automate the selection of base learners in an ensemble. I'm unaware of existing techniques for this, so if anyone has any suggestions or could point me towards relevant literature, it would be greatly appreciated.
enhancement

opened by reiinakano 5
Added more of the sklearn regressors to the presets

Added the large majority of the more popular regressors of sklearn. I am aware that a few may be missing. Also, I tidied the code slightly and split the regressors and classifiers into two sections.

opened by enisnazif 4
Memory management

First of all, thanks! I find this project fascinating. My question/issue is about how do you handle the memory for multiple processes. By default Python will create a copy of the data per process. This is prohibitive for large datasets.

How did you manage this problem?

opened by alvarouc 3
Move .gitignore to project root and add Python ignores

I think the best practice for .gitignore is to have a single .gitignore file at the root of the project so I moved the .gitignore that was in xcessiv/ui (I think it was generated by create-react-scripts) to the project root and added some Python ignore lines.

opened by menglewis 3
Added Leave One Out Crossvalidation to cvsetting.py

Added Leave One Out Cross validation as part of #15

I'm keen to finish implementing all of the cv / metrics within sklearn, just wanted to make sure I was doing it right since this is my first pull request!

opened by enisnazif 2
XGBRegressor model stuck in queued status

I tried to make a regression model to run on zillow data from kaggle available here https://www.kaggle.com/c/zillow-prize-1/data Here is a gist of my dataset extractor as well as setting up the XGBRegressor and an exception that was the last thing left in the console https://gist.github.com/jef5ez/a9b0650293f343682a58b0f0500f3332 I selected the shuffle split for both cross validation settings and added MSE as the learner metric. The base learner seems to verify fine on the boston housing data. After hitting finalize and selecting a single base learner a row shows up below but is stuck in the Queued status.

python 3.5.2 xcessiv (0.2.2) xgboost (0.6a2)

opened by jef5ez 2
The _BasePipeline in exported Python script should be _BaseComposition

Since scitkit-learn 0.19.x, the base class for Pipeline has changed to _BaseComposition. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/pipeline.py When using the generated code for training, it raises a name-not-found error on newer versions of sklearn. At the moment, an easy workaround is to change two instances of the word manually in the generated script.

opened by Mithrillion 1

Issues with TfidnVectorizer

Hey, great tool.

I have a problem though when I am trying to use a TfidfVectorizer for Text Classification. When I create a Single Base Learner I get the error:

ValueError: all the input array dimensions except for the concatenation axis must match exactly .

The type of the X variable is an numpy.ndarray, but if I don't convert the variable X to an array then I get the error message:

TypeError: Singleton array array(<92820x194 sparse matrix of type '<class 'numpy.float64'>' with 92820 stored elements in Compressed Sparse Row format>, dtype=object) cannot be considered a valid collection.

I choose the preset learner setting scikit-learn Random Forest as a Base Learner Type.

import os
import numpy as np
import pandas as pd
import pickle
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier

def extract_main_dataset():
    # pandas data frame with the columns Classification, FeatureVector
    # ie:
    # 0, 'This is the feature vector'
    # 1, 'This is another feature vector' 
    # 2, 'This is yet another feature vector' 
    # 1, 'This is the last feature vector example' 
    with open('feature_vector.pik', 'rb') as rf:
        feature_vector = pickle.load(rf)

    y = np.array(feature_vector.Classification.values)
    title_rf_vectorizer = TfidfVectorizer(ngram_range=(2, 9),
                                          sublinear_tf=True,
                                          use_idf=True,
                                          strip_accents='ascii')

    title_rf_classifier = RandomForestClassifier(n_estimators=100, n_jobs=8)
    X = title_rf_vectorizer.fit_transform(feature_vector["Classification"]).toarray()
    return X, y

opened by bbowler86 1

Valid values for metric to optimise in bayesian optimisation?
Is there a list of valid metric_to_optimise for Bayesian Optimisation?

I am using sklearn mean_squared_regression for my base learning but when I enter that into the Bayesian Optimisation menu under metric_to_optimise I get:

assert module.metric_to_optimize in automated_run.base_learner_origin.metric_generators AssertionError
question
opened by Data-drone 1
'dict_keys' object does not support indexing

On lines 306 and 309 of views.py, trying to index a dictionary keys object will fail on Python 3 and result in a server error. The fix is simple: change all occurrences of

base_learner_origin.validation_results.keys()[0]

to

list(base_learner_origin.validation_results.keys())[0]

opened by KhaledSharif 1
redis.exceptions.DataError at xcessiv launch

Hello, When I try to launch xcessiv I get an error:

Traceback (most recent call last): File "/PATH_TO/anaconda3/bin/xcessiv", line 10, in <module> sys.exit(main()) File "/PATH_TO/anaconda3/lib/python3.7/site-packages/xcessiv/scripts/runapp.py", line 51, in main redis_conn.get(None) # will throw exception if Redis is unavailable File "/PATH_TO/anaconda3/lib/python3.7/site-packages/redis/client.py", line 1264, in get return self.execute_command('GET', name) File "/PATH_TO/anaconda3/lib/python3.7/site-packages/redis/client.py", line 774, in execute_command connection.send_command(*args) File "/PATH_TO/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 620, in send_command self.send_packed_command(self.pack_command(*args)) File "/PATH_TO/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 663, in pack_command for arg in imap(self.encoder.encode, args): File "/PATH_TO/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 125, in encode "byte, string or number first." % typename) redis.exceptions.DataError: Invalid input of type: 'NoneType'. Convert to a byte, string or number first.

Previously I had to change from gevent.wsgi import WSGIServer to from gevent.pywsgi import WSGIServer as indicated in this issue

My server is responding when I do redis-cli ping

I am on Ubuntu 18.04, with python 3.7.3 and redis 5.0.5

Do you have an idea to fix this? Thanks!

opened by AlexCoul 0
How to import homemade modules in Xcessiv?

I'm trying to import homemade module named preprocessing_115v (filename preprocessing_115v.py) into the main data extraction source code but I can't seem to find it :

############# import preprocessing_115v <-- where do I store the preprocessing_115v.py file for it to load here? def extract_main_dataset(): import pandas as pd df=pd.read_csv('./data.csv', sep=',',header=None) X=df.values labels=pd.read_csv('./labelsnum.csv', sep=',',header=None) y=labels.values y=y[:,0] return X, y ##############

Amazing program by the way :-)

opened by fcoppey 0
xcessiv server
Hi This project looks very cool, but I am having some problems with the setup. I am running this in a container (my own), and I can't get the server to show up. From inside the container I can see the server running - ps shows xcessiv running and curl localhost:1994 gives me some HTML from xcessiv. From outside the container, however, there's nothing.

I suppose that's down to the server.py file which I have now changed to this:

1 from __future__ import absolute_import, print_function, division, unicode_literals¬ 2 from gevent.wsgi import WSGIServer¬ 3 # import webbrowser¬ 4 ¬ 5 ¬ 6 def launch(app):¬ 7 http_server = WSGIServer(('0.0.0.0', app.config['XCESSIV_PORT']), app)¬ 8 # webbrowser.open_new('http://localhost:' + str(app.config['XCESSIV_PORT']))¬ 9 http_server.serve_forever()¬

I have changed the WSGIServer setup to be open to outside connection (I suppose that's what I changed), but it's still not showing up.

Feedback appreciated. I'd like to try this out. Thanks!
opened by benman1 0
Feature Request - Backup .db file

I got an error something to the effect of "Error with JSON "N" at position 8345", presumably caused by my manually editing the code for one of the base learners. Once I got this error however, none of the base learners in my project would load. I resolved it by manually deleting the base learner I had been editing from the .db file. I'll post the specifics if I can recreate it, but I'm wondering if it might be prudent to have some kind of db backup/"Last Known Good Configuration"?

opened by Tahlor 0
Fix issue #63 no module named wsgi
In file server.py

from gevent.wsgi import WSGIServer

Has to be changed to:

from gevent.pywsgi import WSGIServer

http://www.gevent.org/api/gevent.pywsgi.html
opened by KhaledTo 0
ImportError: No module named wsgi

File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xcessiv/server.py", line 2, in from gevent.wsgi import WSGIServer ImportError: No module named wsgi

opened by xialeizhou 2

Releases(v0.5.1)

v0.5.1(Aug 21, 2017)
Bugfix:

#51 fixes #50, a bug resulting from changes in scikit-learn v0.19

New features:

Docker-compose file for easier docker deployment by @marcelmaatkamp

Source code(tar.gz)
Source code(zip)
v0.5.0(Jun 23, 2017)
Features

#43 Automated ensembling is finally here! Greedy forward model selection introduced

Warning: Project files from version before v0.5.0 will not work with v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jun 15, 2017)
Features

#35 by @menglewis

#37 Added TPOT integration

Warning: v0.4 Project files are not compatible with earlier versions
Source code(tar.gz)
Source code(zip)
v0.3.8(Jun 7, 2017)
Features

#29 #30 Removed append original checkbox and added Identity Transformer preset base learner instead

Source code(tar.gz)
Source code(zip)
v0.3.7(Jun 7, 2017)
Features

#27 More estimators (regressors) by @enisnazif

#28 "Export ensemble as Python package" changed to "Export ensemble as Python file". Also, an additional shortcut for directly exporting a stacked ensemble as a base learner setup. Awesome!

Source code(tar.gz)
Source code(zip)
v0.3.6(Jun 6, 2017)
Features

#23 More preset cross-validators by @ryanliwag

#24 Added preset metrics median absolute error, R2 score, and explained variance score

#26 Added functionality that stores previous parameter searches. More of a user experience fix.

Source code(tar.gz)
Source code(zip)
v0.3.5(Jun 4, 2017)
New Feature:

#22 Added ability to export a stacked ensemble as a Python package so you can use it on different data.

#19 by @jef5ez adds Mean Absolute Error as preset metric

Docs:

#22 Added docs for using exported stacked ensemble Python package

#20 Added docs for using TPOT with Xcessiv

Source code(tar.gz)
Source code(zip)
v0.3.4(Jun 2, 2017)

Hotfix for 0.3.0. Ended up at 0.3.4 because Pypi problems..
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 2, 2017)
Major feature addition

#18 - Added an experimental bayesian optimization search beside grid search and random search to allow a bit of automation for hyperparameter tuning.

Source code(tar.gz)
Source code(zip)
v0.2.5(May 31, 2017)

Hotfix for Python 3 users.
Source code(tar.gz)
Source code(zip)
v0.2.3(May 30, 2017)
#16 Added a few new preset learners and metrics

Source code(tar.gz)
Source code(zip)
v0.2.2(May 29, 2017)

Hotfix for setuptools.
Source code(tar.gz)
Source code(zip)
v0.2.1(May 29, 2017)
Features

Dockerfile added

Startup script now raises explicit error for Windows OS.

Documentation

Documentation updated for Dockerfile

Source code(tar.gz)
Source code(zip)
v0.2.0(May 28, 2017)
Added features

#13

Breaking Change

Project folders created with Xcessiv<0.2.0 will not work on Xcessiv>=0.2.0. This is due to #13

Source code(tar.gz)
Source code(zip)
v0.1.6(May 28, 2017)
#12

Brand new and more flexible way of defining cross-validation and meta-feature generation method

Source code(tar.gz)
Source code(zip)
v0.1.5(May 26, 2017)
Fix UI, when deleting parent component e.g. base learner origin, make sure children components are refreshed (base learners and stacked ensembles

Major feature change: #11

Source code(tar.gz)
Source code(zip)
v0.1.4(May 25, 2017)
#9

#10

Source code(tar.gz)
Source code(zip)
v0.1.3(May 24, 2017)
#8

Source code(tar.gz)
Source code(zip)
v0.1.2(May 24, 2017)
Hotfix for #6 pointed out by @KhaledSharif

Source code(tar.gz)
Source code(zip)
v0.1.1(May 23, 2017)

Source code(tar.gz)
Source code(zip)

Owner

Reiichiro Nakano

I like working on awesome things with awesome people!

GitHub Repository http://xcessiv.readthedocs.io

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Related tags

Overview

Xcessiv

Xcessiv is a tool to help you create the biggest, craziest, and most excessive stacked ensembles you can think of.

The Xcessiv process

Define your base learners and performance metrics

Keep track of hundreds of different model-hyperparameter combinations

Effortlessly choose your base learners and create an ensemble with the click of a button

Features

Installation and Documentation

FAQ

Where does Xcessiv fit in the machine learning process?

I don't care about fancy stacked ensembles and what not, should I still use Xcessiv?

How does Xcessiv generate meta-features for stacking?

Contributing

Project Status

Comments

Releases(v0.5.1)

v0.5.1(Aug 21, 2017)

v0.5.0(Jun 23, 2017)

v0.4.0(Jun 15, 2017)

v0.3.8(Jun 7, 2017)

v0.3.7(Jun 7, 2017)

v0.3.6(Jun 6, 2017)

v0.3.5(Jun 4, 2017)

v0.3.4(Jun 2, 2017)

v0.3.0(Jun 2, 2017)

v0.2.5(May 31, 2017)

v0.2.3(May 30, 2017)

v0.2.2(May 29, 2017)

v0.2.1(May 29, 2017)

v0.2.0(May 28, 2017)

v0.1.6(May 28, 2017)

v0.1.5(May 26, 2017)

v0.1.4(May 25, 2017)

v0.1.3(May 24, 2017)

v0.1.2(May 24, 2017)

v0.1.1(May 23, 2017)

Owner

Reiichiro Nakano

The `rtdl` library + The official implementation of the paper

Tutorials and implementations for "Self-normalizing networks"

A toolkit for document-level event extraction, containing some SOTA model implementations

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

exponential adaptive pooling for PyTorch

[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

J.A.R.V.I.S is an AI virtual assistant made in python.

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

This repo contains implementation of different architectures for emotion recognition in conversations.

Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Self-supervised Deep LiDAR Odometry for Robotic Applications

Sample code from the Neural Networks from Scratch book.

Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

Sub-tomogram-Detection - Deep learning based model for Cyro ET Sub-tomogram-Detection

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

Differentiable scientific computing library

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"