monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

Overview

monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture

monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical algebra libraries.


monolish let developer forget about:

  • Performance tuning
  • Processor differences which execute library (Intel CPU, NVIDIA GPU, AMD CPU, ARM CPU, NEC SX-Aurora TSUBASA, etc.)
  • Vendor specific data transfer APIs (host RAM to Device RAM)
  • Finding bottlenecks and performance benchmarks
  • The argument data type of matrix/vector operations
  • Matrix structures and storage formats
  • Cumbersome package dependency

License

Copyright 2021 RICOS Co. Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    Running the following simple program

    #include <iostream>
    #include <monolish_blas.hpp>
    
    int main() {
      monolish::matrix::Dense<double>x(2, 3, 0.0, 10.0);
      x.print_all();
      return 0;
    }
    
    $ g++ -O3 main.cpp -o main.out -lmonolish_cpu
    

    will produce results like this.

    [email protected]:/$ ./main.out
    1 1 5.27196
    1 2 2.82358 <--
    1 3 2.13893 <--
    2 1 9.72054
    2 2 2.82358 <--
    2 3 2.13893 <--
    [email protected]:/$ ./main.out
    1 1 5.3061
    1 2 9.75236
    1 3 7.15652
    2 1 5.28961
    2 2 2.05967
    2 3 0.59838
    [email protected]:/$ ./main.out
    1 1 9.33149 <--
    1 2 4.75639 <--
    1 3 8.71093 <--
    2 1 9.33149 <--
    2 2 4.75639 <--
    2 3 8.71093 <--
    

    The arrows (<--) indicate that the number is repeating.

    This is probably due to that the pseudo-random number generator does not split well when it is parallelized by OpenMP.

    https://github.com/ricosjp/monolish/blob/1b89942e869b7d0acd2d82b4c47baeba2fbdf3e6/src/utils/dense_constructor.cpp#L120-L127

    This may happen not only with Dense, but also with random constructors of other data structures.

    I tested this on docker image ghcr.io/ricosjp/monolish/mkl:0.14.1.

    opened by lotz84 5
  • impl. transpose matvec, matmul

    impl. transpose matvec, matmul

    I want to give modern and intuitive transposition information. But I have no idea how to implement it easily.

    First, we create the following function as a prototype

    matmul(A,B,C) // C=AB
    matmul_TNN(A, B, C); // C=A^TB
    matvec(A,x,y); // y = Ax
    matvec_T(A, x, y); // y=A^Tx
    

    This interface is not beautiful. However, it has the following advantages

    • It does not affect other functions.
    • Easy to trace with logger
    • Simple to implement FFI in the future.
    • When beautiful ideas appear in the future, these functions can be implemented wrapping it.
    opened by t-hishinuma 2
  • try -fopenmp-cuda-mode flag

    try -fopenmp-cuda-mode flag

    memo:

    Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

    https://clang.llvm.org/docs/OpenMPSupport.html#basic-support-for-cuda-devices

    opened by t-hishinuma 2
  • Reserch the effect of the level information of the performance of cusparse ILU precondition

    Reserch the effect of the level information of the performance of cusparse ILU precondition

    The level information may not improve the performance but spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, CUSPARSE_SOLVE_POLICY_NO_LEVEL performs better than CUSPARSE_SOLVE_POLICY_USE_LEVEL. If the user has an iterative solver, the best approach is to do csrsv2_analysis() with CUSPARSE_SOLVE_POLICY_USE_LEVEL once. Then do csrsv2_solve() with CUSPARSE_SOLVE_POLICY_NO_LEVEL in the first run and with CUSPARSE_SOLVE_POLICY_USE_LEVEL in the second run, picking faster one to perform the remaining iterations.

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    opened by t-hishinuma 2
  •  ignoring return value in test

    ignoring return value in test

    matrix_transpose.cpp:60:3: warning: ignoring return value of 'monolish::matrix::COO<Float>& monolish::matrix::COO<Float>::transpose() [with Float = double]', declared with attribute nodiscard [-Wunused-result]
       60 |   A.transpose();
    
    opened by t-hishinuma 2
  • Automatic deploy at release

    Automatic deploy at release

    impl. in github actions

    • [x] generate Doxyben (need to chenge version name)
    • [x] generate deb file
    • [x] generate monolish docker

    need to get version number...

    opened by t-hishinuma 2
  • write how to install nvidia-docker

    write how to install nvidia-docker

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee > /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt update -y
    sudo apt install -y nvidia-docker2
    sudo systemctl restart docker
    
    opened by t-hishinuma 1
  • Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    What conflicts?

    libomp.so is contained in both package

    How to resolve?

    • (a) Use libomp5-12 distributed by ubuntu
    • (b) Create another package of libomp in allgebra stage (libomp-allgebra)
    opened by termoshtt 1
  • resolve curse of type name in src/

    resolve curse of type name in src/

    In src/, int and size_t are written. When change the class or function declarations in include/, I don't want to rewrite src/. Use auto or decltype() to remove them.

    opened by t-hishinuma 1
  • LLVM OpenMP Offloading can be installed by apt?

    LLVM OpenMP Offloading can be installed by apt?

    docker run -it --gpus all -v $PWD:/work nvidia/cuda:11.7.0-devel-ubuntu22.04
    ==
    apt update -y
    apt install -y git intel-mkl cmake ninja-build ccache clang clang-tools libomp-14-dev gcc gfortran
    git config --global --add safe.directory /work
    cd /work; make gpu
    

    pass??

    opened by t-hishinuma 0
  • cusparse IC / ILU functions is deprecated

    cusparse IC / ILU functions is deprecated

    but, sample code of cusparse is not updated

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    I dont like trial and error, so wait for the sample code to be updated.

    opened by t-hishinuma 0
Releases(0.17.0)
Owner
RICOS Co. Ltd.
株式会社科学計算総合研究所 / Research Institute for Computational Science Co. Ltd.
RICOS Co. Ltd.
This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

ARM This jupyter notebook project was completed by me and my friend using the dataset from Kaggle. The world Happiness 2017, which ranks 155 countries

1 Jan 23, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

SmartSim Example Zoo This repository contains CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning appl

Cray Labs 14 Mar 30, 2022
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

4.1k Jan 09, 2023
A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

Johann Faouzi 1.4k Jan 01, 2023
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
CVXPY is a Python-embedded modeling language for convex optimization problems.

CVXPY The CVXPY documentation is at cvxpy.org. We are building a CVXPY community on Discord. Join the conversation! For issues and long-form discussio

4.3k Jan 08, 2023
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

42 Dec 23, 2022
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality wit

Soledad Galli 33 Dec 27, 2022
A Python implementation of FastDTW

fastdtw Python implementation of FastDTW [1], which is an approximate Dynamic Time Warping (DTW) algorithm that provides optimal or near-optimal align

tanitter 651 Jan 04, 2023
GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021
QML: A Python Toolkit for Quantum Machine Learning

QML is a Python2/3-compatible toolkit for representation learning of properties of molecules and solids.

176 Dec 09, 2022
A GitHub action that suggests type annotations for Python using machine learning.

Typilus: Suggest Python Type Annotations A GitHub action that suggests type annotations for Python using machine learning. This action makes suggestio

40 Sep 18, 2022
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
About Solve CTF offline disconnection problem - based on python3's small crawler

About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug

天河 32 Oct 25, 2022
Traingenerator 🧙 A web app to generate template code for machine learning ✨

Traingenerator 🧙 A web app to generate template code for machine learning ✨ 🎉 Traingenerator is now live! 🎉

Johannes Rieke 1.2k Jan 07, 2023
A concept I came up which ditches the idea of "layers" in a neural network.

Dynet A concept I came up which ditches the idea of "layers" in a neural network. Install Copy Dynet.py to your project. Run the example Install matpl

Anik Patel 4 Dec 05, 2021
This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

Developer Junaid 3 Aug 04, 2022