A set of functions and analysis classes for solvation structure analysis

Last update: Nov 24, 2022

Related tags

Overview

SolvationAnalysis

The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes, the solvation environment surrounding ions is especially important. By studying the solvation of interesting materials, scientists can better understand, engineer, and design new technologies. The aim of this project is to implement a robust and cohesive set of methods for solvation analysis that would be widely useful in both biomolecular and battery electrolyte simulations. The core of the solvation module will be a set of functions for easily working with ionic solvation shells. Building from that core functionality, the module will implement several analysis methods for analyzing ion pairing, ion speciation, residence times, and shell association and dissociation.

Main development by @orioncohen, with mentorship from @richardjgowers, @IAlibay, and @hmacdope.

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.5.

Comments

analysis functions
Description

This PR will establish the analysis functions for the solvation module. Linked to issues #11, #21, and #28.

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] create ion speciation class

[x] create coordination number class

[x] create solvation god class

[x] update documentation

[x] finalize data structure

[x] complete implementation

[x] finalize in-line documentation

Status

[x] Ready to go

core high-priority
opened by orionarcher 34
Parse RDF's and find minima
Description

Provide a brief description of the PR's purpose here.

Todos

[x] add test data

[x] functionality to interpolate rdfs

[x] functionality to find minima

[x] refine minima finding

[x] solidify unit testing

Status

[x] Ready to go

core testing high-priority
opened by orionarcher 20
Basic testing with Pytest
Description

Set up basic testing for current implemented functions.

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] get_radial_shell testing

[x] get_closest_n_mol testing

[x] slightly rework get_closest_n_mol signature

Status

[x] Ready to go

core testing
opened by orionarcher 17
tutorials
Description

Create a convenient tutorial for users to learn solvation_analysis

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] jupyter tutorial for users

[x] set up jupyter edit requests

Status

[x] Ready to go

documentation high-priority
opened by orionarcher 12
Decide hierarchy of analysis classes/methods

Some of our proposed functionality follows on directly from other functionality and some does not.

This means we need to discuss how to compartmentalise functionality into classes and methods many of which will subclass analysis base.

I thought I would make this issue as an open discussion area for how we want to compartmentalise /hierarchicalize our functionality. I think the best way to proceed is if @orioncohen can lay out how he sees it and then @richardjgowers @IAlibay and any @MDAnalysis/gsoc-mentors can weigh in. We may also need to meet to discuss this. :)
question core high-priority

opened by hmacdope 11
add visualization tutorial
Description

Since nglview is causing issues and I don't want to get bogged down before the release, I wanted to put this in a separate thread.

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] tutorial introduction

[x] fix nglview issues

[x] finish tutorial

Questions

[x] Why is nglview showing too many bonds on molecules?

Status

[x] Ready to go

bug documentation enhancement
opened by orionarcher 10
Initial functions for solvation selection + documentation + testing
Description

Implements basic functionality for solvation selection. Currently implements get_atom_group, get_n_shells, get_closest_n_mol, and get_radial_shell. Addresses issue #5

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] implement basic functionality for solvation selection

[x] documentation for functions

[x] unit tests for basic functionality

Questions

[x] what doc style should be used?

Status

[x] ready to go
opened by orionarcher 10
Support for multi atom solutes
Description

This PR was split off from PR #70 to better separate quality-of-life changes from API changes. See that PR for some history and discussion.

This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #66, and #58.

I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.

Solution will be renamed to Solute. All references to solute in the current documentation will be renamed to solvated atom or solvated atoms. I think this better captures what the Solute class really is, especially as we expand to multi-atom Solutes.

The default initializer for Solute will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same.

water = u.select_atoms(...) water_O = u.select_atoms(...) water_O_solute = Solute(water_O, {"water": water})

Additional initializers will be added to instantiate a Solute, these will support multi-atom solutes.

The first will allow the user to stitch together multiple solutes to create a new solute.

water_H1 = u.select_atoms(...) water_H2 = u.select_atoms(...) solute_O = Solute(water_O, {"water": water}) solute_H1 = Solute(water_H1, {"water": water}) solute_H2 = Solute(water_H2, {"water": water}) multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2]) # maybe this should be a dict?

The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.

multi_atom_solute = Solute.from_residue(water, {"water": water})

To support this, the solvation_data dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.

When a multi-atom solute is created all of the solvation_data dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.

multi_atom_solute.coordination_number["water"] # valid property

We will retain all of the single atom Solutes as a property of the multi-atom Solute. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.

>>> print(water.atoms) # what should this be called? >>> [solute_O, solute_H1, solute_H2] # maybe this should be a dict?

For a single atom solute the atoms list would still be present but the data within would be identical to the solvation_data of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.

water_O_solute.atoms[0].solvation_data = water_O_solute.solvation_data

I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] make solutions composable so that multi-atom solutes can be constructed systematically

[x] put guardrails in place to prevent misuse

[ ] bare minimum rewrite of the documentation

Status

[ ] Ready to go

enhancement core
opened by orionarcher 8
Residence times and solute-solvent network calculations
Description

This PR adds an analysis module to calculate residence times and an analysis module to calculate solute-solvent networks.

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] Residence module

[x] Change exponential fit in residence time calculations to 1/e decay point

[x] Networking module

[x] Residence testing

[x] Networking testing

[x] Residence documentation

[x] Networking documentation

[x] Add diluent_composition analysis to Pairing class

[x] Add module selection kwarg to Solution

[x] Add from_solution class method to all analysis classes

[x] Residence and Networking tutorial

[x] add citations for Networking and Residence codes

[x] improve documentation for Residence time

[x] specify caveats and difference between both implementations

Status

The PR is nearly finished, the main outstanding issues are improved documentation, citations and tutorials.
enhancement
opened by orionarcher 8
Formalize Roadmap in Projects tab

We should decide on a model for a release schedule, Do we wish to match the MDA core idea and do quarterly (@IAlibay is that right?) releases? Or do we just want to do with major functionality change, ie at your discretion @orioncohen?

If people want updated functionality and bugfixes, they can always clone the current main branch and install with pip install -e .

Raising issue as food for thought.
release

opened by hmacdope 8
Enhancements to support composable Solutions and self-solvating solutes
UPDATE:

This PR now implements a number of quality of life changes and solves issue #31. The proposed multi-atom solute changes will be implemented in another PR. See PR #72 for the new changes!

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] add new testing data with a multi-atom solute

[x] remove foolish internal numbering scheme of solvated_atoms

[x] allow solutes to also act as solvents

[x] replace all column name strings with variables stored in a column_names.py file

Status

[x] Ready to go

Description

This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #31, #66, and #58.

I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.

Solution will be renamed to Solute. All references to solute in the current documentation will be renamed to solvated atom or solvated atoms. I think this better captures what the Solute class really is, especially as we expand to multi-atom Solutes.

The default initializer for Solute will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same. (note that I have already fixed the case with self-solvation identified in issue #31)

water = u.select_atoms(...) water_O = u.select_atoms(...) water_O_solute = Solute(water_O, {"water": water})

Additional initializers will be added to instantiate a Solute, these will support multi-atom solutes.

The first will allow the user to stitch together multiple solutes to create a new solute.

water_H1 = u.select_atoms(...) water_H2 = u.select_atoms(...) solute_O = Solute(water_O, {"water": water}) solute_H1 = Solute(water_H1, {"water": water}) solute_H2 = Solute(water_H2, {"water": water}) multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2]) # maybe this should be a dict?

The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.

multi_atom_solute = Solute.from_residue(water, {"water": water})

To support this, the solvation_data dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.

When a multi-atom solute is created all of the solvation_data dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.

multi_atom_solute.coordination_number["water"] # valid property

We will retain all of the single atom Solutes as a property of the multi-atom Solute. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.

>>> print(water.atoms) # what should this be called? >>> [solute_O, solute_H1, solute_H2] # maybe this should be a dict?

For a single atom solute the atoms list would still be present but the data within would be identical to the solvation_data of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.

water_O_solute.atoms[0].solvation_data = water_O_solute.solvation_data

I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.

Todos

Notable points that this PR has either accomplished or will accomplish.

[x] add new testing data with a multi-atom solute

[x] remove foolish internal numbering scheme of solvated_atoms

[x] allow solutes to also act as solvents

[x] replace all column name strings with variables stored in a column_names.py file

[ ] make solutions composable so that multi-atom solutes can be constructed systematically

[ ] put guardrails in place to prevent misuse

Status

[ ] Ready to go

enhancement core high-priority
opened by orionarcher 7
Concatenation Issue for Residence Time Calculation

I am using Residence.from_solute(solute) to calculate the residence time between trimers' N and water' O. It always reports the same concatenation issue which seems to be caused by the calculation of auto-covariance (Fig 1). However, when I calculate the residence time between anion and water, there isn't this problem. I add the solvation_data of anion solution (Fig 2) and trimer solution for your reference (Fig 3).
Fig 1 Fig 2
Fig 3

opened by SophiaRuan 0
Styling and consistency of documentation could be improved
This is split off from PR #78 to address two points the consistency and formatting of the documentation.

Todos:

[ ] closely read over all documentation for errors

[ ] make sure all syntax and code styling is consistent

[ ] enhance aesthetic styling of documentation

documentation
opened by orionarcher 0
Work towards MDAKit integration

Now that MDAKits are live to roll we should work towards registering solvation-analysis as an MDAKit!

See the blog post for more info.

@orionarcher I would be interested to know if you would prefer to just go for it or wait for 0.2.0 (and some conda packages).

AFAIK solvation-analysis already meets all the requirements listed in the white paper.

opened by hmacdope 2
Solvation plots
Description

Provide a brief description of the PR's purpose here.

Todos

Notable points that this PR has either accomplished or will accomplish.

[ ] TODO 1

Questions

[ ] Question1

Status

[ ] Ready to go
opened by laurlee 1
Create a `save_data` method for solution

This method should dump the core solvation statistics to python dict. It will not contain enough information to reconstitute the Solution.

Brought up in issue #52.

opened by orionarcher 0

Releases(v0.1.4)

v0.1.4(Jun 28, 2022)

Residence and Networking modules added to analysis_library.py
Source code(tar.gz)
Source code(zip)
v0.1.3(Mar 17, 2022)

Added two new tutorials and fixed an outstanding bug.

This release will also mint a Zenodo DOI.
Source code(tar.gz)
Source code(zip)
v0.1.2-beta(Sep 23, 2021)
New functionality:

co-occurrence matrix calculation and plotting

identify types of coordinating atoms

find percentage of free solvent

Bug fixes:

all AtomGroup.ids and ResidueGroup.resids changed to AtomGroup.ix and ResidueGroup.resindices

Source code(tar.gz)
Source code(zip)
v0.1.2(Sep 23, 2021)
New functionality:

co-occurrence matrix calculation and plotting

identify types of coordinating atoms

find percentage of free solvent

Bug fixes:

all AtomGroup.ids and ResidueGroup.resids changed to AtomGroup.ix and ResidueGroup.resindices

Source code(tar.gz)
Source code(zip)
v0.1.1-alpha(Aug 19, 2021)

First release to pip to conclude GSoC.
Source code(tar.gz)
Source code(zip)
v0.1.1(Aug 19, 2021)

First release on pip.
Source code(tar.gz)
Source code(zip)

Owner

MDAnalysis

MDAnalysis is an object-oriented Python library to analyze molecular dynamics trajectories.

GitHub Repository https://solvation-analysis.readthedocs.io/en/latest/

Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

1 Dec 26, 2021

Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

1 Jan 17, 2022

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

4.7k Jan 09, 2023

Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program

1 Jan 23, 2022

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

1 Oct 26, 2021

Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

WeRateDogs Twitter Data from 2015 to 2017 Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data Table of Contents Introduction Proj

1 Jan 12, 2022

My first Python project is a simple Mad Libs program.

Python CLI Mad Libs Game My first Python project is a simple Mad Libs program. Mad Libs is a phrasal template word game created by Leonard Stern and R

1 Dec 10, 2021

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python 📊

2 May 26, 2022

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Description Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Ti

4.1k Jan 09, 2023

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

13 Mar 31, 2022

Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

2 Nov 16, 2021

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

102 Nov 16, 2022

Big Data & Cloud Computing for Oceanography

DS2 Class 2022, Big Data & Cloud Computing for Oceanography Home of the 2022 ISblue Big Data & Cloud Computing for Oceanography class (IMT-A, ENSTA, I

5 Mar 19, 2022

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

36 Jan 03, 2023

Modular analysis tools for neurophysiology data

Neuroanalysis Modular and interactive tools for analysis of neurophysiology data, with emphasis on patch-clamp electrophysiology. Functions for runnin

5 Dec 22, 2021

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

10 Dec 08, 2022

Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

22 Dec 27, 2022

This is a repo documenting the best practices in PySpark.

Spark-Syntax This is a public repo documenting all of the "best practices" of writing PySpark code from what I have learnt from working with PySpark f

447 Dec 25, 2022

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

27 Dec 22, 2022

A set of functions and analysis classes for solvation structure analysis

Related tags

Overview

SolvationAnalysis

Main development by @orioncohen, with mentorship from @richardjgowers, @IAlibay, and @hmacdope.

Acknowledgements

Comments

Description

Todos

Status

Description

Todos

Status

Description

Todos

Status

Description

Todos

Status

Description

Todos

Questions

Status

Description

Todos

Questions

Status

Description

Todos

Status

Description

Todos

Status

UPDATE:

Todos

Status

Description

Todos

Status

Description

Todos

Questions

Status

Releases(v0.1.4)

v0.1.4(Jun 28, 2022)

v0.1.3(Mar 17, 2022)

v0.1.2-beta(Sep 23, 2021)

v0.1.2(Sep 23, 2021)

v0.1.1-alpha(Aug 19, 2021)

v0.1.1(Aug 19, 2021)

Owner

MDAnalysis

Titanic data analysis for python

Semi-Automated Data Processing

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

My first Python project is a simple Mad Libs program.

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Kennedy Institute of Rheumatology University of Oxford Project November 2019

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Big Data & Cloud Computing for Oceanography

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

Modular analysis tools for neurophysiology data

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

Pyspark Spotify ETL

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

This is a repo documenting the best practices in PySpark.

WaveFake: A Data Set to Facilitate Audio DeepFake Detection