Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First attempt: panedr and panedrlite #42

Merged
merged 26 commits into from
Jul 6, 2022
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f0cfe69
First attempt: panedr and panedrlite
BFedder Jun 26, 2022
839ae56
Attempt to fix CI
BFedder Jun 26, 2022
1d61a8d
reattempt CI
BFedder Jun 26, 2022
a937b42
Attempt: Fix pytest/codecov
BFedder Jun 26, 2022
c02e1f7
Reattempt: Codecov
BFedder Jun 26, 2022
e0bca63
blahMerge branch 'panedrlite' of https://github.com/BFedder/panedr in…
BFedder Jun 26, 2022
a9a9887
fixed git merge file weirdness
BFedder Jun 27, 2022
0af1018
comment out pip install panedr
BFedder Jun 27, 2022
233927c
Update gh-ci.yaml
BFedder Jun 27, 2022
1cbb327
fix pbr versioning in panedrlite
BFedder Jun 27, 2022
10a5727
Merge branch 'master' of https://github.com/MDAnalysis/panedr into MD…
BFedder Jun 27, 2022
7a5806c
Merge branch 'MDAnalysis-master' into panedrlite
BFedder Jun 27, 2022
bf89e03
addressing IAlibay's reviewer comments
BFedder Jun 27, 2022
9632971
merge #33
BFedder Jun 29, 2022
55d5731
panedrlite now mentioned in README
BFedder Jun 29, 2022
85ac62d
fix reST syntax
BFedder Jun 29, 2022
58fe71c
actually fix reST syntax
BFedder Jun 29, 2022
8c14842
Added docstrings and type hints
BFedder Jun 29, 2022
8cc1914
remove pandas type hint for now
BFedder Jun 29, 2022
0f405d9
fixed type hints for older python versions
BFedder Jul 1, 2022
c4c9453
added type hint for edr_to_df
BFedder Jul 3, 2022
f9302be
reverted edr_to_df type hints
BFedder Jul 4, 2022
ff328c3
panedrlite now supports import panedr
BFedder Jul 5, 2022
b61cf65
fixed CI links to panedrlite
BFedder Jul 5, 2022
86d17ef
Fixed CI? changed import in panedr/__init__
BFedder Jul 5, 2022
a04eaf0
reverted to panedrlite
BFedder Jul 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions .github/workflows/gh-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,21 @@ jobs:

- name: install package
run: |
python -m pip install -v .
python -m pip install -v ./panedrlite
python -m pip install -v ./panedr

- name: test imports
# Exit the git repo in order for pbr to stop auto-picking up version info
# from the local git data
working-directory: ../
run: |
python -Ic "from panedrlite import edr_to_dict"
python -Ic "from panedr import edr_to_df"


- name: run unit tests
run: |
pytest -n 2 -v --cov=panedr --cov-report=xml --color=yes ./tests
pytest -n 2 -v --cov=panedrlite/panedrlite --cov-report=xml --color=yes ./tests

- name: codecov
uses: codecov/codecov-action@v3
Expand Down
37 changes: 27 additions & 10 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,23 @@ Panedr

|Build Status| |cov|

Panedr reads a `Gromacs EDR`_ binary energy XDR file and returns its content
as a pandas_ dataframe. The library exposes one function—the ``edr_to_df``
function—that gets the path to an EDR file and returns a pandas
dataframe.
Panedr reads a `Gromacs EDR`_ binary energy XDR file and returns its content
as a pandas_ dataframe. The library exposes three functions:

``panedr`` is compatible with Python 3.6 and greater.
- the ``edr_to_df`` function, which gets the path to an EDR file and returns a
pandas DataFrame,

- the ``edr_to_dict`` function, which returns a dictionary of NumPy arrays instead
of a pandas DataFrame

- and the ``read_edr`` function, which is called by the other two functions to
do the actual reading of EDR files. It returns a tuple of lists.

``panedr`` is compatible with Python 3.6 and greater. It comes in two flavours:
``panedr`` and ``panedrlite``. These two packages are identical, but installing
panedr automatically installs pandas as well. This automatic installation of pandas
does not happen in panedrlite, making it useful for downstream integrators trying
to limit additional dependencies.

Example
-------
Expand All @@ -20,22 +31,28 @@ Example
# Read the EDR file
path = 'ener.edr'
df = panedr.edr_to_df(path)
dic = panedr.edr_to_dict(path)

# The `verbose` optional parameter can be set to True to display the
# progress on stderr
df = panedr.edr_to_df(path, verbose=True)
dic = panedr.edr_to_dict(path, verbose=True)

# Get the average pressure after the first 10 ns
pressure_avg = df[u'Pressure'][df[u'Time'] > 10000].mean()
pressure_avg = df['Pressure'][df['Time'] > 10000].mean()
pressure_avg = dic['Pressure'][dic['Time'] > 10000].mean()


Install
-------

Install the package with ``pip``:
All code is found in panedrlite. Installing panedr installs panedrlite and pandas
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
in one go, to make all functions available to the user out-of-the-box.
Install panedr or panedrlite with ``pip``:

.. code:: bash

pip install panedr
pip install panedrlite

If you are using `conda`_ and `conda-forge`_, you can install with

Expand All @@ -48,7 +65,7 @@ Tests

The ``panedr`` repository contains a series of tests. If you downloaded or
cloned the code from the repository, you can run the tests. To do so,
install pytest`_, and, in the directory of the
install `pytest`_, and, in the directory of the
panedr source code, run:

.. code:: bash
Expand Down Expand Up @@ -84,7 +101,7 @@ Public License version 2.1 as Gromacs.
.. |Build Status| image:: https://github.com/MDAnalysis/panedr/actions/workflows/gh-ci.yaml/badge.svg
:alt: Github Actions Build Status
:target: https://github.com/MDAnalysis/panedr/actions/workflows/gh-ci.yaml

.. |cov| image:: https://codecov.io/gh/MDAnalysis/panedr/branch/master/graph/badge.svg
:alt: Coverage Status
:target: https://codecov.io/gh/MDAnalysis/panedr
Expand Down
2 changes: 1 addition & 1 deletion panedr/__init__.py → panedr/panedr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
__version__ = pbr.version.VersionInfo('panedr').release_string()
del pbr

from .panedr import *
from panedrlite import *
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 3 additions & 0 deletions panedr/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
panedrlite[pandas]
numpy>=1.19.0
pbr
2 changes: 1 addition & 1 deletion setup.cfg → panedr/setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author_email = [email protected]
summary = Read and manipulate Gromacs energy files
license = LGPL
description_file =
README.rst
../README.rst
long_description_content_type = text/x-rst
home_page = https://github.com/MDAnalysis/panedr
python_requires = >=3.6
Expand Down
File renamed without changes.
7 changes: 7 additions & 0 deletions panedrlite/panedrlite/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# -*- coding: utf-8 -*-

import pbr.version
__version__ = pbr.version.VersionInfo('panedrlite').release_string()
IAlibay marked this conversation as resolved.
Show resolved Hide resolved
del pbr

from .panedr import *
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
88 changes: 81 additions & 7 deletions panedr/panedr.py → panedrlite/panedrlite/panedr.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,19 @@
The ``panedr`` library allows to read and manipulate the content of Gromacs
energy file (.edr files) in python.

The current version of ``panedr`` tries to be in par with Gromacs 5.1.1 when
it comes to read EDR files.
The current version of ``panedr`` tries to be in par with Gromacs 5.1.1 and
newer when it comes to reading EDR files.

So far, only one function is exposed by the library : the :fun:`edr_to_df`
function that returns a pandas ``DataFrame`` from an EDR file.
The library exposes the following functions:

- the :func:`read_edr` function parses an EDR file and returns the energy terms
in a nested list

- the :func:`edr_to_df` function that turns the nested list created by
:func:`read_edr` into a pandas ``DataFrame``

- the :func:`edr_to_dict` function that turns the nested list created by
:func:`read_edr` into a dictionary that maps term names to numpy arrays

.. autofunction:: edr_to_df
"""
Expand Down Expand Up @@ -403,7 +411,37 @@ def is_frame_magic(data):
return magic == -7777777


def read_edr(path, verbose=False):
all_energies_type = list[list[float]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion but I prefer type hints to be in full at the call site where possible.

I get that here they are too long and complicated to be readable and am happy with your choice but just where possible.

all_names_type = list[str]
times_type = list[float]
read_edr_return_type = tuple[all_energies_type, all_names_type, times_type]


def read_edr(path: str, verbose: bool = False) -> read_edr_return_type:
"""Parse EDR files and make contents available in Python

:func:`read_edr` does the actual reading of EDR files. It is called by
:func:`edr_to_df` and :func:`edr_to_dict` to provide the file contents.
Under the hood, it is using :class:`xdrlib.Unpacker` to access the binary
EDR file.

Parameters
----------
path : str
path to EDR file to be read
verbose : bool
Optionally show verbose output while reading the file

Returns
-------
all_energies: list[list[float]]
A nested containing the energy values for each frame found in the EDR
file
all_names: list[str]
A list containing the names of the energy terms found in the file
times: list[float]
A list containing the time of each step/frame.
"""
begin = time.time()
edr_file = EDRFile(str(path))
all_energies = []
Expand All @@ -428,11 +466,30 @@ def read_edr(path, verbose=False):
end='', file=sys.stderr)
print('\n{} frame read in {:.2f} seconds'.format(ifr, end - begin),
file=sys.stderr)

return all_energies, all_names, times



def edr_to_df(path: str, verbose: bool = False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def edr_to_df(path: str, verbose: bool = False):
def edr_to_df(path: str, verbose: bool = False) -> pd.DataFrame:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit complicated because pandas is an optional dependency. If pandas is not installed, the module can't be imported if the type hint is present. I could put the function definition into a try-except statement, but then I would lose the custom ImportError message. What's the best thing to do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents - sometimes you just have to take the loss. Type hints are just that, over-engineering a solution for an optional import for a method that's ~ 3 lines of code probably isn't worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took one of the suggestions from there and it seems to work :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to soon - I didn't fully think this through, the solution I tried yesterday won't work I'm afraid. I think we'll have to hold back on annotating the return type of edr_to_df for now, but the function name and doc string are pretty self-explanatory, and mypy wasn't working with that anyway yet

mypy error message: "error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
panedrlite/panedrlite/panedr.py:59: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say don't worry about it. We lived without typed code for Python 0-3.whatever so I'm sure well be fine without it. Sorry for leading you down the garden path.

"""Calls :func:`read_edr` and packs its return values into a DataFrame

This function has a pandas dependency. Installing panedrlite instead of
panedr will not automatically install pandas. If you want to use this
function, please install pandas or consider installing panedr instead.

Parameters
----------
path : str
path to EDR file to be read
verbose : bool
Optionally show verbose output while reading the file

Returns
-------
df: pandas.DataFrame
:class:`pandas.DataFrame()` object that holds all energy terms found in
the EDR file.
"""
try:
import pandas
except ImportError:
Expand All @@ -445,7 +502,24 @@ def edr_to_df(path: str, verbose: bool = False):
return df


def edr_to_dict(path: str, verbose: bool = False):
def edr_to_dict(path: str, verbose: bool = False) -> dict[str, np.ndarray]:
"""Calls :func:`read_edr` and packs its return values into a dictionary

The returned dictionary's keys are the names of the energy terms present in
the EDR file, the values are the time-series energy data for those terms.

Parameters
----------
path : str
path to EDR file to be read
verbose : bool
Optionally show verbose output while reading the file

Returns
-------
enery_dict: dict[str, np.ndarray]
dictionary that holds all energy terms found in the EDR file.
"""
all_energies, all_names, times = read_edr(path, verbose=verbose)
energy_dict = {}
for idx, name in enumerate(all_names):
Expand Down
File renamed without changes.
31 changes: 31 additions & 0 deletions panedrlite/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[metadata]
name = panedrlite
author = Jonathan Barnoud
author_email = [email protected]
summary = Read and manipulate Gromacs energy files
license = LGPL
description_file =
../README.rst
long_description_content_type = text/x-rst
home_page = https://github.com/MDAnalysis/panedr
python_requires = >=3.6
classifier =
Development Status :: 4 - Beta
Intended Audience :: Developers
Topic :: Scientific/Engineering :: Bio-Informatics
Topic :: Scientific/Engineering :: Chemistry
Topic :: Scientific/Engineering :: Physics
License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Operating System :: OS Independent

[extras]
test =
six
pytest
pandas =
pandas
6 changes: 6 additions & 0 deletions panedrlite/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from setuptools import setup

setup(
setup_requires=['pbr'],
pbr=True,
)
23 changes: 3 additions & 20 deletions tests/test_edr.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,11 @@
import contextlib
import numpy
import pandas
import panedr
import panedrlite as panedr
import re

# On python 2, cStringIO is a faster version of StringIO. It may not be
# available on implementations other than Cpython, though. Therefore, we may
# have to fail back on StringIO if cStriongIO is not available.
# On python 3, the StringIO object is not part of the StringIO module anymore.
# It becomes part of the io module.
try:
from cStringIO import StringIO
except ImportError:
try:
from StringIO import StringIO
except ImportError:
from io import StringIO

from io import StringIO
from collections import namedtuple
try:
from pathlib import Path
except ImportError:
# Python 2 requires the pathlib2 backport of pathlib
from pathlib2 import Path
from pathlib import Path

# Constants for XVG parsing
COMMENT_PATTERN = re.compile(r'\s*[@#%&/]')
Expand Down