Skip to content

Commit

Permalink
First attempt: panedr and panedrlite (MDAnalysis#42)
Browse files Browse the repository at this point in the history
PR MDAnalysis#42
## Work done in this PR
* Creates the panedr and panedrlite packages
* panedrlite is panedr without pandas
* panedr import panedrlite and pandas
* update CI accordingly
* fix some type hint issues
  • Loading branch information
BFedder authored and ezavod committed Jul 9, 2022
1 parent 8061b6c commit 8e516d0
Show file tree
Hide file tree
Showing 12 changed files with 183 additions and 42 deletions.
14 changes: 12 additions & 2 deletions .github/workflows/gh-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,21 @@ jobs:
- name: install package
run: |
python -m pip install -v .
python -m pip install -v ./panedrlite
python -m pip install -v ./panedr
- name: test imports
# Exit the git repo in order for pbr to stop auto-picking up version info
# from the local git data
working-directory: ../
run: |
python -Ic "from panedrlite import edr_to_dict"
python -Ic "from panedr import edr_to_df"
- name: run unit tests
run: |
pytest -n 2 -v --cov=panedr --cov-report=xml --color=yes ./tests
pytest -n 2 -v --cov=panedrlite/panedrlite --cov-report=xml --color=yes ./tests
- name: codecov
uses: codecov/codecov-action@v3
Expand Down
46 changes: 37 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,17 @@ Panedr

|Build Status| |cov|

Panedr reads a `Gromacs EDR`_ binary energy XDR file and returns its content
as a pandas_ dataframe. The library exposes one function—the ``edr_to_df``
function—that gets the path to an EDR file and returns a pandas
dataframe.
Panedr reads a `Gromacs EDR`_ binary energy XDR file and returns its content
as a pandas_ dataframe. The library exposes three functions:

- the ``edr_to_df`` function, which gets the path to an EDR file and returns a
pandas DataFrame,

- the ``edr_to_dict`` function, which returns a dictionary of NumPy arrays instead
of a pandas DataFrame

- and the ``read_edr`` function, which is called by the other two functions to
do the actual reading of EDR files. It returns a tuple of lists.

``panedr`` is compatible with Python 3.6 and greater.

Expand All @@ -20,23 +27,27 @@ Example
# Read the EDR file
path = 'ener.edr'
df = panedr.edr_to_df(path)
dic = panedr.edr_to_dict(path)
# The `verbose` optional parameter can be set to True to display the
# progress on stderr
df = panedr.edr_to_df(path, verbose=True)
dic = panedr.edr_to_dict(path, verbose=True)
# Get the average pressure after the first 10 ns
pressure_avg = df[u'Pressure'][df[u'Time'] > 10000].mean()
pressure_avg = df['Pressure'][df['Time'] > 10000].mean()
pressure_avg = dic['Pressure'][dic['Time'] > 10000].mean()
Install
-------

Install the package with ``pip``:
Install panedr ``pip``:

.. code:: bash
pip install panedr
If you are using `conda`_ and `conda-forge`_, you can install with

.. code:: bash
Expand All @@ -48,13 +59,30 @@ Tests

The ``panedr`` repository contains a series of tests. If you downloaded or
cloned the code from the repository, you can run the tests. To do so,
install pytest`_, and, in the directory of the
install `pytest`_, and, in the directory of the
panedr source code, run:

.. code:: bash
pytest -v tests
panedrlite
----------
Under the hood, panedr is just a metapackage that installs panedrlite and
the requirements for all functions, notably including pandas. To avoid requiring
pandas in downstream applications, panedrlite is available for installation as
well. It provides all functionality, except that pandas is not automatically
installed as a dependency, and therefore :func:`edr_to_df` will not work
out-of-the-box unless it is installed manually. `panedrlite` also uses the panedr
namespace, so `import panedr` works.


.. code:: bash
pip install panedrlite
License
-------

Expand Down Expand Up @@ -84,7 +112,7 @@ Public License version 2.1 as Gromacs.
.. |Build Status| image:: https://github.com/MDAnalysis/panedr/actions/workflows/gh-ci.yaml/badge.svg
:alt: Github Actions Build Status
:target: https://github.com/MDAnalysis/panedr/actions/workflows/gh-ci.yaml

.. |cov| image:: https://codecov.io/gh/MDAnalysis/panedr/branch/master/graph/badge.svg
:alt: Coverage Status
:target: https://codecov.io/gh/MDAnalysis/panedr
Expand Down
2 changes: 1 addition & 1 deletion panedr/__init__.py → panedr/panedr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
__version__ = pbr.version.VersionInfo('panedr').release_string()
del pbr

from .panedr import *
from panedrlite import edr_to_dict, edr_to_df, read_edr
3 changes: 3 additions & 0 deletions panedr/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
panedrlite[pandas]
numpy>=1.19.0
pbr
2 changes: 1 addition & 1 deletion setup.cfg → panedr/setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author_email = [email protected]
summary = Read and manipulate Gromacs energy files
license = LGPL
description_file =
README.rst
../README.rst
long_description_content_type = text/x-rst
home_page = https://github.com/MDAnalysis/panedr
python_requires = >=3.6
Expand Down
File renamed without changes.
7 changes: 7 additions & 0 deletions panedrlite/panedrlite/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# -*- coding: utf-8 -*-

import pbr.version
__version__ = pbr.version.VersionInfo('panedrlite').release_string()
del pbr

from .panedr import edr_to_df, edr_to_dict, read_edr
93 changes: 83 additions & 10 deletions panedr/panedr.py → panedrlite/panedrlite/panedr.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,19 @@
The ``panedr`` library allows to read and manipulate the content of Gromacs
energy file (.edr files) in python.
The current version of ``panedr`` tries to be in par with Gromacs 5.1.1 when
it comes to read EDR files.
The current version of ``panedr`` tries to be in par with Gromacs 5.1.1 and
newer when it comes to reading EDR files.
So far, only one function is exposed by the library : the :fun:`edr_to_df`
function that returns a pandas ``DataFrame`` from an EDR file.
The library exposes the following functions:
- the :func:`read_edr` function parses an EDR file and returns the energy terms
in a nested list
- the :func:`edr_to_df` function that turns the nested list created by
:func:`read_edr` into a pandas ``DataFrame``
- the :func:`edr_to_dict` function that turns the nested list created by
:func:`read_edr` into a dictionary that maps term names to numpy arrays
.. autofunction:: edr_to_df
"""
Expand All @@ -46,6 +54,7 @@
import itertools
import time
import numpy as np
from typing import List, Tuple, Dict


#Index for the IDs of additional blocks in the energy file.
Expand Down Expand Up @@ -78,7 +87,6 @@

__all__ = ['edr_to_df', 'edr_to_dict', 'read_edr']


class EDRFile(object):
def __init__(self, path):
with open(path, 'rb') as infile:
Expand Down Expand Up @@ -403,7 +411,37 @@ def is_frame_magic(data):
return magic == -7777777


def read_edr(path, verbose=False):
all_energies_type = List[List[float]]
all_names_type = List[str]
times_type = List[float]
read_edr_return_type = Tuple[all_energies_type, all_names_type, times_type]


def read_edr(path: str, verbose: bool = False) -> read_edr_return_type:
"""Parse EDR files and make contents available in Python
:func:`read_edr` does the actual reading of EDR files. It is called by
:func:`edr_to_df` and :func:`edr_to_dict` to provide the file contents.
Under the hood, it is using :class:`xdrlib.Unpacker` to access the binary
EDR file.
Parameters
----------
path : str
path to EDR file to be read
verbose : bool
Optionally show verbose output while reading the file
Returns
-------
all_energies: list[list[float]]
A nested containing the energy values for each frame found in the EDR
file
all_names: list[str]
A list containing the names of the energy terms found in the file
times: list[float]
A list containing the time of each step/frame.
"""
begin = time.time()
edr_file = EDRFile(str(path))
all_energies = []
Expand All @@ -428,24 +466,59 @@ def read_edr(path, verbose=False):
end='', file=sys.stderr)
print('\n{} frame read in {:.2f} seconds'.format(ifr, end - begin),
file=sys.stderr)

return all_energies, all_names, times


def edr_to_df(path: str, verbose: bool = False):
"""Calls :func:`read_edr` and packs its return values into a DataFrame
This function has a pandas dependency. Installing panedrlite instead of
panedr will not automatically install pandas. If you want to use this
function, please install pandas or consider installing panedr instead.
Parameters
----------
path : str
path to EDR file to be read
verbose : bool
Optionally show verbose output while reading the file
Returns
-------
df: pandas.DataFrame
:class:`pandas.DataFrame()` object that holds all energy terms found in
the EDR file.
"""
try:
import pandas
import pandas as pd
except ImportError:
raise ImportError("""ERROR --- pandas was not found!
pandas is required to use the `.edr_to_df()`
functionality. Try installing it using pip, e.g.:
python -m pip install pandas""")
all_energies, all_names, times = read_edr(path, verbose=verbose)
df = pandas.DataFrame(all_energies, columns=all_names, index=times)
df = pd.DataFrame(all_energies, columns=all_names, index=times)
return df


def edr_to_dict(path: str, verbose: bool = False):
def edr_to_dict(path: str, verbose: bool = False) -> Dict[str, np.ndarray]:
"""Calls :func:`read_edr` and packs its return values into a dictionary
The returned dictionary's keys are the names of the energy terms present in
the EDR file, the values are the time-series energy data for those terms.
Parameters
----------
path : str
path to EDR file to be read
verbose : bool
Optionally show verbose output while reading the file
Returns
-------
enery_dict: dict[str, np.ndarray]
dictionary that holds all energy terms found in the EDR file.
"""
all_energies, all_names, times = read_edr(path, verbose=verbose)
energy_dict = {}
for idx, name in enumerate(all_names):
Expand Down
File renamed without changes.
31 changes: 31 additions & 0 deletions panedrlite/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[metadata]
name = panedrlite
author = Jonathan Barnoud
author_email = [email protected]
summary = Read and manipulate Gromacs energy files
license = LGPL
description_file =
../README.rst
long_description_content_type = text/x-rst
home_page = https://github.com/MDAnalysis/panedr
python_requires = >=3.6
classifier =
Development Status :: 4 - Beta
Intended Audience :: Developers
Topic :: Scientific/Engineering :: Bio-Informatics
Topic :: Scientific/Engineering :: Chemistry
Topic :: Scientific/Engineering :: Physics
License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Operating System :: OS Independent

[extras]
test =
six
pytest
pandas =
pandas
6 changes: 6 additions & 0 deletions panedrlite/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from setuptools import setup

setup(name="panedr",
setup_requires=['pbr'],
pbr=True,
)
21 changes: 2 additions & 19 deletions tests/test_edr.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,9 @@
import pandas
import panedr
import re

# On python 2, cStringIO is a faster version of StringIO. It may not be
# available on implementations other than Cpython, though. Therefore, we may
# have to fail back on StringIO if cStriongIO is not available.
# On python 3, the StringIO object is not part of the StringIO module anymore.
# It becomes part of the io module.
try:
from cStringIO import StringIO
except ImportError:
try:
from StringIO import StringIO
except ImportError:
from io import StringIO

from io import StringIO
from collections import namedtuple
try:
from pathlib import Path
except ImportError:
# Python 2 requires the pathlib2 backport of pathlib
from pathlib2 import Path
from pathlib import Path

# Constants for XVG parsing
COMMENT_PATTERN = re.compile(r'\s*[@#%&/]')
Expand Down

0 comments on commit 8e516d0

Please sign in to comment.