First attempt: panedr and panedrlite (MDAnalysis#42)

PR MDAnalysis#42 ## Work done in this PR * Creates the panedr and panedrlite packages * panedrlite is panedr without pandas * panedr import panedrlite and pandas * update CI accordingly * fix some type hint issues
ezavod · Jul 9, 2022 · 8e516d0 · 8e516d0
1 parent 8061b6c
commit 8e516d0
Show file tree

Hide file tree

Showing 12 changed files with 183 additions and 42 deletions.
diff --git a/.github/workflows/gh-ci.yaml b/.github/workflows/gh-ci.yaml
@@ -56,11 +56,21 @@ jobs:
 
     - name: install package
       run: |
-        python -m pip install -v .
+        python -m pip install -v ./panedrlite
+        python -m pip install -v ./panedr
+
+    - name: test imports
+      # Exit the git repo in order for pbr to stop auto-picking up version info
+      # from the local git data
+      working-directory: ../
+      run: |
+        python -Ic "from panedrlite import edr_to_dict"
+        python -Ic "from panedr import edr_to_df"
+
 
     - name: run unit tests
       run: |
-        pytest -n 2 -v --cov=panedr --cov-report=xml --color=yes ./tests
+        pytest -n 2 -v --cov=panedrlite/panedrlite --cov-report=xml --color=yes ./tests
 
     - name: codecov
       uses: codecov/codecov-action@v3

diff --git a/README.rst b/README.rst
@@ -3,10 +3,17 @@ Panedr
 
 |Build Status| |cov|
 
-Panedr reads a `Gromacs EDR`_ binary energy XDR file and returns its content 
-as a pandas_ dataframe. The library exposes one function—the ``edr_to_df``
-function—that gets the path to an EDR file and returns a pandas
-dataframe.
+Panedr reads a `Gromacs EDR`_ binary energy XDR file and returns its content
+as a pandas_ dataframe. The library exposes three functions:
+
+- the ``edr_to_df`` function, which gets the path to an EDR file and returns a
+  pandas DataFrame,
+
+- the ``edr_to_dict`` function, which returns a dictionary of NumPy arrays instead
+  of a pandas DataFrame
+
+- and the ``read_edr`` function, which is called by the other two functions to
+  do the actual reading of EDR files. It returns a tuple of lists.
 
 ``panedr`` is compatible with Python 3.6 and greater.
 
@@ -20,23 +27,27 @@ Example
     # Read the EDR file
     path = 'ener.edr'
     df = panedr.edr_to_df(path)
+    dic = panedr.edr_to_dict(path)
 
     # The `verbose` optional parameter can be set to True to display the
     # progress on stderr
     df = panedr.edr_to_df(path, verbose=True)
+    dic = panedr.edr_to_dict(path, verbose=True)
 
     # Get the average pressure after the first 10 ns
-    pressure_avg = df[u'Pressure'][df[u'Time'] > 10000].mean()
+    pressure_avg = df['Pressure'][df['Time'] > 10000].mean()
+    pressure_avg = dic['Pressure'][dic['Time'] > 10000].mean()
+
 
 Install
 -------
-
-Install the package with ``pip``:
+Install panedr ``pip``:
 
 .. code:: bash
 
     pip install panedr
 
+
 If you are using `conda`_ and `conda-forge`_, you can install with
 
 .. code:: bash
@@ -48,13 +59,30 @@ Tests
 
 The ``panedr`` repository contains a series of tests. If you downloaded or
 cloned the code from the repository, you can run the tests. To do so,
-install pytest`_, and, in the directory of the
+install `pytest`_, and, in the directory of the
 panedr source code, run:
 
 .. code:: bash
 
     pytest -v tests
 
+
+panedrlite
+----------
+Under the hood, panedr is just a metapackage that installs panedrlite and
+the requirements for all functions, notably including pandas. To avoid requiring
+pandas in downstream applications, panedrlite is available for installation as
+well. It provides all functionality, except that pandas is not automatically
+installed as a dependency, and therefore :func:`edr_to_df` will not work
+out-of-the-box unless it is installed manually. `panedrlite` also uses the panedr
+namespace, so `import panedr` works. 
+
+
+.. code:: bash
+
+    pip install panedrlite
+
+
 License
 -------
 
@@ -84,7 +112,7 @@ Public License version 2.1 as Gromacs.
 .. |Build Status| image:: https://github.com/MDAnalysis/panedr/actions/workflows/gh-ci.yaml/badge.svg
    :alt: Github Actions Build Status
    :target: https://github.com/MDAnalysis/panedr/actions/workflows/gh-ci.yaml
-   
+
 .. |cov|   image:: https://codecov.io/gh/MDAnalysis/panedr/branch/master/graph/badge.svg
    :alt: Coverage Status
    :target: https://codecov.io/gh/MDAnalysis/panedr

diff --git a/panedr/__init__.py → panedr/panedr/__init__.py b/panedr/__init__.py → panedr/panedr/__init__.py
@@ -4,4 +4,4 @@
 __version__ = pbr.version.VersionInfo('panedr').release_string()
 del pbr
 
-from .panedr import *
+from panedrlite import edr_to_dict, edr_to_df, read_edr
diff --git a/panedr/requirements.txt b/panedr/requirements.txt
@@ -0,0 +1,3 @@
+panedrlite[pandas]
+numpy>=1.19.0
+pbr
diff --git a/setup.cfg → panedr/setup.cfg b/setup.cfg → panedr/setup.cfg
@@ -5,7 +5,7 @@ author_email = [email protected]
 summary = Read and manipulate Gromacs energy files
 license = LGPL
 description_file =
-    README.rst
+    ../README.rst
 long_description_content_type = text/x-rst
 home_page = https://github.com/MDAnalysis/panedr
 python_requires = >=3.6

diff --git a/setup.py → panedr/setup.py b/setup.py → panedr/setup.py
diff --git a/panedrlite/panedrlite/__init__.py b/panedrlite/panedrlite/__init__.py
@@ -0,0 +1,7 @@
+# -*- coding: utf-8 -*-
+
+import pbr.version
+__version__ = pbr.version.VersionInfo('panedrlite').release_string()
+del pbr
+
+from .panedr import edr_to_df, edr_to_dict, read_edr
diff --git a/panedr/panedr.py → panedrlite/panedrlite/panedr.py b/panedr/panedr.py → panedrlite/panedrlite/panedr.py
@@ -28,11 +28,19 @@
 The ``panedr`` library allows to read and manipulate the content of Gromacs
 energy file (.edr files) in python.
 
-The current version of ``panedr`` tries to be in par with Gromacs 5.1.1 when
-it comes to read EDR files.
+The current version of ``panedr`` tries to be in par with Gromacs 5.1.1 and
+newer when it comes to reading EDR files.
 
-So far, only one function is exposed by the library : the :fun:`edr_to_df`
-function that returns a pandas ``DataFrame`` from an EDR file.
+The library exposes the following functions:
+
+- the :func:`read_edr` function parses an EDR file and returns the energy terms
+  in a nested list
+
+- the :func:`edr_to_df` function that turns the nested list created by
+  :func:`read_edr` into a pandas ``DataFrame``
+
+- the :func:`edr_to_dict` function that turns the nested list created by
+  :func:`read_edr` into a dictionary that maps term names to numpy arrays
 
 .. autofunction:: edr_to_df
 """
@@ -46,6 +54,7 @@
 import itertools
 import time
 import numpy as np
+from typing import List, Tuple, Dict
 
 
 #Index for the IDs of additional blocks in the energy file.
@@ -78,7 +87,6 @@
 
 __all__ = ['edr_to_df', 'edr_to_dict', 'read_edr']
 
-
 class EDRFile(object):
     def __init__(self, path):
         with open(path, 'rb') as infile:
@@ -403,7 +411,37 @@ def is_frame_magic(data):
     return magic == -7777777
 
 
-def read_edr(path, verbose=False):
+all_energies_type = List[List[float]]
+all_names_type = List[str]
+times_type = List[float]
+read_edr_return_type = Tuple[all_energies_type, all_names_type, times_type]
+
+
+def read_edr(path: str, verbose: bool = False) -> read_edr_return_type:
+    """Parse EDR files and make contents available in Python
+
+    :func:`read_edr` does the actual reading of EDR files. It is called by
+    :func:`edr_to_df` and :func:`edr_to_dict` to provide the file contents.
+    Under the hood, it is using :class:`xdrlib.Unpacker` to access the binary
+    EDR file.
+
+    Parameters
+    ----------
+    path : str
+        path to EDR file to be read
+    verbose : bool
+        Optionally show verbose output while reading the file
+
+    Returns
+    -------
+    all_energies: list[list[float]]
+        A nested containing the energy values for each frame found in the EDR
+        file
+    all_names: list[str]
+        A list containing the names of the energy terms found in the file
+    times: list[float]
+        A list containing the time of each step/frame.
+    """
     begin = time.time()
     edr_file = EDRFile(str(path))
     all_energies = []
@@ -428,24 +466,59 @@ def read_edr(path, verbose=False):
               end='', file=sys.stderr)
         print('\n{} frame read in {:.2f} seconds'.format(ifr, end - begin),
               file=sys.stderr)
-
     return all_energies, all_names, times
 
 
 def edr_to_df(path: str, verbose: bool = False):
+    """Calls :func:`read_edr` and packs its return values into a DataFrame
+
+    This function has a pandas dependency. Installing panedrlite instead of
+    panedr will not automatically install pandas. If you want to use this
+    function, please install pandas or consider installing panedr instead.
+
+    Parameters
+    ----------
+    path : str
+        path to EDR file to be read
+    verbose : bool
+        Optionally show verbose output while reading the file
+
+    Returns
+    -------
+    df: pandas.DataFrame
+        :class:`pandas.DataFrame()` object that holds all energy terms found in
+        the EDR file.
+        """
     try:
-        import pandas
+        import pandas as pd
     except ImportError:
         raise ImportError("""ERROR --- pandas was not found!
                           pandas is required to use the `.edr_to_df()`
                           functionality. Try installing it using pip, e.g.:
                           python -m pip install pandas""")
     all_energies, all_names, times = read_edr(path, verbose=verbose)
-    df = pandas.DataFrame(all_energies, columns=all_names, index=times)
+    df = pd.DataFrame(all_energies, columns=all_names, index=times)
     return df
 
 
-def edr_to_dict(path: str, verbose: bool = False):
+def edr_to_dict(path: str, verbose: bool = False) -> Dict[str, np.ndarray]:
+    """Calls :func:`read_edr` and packs its return values into a dictionary
+
+    The returned dictionary's keys are the names of the energy terms present in
+    the EDR file, the values are the time-series energy data for those terms.
+
+    Parameters
+    ----------
+    path : str
+        path to EDR file to be read
+    verbose : bool
+        Optionally show verbose output while reading the file
+
+    Returns
+    -------
+    enery_dict: dict[str, np.ndarray]
+        dictionary that holds all energy terms found in the EDR file.
+    """
     all_energies, all_names, times = read_edr(path, verbose=verbose)
     energy_dict = {}
     for idx, name in enumerate(all_names):

diff --git a/requirements.txt → panedrlite/requirements.txt b/requirements.txt → panedrlite/requirements.txt
diff --git a/panedrlite/setup.cfg b/panedrlite/setup.cfg
@@ -0,0 +1,31 @@
+[metadata]
+name = panedrlite
+author = Jonathan Barnoud
+author_email = [email protected]
+summary = Read and manipulate Gromacs energy files
+license = LGPL
+description_file =
+    ../README.rst
+long_description_content_type = text/x-rst
+home_page = https://github.com/MDAnalysis/panedr
+python_requires = >=3.6
+classifier =
+    Development Status :: 4 - Beta
+    Intended Audience :: Developers
+    Topic :: Scientific/Engineering :: Bio-Informatics
+    Topic :: Scientific/Engineering :: Chemistry
+    Topic :: Scientific/Engineering :: Physics
+    License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
+    Programming Language :: Python :: 3.6
+    Programming Language :: Python :: 3.7
+    Programming Language :: Python :: 3.8
+    Programming Language :: Python :: 3.9
+    Programming Language :: Python :: 3.10
+    Operating System :: OS Independent
+
+[extras]
+test =
+    six
+    pytest
+pandas =
+    pandas
diff --git a/panedrlite/setup.py b/panedrlite/setup.py
@@ -0,0 +1,6 @@
+from setuptools import setup
+
+setup(name="panedr",
+    setup_requires=['pbr'],
+    pbr=True,
+)
diff --git a/tests/test_edr.py b/tests/test_edr.py
@@ -16,26 +16,9 @@
 import pandas
 import panedr
 import re
-
-# On python 2, cStringIO is a faster version of StringIO. It may not be
-# available on implementations other than Cpython, though. Therefore, we may
-# have to fail back on StringIO if cStriongIO is not available.
-# On python 3, the StringIO object is not part of the StringIO module anymore.
-# It becomes part of the io module.
-try:
-    from cStringIO import StringIO
-except ImportError:
-    try:
-        from StringIO import StringIO
-    except ImportError:
-        from io import StringIO
-
+from io import StringIO
 from collections import namedtuple
-try:
-    from pathlib import Path
-except ImportError:
-    # Python 2 requires the pathlib2 backport of pathlib
-    from pathlib2 import Path
+from pathlib import Path
 
 # Constants for XVG parsing
 COMMENT_PATTERN = re.compile(r'\s*[@#%&/]')