Merge pull request #136 from fraunhoferportugal/dev

Dev
fraunhoferportugal · Aug 22, 2023 · efe2bbc · efe2bbc
2 parents e6fcb47 + 1c4050a
commit efe2bbc
Show file tree

Hide file tree

Showing 20 changed files with 1,408 additions and 1,572 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -2,6 +2,29 @@
 Changelog
 =========
 
+Version 0.1.6
+=============
+- Changes
+    - Feature ``total energy`` changed name to ``average power``
+    - Features ``peak to peak``, ``absolute energy`` and ``entropy`` are now classified as statistical
+
+- Bugfixes
+    - Fixed a bug on numpy bool usage (`#133 <https://github.com/fraunhoferportugal/tsfel/issues/133>`_)
+    - Fixed a bug on features' header names
+
+- Improvements
+    - Correlated features are now computed using absolute value
+    - Unit tests improvements
+    - Refactoring of some code sections and overall improved stability
+
+
+Version 0.1.5
+=============
+-  Bugfixes
+   - Fixed a bug on scipy function median_absolute_deviation to median_abs_deviation (`#128 <https://github.com/fraunhoferportugal/tsfel/pull/128>`_)
+   - Fixed on pandas function df.append to pd.concat (`#120 <https://github.com/fraunhoferportugal/tsfel/pull/120>`_)
+
+
 Version 0.1.4
 =============
 - Bugfixes

diff --git a/README.md b/README.md
@@ -51,9 +51,12 @@ X = tsfel.time_series_features_extractor(cfg, df)
 #### Statistical domain
 | Features                   | Computational Cost |
 |----------------------------|:------------------:|
+| Absolute energy            |          1         |
+| Average power              |          1         |
 | ECDF                       |          1         |
 | ECDF Percentile            |          1         |
 | ECDF Percentile Count      |          1         |
+| Entropy                    |          1         |
 | Histogram                  |          1         |
 | Interquartile range        |          1         |
 | Kurtosis                   |          1         |
@@ -72,11 +75,9 @@ X = tsfel.time_series_features_extractor(cfg, df)
 #### Temporal domain
 | Features                   | Computational Cost |
 |----------------------------|:------------------:|
-| Absolute energy            |          1         |
 | Area under the curve       |          1         |
 | Autocorrelation            |          1         |
 | Centroid                   |          1         |
-| Entropy                    |          1         |
 | Mean absolute diff         |          1         |
 | Mean diff                  |          1         |
 | Median absolute diff       |          1         |
@@ -87,7 +88,6 @@ X = tsfel.time_series_features_extractor(cfg, df)
 | Signal distance            |          1         |
 | Slope                      |          1         |
 | Sum absolute diff          |          1         |
-| Total energy               |          1         |
 | Zero crossing rate         |          1         |
 | Neighbourhood peaks        |          1         |
 

diff --git a/notebooks/TSFEL_HAR_Example.ipynb b/notebooks/TSFEL_HAR_Example.ipynb
@@ -184,11 +184,7 @@
    "source": [
     "# Feature Extraction\n",
     "\n",
-    "Through **Feature Extraction** methodologies, the data is translated into a feature vector containing information about the signal properties of each window. These properties can be classified according to their domain as Time, Frequency and Statistical features and allow to characterise the signal compactly, enhancing its characteristics. This features will be used as input to the machine learning classifier, thus, the chosen set of features can strongly influence the classification output.\n",
-    "\n",
-    "The features to extract are defined in the [google sheet](https://docs.google.com/spreadsheets/d/13u7L_5IX3XxFuq_SnbOZF1dXQfcBB0wR3PXhvevhPYA/edit?usp=sharing). Save a copy on your local drive and share it with [email protected].\n",
-    "\n",
-    "**Change your google sheet file name and the googleSheet_name variable to your name so both have the same name.**"
+    "Through **Feature Extraction** methodologies, the data is translated into a feature vector containing information about the signal properties of each window. These properties can be classified according to their domain as Time, Frequency and Statistical features and allow to characterise the signal compactly, enhancing its characteristics. This features will be used as input to the machine learning classifier, thus, the chosen set of features can strongly influence the classification output."
    ]
   },
   {
@@ -279,9 +275,10 @@
    ],
    "source": [
     "#@title Feature Extraction\n",
-    "googleSheet_name = \"Features_dev\"\n",
-    "# Extract excel info\n",
-    "cfg_file = tsfel.extract_sheet(googleSheet_name)\n",
+    "cfg_file = tsfel.get_features_by_domain()                # All features \n",
+    "# cfg_file = tsfel.get_features_by_domain('statistical') # Only statistical features\n",
+    "# cfg_file = tsfel.get_features_by_domain('temporal')    # Only temporal features\n",
+    "# cfg_file = tsfel.get_features_by_domain('spectral')    # Only spectral features\n",
     "\n",
     "# Get features\n",
     "X_train = tsfel.time_series_features_extractor(cfg_file, x_train_sig, fs=fs)\n",
@@ -461,7 +458,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.6.9"
   }
  },
  "nbformat": 4,

diff --git a/requirements/requirements-dev.txt b/requirements/requirements-dev.txt
@@ -0,0 +1 @@
+pre-commit==2.17.0
diff --git a/requirements/requirements-docs.txt b/requirements/requirements-docs.txt
@@ -0,0 +1 @@
+Sphinx >= 1.8.5
diff --git a/requirements/requirements-test.txt b/requirements/requirements-test.txt
@@ -0,0 +1,2 @@
+nose==1.3.7
+pytest==7.1.1
diff --git a/requirements.txt → requirements/requirements.txt b/requirements.txt → requirements/requirements.txt
@@ -1,25 +1,15 @@
 # Automatically generated by https://github.com/damnever/pigar.
 
-# tsfel/docs/conf.py: 23
-Sphinx >= 1.8.5
-
-# tsfel/utils/gSheetsFilters.py: 4
-gspread >= 3.1.0
-
 # tsfel/feature_extraction/calc_features.py: 15,16
 # tsfel/utils/progress_bar.py: 1,2
 ipython >= 7.4.0
 
 # tsfel/feature_extraction/calc_features.py: 12
 # tsfel/feature_extraction/features_utils.py: 2
 # tsfel/utils/calculate_complexity.py: 3
-# tsfel/utils/gSheetsFilters.py: 5
 # tsfel/utils/signal_processing.py: 1
 numpy >= 1.18.5
 
-# tsfel/utils/gSheetsFilters.py: 6
-oauth2client >= 4.1.3
-
 # tsfel/feature_extraction/calc_features.py: 13
 # tsfel/utils/signal_processing.py: 2
 pandas >= 1.5.3

diff --git a/setup.py b/setup.py
@@ -1,30 +1,43 @@
 import setuptools
+from pathlib import Path
 
+ROOT = Path(__file__).parent
 
 with open("README.md", "r") as fh:
     long_description = fh.read()
 
-with open('requirements.txt', 'r') as f:
-    install_reqs = [
-        s for s in [
-            line.strip(' \n') for line in f
-        ] if not s.startswith('#') and s != ''
-    ]
+
+def find_requirements(filename):
+    with (ROOT / "requirements" / filename).open() as f:
+        return [
+            s
+            for s in [line.strip(" \n") for line in f]
+            if not s.startswith("#") and s != ""
+        ]
+
+
+install_reqs = find_requirements("requirements.txt")
+docs_require = find_requirements("requirements-docs.txt")
 
 setuptools.setup(
     name="tsfel",
-    version="0.1.5",
+    version="0.1.6",
     author="Fraunhofer Portugal",
     description="Library for time series feature extraction",
     long_description=long_description,
     long_description_content_type="text/markdown",
     url="https://github.com/fraunhoferportugal/tsfel/",
-    package_data={'tsfel': ['feature_extraction/features.json', 'utils/client_secret.json']},
+    package_data={
+        "tsfel": ["feature_extraction/features.json"]
+    },
     packages=setuptools.find_packages(),
     classifiers=[
         "Programming Language :: Python :: 3",
         "License :: OSI Approved :: BSD License",
         "Operating System :: OS Independent",
     ],
-    install_requires=install_reqs
+    install_requires=install_reqs,
+    extras_require={
+        "docs": docs_require,
+    },
 )
diff --git a/tests/test_calc_features.py b/tests/test_calc_features.py
@@ -1,66 +1,98 @@
 import os
 import json
 import glob
-import tsfel
+# import tsfel
 import pandas as pd
 from pathlib import Path
-
+from tsfel.feature_extraction.features_settings import get_features_by_domain, get_features_by_tag
+from tsfel.feature_extraction.calc_features import time_series_features_extractor, dataset_features_extractor
+from tsfel.utils.signal_processing import merge_time_series, signal_window_splitter
+from tsfel.utils.add_personal_features import add_feature_json
 
 # Example of user preprocess sensor data
 def pre_process(sensor_data):
-    if 'Accelerometer' in sensor_data:
-        sensor_data['Accelerometer'].iloc[:, 1] = sensor_data['Accelerometer'].iloc[:, 1] * 0
+    if "Accelerometer" in sensor_data:
+        sensor_data["Accelerometer"].iloc[:, 1] = sensor_data["Accelerometer"].iloc[:, 1] * 10
     return sensor_data
 
 
 # DATASET DIR
-main_directory = "tests" + os.sep + "tests_tools" + os.sep + "test_dataset" + os.sep
+main_directory = os.path.join("tests", "tests_tools", "test_dataset", "")
 
 # JSON DIR
-tsfel_path_json = tsfel.__path__[0] + os.sep + 'feature_extraction' + os.sep + 'features.json'
-personal_path_json = 'tests' + os.sep + 'tests_tools' + os.sep + 'test_features.json'
-personal_features_path = 'tests' + os.sep + "tests_tools" + os.sep + "test_personal_features.py"
+# tsfel_path_json = tsfel.__path__[0] + os.sep + "feature_extraction" + os.sep + "features.json"
+personal_path_json = os.path.join("tests", "tests_tools", "test_features.json")
+personal_features_path = os.path.join("tests", "tests_tools", "test_personal_features.py")
 
 # DEFAULT PARAM for testing
 time_unit = 1e9  # seconds
 resample_rate = 30  # resample sampling frequency
 window_size = 100  # number of points
 overlap = 0  # varies between 0 and 1
-search_criteria = ['Accelerometer.txt', 'Gyroscope.txt']
-output_directory = str(Path.home()) + os.sep + 'Documents' + os.sep + 'tsfel_output' + os.sep
+search_criteria = ["Accelerometer.txt", "Gyroscope.txt"]
+output_directory = str(Path.home()) + os.sep + "Documents" + os.sep + "tsfel_output" + os.sep
 sensor_data = {}
-key = 'Accelerometer'
+key = "Accelerometer"
 
 folders = [f for f in glob.glob(main_directory + "**/", recursive=True)]
-sensor_data[key] = pd.read_csv(folders[-1] + key + '.txt', header=None)
+sensor_data[key] = pd.read_csv(folders[-1] + key + ".txt", header=None)
 
-# add personal feature
-# tsfel.add_feature_json(personal_features_path, personal_path_json)
 
 # Features Dictionary
-settings0 = json.load(open(tsfel_path_json))
-settings1 = json.load(open(personal_path_json))
-settings2 = tsfel.get_features_by_domain('statistical')
-settings3 = tsfel.get_features_by_domain('temporal')
-settings4 = tsfel.get_features_by_domain('spectral')
-settings5 = tsfel.get_features_by_domain()
-# settings6 = tsfel.extract_sheet('Features')
-# settings7 = tsfel.extract_sheet('Features_test', path_json=personal_path_json)
-settings8 = tsfel.get_features_by_tag('inertial')
-settings10 = tsfel.get_features_by_tag()
+# settings0 = json.load(open(tsfel_path_json))
+settings2 = get_features_by_domain("statistical")
+settings3 = get_features_by_domain("temporal")
+settings4 = get_features_by_domain("spectral")
+settings5 = get_features_by_domain()
+settings8 = get_features_by_tag("inertial")
+settings10 = get_features_by_tag()
 
 # Signal processing
-data_new = tsfel.merge_time_series(sensor_data, resample_rate, time_unit)
-windows = tsfel.signal_window_splitter(data_new, window_size, overlap)
+data_new = merge_time_series(sensor_data, resample_rate, time_unit)
+data_new = data_new[data_new.columns[:-1]]
+windows = signal_window_splitter(data_new, window_size, overlap)
 
+n_jobs = 1
 # time_series_features_extractor
-features0 = tsfel.time_series_features_extractor(settings4, windows, fs=resample_rate)
-features1 = tsfel.time_series_features_extractor(settings2, data_new, fs=resample_rate, window_size=70, overlap=0.5)
-features2 = tsfel.time_series_features_extractor(settings3, windows, fs=resample_rate)
+
+# multi windows and multi axis
+# input: list
+features0 = time_series_features_extractor(settings5, windows, fs=resample_rate, n_jobs=n_jobs)
+
+# multiple windows and single axis
+# input: np.array
+features1 = time_series_features_extractor(settings5, data_new.values[:, 0], fs=resample_rate, n_jobs=n_jobs, window_size=window_size, overlap=overlap)
+# input: pd.series
+features2 = time_series_features_extractor(settings5, data_new.iloc[:, 0], fs=resample_rate, n_jobs=n_jobs, window_size=window_size, overlap=overlap)
+
+# single window and multi axis
+# input: pd.DataFrame
+features3 = time_series_features_extractor(settings5, data_new, fs=resample_rate, n_jobs=n_jobs)
+# input: np.array
+features4 = time_series_features_extractor(settings4, data_new.values, fs=resample_rate, n_jobs=n_jobs)
+
+# single window and single axis
+# input: pd.Series
+features5 = time_series_features_extractor(settings2, data_new.iloc[:, 0], fs=resample_rate, n_jobs=n_jobs)
+# input: np.array
+features6 = time_series_features_extractor(settings4, data_new.values[:, 0], fs=resample_rate, n_jobs=n_jobs)
+
+# personal features
+settings1 = json.load(open(personal_path_json))
+add_feature_json(personal_features_path, personal_path_json)
+features7 = time_series_features_extractor(settings1, data_new.values[:, 0], fs=resample_rate, n_jobs=n_jobs, features_path=personal_features_path)
+
 
 # Dataset features extractor
-data = tsfel.dataset_features_extractor(main_directory, settings1, search_criteria=search_criteria, time_unit=time_unit,
-                                        resample_rate=resample_rate, window_size=window_size, overlap=overlap,
-                                        pre_process=pre_process, output_directory=output_directory,
-                                        features_path=personal_features_path)
-print('-----------------------------------OK-----------------------------------')
+data = dataset_features_extractor(
+    main_directory,
+    settings4,
+    search_criteria=search_criteria,
+    time_unit=time_unit,
+    resample_rate=resample_rate,
+    window_size=window_size,
+    overlap=overlap,
+    pre_process=pre_process,
+    output_directory=output_directory,
+)
+print("-----------------------------------OK-----------------------------------")