Releases: equinor/gordo
0.29.0
Release 0.28.0 of gordo-components
New release of gordo-components!
Small changes:
- All dependencies are updated, including pandas (0.24.2->0.25.0)
- Fix issue where IROC reader used 1 thread by default. (#409)
- Add exponential retries to influx forwarder (#413)
- Filter bad data (code 0) from the datalake (#423)
- Wrapper enabling use of standard scikit-learn scorers (#427)
Major change:
Change all our keras neural networks to take an explicit y
instead of using the passed (and possibly scaled) X
as the target.
This gives more freedom in several ways:
- It allows training towards a un-scaled
y
with a scaledX
, or having them xscaled in different ways. - It allows the
y
andX
to be different sets of tags. The targety
can be a subset ofX
or even a completely different set of tags. - It follows the standard scikit-learn pattern, making it easier to use e.g. standard scikit-learn scorers. (more about this below)
But it also involves some changes in the model definitions to get the same behavior as before.
Change in model format:
Previous model definition:
model:
sklearn.pipeline.Pipeline:
steps:
- sklearn.preprocessing.data.MinMaxScaler
- gordo_components.model.models.KerasLSTMAutoEncoder:
kind: lstm_hourglass
lookback_window: 10
New model definition:
model:
gordo_components.model.anomaly.diff.DiffBasedAnomalyDetector:
base_estimator:
sklearn.compose.TransformedTargetRegressor:
transformer: sklearn.preprocessing.data.MinMaxScaler
regressor:
sklearn.pipeline.Pipeline:
steps:
- sklearn.preprocessing.data.MinMaxScaler
- gordo_components.model.models.KerasLSTMAutoEncoder:
kind: lstm_hourglass
lookback_window: 10
Explanation:
The first class, gordo_components.model.anomaly.diff.DiffBasedAnomalyDetector
is a class which takes a base estimator as a parameter, and provides a new method anomaly
in addition to any methods the base_estimator
already has (like fit
and predict
). In the case of DiffBasedAnomalyDetector
the call to anomaly(X,y)
is implemented by calling predict
on the base_estimator
, scaling the output, scaling the passed y
, calculating the absolute value of the differences, and then calculating the norm. The output of anomaly(X,y)
is a multi-level dataframe with the original input and output to the base-estimator, in addition to per-sensor calculated errors (abs of differences) and the complete error score. The major difference from before is that the error-calculations are now an explicit class which can be used in e.g. notebooks, instead of existing as a function in the server-class.
The second new class in the config above is sklearn.compose.TransformedTargetRegressor
. This is a standard scikit-learn class which allows one to scale the target y
before the model is fitted, and then inverse scales the output of the base_estimator
when predict
is called. This class is needed if you want the Keras network to train towards scaled y
as it was before, if you do not want this then you can omit the sklearn.compose.TransformedTargetRegressor
.
Using scikit learn scores
It is now possible to use standard scikit-learn scorers with a simple wrapper.
Example:
from gordo_components import serializer
import yaml
import numpy
from sklearn.metrics import r2_score
config = yaml.load(
"""
sklearn.pipeline.Pipeline:
steps:
- sklearn.preprocessing.data.MinMaxScaler
- gordo_components.model.models.KerasLSTMAutoEncoder:
kind: lstm_hourglass
lookback_window: 10
epochs: 20
"""
)
model = serializer.pipeline_from_definition(config)
X = numpy.random.rand(100,10)
y = numpy.random.rand(100,10)
model.fit(X,y)
#This will fail since the output and the target is of different length
# r2_score(X,model.predict(X))
# The fix:
from gordo_components.model.utils import metric_wrapper
metric_wrapper(r2_score)(X, model.predict(X))
Release 0.27.0 of gordo-components
Release 0.26.1 of gordo-components
Release 0.26 of gordo-components
- Pass keyword arguments onto Keras compile, allowing more flexibility
- Add Gullfaks A as new asset
- Add "infinity" imputer
- Add pushing of "latest" tag for docker images, making it easier to always test latest build of master
- Optimize ML server post data processing, speeding it up
- Add pytest-benchmark
Release 0.25 of gordo-components
- Change default keras activation functions to tanh (#346)
- Server: Log timings and return as header (#345)
- Added PERA (Peregrino) as new asset
- Allow TimeseriesDataset to take and output target tags (#327)
- Support sklearn.multioutput.MultiOutputRegressor (#321)
- Add output activation function for feedforward NN as a parameter (#352)
Release 0.24 of gordo-components
- More robust and scalable watchman - using K8S updates
- Fix a bug that made the automatic client fail on IROC projects if train_start was not UTC
- Add multithreaded download of NCS data from datalake
- Support dry-run mode on the ncs_provider load_series
Release 0.23.0 of gordo-components
- Support building models without scoring/cross val (#326)
- Fix issue where the serializer drops parameters to keras (#333)
- Refactor ML Server into modular model views (#288)
- Rename auto encoders .transform() -> .predict() (#288)
- Upgrade scikit-learn ~=0.21
Note: This depends on a compatible version of gordo-infrastructure, the soon-to-be-released 0.24.0