This repository serves as a benchmark suite for shape-constrained regression algorithms, which can integrate prior knowledge about the expected shape of the prediction function.
Our benchmark suite builds upon established benchmarks for symbolic regression and extends them in functionality. We provide python code for generating training data for experiments, a set of validation datasets for result comparison, and the ability to check if a function adheres to the specified constraints.
Additionally, we provide the code and platform to compare to other shape-constrained regression algorithms by hosting up-to-date comparison charts and linking to other algorithms and their respective publications (if available).
See our examples
folder for more code snippets and usage examples. There we detail the data sampling methodology and show how to use our constraint checking interface. This README only highlights the key features.
Go here for more examples.
from SCRBenchmark import FEYNMAN_SRSD_HARD,HARD_NOISE_LEVELS,HARD_SAMPLE_SIZES
from SCRBenchmark import BenchmarkSuite
#creates one folder per equation under the parent folder
# each equation folder contains the info file as json
# and the data files for each configuration as csv
BenchmarkSuite.create_hard_instances(target_folder = './data',
Equations=FEYNMAN_SRSD_HARD,
sample_sizes=HARD_SAMPLE_SIZES,
noise_levels=HARD_NOISE_LEVELS)
Go here for more examples.
import SCRBenchmark.SRSDFeynman as srsdf
from SCRBenchmark import Benchmark
ICh6Eq20 = Benchmark(srsdf.FeynmanICh6Eq20)
(training, test) = ICh6Eq20.create_dataset(sample_size=1000, patience= 10, noise_level = 0)
Go here for more detail.
import SCRBenchmark.SRSDFeynman as srsdf
from SCRBenchmark import Benchmark
ICh6Eq20 = Benchmark(srsdf.FeynmanICh6Eq20)
# Raw: exp(-(theta / sigma) ** 2 / 2) / (sqrt(2 * pi) * sigma)
# is monotonic increasing over theta in -inf. <= theta <= 0
# is monotonic decreasing over theta in 0 <= theta <= inf.
#prints true as no constraint is violated
print(ICh6Eq20.check_constraints("-(theta*theta)",use_display_names=True))
#prints false as the decreasing constraint for 0 <= theta <= inf. is violated
print(ICh6Eq20.check_constraints("theta",use_display_names=True))
#prints false as the increasing constraint for -inf. <= theta <= 0 is violated
print(ICh6Eq20.check_constraints("-theta",use_display_names=True))
#prints false as both constraints for x0 (theta) are violated
print(ICh6Eq20.check_constraints("(theta*theta)",use_display_names=True))
We will continuously report on results using benchmark set. Currently, only results using shape-constrained polynomial regression (SCPR) are available. We run this algorithm with three different settings to showcase how comparison of algorithms will look like in the future. The detailed results of SCPR are published in our separate SCR-Benchmark SCPR result repository. The methodology and experimental setup are described in our publication.
Image shows the validation
Approach | Description | URL |
---|---|---|
SCPR | shape-constrained polynomial regression | SCPR result repository |
If the python package is hosted on a repository, you can install directly using:
pip install git+https://github.com/florianBachinger/SCR-Benchmarks.git
(you may need to run pip
with root permission: sudo pip install git+https://github.com/florianBachinger/SCR-Benchmarks.git
)
Then import the package:
import SCRBenchmark
Cite this work:
will be added in future
We enhance the work of Matsubara et al. by determining the shape constraints that describe the expected function shapes of the benchmark formulas. This knowledge can be used to enforce the behavior of trained prediction models and to e.g., improve extrapolation behavior.
Credit to Matsubara et al. for reviewing and adapting the formulas and variable ranges of Udrescu et al. to reasonable sampling values:
@article{Matsubara2022,
title={Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery},
author={Matsubara, Yoshitomo and Chiba, Naoya and Igarashi, Ryo and Tatsunori, Taniai and Ushiku, Yoshitaka},
journal={arXiv preprint arXiv:2206.10540},
year={2022}
}
Credit to Udrescu et al. for the initial benchmark set:
@article{Udrescu2020,
title={AI Feynman: A physics-inspired method for symbolic regression},
author={Udrescu, Silviu-Marian and Tegmark, Max},
journal={Science Advances},
volume={6},
number={16},
pages={eaay2631},
year={2020},
publisher={American Association for the Advancement of Science}
}