Easy-to-use Bayesian optimization library made available for either closed-loop or user-driven (manual) optimization of either
known or unknown objective functions. Drawing on PyTorch
(GPyTorch
), BOTorch
and with proprietary extensions.
A short primer on Bayesian optimization is provided in this section.
- Handles continuous, integer and categorical covariates.
- Optimization of either known or unknown functions. The allows for optimization of e.g. real-world experiments without specifically requiring a model of the system be defined a priori.
- Simple interface with focus on ease of use: only few lines of code required for full Bayesian optimization.
- Erroneous observations of either covariates or response can be overwritten during optimization.
- Well-documented code with detailed end-to-end examples of use, see examples.
- Optimization can start from scratch or repurpose existing data.
- Multivariate covariates, univariate system response: It is assumed that input covariates (the independent variables) can be either multivariate or univariate, while the system response (the dependent variable) is only univariate.
- Optimizing across continuous, integer and categorical covariates: Problems can depend on any of these types of variables, in any combination. Special attention is given to implementation of integer and categorical variables which are handled via the method of Garrido-Merchán and Hernandéz-Lobato (E.C. Garrido-Merchán and D. Hernandéz-Lobato, Neurocomputing, see References).
- System-generated or manual input: Observations of covariates and responses during optimization can be provided both programmatically or manually via prompt input.
- Optimizes known and unknown response functions: Both cases where the response function can be formulated mathematically and cases where the response can only be measured (e.g. a real-life experiment) can be optimized.
- Observed covariates can vary from the proposed covariates: The optimization routine at each iteration proposes new covariate data points to investigate, but there is no requirement that this is also the observed data point. At each iteration step, proposed covariates, observed covariates and observed response are 3 separate entities. That that noisy or unexpected measurement points will be fully useful (no introduce any errors), even if they vary a lot from the proposed covariate data points.
- Data stored in class instance: Data for proposed covariate data points, observed covariates and observed responses is stored in the instantiated class object.
- Data format and type validation: Input data is validated at each iteration.
- Observations of covariates and response can be overridden during execution: If an observation of either covariates or response seems incorrect, the framework allows overriding the previous observation.
- Consistency in number of covariates and observations: It is assumed that there is consistency in the number of observations of covariates and responses: at each step a new covariate data point is proposed, before observations of covariates and response for this iteration are reported (specifically the number of proposed data points cannot exceed the number of observed covariates by more than 1, and the number of observed covariates also cannot exceed the number of observed responses by more than 1). If additional data is provided for either observed covariates or observed response, this will override the last provided data.
The library is available on https://pypi.org/, so to install simply run
pip install greattunes
You can also download the library source code and install it from there.
Installing torch
dependencies is not always a requirement. Unfortunately, in some cases torch
-libraries have to
be installed outside normal bulk pip install -r requirements.txt
. First try to install directly via steps 1-3 in
Install library below, and only install torch
libraries manually if direct installation fails.
To find the right installation command for torch
, use this link
to determine the details and add as a separate command in the github
actions yaml. As an example, the following is the
install command on my local system (an Ubuntu
-based system with pip
and without CUDA
access)
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
Currently the code is not available on any repo servers except the private GitHub account. The best way to install the
code (after adding torch
and torchvision
) is follow this series of steps.
- Upgrade local versions of packaging libraries
pip install --upgrade setuptools wheel
- Clone this repo
- Do local installation
python -m pip install https://github.com/svedel/greattunes/
Step 3 will install by running greattunes/setup.py
locally and installing. This step can also be broken into two,
which might improve debugging
python3 https://github.com/svedel/greattunes/ setup.py bdist_wheel
python -m pip install https://github.com/svedel/greattunes/dist/greattunes-<version>-py3-none-any.whl
where <version>
is the latest version in normal python
format of MAJOR.MINOR[.MICRO]
(check /dist
-folder to see which one to pick).
All capabilities of the framework are described below.
For readers wanting to skip directly to working with the framework, a number of examples of how to use the framework end-to-end are included as Jupyter notebooks in examples.
Solving an optimization problem consists of two steps in this framework:
- Define the input variables (covariates), the surrogate model type and the acquisition function. Also define the response function if this is known
- Optimize based on closed-loop or iterative interface
Here's a simple illustration of how to do this for a known function f
.
The critical things to define in this step are
- The number of covariates. Upper and lower limits must be provided for each covariate to constraint the search space, and initial guess for each to be provided as well. Works for both univariate and multivariate covariate structures.
- The type of surrogate model. The model will be fitted at each step of the optimization.
- The type of acquisitition function. This will also be fitted at each step of the optimization.
# import library
from greattunes import TuneSession
# === Step 1: define the input ===
# specify covariate. For each covariate of the model provide best guess of starting point together with upper and lower
# limit
x_start = 0.5 # initial guess
x_min = 0 # lower limit
x_max = 1 # upper limit
covars = [(x_start, x_min, x_max)]
# initialize the class
cls = TuneSession(covars=covars, model="SingleTaskGP", acq_func="ExpectedImprovement")
In order to optimize, we must first describe which function we want to do this for. The framework works both when this function can be formulated mathematically and when it can only be sampled (e.g. through examples) but cannot be formulated. For an illustrate of the latter see Example 2 under examples.
Here we will work with a known objective function to optimize
# === Step 2: solve the problem ===
# univariate function to optimize
import numpy as np
def f(x):
return -(6 * x - 2) ** 2 * np.sin(12 * x - 4)
Beware that the number of covariates (including their range) specified by covars
under Step 1 must comply with the
functional dependence of the objective function (x
in the case above).
We are now ready to solve the problem. We will run for max_iter
=20 iterations.
# run the auto-method
cls.auto(response_samp_func=f, max_iter=max_iter)
Had we worked with an objective function f
which could not be formulated explicitly, the right entrypoint would have
been to use the .ask
-.tell
methods instead of .auto
.
The following key attributes are stored for each optimization as part of the instantiated class. These primary
data structures for users are stored in pandas
dataframes in pretty format. current_best
and best_predicted
are
methods which print their output to the prompt.
Attribute/method | Comments |
---|---|
x_data |
All observed covariates with dimensions, one row per observation. If no names have been added to the covariates they will take the naems "covar0", "covar1", ... . Dimensions num_observations X num_covariates . |
y_data |
All observed responses corresponding to the covariate points (rows) in x_data . Dimensions num_observations X 1. |
best_response |
Best observed response value during optimization run, including current iteration. Dimensions num_observations X 1. |
covars_best_response |
Observed covariates for best response value during optimization run, i.e. each row in covars_best_response generated the same row in best_response . Dimensions num_observations X num_covariates . |
current_best() |
Returns the best observed response of the objective up to the current iteration. |
best_predicted() |
Best predicted response from the surrogate model. Calculates for the mean model as well as for the lower confidence region (e.g. mean minus one standard deviation) of the full model. For both cases also returns the covariates resulting in the maximum. |
In the backend the framework makes use of different data structures based on the tensor
structure from torch
which
also handles one-hot encoding of categorical variables. The key backend attributes are listed in the table below.
Attribute | Comments |
---|---|
train_X |
All observed covariates with dimensions num_observations X num_covariates . Backend equivalent to x_data . |
proposed_X |
All proposed covariate datapoints to investigate, with dimensions num_observations X num_covariates . |
train_Y |
All observed responses corresponding to the covariate points in train_X . Dimensions num_observations X 1. Backend equivalent to y_data . |
best_response_value |
Best observed response value during optimization run, including current iteration. Dimensions num_observations X 1. Backend equivalent to best_response . |
covars_best_response_value |
Observed covariates for best response value during optimization run, i.e. each row in covars_best_response_value generated the same row in best_response_value . Dimensions num_observations X num_covariates . Backend equivalent to covars_best_response . |
The user must detail which covariates the framework can adjust in order to optimize (maximize/minimize) the
response. This is a mandatory part of class initialization and set via covars
input variable; without any knowledge
of the covariates, the framework cannot proceed to optimization. Here's an example for a problem with two covariates
covars = [(0.5, 0, 1), (2,1,4)] # each tuple defines one covariate; the tuple entries are (initial guess, min, max)
# initialize the class
cls = TuneSession(covars=covars, ...)
This is also illustrated for a single-variable situation in Step 1: Define the problem above.
The following three types of covariates are supported.
- Continuous: Variables which can take any numerical value, i.e. can take values which include decimals. The data
type of a continuous variable will be among
float
types. Typical examples of continuous covariates will be weights in a model and time thresholds (imagine a case where total runtime was a parameter). - Integer: Variables which can only take integer values; the data types of these variables will be among
int
types. Special consideration must be taken during optimization because these variables only can update in discrete steps, resulting in step changes of the response. Examples of integer covariates include number of layers in a neural network and number of eggs in a recipe. - Categorical: Variables that can take different discrete values, which, contrary to integers do not even have any
internal relation in terms of size. An example is a variable which can take the values {
green
,blue
,red
} where there clearly is no direct numerical relationship between the potential values; in contrast, a numerical relationship does exist for integer variables (e.g. 5 is bigger than 2). In addition to the color example above, another example of a categorical variable can be one which determines the make of a car (e.g. take valuesvolvo
,lincoln
,fiat
etc)
The framework follows the method of Garrido-Merchán and Hernandéz-Lobato (see References) to integrate the different types of covariates and bring them to a form that is consistent with using continuous Gaussian processes to drive the optimization. Briefly, the method relies on adding a transformation of variables in the correlation (kernel) function of the Gaussian processes with the following properties: integer covariates are rounded to nearest integer and categorical variables are one-hot encoded and only the one with highest numerical value is carried forward in each round by adjusting the value of its associated one-hot encoded variable to 1 and setting all other one-hot encoded variables to 0.
Two approaches to defining covariates in framework: working with named covariates and setting data types
Two ways are offered to provide covariate details to the framework: the simple way which assigns names to covariates
and infers their data types from the provided data in covars
(used so far), and an elaborate way which allows for
naming covariates and gives more control to specify data types. In either case, the information is given to the
framework via the covars
input variable.
Each covariate is defined by a tuple, and the order of the tuples defines the order of the covariates. The same order
must be used later if covariates are manually reported via the .tell
-method.
Covariate data type is critical because it impacts how to handle the covariate during the optimization. In this simple
approach, data types are inferred from the provided data in covars
as indicated by the table below.
Data type | How report | Example | Comments |
---|---|---|---|
Integer | (<initial_guess> ,<parameter_minimum> , <parameter_maximum> ) |
(2, 0, 5) |
All tuple entries must be of data type int for covariate to be taken as integer |
Continuous | (<initial_guess> ,<parameter_minimum> , <parameter_maximum> ) |
(2.0, -1.2, 2.5) |
Only one tuple entry has to be a float for the covariate to be set to continuous |
Categorical | (<initial_guess> ,<option_1> , <option_2> , ...) |
(volvo, fiat, aston martin, ford, toyota) |
Covariate is taken as categorical if any entry has data type str . There must be at least one other option than <initial_guess> , but otherwise no limit to the number of entries. |
Here's an example of how to use the simple approach to define the covars
-variable to communicate covariates of
different data types. This covars
could be used to initialize a class instantiation
covars = [
(1, 0, 2), # will be taken as INTEGER (type: int)
(1.0, 0.0, 2.0), # will be taken as CONTINUOUS (type: float)
(1, 0, 2.0), # will be taken as CONTINUOUS (type: float)
("red", "green", "blue", "yellow"), # will be taken as CATEGORICAL (type: str)
("volvo", "chevrolet", "ford"), # will be taken as CATEGORICAL (type: str)
("sunny", "cloudy"), # will be taken as CATEGORICAL (type: str)
]
Covariates are assigned names behind the scenes of the type covar1
, covar2
etc. with numbers added in the order in
which the variable is processed from the covars
list of tuples during class initialization (beware that this order may
not be preserved). Covariate names are visible as the column names in the x_data
attribute.
This approach requires a bit more details to be provided, but also offers much more flexibility.
In this approach, all covariates are defined in a dictionary which is fed via the covars
parameter, and each covariate is defined by their own dictionary
nested within the outer dictionary specifying all covariates. An example, which will be elaborated further in the
following, is given below for 3 covariates to make this concrete
covars = {
'variable1': # type: integer
{
'guess': 1,
'min': 0,
'max': 2,
'type': int,
},
'variable2': # type: continuous (float)
{
'guess': 12.2,
'min': -3.4,
'max': 30.8,
'type': float,
},
'variable3': # type: categorical (str)
{
'guess': 'red',
'options': {'red', 'blue', 'green'},
'type': str,
}
}
Each nested dictionary gives the details of an individual covariate, and the name of these nested dictionaries are used to name the covariate.
Covariate names: Anything that's permissable as a python
string is a valid covariate name. These names are used
throughout the framework (will be inherited into x_data
).
Specifying data type: The variable type
indicates the type of the covariate. The framework uses the following types
int
: integer covariatefloat
: continuous covariatestr
: categorical covariate Beware that the data type (and not a string) is used to define the type (i.e. use e.g.str
not'str'
to indicate a categorical variable).
Required information for each covariate: Requirements vary with the covariate data type. The following is required for each type of covariate
- Integer (
'type': int
): Required fields areguess
,min
andmax
(all single entries of type:int
), as well astype
(must beint
to specify categorical). - Continuous (
'type': float
): Required fields areguess
,min
andmax
(all single entries of typesint
orfloat
), as well astype
(must befloat
to specify categorical). - Categorical (
'type': str
): Required fields areguess
(a single entry, type:str
),options
(dictionary ofstr
, one for each option the covariate can take. Must also include the element inguess
) andtype
(must bestr
to specify categorical).
The example above shows 3 covariates but the framework can handle any number of covariates. Simply adjust the number of nested dictionaries to meet the need (and use appropriate naming and covariate specification for your application).
Multivariate covariates are set via the (mandatory) covars
parameter during class initialization. Each covariate is
given as a 3-tuple of parameters (<initial_guess>
,<parameter_minimum>
, <parameter_maximum>
) (the order matters!), with covars
being a
list of these tuples. As an example, for a cases with 3 covariates, the covars
parameter would be
covars = [(1, 0, 4.4), (5.2, 1.5, 7.0), (4, 2.2, 5.1)]
The order of the covariates matters since framework does not work with named covariates. Hence, the parameter defined
by the first tuple in covars
will always have to be reported as the first covariate when iterating during
optimization, the second covariate will be initialized by the second tuple in covars
etc.
Observations of multivariate covariates are specified as columns in the train_X
attribute (format: torch.tensor
),
with observations added as rows. As an example, the initial guess for the three covariates defined by covars
above
would be
train_X = torch.tensor([[1, 5.2, 4]], dtype=torch.double)
If historical data for pairs of covariates and response is available for your system, this can be added during initialization. In this case the optimization framework will have a better starting position and will likely converge more quickly.
Historical data is added during class initialization. The number of observations (rows) of covariates and response must
match. Historical training data is added during class instantiation via arguments train_X=<>
and train_Y=<>
as
illustrated below for the following cases
- Multiple observations of multivariate system
- Single observation of univariate system
- Single observation of multivariate system
# import
import torch
from greattunes import TuneSession
### ------ Case 1 - multiple observations (multivariate) ------ ###
# set range of data
covars = [(1, 0, 4.4), (5.2, 1.5, 7.0), (4, 2.2, 5.1)]
# define initial data
X = torch.tensor([[1, 2, 3],[3, 4.4, 5]], dtype=torch.double)
Y = torch.tensor([[33],[37.8]], dtype=torch.double)
# initialize class
cls = TuneSession(covars=covars,train_X=X, train_Y=Y)
### ------ Case 2 - single observation (univariate) ------ ###
# set range of data
covars = [(1, 0, 4.4)]
# define initial data
X = torch.tensor([[1]], dtype=torch.double)
Y = torch.tensor([[33]], dtype=torch.double)
# initialize class
cls = TuneSession(covars=covars,train_X=X, train_Y=Y)
### ------ Case 3 - single observation (multivariate) ------ ###
# set range of data
covars = [(1, 0, 4.4), (5.2, 1.5, 7.0), (4, 2.2, 5.1)]
# define initial data
X = torch.tensor([[1, 2, 3]], dtype=torch.double)
Y = torch.tensor([[33]], dtype=torch.double)
# initialize class
cls = TuneSession(covars=covars,train_X=X, train_Y=Y)
Starting from a few randomly sampled datapoints typically increases the convergence of the optimization because it makes it less likely that the algorithm locks onto a local maximum without consideration for an unknown global one. Furthermore, in the absence of historical data, random sampling is the best option is to start.
Random initialization is enabled via the parameter random_start
during initialization and can be applied both in case
historical data has been added or not (default is random_start = True
).
# import
import torch
from greattunes import TuneSession
### ------ Case 1 - No historical data ------ ###
# set range of data
covars = [(1, 0, 4.4), (5.2, 1.5, 7.0), (4, 2.2, 5.1)]
# define initial data
X = torch.tensor([[1, 2, 3],[3, 4.4, 5]], dtype=torch.double)
Y = torch.tensor([[33],[37.8]], dtype=torch.double)
# initialize class
cls = TuneSession(covars=covars, random_start=True)
### ------ Case 2 - With historical data ------ ###
# set range of data
covars = [(1, 0, 4.4), (5.2, 1.5, 7.0), (4, 2.2, 5.1)]
# define initial data
X = torch.tensor([[1, 2, 3],[3, 4.4, 5]], dtype=torch.double)
Y = torch.tensor([[33],[37.8]], dtype=torch.double)
# initialize class
cls = TuneSession(covars=covars,train_X=X, train_Y=Y, random_start=True)
Number of random datapoints: The number of random datapoints to be sampled is set via the kwarg num_initial_random
during initialization. This defaults to the closest integer to
Sampling method: Two sampling methods are available:
random
: Fully random sampling within the whole hypercube specified bycovars
.latin_hcs
: Latin hypercube sampling within the hypercube specified bycovars
. The sampling method is determined by the kwargrandom_sampling_method
during class initialization.
Just like random initialization helps with convergence, best practice also prescribes adding randomly sampled points during the optimization run.
This is easily done within this framework. The parameter random_step_cadence
determines the cadence between randomly
sampled datapoints (in between points sampled via Bayesian optimization).
The following kernels for Gaussian process surrogate model are implemented. Model type and listed parameters are
provided as input to class initialization, i.e. during initialization of TuneSession
Model name | Parameters | Comments |
---|---|---|
"SingleTaskGP" |
N/A | A single-task exact kernel for Gaussian process regression. Follow this link for more details. |
"FixedNoiseGP" |
train_Yvar |
A single-task exact kernel for Gaussian process regression assuming a fixed noise level. Follow this link for more details. |
"HeteroskedasticSingleTaskGP" |
train_Yvar |
A single-task exact kernel for Gaussian process regression using a heteroskedastic noise model. Follow this link for more details. |
"SimpleCustomMaternGP" |
nu |
A custom Matérn kernel with parameter nu (a float). For more details on Matérn kernels see wiki page, and see the source code for the model in greattunes\custom_models . |
These acquisition functions are currently available. Parameters (if any) are provided during initialization of the
TuneSession
class instance.
Acquisition function name | Parameter | Comments |
---|---|---|
"ExpectedImprovement" |
N/A | Expected improvement acquisition function. This is the default for greattunes . For more details see here or Section 2 in this paper. |
"NoisyExpectedImprovement" |
num_fantasies (default: 20) |
Expected improvement acquisition averaged over num_fantasies realizations of a single but noisy model. Requires that the Gaussian process model is of the type FixedNoiseGP . For more details see here. |
"qExpectedImprovement" |
sampler (default: botorch.sampling.SobolQMCNormalSampler ) |
Monte Carlo-based expected improvement function. For more details see here. |
"qNoisyExpectedImprovement" |
sampler (default: botorch.sampling.SobolQMCNormalSampler ) |
Monte Carlo-based noisy expected improvement function. For more details see here. |
"PosteriorMean" |
N/A | Posterior mean. Requires the surrogate (Gaussian process) model to have a mean property (all implemented models do). For more details see here. |
"ProbabilityOfImprovement" |
N/A | Probability of improvement over the current best observed value, computed using the analytic formula under a Normal posterior distribution. Requires the outcome to be Gaussian. For more details see here. |
"qProbabilityOfImprovement" |
sampler (default: botorch.sampling.SobolQMCNormalSampler ) |
Monte Carlo based probability of improvement method. For more details see here. |
"qSimpleRegret" |
sampler (default: botorch.sampling.SobolQMCNormalSampler ) |
Monte Carlo method for simple regret. For more details see here. |
"UpperConfidenceBound" |
beta (default: 0.2) |
Analytic upper confidence bound that comprises of the posterior mean plus an additional term: the posterior standard deviation weighted by a trade-off parameter, beta . For more details see here. |
"qUpperConfidenceBound" |
beta (default: 0.2), sampler (default: botorch.sampling.SobolQMCNormalSampler ) |
Monte carlo based Upper Confidence Bound method. For more details see here or here. |
"qKnowledgeGradient" |
num_fantasies (default: 20) |
Computes the Knowledge Gradient using realizations ("fantasies") for the outer expectation and either the model posterior mean or MC-sampling for the inner expectation. For a fixed number of realizations ("fantasies"), optimizes in a “one-shot” fashion. For more details see here or here. |
"qMaxValueEntropy" |
N/A | Uses max-value entropy search. This acquisition function computes the mutual information of max values and a candidate point. For more details see here or here. |
"qMultiFidelityMaxValueEntropy" |
N/A | Multi-fidelity max-value entropy search. For more details see here or here. |
Closed-loop optimization refers to situations where the function is known and therefore can iterate itself to
optimality. These are addressed via the .auto
method, which takes a function handle response_samp_func
as well as a
maximum number of iterations max_iter
as input parameters. See the example above as
illustration of how to use the method.
The optimization can be stopped before max_iter
steps have been taken by specifying the limit on the relative
improvement in best observed response value (best_response_value
). This is invoked by providing the parameter
rel_tol
to the .auto
method.
# some function to optimize
def f(x):
...
# parameters
max_iter = 100
rel_tol = 1e-10
# run the auto-method
cls.auto(response_samp_func=f, max_iter=max_iter, rel_tol=rel_tol)
In most cases the best results are found by requiring the rel_tol
limit to be satisfied for multiple consecutive
iterations. This can be achieved by also providing the number of consecutive steps required rel_tol_steps
. If
rel_tol_steps
is not provided, the limit on relative improvement only needs to be reached once for convergence.
# some function to optimize
def f(x):
...
# parameters
max_iter = 100
rel_tol = 1e-10
rel_tol_steps = 5
# run the auto-method
cls.auto(response_samp_func=f, max_iter=max_iter, rel_tol=rel_tol, rel_tol_steps=rel_tol_steps)
Best practises on using rel_tol
and rel_tol_steps
are provided in Example 5 in examples.
The true value of Bayesian optimization is its ability to optimize problems which cannot be formulated mathematically. The mathematical method can work as long as a response can be generated, and in fact makes no assumptions on the nature of the problem (except that a maximum is present). Thus, whether the response is generated as a measurement from an experiment, the feedback from users or the output of a defined mathematical function does not matter; all can be optimized via the framework.
Optimization of unknown functions is handled by the methods .ask
and .tell
.
.ask
provides a best guess of the next covariate data point to sample, given the history of previously sampled points for the problem (that is,.ask
provides the output of the acquisition function).tell
is the method to report the observed covariate data point and the associated response One call to.ask
followed by a call to.tell
performs one iteration of.auto
from the point of view of the Bayesian optimization; the difference is only in how to interface with it. Examples 2 and 3 in examples shows how to use.ask
-.tell
to solve problems end-to-end.
To solve a problem, apply these problems iteratively: in each iteration start by calling .ask
, then use the proposed
new data point to sample the system response and provide both this value and the actually sampled covariate values (can
be different from proposed values) back via .tell
.
# in below, "cc" is an instantiated version of TuneSession class (identical initialization as when using .auto method)
max_iter = 20
for i in range(max_iter):
# generate candidate
cls.ask() # new candidate is last row in cc.proposed_X
# sample response (beware results must be formulated as torch tensors)
observed_covars = <from measurement or from cc.proposed_X>
observed_response = <from measurement or from specified objective function>
# report response
cls.tell(covars=observed_covars, response=observed_response)
Observations of covariates and response can be provided manually to .tell
. To do so, simply call .tell
without any
arguments at each iteration (all book keeping will be handled on backend)
# in below, "cc" is an instantiated version of TuneSession class (identical initialization as when using .auto method)
max_iter = 20
for i in range(max_iter):
# generate candidate
cls.ask() # new candidate is last row in cc.proposed_X
# report response
cls.tell()
In this case, the user will be prompted to provide input manually. There will be 3 attempts to provide covariates (another 3 for response), and the method will stop if not successful within these attempts. Provided input data will be validated for number of variables and data type as part of these cycles.
Any of covars
and response
not provided as (named) parameter to .tell
the user will be requested to provide via
manual input in prompt. It is thus possible to get e.g. covariates automatically but manually read off response values
from an instrument.
Observed covariates and observed responses are sometimes off. To override the latest datapoint for either, simply provide it again in the same iteration. This will automatically override the latest reported value
# in below, "cc" is an instantiated version of TuneSession class (identical initialization as when using .auto method)
# further assumes that at least on full iteration has been taken
# define a response
def f(x):
...
# generate candidate
cls.ask() # new candidate is last row in cc.proposed_X
# first result
observed_results = torch.tensor([[it.item() for it in cc.proposed_X[-1]]], dtype=torch.double)
observed_response = torch.tensor([[f(cc.proposed_X[-1]).item()]], dtype=torch.double)
# report first response
cls.tell(covars=observed_results, response=observed_response)
# second result
observed_response_second = observed_response + 1
# update response
cls.tell(covars=observed_results, response=observed_response_second)
Some standard plots and standard methods for presenting the results have been included.
plot_1d_latest()
: plots the latest retrained surrogate model (mean and variance), including all sampled data points.plot_convergence()
: plots the relative error between consecutive iterations.plot_best_objective()
: plots the best recorded value of the objective function as a function of the number of iterations.
These methods print their results to the prompt.
current_best()
: returns the largest observed response value (observed in either previous or current iteration). Also returns the corresponding values of the covariates.best_predicted()
: returns the largest response predicted from the surrogate model trained on all available data. Two values are returned: the largest mean and the largest of the lower confidence region, i.e. the largest value of the mean minus the first standard deviation (note: heteroskedacticity is allowed, so the standard deviation will vary across different covariates). Also returns the corresponding covariate values. Uses the Nelder-Mead method, a multivariate equivalent to bisection, to find maximum value of surrogate model.
We are happy if you would like to invest time in this project! Details are given in CONTRIBUTING.md on how to get started.
A number of examples showing how to use the framework in jupyter
notebooks is available in the examples
folder. This includes both closed-loop and iterative usages, as well as a few real-world examples (latter to come!)
- E.C. Garrido-Merchán and D. Hernandéz-Lobato: Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes, Neurocomputing vol. 380, 7 March 2020, pp. 20-35, ArXiv preprint
A number of good resources are available for Bayesian optimization, so below follows only a short primer. Interested readers are referred to the references listed below for more information.
Briefly and heuristically, Bayesian optimization works as follows.
- Define a objective function. The goal of the optimization is to maximize this function.
- Define a surrogate model. This is an approximation of the actual functional dependencies underlying the objective function. Because Bayesian optimization builds its own model there is no requirement that the objective function can be written as a mathematical expression.
- Define an acquisition function. This function is applied to the surrogate model to identify the next datapoint to sample (as such, the acquisition function is actually a functional)
- Iterate:
- Use the acquisition function to identify the next data point to sample.
- Observe the response of the objective function at the proposed point
- Based on all observed covariates and responses of the objective function, update the surrogate model via Bayes theorem and repeat.
A typical choice of surrogate model class is the Gaussian process, but this is not a strict requirement. Examples exist in which both random forest and various types of neural networks have been used.
Formally, Bayesian optimization considers the function to be optimized as unknown and instead places a Bayesian prior distribution over it. This is the initial surrogate model. Upon observing the response, the prior model is updated to obtain the posterior distribution of functions.
The benefit of Gaussian process models is their explicit modeling of the uncertainty and ease of obtaining the posterior.
Acquisition functions (functionals) propose the best point to sample for a particular problem, given the prior distribution of the surrogate model.
A number of different functions exist, with some typical ones provided in Peter Frazier's Tutorial on Bayesian Optimization. They typically balance exploration and exploitation in different ways.
A list of Bayesian optimization references for later use
- Wikipedia entry on Bayesian optimization
BoTorch
introduction to Bayesian optimization- borealis.ai
- bayesopt, SigOpt page
- Towards Data Science
- Gaussian processes for dummies
- Peter Frazier, Cornell, Bayesian Optimization expert
- Tutorial on Bayesian Optimization
- Bayesian Optimization, Martin Krasser's blog
- Bayesian Optimization with inequality constraints
- Bayesian deep learning