Skip to content

Commit

Permalink
Rename CEED to CEEDesigns (#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
thevolatilebit authored Nov 28, 2023
1 parent 323fc2e commit d1d09c3
Show file tree
Hide file tree
Showing 31 changed files with 105 additions and 102 deletions.
4 changes: 2 additions & 2 deletions LICENSES_THIRD_PARTY
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
In order to use CEED.jl, it is necessary to download and install several third-party Julia packages, which may be distributed under various licenses. For the most recent list of these packages, refer to `Project.toml` and consult the license terms of the individual packages.
In order to use CEEDesigns.jl, it is necessary to download and install several third-party Julia packages, which may be distributed under various licenses. For the most recent list of these packages, refer to `Project.toml` and consult the license terms of the individual packages.

You must agree to the terms of these licenses, in addition to the CEED.jl source code license, in order to use this software.
You must agree to the terms of these licenses, in addition to the CEEDesigns.jl source code license, in order to use this software.

--------------------------------------------------
Third party software listed by License type
Expand Down
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "CEED"
name = "CEEDesigns"
uuid = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
version = "0.3.4"
version = "0.3.5"

[deps]
Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
Expand Down
2 changes: 1 addition & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[deps]
BetaML = "024491cd-cc6b-443e-8034-08ea7eb7db2b"
CEED = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
CEEDesigns = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
D3Trees = "e3df1716-f71e-5df9-9e2d-98e193103c45"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Expand Down
6 changes: 3 additions & 3 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
using Documenter, DocumenterMarkdown, Literate
using CEED
using CEEDesigns

# Literate for tutorials
const literate_dir = joinpath(@__DIR__, "..", "tutorials")
Expand Down Expand Up @@ -37,7 +37,7 @@ pages = [
]

makedocs(;
sitename = "CEED.jl",
sitename = "CEEDesigns.jl",
format = Documenter.HTML(;
prettyurls = false,
edit_link = "main",
Expand All @@ -46,4 +46,4 @@ makedocs(;
pages,
)

deploydocs(; repo = "github.com/Merck/CEED.jl.git")
deploydocs(; repo = "github.com/Merck/CEEDesigns.jl.git")
38 changes: 19 additions & 19 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,46 @@
# API Documentation

```@meta
CurrentModule = CEED
CurrentModule = CEEDesigns
```

## `StaticDesigns`

```@docs
CEED.StaticDesigns.efficient_designs
CEED.StaticDesigns.evaluate_experiments
CEEDesigns.StaticDesigns.efficient_designs
CEEDesigns.StaticDesigns.evaluate_experiments
```

## `GenerativeDesigns`

```@docs
CEED.GenerativeDesigns.UncertaintyReductionMDP
CEED.GenerativeDesigns.EfficientValueMDP
CEED.GenerativeDesigns.State
CEED.GenerativeDesigns.Variance
CEED.GenerativeDesigns.Entropy
CEEDesigns.GenerativeDesigns.UncertaintyReductionMDP
CEEDesigns.GenerativeDesigns.EfficientValueMDP
CEEDesigns.GenerativeDesigns.State
CEEDesigns.GenerativeDesigns.Variance
CEEDesigns.GenerativeDesigns.Entropy
```

```@docs
CEED.GenerativeDesigns.efficient_design
CEED.GenerativeDesigns.efficient_designs
CEED.GenerativeDesigns.efficient_value
CEEDesigns.GenerativeDesigns.efficient_design
CEEDesigns.GenerativeDesigns.efficient_designs
CEEDesigns.GenerativeDesigns.efficient_value
```

### Distance-Based Sampling

```@docs
CEED.GenerativeDesigns.DistanceBased
CEED.GenerativeDesigns.QuadraticDistance
CEED.GenerativeDesigns.DiscreteDistance
CEED.GenerativeDesigns.MahalanobisDistance
CEED.GenerativeDesigns.Exponential
CEEDesigns.GenerativeDesigns.DistanceBased
CEEDesigns.GenerativeDesigns.QuadraticDistance
CEEDesigns.GenerativeDesigns.DiscreteDistance
CEEDesigns.GenerativeDesigns.MahalanobisDistance
CEEDesigns.GenerativeDesigns.Exponential
```

## Plotting

```@docs
CEED.plot_front
CEED.make_labels
CEED.plot_evals
CEEDesigns.plot_front
CEEDesigns.make_labels
CEEDesigns.plot_evals
```
4 changes: 2 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CEED.jl: Overview
# CEEDesigns.jl: Overview

A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs. We have considered two different experimental setups, which are outlined below.

Expand All @@ -11,7 +11,7 @@ Here we assume that the same experimental design will be used for a population o

For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features.

In the cost-sensitive setting of CEED, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.

Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/tutorials/GenerativeDesigns.jl
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ data = coerce(data, types);

# ## Generative Model for Outcomes Sampling

using CEED, CEED.GenerativeDesigns
using CEEDesigns, CEEDesigns.GenerativeDesigns

# As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable.

Expand Down
7 changes: 5 additions & 2 deletions docs/src/tutorials/GenerativeDesigns.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ nothing #hide
## Generative Model for Outcomes Sampling

````@example GenerativeDesigns
using CEED, CEED.GenerativeDesigns
using CEEDesigns, CEEDesigns.GenerativeDesigns
````

As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable.
Expand Down Expand Up @@ -162,7 +162,10 @@ DistanceBased(
target = "HeartDisease",
uncertainty = Entropy,
similarity = Exponential(; λ = 5),
distance = merge(Dict(c => DiscreteDistance() for c in categorical_feats), Dict(c => QuadraticDistance() for c in numeric_feats))
distance = merge(
Dict(c => DiscreteDistance() for c in categorical_feats),
Dict(c => QuadraticDistance() for c in numeric_feats),
),
);
nothing #hide
````
Expand Down
8 changes: 4 additions & 4 deletions docs/src/tutorials/StaticDesigns.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

# For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$.

# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.

# To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.

Expand Down Expand Up @@ -44,7 +44,7 @@ data[1:10, :]

# ## Predictive Accuracy

# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.

# We specify the experiments along with the associated features:

Expand Down Expand Up @@ -96,9 +96,9 @@ model = classifier(; n_trees = 20, max_depth = 10)

# ### Performance Evaluation

# We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).
# We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).

using CEED, CEED.StaticDesigns
using CEEDesigns, CEEDesigns.StaticDesigns

#

Expand Down
8 changes: 4 additions & 4 deletions docs/src/tutorials/StaticDesigns.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Let us consider a set of $n$ experiments $E = \{ e_1, \ldots, e_n\}$.

For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$.

In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.

To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.

Expand Down Expand Up @@ -50,7 +50,7 @@ data[1:10, :]

## Predictive Accuracy

The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.

We specify the experiments along with the associated features:

Expand Down Expand Up @@ -120,10 +120,10 @@ model = classifier(; n_trees = 20, max_depth = 10)

### Performance Evaluation

We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).
We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).

````@example StaticDesigns
using CEED, CEED.StaticDesigns
using CEEDesigns, CEEDesigns.StaticDesigns
````

````@example StaticDesigns
Expand Down
10 changes: 5 additions & 5 deletions docs/src/tutorials/StaticDesignsFiltration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

# We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result.

# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.

# To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.

Expand Down Expand Up @@ -90,7 +90,7 @@ data_binary[1:10, :]

# In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate.

# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.

# We specify the experiments along with the associated features:

Expand All @@ -105,9 +105,9 @@ experiments = Dict(
# We may also provide additional zero-cost features, which are always available.
zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"]

# For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments.
# For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments.

using CEED, CEED.StaticDesigns
using CEEDesigns, CEEDesigns.StaticDesigns

#

Expand Down Expand Up @@ -175,7 +175,7 @@ scatter!(
using MCTS, D3Trees

experiments = Set(vcat.(designs[end][2].arrangement...)[1])
(; planner) = CEED.StaticDesigns.optimal_arrangement(
(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement(
costs,
perf_eval,
experiments;
Expand Down
10 changes: 5 additions & 5 deletions docs/src/tutorials/StaticDesignsFiltration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Moreover, it can be assumed that a set of extrinsic decision-making rules is imp

We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result.

In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.

To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.

Expand Down Expand Up @@ -100,7 +100,7 @@ data_binary[1:10, :]

In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate.

The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.

We specify the experiments along with the associated features:

Expand All @@ -120,10 +120,10 @@ We may also provide additional zero-cost features, which are always available.
zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"]
````

For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments.
For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments.

````@example StaticDesignsFiltration
using CEED, CEED.StaticDesigns
using CEEDesigns, CEEDesigns.StaticDesigns
````

````@example StaticDesignsFiltration
Expand Down Expand Up @@ -204,7 +204,7 @@ The following is a visualisation of the DPW search tree that was used to find an
using MCTS, D3Trees
experiments = Set(vcat.(designs[end][2].arrangement...)[1])
(; planner) = CEED.StaticDesigns.optimal_arrangement(
(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement(
costs,
perf_eval,
experiments;
Expand Down
8 changes: 4 additions & 4 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
<p align="left">
<img src="docs/src/assets/ceed_light.svg#gh-light-mode-only" alt="CEED.jl logo"/>
<img src="docs/src/assets/ceed_dark.svg#gh-dark-mode-only" alt="CEED.jl logo"/>
<img src="docs/src/assets/ceed_light.svg#gh-light-mode-only" alt="CEEDesigns.jl logo"/>
<img src="docs/src/assets/ceed_dark.svg#gh-dark-mode-only" alt="CEEDesigns.jl logo"/>
</p>

_______

[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://merck.github.io/CEED.jl/)
[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://merck.github.io/CEEDesigns.jl/)

A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs. We have considered two different experimental setups, which are outlined below.

Expand All @@ -16,7 +16,7 @@ Here we assume that the same experimental design will be used for a population o

For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features.

In the cost-sensitive setting of CEED, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.

Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs.

Expand Down
2 changes: 1 addition & 1 deletion src/CEED.jl → src/CEEDesigns.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
module CEED
module CEEDesigns

using DataFrames, Plots
export front, plot_front
Expand Down
Loading

0 comments on commit d1d09c3

Please sign in to comment.