Rename CEED to CEEDesigns (#9)

Merck · Nov 28, 2023 · d1d09c3 · d1d09c3
1 parent 323fc2e
commit d1d09c3
Show file tree

Hide file tree

Showing 31 changed files with 105 additions and 102 deletions.
diff --git a/LICENSES_THIRD_PARTY b/LICENSES_THIRD_PARTY
@@ -1,6 +1,6 @@
-In order to use CEED.jl, it is necessary to download and install several third-party Julia packages, which may be distributed under various licenses. For the most recent list of these packages, refer to `Project.toml` and consult the license terms of the individual packages.
+In order to use CEEDesigns.jl, it is necessary to download and install several third-party Julia packages, which may be distributed under various licenses. For the most recent list of these packages, refer to `Project.toml` and consult the license terms of the individual packages.
 
-You must agree to the terms of these licenses, in addition to the CEED.jl source code license, in order to use this software.
+You must agree to the terms of these licenses, in addition to the CEEDesigns.jl source code license, in order to use this software.
 
 --------------------------------------------------
 Third party software listed by License type

diff --git a/Project.toml b/Project.toml
@@ -1,6 +1,6 @@
-name = "CEED"
+name = "CEEDesigns"
 uuid = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
-version = "0.3.4"
+version = "0.3.5"
 
 [deps]
 Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"

diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,6 +1,6 @@
 [deps]
 BetaML = "024491cd-cc6b-443e-8034-08ea7eb7db2b"
-CEED = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
+CEEDesigns = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
 CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 D3Trees = "e3df1716-f71e-5df9-9e2d-98e193103c45"
 DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"

diff --git a/docs/make.jl b/docs/make.jl
@@ -1,5 +1,5 @@
 using Documenter, DocumenterMarkdown, Literate
-using CEED
+using CEEDesigns
 
 # Literate for tutorials
 const literate_dir = joinpath(@__DIR__, "..", "tutorials")
@@ -37,7 +37,7 @@ pages = [
 ]
 
 makedocs(;
-    sitename = "CEED.jl",
+    sitename = "CEEDesigns.jl",
     format = Documenter.HTML(;
         prettyurls = false,
         edit_link = "main",
@@ -46,4 +46,4 @@ makedocs(;
     pages,
 )
 
-deploydocs(; repo = "github.com/Merck/CEED.jl.git")
+deploydocs(; repo = "github.com/Merck/CEEDesigns.jl.git")
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -1,46 +1,46 @@
 # API Documentation
 
 ```@meta
-CurrentModule = CEED
+CurrentModule = CEEDesigns
 ```
 
 ## `StaticDesigns`
 
 ```@docs
-CEED.StaticDesigns.efficient_designs
-CEED.StaticDesigns.evaluate_experiments
+CEEDesigns.StaticDesigns.efficient_designs
+CEEDesigns.StaticDesigns.evaluate_experiments
 ```
 
 ## `GenerativeDesigns`
 
 ```@docs
-CEED.GenerativeDesigns.UncertaintyReductionMDP
-CEED.GenerativeDesigns.EfficientValueMDP
-CEED.GenerativeDesigns.State
-CEED.GenerativeDesigns.Variance
-CEED.GenerativeDesigns.Entropy
+CEEDesigns.GenerativeDesigns.UncertaintyReductionMDP
+CEEDesigns.GenerativeDesigns.EfficientValueMDP
+CEEDesigns.GenerativeDesigns.State
+CEEDesigns.GenerativeDesigns.Variance
+CEEDesigns.GenerativeDesigns.Entropy
 ```
 
 ```@docs
-CEED.GenerativeDesigns.efficient_design
-CEED.GenerativeDesigns.efficient_designs
-CEED.GenerativeDesigns.efficient_value
+CEEDesigns.GenerativeDesigns.efficient_design
+CEEDesigns.GenerativeDesigns.efficient_designs
+CEEDesigns.GenerativeDesigns.efficient_value
 ```
 
 ### Distance-Based Sampling
 
 ```@docs
-CEED.GenerativeDesigns.DistanceBased
-CEED.GenerativeDesigns.QuadraticDistance
-CEED.GenerativeDesigns.DiscreteDistance
-CEED.GenerativeDesigns.MahalanobisDistance
-CEED.GenerativeDesigns.Exponential
+CEEDesigns.GenerativeDesigns.DistanceBased
+CEEDesigns.GenerativeDesigns.QuadraticDistance
+CEEDesigns.GenerativeDesigns.DiscreteDistance
+CEEDesigns.GenerativeDesigns.MahalanobisDistance
+CEEDesigns.GenerativeDesigns.Exponential
 ```
 
 ## Plotting
 
 ```@docs
-CEED.plot_front
-CEED.make_labels
-CEED.plot_evals
+CEEDesigns.plot_front
+CEEDesigns.make_labels
+CEEDesigns.plot_evals
 ```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,4 +1,4 @@
-# CEED.jl: Overview
+# CEEDesigns.jl: Overview
 
 A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs. We have considered two different experimental setups, which are outlined below.
 
@@ -11,7 +11,7 @@ Here we assume that the same experimental design will be used for a population o
 
 For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features.
 
-In the cost-sensitive setting of CEED, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
+In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
 
 Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs.
 

diff --git a/docs/src/tutorials/GenerativeDesigns.jl b/docs/src/tutorials/GenerativeDesigns.jl
@@ -115,7 +115,7 @@ data = coerce(data, types);
 
 # ## Generative Model for Outcomes Sampling
 
-using CEED, CEED.GenerativeDesigns
+using CEEDesigns, CEEDesigns.GenerativeDesigns
 
 # As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable.
 

diff --git a/docs/src/tutorials/GenerativeDesigns.md b/docs/src/tutorials/GenerativeDesigns.md
@@ -125,7 +125,7 @@ nothing #hide
 ## Generative Model for Outcomes Sampling
 
 ````@example GenerativeDesigns
-using CEED, CEED.GenerativeDesigns
+using CEEDesigns, CEEDesigns.GenerativeDesigns
 ````
 
 As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable.
@@ -162,7 +162,10 @@ DistanceBased(
     target = "HeartDisease",
     uncertainty = Entropy,
     similarity = Exponential(; λ = 5),
-    distance = merge(Dict(c => DiscreteDistance() for c in categorical_feats), Dict(c => QuadraticDistance() for c in numeric_feats))
+    distance = merge(
+        Dict(c => DiscreteDistance() for c in categorical_feats),
+        Dict(c => QuadraticDistance() for c in numeric_feats),
+    ),
 );
 nothing #hide
 ````

diff --git a/docs/src/tutorials/StaticDesigns.jl b/docs/src/tutorials/StaticDesigns.jl
@@ -8,7 +8,7 @@
 
 # For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$.
 
-# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
+# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
 
 # To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.
 
@@ -44,7 +44,7 @@ data[1:10, :]
 
 # ## Predictive Accuracy
 
-# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
+# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
 
 # We specify the experiments along with the associated features:
 
@@ -96,9 +96,9 @@ model = classifier(; n_trees = 20, max_depth = 10)
 
 # ### Performance Evaluation
 
-# We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).
+# We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).
 
-using CEED, CEED.StaticDesigns
+using CEEDesigns, CEEDesigns.StaticDesigns
 
 #
 

diff --git a/docs/src/tutorials/StaticDesigns.md b/docs/src/tutorials/StaticDesigns.md
@@ -12,7 +12,7 @@ Let us consider a set of $n$ experiments $E = \{ e_1, \ldots, e_n\}$.
 
 For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$.
 
-In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
+In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
 
 To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.
 
@@ -50,7 +50,7 @@ data[1:10, :]
 
 ## Predictive Accuracy
 
-The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
+The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
 
 We specify the experiments along with the associated features:
 
@@ -120,10 +120,10 @@ model = classifier(; n_trees = 20, max_depth = 10)
 
 ### Performance Evaluation
 
-We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).
+We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below).
 
 ````@example StaticDesigns
-using CEED, CEED.StaticDesigns
+using CEEDesigns, CEEDesigns.StaticDesigns
 ````
 
 ````@example StaticDesigns

diff --git a/docs/src/tutorials/StaticDesignsFiltration.jl b/docs/src/tutorials/StaticDesignsFiltration.jl
@@ -14,7 +14,7 @@
 
 # We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result.
 
-# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
+# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
 
 # To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.
 
@@ -90,7 +90,7 @@ data_binary[1:10, :]
 
 # In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate.
 
-# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
+# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
 
 # We specify the experiments along with the associated features:
 
@@ -105,9 +105,9 @@ experiments = Dict(
 # We may also provide additional zero-cost features, which are always available.
 zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"]
 
-# For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments.
+# For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments.
 
-using CEED, CEED.StaticDesigns
+using CEEDesigns, CEEDesigns.StaticDesigns
 
 #
 
@@ -175,7 +175,7 @@ scatter!(
 using MCTS, D3Trees
 
 experiments = Set(vcat.(designs[end][2].arrangement...)[1])
-(; planner) = CEED.StaticDesigns.optimal_arrangement(
+(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement(
     costs,
     perf_eval,
     experiments;

diff --git a/docs/src/tutorials/StaticDesignsFiltration.md b/docs/src/tutorials/StaticDesignsFiltration.md
@@ -18,7 +18,7 @@ Moreover, it can be assumed that a set of extrinsic decision-making rules is imp
 
 We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result.
 
-In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
+In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment.
 
 To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$.
 
@@ -100,7 +100,7 @@ data_binary[1:10, :]
 
 In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate.
 
-The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
+The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`.
 
 We specify the experiments along with the associated features:
 
@@ -120,10 +120,10 @@ We may also provide additional zero-cost features, which are always available.
 zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"]
 ````
 
-For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments.
+For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments.
 
 ````@example StaticDesignsFiltration
-using CEED, CEED.StaticDesigns
+using CEEDesigns, CEEDesigns.StaticDesigns
 ````
 
 ````@example StaticDesignsFiltration
@@ -204,7 +204,7 @@ The following is a visualisation of the DPW search tree that was used to find an
 using MCTS, D3Trees
 
 experiments = Set(vcat.(designs[end][2].arrangement...)[1])
-(; planner) = CEED.StaticDesigns.optimal_arrangement(
+(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement(
     costs,
     perf_eval,
     experiments;

diff --git a/readme.md b/readme.md
@@ -1,11 +1,11 @@
 <p align="left">
-  <img src="docs/src/assets/ceed_light.svg#gh-light-mode-only" alt="CEED.jl logo"/>
-  <img src="docs/src/assets/ceed_dark.svg#gh-dark-mode-only" alt="CEED.jl logo"/>
+  <img src="docs/src/assets/ceed_light.svg#gh-light-mode-only" alt="CEEDesigns.jl logo"/>
+  <img src="docs/src/assets/ceed_dark.svg#gh-dark-mode-only" alt="CEEDesigns.jl logo"/>
 </p>
 
 _______
 
-[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://merck.github.io/CEED.jl/)
+[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://merck.github.io/CEEDesigns.jl/)
 
 A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs. We have considered two different experimental setups, which are outlined below.
 
@@ -16,7 +16,7 @@ Here we assume that the same experimental design will be used for a population o
 
 For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features.
 
-In the cost-sensitive setting of CEED, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
+In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
 
 Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs.
 

diff --git a/src/CEED.jl → src/CEEDesigns.jl b/src/CEED.jl → src/CEEDesigns.jl
@@ -1,4 +1,4 @@
-module CEED
+module CEEDesigns
 
 using DataFrames, Plots
 export front, plot_front