Skip to content

Commit 903bd33

Browse files
committed
Documentation updates
1. Doc updates 2. Vignettes updates
1 parent b4fdd62 commit 903bd33

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+775
-1295
lines changed

.Rbuildignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,9 @@
1111
^RESEARCH-NOTICE\.md$
1212
^vignettes/images
1313
^vignettes/motorcycle.Rmd$
14+
^vignettes/classification.Rmd$
15+
^vignettes/large_scale_emulation.Rmd$
16+
^vignettes/linked_DGP.Rmd$
17+
^vignettes/seq_design.Rmd$
18+
^vignettes/seq_design_2.Rmd$
1419
^LICENSE\.md$

NAMESPACE

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,6 @@ S3method(validate,lgp)
2929
S3method(vigf,bundle)
3030
S3method(vigf,dgp)
3131
S3method(vigf,gp)
32-
export(Hetero)
33-
export(NegBin)
34-
export(Poisson)
3532
export(alm)
3633
export(combine)
3734
export(continue)
@@ -42,7 +39,6 @@ export(draw)
4239
export(get_thread_num)
4340
export(gp)
4441
export(init_py)
45-
export(kernel)
4642
export(lgp)
4743
export(mice)
4844
export(nllik)

NEWS.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
- The `plot()` function has been updated to generate validation plots for DGP classifiers (i.e., DGP emulators with categorical likelihoods) and linked emulators created by `lgp()` using the new data frame form for `struc`.
1616
- The `summary()` function has been redesigned to provide both summary tables and visualizations of structure and model specifications for (D)GP and linked (D)GP emulators.
1717
- A `sample_size` argument has been added to the `validate()` and `plot()` functions, allowing users to adjust the number of samples used for validation when the validation method is set to `sampling`.
18-
- The following functions are deprecated as of this version and will be removed in the next release: `combine()`, `set_linked_idx()`, `kernel()`, `Poisson()`, `Hetero()`, and `NegBin()`. These functions are no longer maintained. Please refer to the updated package documentation for alternative workflows.
19-
- The basic node functions `kernel()`, `Hetero()`, `Poisson()`, and `NegBin()`, along with the `struc` argument in the `gp()` and `dgp()` functions, have been deprecated as of this version and will be removed in the next release. Customization of (D)GP specifications can be achieved by modifying the other arguments in `gp()` and `dgp()`.
18+
- `combine()` and `set_linked_idx()` are deprecated as of this version and will be removed in the next release. These two functions are no longer maintained. Please refer to the updated package documentation for alternative workflows.
19+
- The basic node functions `kernel()`, `Hetero()`, `Poisson()`, and `NegBin()`, along with the `struc` argument in the `gp()` and `dgp()` functions, have been removed as of this version. Customization of (D)GP specifications can be achieved by modifying the other arguments in `gp()` and `dgp()`.
2020
- The `draw()` function has been updated for instances of the `bundle` class to allow drawing of design and evaluation plots of all emulators in a single figure.
2121
- The `plot()` function has been updated for linked emulators generated by `lgp()` using the new data frame form for `struc`.
2222
- The `design()` function has been redesigned to allow new specifications of the user-supplied `method` function.
@@ -28,6 +28,8 @@
2828
- The `write()` function now allows `light = TRUE` for both GP emulators and bundles of GP emulators.
2929
- Two new functions, `serialize()` and `deserialize()`, have been added to allow users to export emulators to multi-session workers for parallel processing.
3030
- Additional vignettes are available, showcasing large-scale DGP emulation and DGP classification.
31+
- Enhanced clarity and consistency across the documentation.
32+
- Improved examples and explanations in vignettes for better user guidance.
3133

3234
# dgpsi 2.4.0
3335
- One can now use `design()` to implement sequential designs using `f` and a fixed candidate set passed to `x_cand` with `y_cand = NULL`.

R/alm.R

Lines changed: 28 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@
77
#' * the S3 class `gp`.
88
#' * the S3 class `dgp`.
99
#' * the S3 class `bundle`.
10-
#' @param x_cand a matrix (with each row containing a design point and column representing an input dimension) that gives a candidate set
11-
#' from which the next design point(s) are determined. If `object` is an instance of the `bundle` class and `aggregate` is not supplied, `x_cand` could also
12-
#' be a list with length equal to the number of emulators contained in `object`. In this case, each slot in `x_cand` should be a candidate set matrix
13-
#' for each emulator included in the bundle. Defaults to `NULL`.
10+
#' @param x_cand a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
11+
#' from which the next design point(s) are determined. If `object` is an instance of the `bundle` class and `aggregate` is not supplied, `x_cand` can also be a list.
12+
#' The list must have a length equal to the number of emulators in `object`, with each element being a matrix representing the candidate set for a corresponding
13+
#' emulator in the bundle. Defaults to `NULL`.
1414
#' @param n_start an integer that gives the number of initial design points to be used to determine next design point(s). This argument
1515
#' is only used when `x_cand` is `NULL`. Defaults to `20`.
1616
#' @param batch_size an integer that gives the number of design points to be chosen. Defaults to `1`.
@@ -33,37 +33,40 @@
3333
#' of the matrix is equal to:
3434
#' - the emulator output dimension if `object` is an instance of the `dgp` class; or
3535
#' - the number of emulators contained in `object` if `object` is an instance of the `bundle` class.
36-
#' * the output should be a vector that aggregates scores across outputs or emulators at different design points.
36+
#' * the output should be a vector that gives aggregate scores at different design points.
3737
#'
38-
#' Set to `NULL` to disable the aggregation. Defaults to `NULL`.
38+
#' Set to `NULL` to disable aggregation. Defaults to `NULL`.
3939
#' @param ... any arguments (with names different from those of arguments used in [alm()]) that are used by `aggregate`
4040
#' can be passed here.
4141
#'
4242
#' @return
43-
#' 1. If `x_cand` is not `NULL` and:
44-
#' - `object` is an instance of the `gp` class, a vector is returned with length equal to `batch_size`, giving the positions (i.e., row numbers)
45-
#' of next design points from `x_cand`.
46-
#' - `object` is an instance of the `dgp` class, a vector is returned with length equal to `batch_size * D`, giving positions (i.e., row numbers)
47-
#' of next design points from `x_cand` to be added to the DGP emulator. `D` equals to the number of output dimensions of the DGP
48-
#' emulator if there is no likelihood layer in the hierarchy. If `object` is a DGP emulator with either `Hetero` or `NegBin` likelihood layer,
49-
#' `D = 2`. If `object` is a DGP emulator with a `Categorical` likelihood layer, `D` equals to one (for binary output) or `K` (for multi-class output with `K` classes).
50-
#' - `object` is an instance of the `bundle` class, a matrix is returned with row number equal to `batch_size` and column number equal to the number of
51-
#' emulators in the bundle, giving positions (i.e., row numbers) of next design points from `x_cand` to be added to individual emulators.
52-
#' 2. If `x_cand = NULL` and:
53-
#' - `object` is an instance of the `gp` class, a matrix is returned with row number equal to `batch_size`, giving the next design points to be evaluated.
54-
#' - `object` is an instance of the `dgp` class, a matrix is returned with row number equal to `batch_size * D` where `D` is the number of output dimensions of the DGP
55-
#' emulator if no likelihood layer is included. If `object` is a DGP emulator with either `Hetero` or `NegBin` likelihood layer, `D = 2`. If `object` is a DGP emulator
56-
#' with a `Categorical` likelihood layer, `D` equals to one (for binary output) or `K` (for multi-class output with `K` classes).
57-
#' - `object` is an instance of the `bundle` class, a list is returned with the length equal to the number of
58-
#' emulators in the bundle. Each element in the list is a matrix with row number equal to `batch_size`, giving next design points to be added to individual emulators.
43+
#' 1. If `x_cand` is not `NULL`:
44+
#' - When `object` is an instance of the `gp` class, a vector of length `batch_size` is returned, containing the positions
45+
#' (row numbers) of the next design points from `x_cand`.
46+
#' - When `object` is an instance of the `dgp` class, a vector of length `batch_size * D` is returned, containing the positions
47+
#' (row numbers) of the next design points from `x_cand` to be added to the DGP emulator.
48+
#' * `D` is the number of output dimensions of the DGP emulator if no likelihood layer is included.
49+
#' * For a DGP emulator with a `Hetero` or `NegBin` likelihood layer, `D = 2`.
50+
#' * For a DGP emulator with a `Categorical` likelihood layer, `D = 1` for binary output or `D = K` for multi-class output with `K` classes.
51+
#' - When `object` is an instance of the `bundle` class, a matrix is returned with `batch_size` rows and a column for each emulator in
52+
#' the bundle, containing the positions (row numbers) of the next design points from `x_cand` for individual emulators.
53+
#' 2. If `x_cand` is `NULL`:
54+
#' - When `object` is an instance of the `gp` class, a matrix with `batch_size` rows is returned, giving the next design points to be evaluated.
55+
#' - When `object` is an instance of the `dgp` class, a matrix with `batch_size * D` rows is returned, where:
56+
#' - `D` is the number of output dimensions of the DGP emulator if no likelihood layer is included.
57+
#' - For a DGP emulator with a `Hetero` or `NegBin` likelihood layer, `D = 2`.
58+
#' - For a DGP emulator with a `Categorical` likelihood layer, `D = 1` for binary output or `D = K` for multi-class output with `K` classes.
59+
#' - When `object` is an instance of the `bundle` class, a list is returned with a length equal to the number of emulators in the bundle. Each
60+
#' element of the list is a matrix with `batch_size` rows, where each row represents a design point to be added to the corresponding emulator.
5961
#'
6062
#' @note
61-
#' The column order of the first argument of `aggregate` must be consistent with the order of emulator output dimensions (if `object` is an instance of the
62-
#' `dgp` class), or the order of emulators placed in `object` if `object` is an instance of the `bundle` class.
63+
#' The first column of the matrix supplied to the first argument of `aggregate` must correspond to the first output dimension of the DGP emulator
64+
#' if `object` is an instance of the `dgp` class, and so on for subsequent columns and dimensions. If `object` is an instance of the `bundle` class,
65+
#' the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.
6366
#' @references
6467
#' MacKay, D. J. (1992). Information-based objective functions for active data selection. *Neural Computation*, **4(4)**, 590-604.
6568
#'
66-
#' @details See further examples and tutorials at <https://mingdeyu.github.io/dgpsi-R/>.
69+
#' @details See further examples and tutorials at <`r get_docs_url()`>.
6770
#' @examples
6871
#' \dontrun{
6972
#'

R/design.R

Lines changed: 33 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,9 @@
5151
#' * if `object` is an instance of the `bundle` class, `y_test` is a matrix with each row representing the outputs for the corresponding row of `x_test` and each column representing the output of the different emulators in the bundle.
5252
#'
5353
#' Set to `NULL` for LOO-based emulator validation. Defaults to `NULL`. This argument is only used if `eval = NULL`.
54-
#' @param reset A boolean or a vector of booleans indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
54+
#' @param reset A bool or a vector of bools indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
5555
#' The re-fitting occurs based on the frequency specified by `freq[1]`. This option is useful when hyperparameters are suspected to have converged to a local optimum affecting validation performance.
56-
#' - If a single boolean is provided, it applies to every iteration of the sequential design.
56+
#' - If a single bool is provided, it applies to every iteration of the sequential design.
5757
#' - If a vector is provided, its length must equal `N` (even if the re-fit frequency specified in `freq[1]` is not 1) and it will apply to the corresponding iterations of the sequential design.
5858
#'
5959
#' Defaults to `FALSE`.
@@ -91,18 +91,18 @@
9191
#'
9292
#' If no custom function is provided, a built-in evaluation metric (RMSE or log-loss, in the case of DGP emulators with categorical likelihoods) will be used.
9393
#' Defaults to `NULL`. See the *Note* section below for additional details.
94-
#' @param verb a boolean indicating if trace information will be printed during the sequential design.
94+
#' @param verb a bool indicating if trace information will be printed during the sequential design.
9595
#' Defaults to `TRUE`.
9696
#' @param autosave a list that contains configuration settings for the automatic saving of the emulator:
97-
#' * `switch`: a boolean indicating whether to enable automatic saving of the emulator during sequential design. When set to `TRUE`,
97+
#' * `switch`: a bool indicating whether to enable automatic saving of the emulator during sequential design. When set to `TRUE`,
9898
#' the emulator in the final iteration is always saved. Defaults to `FALSE`.
9999
#' * `directory`: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory
100100
#' of `directory` named 'emulator-`id`'. Defaults to './check_points'.
101101
#' * `fname`: a string representing the base name for the saved emulator files. Defaults to 'check_point'.
102102
#' * `save_freq`: an integer indicating the frequency of automatic saves, measured in the number of iterations. Defaults to `5`.
103-
#' * `overwrite`: a boolean value controlling the file saving behavior. When set to `TRUE`, each new automatic save overwrites the previous one,
103+
#' * `overwrite`: a bool value controlling the file saving behavior. When set to `TRUE`, each new automatic save overwrites the previous one,
104104
#' keeping only the latest version. If `FALSE`, each automatic save creates a new file, preserving all previous versions. Defaults to `FALSE`.
105-
#' @param new_wave a boolean indicating whether the current call to [design()] will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
105+
#' @param new_wave a bool indicating whether the current call to [design()] will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
106106
#' This argument is relevant only if waves already exist in the emulator. Creating new waves can improve the visualization of sequential design performance across different calls
107107
#' to [design()] via [draw()], and allows for specifying a different evaluation frequency in `freq`. However, disabling this option can help limit the number of waves visualized
108108
#' in [draw()] to avoid issues such as running out of distinct colors for large numbers of waves. Defaults to `TRUE`.
@@ -123,9 +123,9 @@
123123
#' if the DGP emulator was constructed without the Vecchia approximation. Otherwise, the number of processes is set to `max physical cores available %/% 2`.
124124
#' Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components
125125
#' is computationally expensive. Defaults to `1`.
126-
#' @param pruning a boolean indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
126+
#' @param pruning a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
127127
#' design points exceeds `min_size` in `control`. The argument is only applicable to DGP emulators (i.e., `object` is an instance of `dgp` class)
128-
#' produced by `dgp()` with `struc = NULL`. Defaults to `TRUE`.
128+
#' produced by `dgp()`. Defaults to `TRUE`.
129129
#' @param control a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
130130
#' * `min_size`, the minimum number of design points required to trigger dynamic pruning. Defaults to 10 times the number of input dimensions.
131131
#' * `threshold`, the \eqn{R^2} value above which a GP node is considered redundant. Defaults to `0.97`.
@@ -156,8 +156,8 @@
156156
#' If `target` is not `NULL`, the following additional elements are also included:
157157
#' - `target`: the target evaluating metric computed by the `eval` or built-in function to stop the sequential design.
158158
#' - `reached`: indicates whether the `target` was reached at the end of the sequential design:
159-
#' - a boolean if `object` is an instance of the `gp` or `dgp` class.
160-
#' - a vector of booleans if `object` is an instance of the `bundle` class, with its length determined as follows:
159+
#' - a bool if `object` is an instance of the `gp` or `dgp` class.
160+
#' - a vector of bools if `object` is an instance of the `bundle` class, with its length determined as follows:
161161
#' - equal to the number of emulators in the bundle when `eval = NULL`.
162162
#' - equal to the length of the output from `eval` when a custom `eval` function is provided.
163163
#' - a slot called `type` that gives the type of validation:
@@ -201,7 +201,7 @@
201201
#' within `f` are handled by appropriately returning `NA`s.
202202
#' * When defining `eval`, the output metric needs to be positive if [draw()] is used with `log = T`. And one needs to ensure that a lower metric value indicates
203203
#' a better emulation performance if `target` is set.
204-
#' @details See further examples and tutorials at <https://mingdeyu.github.io/dgpsi-R/>.
204+
#' @details See further examples and tutorials at <`r get_docs_url()`>.
205205
#'
206206
#' @examples
207207
#' \dontrun{
@@ -3237,10 +3237,6 @@ check_reset <- function(reset, N){
32373237
check_auto <- function(object){
32383238
auto_pruning <- T
32393239
# exclude user-defined structure
3240-
if (!"internal_dims" %in% names(object[['specs']])) {
3241-
auto_pruning <- F
3242-
return(auto_pruning)
3243-
} else {
32443240
n_layer <- object$constructor_obj$n_layer
32453241
if (object$constructor_obj$all_layer[[n_layer]][[1]]$type!='gp') {
32463242
n_layer <- n_layer - 1
@@ -3257,7 +3253,7 @@ check_auto <- function(object){
32573253
}
32583254
}
32593255
}
3260-
}
3256+
32613257
return(auto_pruning)
32623258
}
32633259

@@ -3342,24 +3338,24 @@ reverse_minmax <- function(normalized_data, limits) {
33423338
return(original_data)
33433339
}
33443340

3345-
generic_wrapper <- function(r_func) {
3346-
function(...) {
3347-
# Capture the arguments
3348-
args <- list(...)
3349-
3350-
# Convert Python-native arguments to R-native if necessary
3351-
args <- lapply(args, function(arg) {
3352-
if (inherits(arg, "python.builtin.object")) {
3353-
reticulate::py_to_r(arg)
3354-
} else {
3355-
arg
3356-
}
3357-
})
3358-
3359-
# Call the user-provided R function with converted arguments
3360-
result <- do.call(r_func, args)
3361-
3362-
# Convert the result back to Python-native types
3363-
reticulate::r_to_py(result)
3364-
}
3365-
}
3341+
#generic_wrapper <- function(r_func) {
3342+
# function(...) {
3343+
# # Capture the arguments
3344+
# args <- list(...)
3345+
#
3346+
# # Convert Python-native arguments to R-native if necessary
3347+
# args <- lapply(args, function(arg) {
3348+
# if (inherits(arg, "python.builtin.object")) {
3349+
# reticulate::py_to_r(arg)
3350+
# } else {
3351+
# arg
3352+
# }
3353+
# })
3354+
#
3355+
# # Call the user-provided R function with converted arguments
3356+
# result <- do.call(r_func, args)
3357+
#
3358+
# # Convert the result back to Python-native types
3359+
# reticulate::r_to_py(result)
3360+
# }
3361+
#}

0 commit comments

Comments
 (0)