Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to documentation #138

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/src/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ julia> using AbstractTrees; children(ts[1])
```

## Setting a Custom Objective Function
Xgboost uses a second order approximation, so to provide a custom objective functoin first and
XGBoost uses a second order approximation, so to provide a custom objective functoin first and
second order derivatives must be provided, see the docstring of [`updateone!`](@ref) for more
details.

Expand Down Expand Up @@ -148,7 +148,7 @@ bst = xgboost((X, y), ℓ′, ℓ″, max_depth=8)
```

## Caching Data From External Memory
Xgboost can be used to cache memory from external memory on disk, see
XGBoost can be used to cache memory from external memory on disk, see
[here](https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html). In the Julia
wrapper this is facilitated by allowing a `DMatrix` to be constructed from any Julia iterator with
[`fromiterator`](@ref). The resulting `DMatrix` holds references to cache files which will have
Expand Down
30 changes: 13 additions & 17 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,13 @@ ŷ = predict(bst, X)


using DataFrames
df = DataFrame(randn(100,3), [:a, :b, :y])
df = DataFrame(randn(200,3), [:a, :b, :y])

train = DMatrix(df[1:150, [:a, :b]], df[1:150, :y])
test = DMatrix(df[151:end, [:a, :b]], df[151:end, :y])

# can accept tabular data, will keep feature names
bst = xgboost((df[!, [:a, :b]], df.y))
bst = xgboost(train, watchlist=Dict("train"=>train, "test"=>test), num_round=10, max_depth=3, η=0.1, objective="reg:squarederror")

# display importance statistics retaining feature names
importancereport(bst)
Expand Down Expand Up @@ -57,15 +60,13 @@ X = [0 missing 1
isequal(DMatrix(X), x) # nullity is preserved
```

!!! note

`DMatrix` must allocate new arrays when fetching values from it. One therefore should avoid
using `DMatrix` directly except with `XGBoost`; retrieving values from this object should be
considered useful mostly only for verification.
`DMatrix` must allocate new arrays when fetching values from it. One therefore should avoid
using `DMatrix` directly except with `XGBoost`; retrieving values from this object should be
considered useful mostly only for verification.


### Feature Naming and Tabular Data
Xgboost supports the naming of features (i.e. columns of the feature matrix). This can be useful
XGBoost supports feature naming (i.e. names of columns of the feature matrix). This can be useful
for inspecting trained models.
```julia
X = randn(10,3)
Expand All @@ -80,14 +81,9 @@ XGBoost.setfeaturenames!(dm, ["a", "b", "c"]) # can also set after construction
`AbstractVector`s or a `DataFrame`).
```julia
using DataFrames
df = DataFrame(randn(10,3), [:a, :b, :c])

y = randn(10)

DMatrix(df, y)

df[!, :y] = y
DMatrix(df, :y) # equivalent to DMatrix(df, y)
df = DataFrame(randn(10,4), [:a, :b, :c, :y])
dm = DMatrix(df, :y) # equivalent to DMatrix(df[!, Not(:y)], df[!, :y])
```

When constructing a `DMatrix` from a table the feature names will automatically be set to the names
Expand Down Expand Up @@ -134,7 +130,7 @@ this is always a `DMatrix` but arguments will be automatically converted.
### [Parameters](https://xgboost.readthedocs.io/en/stable/parameter.html)
Keyword arguments to `Booster` are xgboost model parameters. These are described in detail
[here](https://xgboost.readthedocs.io/en/stable/parameter.html) and should all be passed exactly as
they are described in the main xgbosot documentation (in a few cases such as Greek letters we also
they are described in the main xgboost documentation (in a few cases such as Greek letters we also
allow unicode equivalents).

### Training
Expand All @@ -156,7 +152,7 @@ using Statistics
mean(ŷ - y)/std(y)
```

Xgboost expects `Booster`s to be initialized with training data, therefore there is usually no need
XGBoost expects `Booster`s to be initialized with training data, therefore there is usually no need
to define `Booster` separate from training. A shorthand for the above, provided by
[`xgboost`](@ref) is
```julia
Expand Down
15 changes: 8 additions & 7 deletions src/booster.jl
Original file line number Diff line number Diff line change
Expand Up @@ -396,14 +396,14 @@ end
xgboost(data; num_round=10, watchlist=Dict(), kw...)
xgboost(data, ℓ′, ℓ″; kw...)

Creates an xgboost gradient booster object on training data `data` and runs `nrounds` of training.
Creates an xgboost gradient booster object on training data `data` and runs `num_round` of training.
This is essentially an alias for constructing a [`Booster`](@ref) with `data` and keyword arguments
followed by [`update!`](@ref) for `nrounds`.
followed by [`update!`](@ref) for `num_round`.

`watchlist` is a dict the keys of which are strings giving the name of the data to watch
and the values of which are [`DMatrix`](@ref) objects containing the data.
`watchlist` is a Dict of form key=>[`DMatrix`](@ref) and is used to specify a data to evaluate a model on.
If omitted `watchlist` will be initialized with the training data.

All other keyword arguments are passed to [`Booster`](@ref). With few exceptions these are model
All other keyword arguments are passed to [`Booster`](@ref). With few exceptions these are model
training hyper-parameters, see [here](https://xgboost.readthedocs.io/en/stable/parameter.html) for
a comprehensive list.

Expand All @@ -412,9 +412,10 @@ See [`updateone!`](@ref) for more details.

## Examples
```julia
(X, y) = (randn(100,3), randn(100))
train = DMatrix(randn(100,3), randn(100))
test = DMatrix(randn(100,3), randn(100))

b = xgboost((X, y), 10, max_depth=10, η=0.1)
b = xgboost(train, watchlist=Dict("train"=>train, "test"=>test), num_round=10, max_depth=5, η=0.1)

ŷ = predict(b, X)
```
Expand Down