Skip to content

Commit

Permalink
clarify in docs the meaning of "table" to close #971
Browse files Browse the repository at this point in the history
  • Loading branch information
ablaom committed Oct 26, 2022
1 parent 238cd07 commit c5ceb3f
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 26 deletions.
5 changes: 3 additions & 2 deletions docs/src/about_mlj.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,9 @@ Extract:

## Key features

* Data agnostic, train models on any data supported by the
[Tables.jl](https://github.com/JuliaData/Tables.jl) interface.
* Data agnostic, train most models on any data `X` supported by the
[Tables.jl](https://github.com/JuliaData/Tables.jl) interface (needs `Tables.istable(X)
== true`).

* Extensive, state-of-the-art, support for model composition
(*pipelines*, *stacks* and, more generally, *learning networks*). See more
Expand Down
2 changes: 1 addition & 1 deletion docs/src/common_mlj_workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Loading a built-in data set already split into `X` and `y`:

```@example workflows
X, y = @load_iris;
selectrows(X, 1:4) # selectrows works for any Tables.jl table
selectrows(X, 1:4) # selectrows works whenever `Tables.istable(X)==true`.
```

```@example workflows
Expand Down
33 changes: 15 additions & 18 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,11 @@ schema(iris)
```

Because this data format is compatible with
[Tables.jl](https://tables.juliadata.org/stable/), many MLJ methods
(such as `selectrows`, `pretty` and `schema` used above) as well as
many MLJ models can work with it. However, as most new users are
already familiar with the access methods particular to
[DataFrames](https://dataframes.juliadata.org/stable/) (also
compatible with Tables.jl) we'll put our data into that format here:
[Tables.jl](https://tables.juliadata.org/stable/) (and satisfies `Tables.istable(iris) ==
true`) many MLJ methods (such as `selectrows`, `pretty` and `schema` used above) as well
as many MLJ models can work with it. However, as most new users are already familiar with
the access methods particular to [DataFrames](https://dataframes.juliadata.org/stable/)
(also compatible with Tables.jl) we'll put our data into that format here:

```@example doda
import DataFrames
Expand Down Expand Up @@ -334,14 +333,12 @@ scitype(X)

### Two-dimensional data

Generally, two-dimensional data in MLJ is expected to be *tabular*.
All data containers compatible with the
[Tables.jl](https://github.com/JuliaData/Tables.jl) interface (which
includes all source formats listed
[here](https://github.com/JuliaData/Tables.jl/blob/master/INTEGRATIONS.md))
have the scientific type `Table{K}`, where `K` depends on the
scientific types of the columns, which can be individually inspected
using `schema`:
Generally, two-dimensional data in MLJ is expected to be *tabular*. All data containers
`X` compatible with the [Tables.jl](https://github.com/JuliaData/Tables.jl) interface and
sastisfying `Tables.istable(X) == true` (most of the formats in [this
list](https://github.com/JuliaData/Tables.jl/blob/master/INTEGRATIONS.md)) have the
scientific type `Table{K}`, where `K` depends on the scientific types of the columns,
which can be individually inspected using `schema`:

```@repl doda
schema(X)
Expand Down Expand Up @@ -385,10 +382,10 @@ resampling is always more efficient in this case.

### Inputs

Since an MLJ model only specifies the scientific type of data, if that
type is `Table` - which is the case for the majority of MLJ models -
then any [Tables.jl](https://github.com/JuliaData/Tables.jl) format is
permitted.
Since an MLJ model only specifies the scientific type of data, if that type is `Table` -
which is the case for the majority of MLJ models - then any
[Tables.jl](https://github.com/JuliaData/Tables.jl) container `X` is permitted, so long as
`Tables.istable(X) == true`.

Specifically, the requirement for an arbitrary model's input is `scitype(X)
<: input_scitype(model)`.
Expand Down
9 changes: 4 additions & 5 deletions docs/src/quick_start_guide_to_adding_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@ of how things work with MLJ. In particular, you are familiar with:

- what `Probabilistic`, `Deterministic` and `Unsupervised` models are

- the fact that MLJ generally works with tables rather than
matrices. Here a *table* is a container satisfying the
[Tables.jl](https://github.com/JuliaData/Tables.jl) API (e.g.,
DataFrame, JuliaDB table, CSV file, named tuple of equal-length
vectors)
- the fact that MLJ generally works with tables rather than matrices. Here a *table* is a
container `X` satisfying the [Tables.jl](https://github.com/JuliaData/Tables.jl) API and
satisifying `Tables.istable(X) == true` (e.g., DataFrame, JuliaDB table, CSV file, named
tuple of equal-length vectors)

- [CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl)
(if working with finite discrete data, e.g., doing classification)
Expand Down

0 comments on commit c5ceb3f

Please sign in to comment.