Skip to content

Commit 33f3213

Browse files
committed
make release-tag: Merge branch 'master' into stable
2 parents 96d194f + 0a36d64 commit 33f3213

28 files changed

+929
-479
lines changed

.github/ISSUE_TEMPLATE.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
* ATM version:
2+
* Python version:
3+
* Operating System:
4+
5+
### Description
6+
7+
Describe what you were trying to get done.
8+
Tell us what happened, what went wrong, and what you expected to happen.
9+
10+
### What I Did
11+
12+
```
13+
Paste the command(s) you ran and the output.
14+
If there was a crash, please include the traceback here.
15+
```

CLI.md

Lines changed: 97 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,36 +3,34 @@
33
**ATM** provides a simple command line client that will allow you to run ATM directly
44
from your terminal by simply passing it the path to a CSV file.
55

6-
In this example, we will use the default values that are provided in the code, which will use
7-
the `pollution.csv` that is being generated with the demo datasets by ATM.
6+
## Quickstart
87

9-
## 1. Generate the demo data
8+
In this example, we will use the default values that are provided in the code in order to generate
9+
classifiers.
1010

11-
**ATM** command line allows you to generate the demo data that we will be using through this steps
12-
by running the following command:
11+
### 1. Get the demo data
1312

14-
```bash
15-
atm get_demos
16-
```
13+
The first step in order to run **ATM** is to obtain the demo datasets that will be used in during
14+
the rest of the tutorial.
1715

18-
A print on your console with the generated demo datasets will appear:
16+
For this demo we will be using the pollution csv from the
17+
[demos bucket](https://atm-data.s3.amazonaws.com/index.html), which you can download from
18+
[here](https://atm-data.s3.amazonaws.com/pollution_1.csv).
1919

20-
```bash
21-
Generating file demos/iris.csv
22-
Generating file demos/pollution.csv
23-
Generating file demos/pitchfork_genres.csv
24-
```
2520

26-
## 2. Create a dataset and generate it's dataruns
21+
### 2. Create a dataset and generate it's dataruns
2722

28-
Once you have generated the demo datasets, now it's time to create a `dataset` object inside the
23+
Once you have obtained your demo dataset, now it's time to create a `dataset` object inside the
2924
database. Our command line also triggers the generation of `datarun` objects for this dataset in
3025
order to automate this process as much as possible:
3126

3227
```bash
33-
atm enter_data
28+
atm enter_data --train-path path/to/pollution_1.csv
3429
```
3530

31+
Bear in mind that `--train-path` argument can be a local path, an URL link to the CSV file or an
32+
complete S3 Bucket path.
33+
3634
If you run this command, you will create a dataset with the default values, which is using the
3735
`pollution_1.csv` dataset from the demo datasets.
3836

@@ -44,7 +42,7 @@ method dt has 2 hyperpartitions
4442
method knn has 24 hyperpartitions
4543
Dataruns created. Summary:
4644
Dataset ID: 1
47-
Training data: demos/pollution_1.csv
45+
Training data: path/to/pollution_1.csv
4846
Test data: None
4947
Datarun ID: 1
5048
Hyperpartition selection strategy: uniform
@@ -58,7 +56,7 @@ For more information about the arguments that this command line accepts, please
5856
atm enter_data --help
5957
```
6058

61-
## 3. Start a worker
59+
### 3. Start a worker
6260

6361
**ATM** requieres a worker to process the dataruns that are not completed and stored inside the
6462
database. This worker process will be runing until there are no dataruns `pending`.
@@ -105,3 +103,83 @@ This command aswell offers more information about the arguments that this comman
105103
```
106104
atm worker --help
107105
```
106+
107+
## Command Line Arguments
108+
109+
You can specify each argument individually on the command line. The names of the
110+
variables are the same as those described [here](https://hdi-project.github.io/ATM/configuring_atm.html#arguments).
111+
SQL configuration variables must be prepended by `sql-`, and AWS config variables must be
112+
prepended by `aws-`.
113+
114+
### Using command line arguments
115+
116+
Using command line arguments is convenient for quick experiments, or for cases where you
117+
need to change just a couple of values from the default configuration. For example:
118+
119+
```bash
120+
atm enter_data --train-path ./data/my-custom-data.csv \
121+
--test-path ./data/my-custom-test-data.csv \
122+
--selector bestkvel
123+
```
124+
125+
You can also use a mixture of config files and command line arguments; any command line
126+
arguments you specify will override the values found in config files.
127+
128+
### Using YAML configuration files
129+
130+
You can also save the configuration as YAML files is an easy way to save complicated setups
131+
or share them with team members.
132+
133+
You should start with the templates provided by the `atm make_config` command:
134+
135+
```bash
136+
atm make_config
137+
```
138+
139+
This will generate a folder called `config/templates` in your current working directory which
140+
will contain 5 files, which you will need to copy over to the `config` folder and edit according
141+
to your needs:
142+
143+
```bash
144+
cp config/templates/*.yaml config/
145+
vim config/*.yaml
146+
```
147+
148+
`run.yaml` contains all the settings for a single dataset and datarun. Specify the `train_path`
149+
to point to your own dataset.
150+
151+
`sql.yaml` contains the settings for the ModelHub SQL database. The default configuration will
152+
connect to (and create if necessary) a SQLite database at `./atm.db` relative to the directory
153+
from which `enter_data.py` is run. If you are using a MySQL database, you will need to change
154+
the file to something like this:
155+
156+
```
157+
dialect: mysql
158+
database: atm
159+
username: username
160+
password: password
161+
host: localhost
162+
port: 3306
163+
query:
164+
```
165+
166+
`aws.yaml` should contain the settings for running ATM in the cloud. This is not necessary
167+
for local operation.
168+
169+
Once your YAML files have been updated, run the datarun creation command and pass it the paths
170+
to your new config files:
171+
172+
```bash
173+
atm enter_data --sql-config config/sql.yaml \
174+
--aws-config config/aws.yaml \
175+
--run-config config/run.yaml
176+
```
177+
178+
It's important that the SQL configuration used by the worker matches the configuration you
179+
passed to `enter_data` -- otherwise, the worker will be looking in the wrong ModelHub
180+
database for its datarun!
181+
182+
```
183+
atm worker --sql-config config/sql.yaml \
184+
--aws-config config/aws.yaml \
185+
```

0 commit comments

Comments
 (0)