3
3
** ATM** provides a simple command line client that will allow you to run ATM directly
4
4
from your terminal by simply passing it the path to a CSV file.
5
5
6
- In this example, we will use the default values that are provided in the code, which will use
7
- the ` pollution.csv ` that is being generated with the demo datasets by ATM.
6
+ ## Quickstart
8
7
9
- ## 1. Generate the demo data
8
+ In this example, we will use the default values that are provided in the code in order to generate
9
+ classifiers.
10
10
11
- ** ATM** command line allows you to generate the demo data that we will be using through this steps
12
- by running the following command:
11
+ ### 1. Get the demo data
13
12
14
- ``` bash
15
- atm get_demos
16
- ```
13
+ The first step in order to run ** ATM** is to obtain the demo datasets that will be used in during
14
+ the rest of the tutorial.
17
15
18
- A print on your console with the generated demo datasets will appear:
16
+ For this demo we will be using the pollution csv from the
17
+ [ demos bucket] ( https://atm-data.s3.amazonaws.com/index.html ) , which you can download from
18
+ [ here] ( https://atm-data.s3.amazonaws.com/pollution_1.csv ) .
19
19
20
- ``` bash
21
- Generating file demos/iris.csv
22
- Generating file demos/pollution.csv
23
- Generating file demos/pitchfork_genres.csv
24
- ```
25
20
26
- ## 2. Create a dataset and generate it's dataruns
21
+ ### 2. Create a dataset and generate it's dataruns
27
22
28
- Once you have generated the demo datasets , now it's time to create a ` dataset ` object inside the
23
+ Once you have obtained your demo dataset , now it's time to create a ` dataset ` object inside the
29
24
database. Our command line also triggers the generation of ` datarun ` objects for this dataset in
30
25
order to automate this process as much as possible:
31
26
32
27
``` bash
33
- atm enter_data
28
+ atm enter_data --train-path path/to/pollution_1.csv
34
29
```
35
30
31
+ Bear in mind that ` --train-path ` argument can be a local path, an URL link to the CSV file or an
32
+ complete S3 Bucket path.
33
+
36
34
If you run this command, you will create a dataset with the default values, which is using the
37
35
` pollution_1.csv ` dataset from the demo datasets.
38
36
@@ -44,7 +42,7 @@ method dt has 2 hyperpartitions
44
42
method knn has 24 hyperpartitions
45
43
Dataruns created. Summary:
46
44
Dataset ID: 1
47
- Training data: demos /pollution_1.csv
45
+ Training data: path/to /pollution_1.csv
48
46
Test data: None
49
47
Datarun ID: 1
50
48
Hyperpartition selection strategy: uniform
@@ -58,7 +56,7 @@ For more information about the arguments that this command line accepts, please
58
56
atm enter_data --help
59
57
```
60
58
61
- ## 3. Start a worker
59
+ ### 3. Start a worker
62
60
63
61
** ATM** requieres a worker to process the dataruns that are not completed and stored inside the
64
62
database. This worker process will be runing until there are no dataruns ` pending ` .
@@ -105,3 +103,83 @@ This command aswell offers more information about the arguments that this comman
105
103
```
106
104
atm worker --help
107
105
```
106
+
107
+ ## Command Line Arguments
108
+
109
+ You can specify each argument individually on the command line. The names of the
110
+ variables are the same as those described [ here] ( https://hdi-project.github.io/ATM/configuring_atm.html#arguments ) .
111
+ SQL configuration variables must be prepended by ` sql- ` , and AWS config variables must be
112
+ prepended by ` aws- ` .
113
+
114
+ ### Using command line arguments
115
+
116
+ Using command line arguments is convenient for quick experiments, or for cases where you
117
+ need to change just a couple of values from the default configuration. For example:
118
+
119
+ ``` bash
120
+ atm enter_data --train-path ./data/my-custom-data.csv \
121
+ --test-path ./data/my-custom-test-data.csv \
122
+ --selector bestkvel
123
+ ```
124
+
125
+ You can also use a mixture of config files and command line arguments; any command line
126
+ arguments you specify will override the values found in config files.
127
+
128
+ ### Using YAML configuration files
129
+
130
+ You can also save the configuration as YAML files is an easy way to save complicated setups
131
+ or share them with team members.
132
+
133
+ You should start with the templates provided by the ` atm make_config ` command:
134
+
135
+ ``` bash
136
+ atm make_config
137
+ ```
138
+
139
+ This will generate a folder called ` config/templates ` in your current working directory which
140
+ will contain 5 files, which you will need to copy over to the ` config ` folder and edit according
141
+ to your needs:
142
+
143
+ ``` bash
144
+ cp config/templates/* .yaml config/
145
+ vim config/* .yaml
146
+ ```
147
+
148
+ ` run.yaml ` contains all the settings for a single dataset and datarun. Specify the ` train_path `
149
+ to point to your own dataset.
150
+
151
+ ` sql.yaml ` contains the settings for the ModelHub SQL database. The default configuration will
152
+ connect to (and create if necessary) a SQLite database at ` ./atm.db ` relative to the directory
153
+ from which ` enter_data.py ` is run. If you are using a MySQL database, you will need to change
154
+ the file to something like this:
155
+
156
+ ```
157
+ dialect: mysql
158
+ database: atm
159
+ username: username
160
+ password: password
161
+ host: localhost
162
+ port: 3306
163
+ query:
164
+ ```
165
+
166
+ ` aws.yaml ` should contain the settings for running ATM in the cloud. This is not necessary
167
+ for local operation.
168
+
169
+ Once your YAML files have been updated, run the datarun creation command and pass it the paths
170
+ to your new config files:
171
+
172
+ ``` bash
173
+ atm enter_data --sql-config config/sql.yaml \
174
+ --aws-config config/aws.yaml \
175
+ --run-config config/run.yaml
176
+ ```
177
+
178
+ It's important that the SQL configuration used by the worker matches the configuration you
179
+ passed to ` enter_data ` -- otherwise, the worker will be looking in the wrong ModelHub
180
+ database for its datarun!
181
+
182
+ ```
183
+ atm worker --sql-config config/sql.yaml \
184
+ --aws-config config/aws.yaml \
185
+ ```
0 commit comments