ATM comes with the possibility to start a server process that enables interacting with it via a REST API server that runs over flask.
In this document you will find a briefly explanation how to start it and use it.
In this section we will briefly show the basic usage of the REST API.
For more detailed information about all the operations supported by the API, please point your browser to http://127.0.0.1:5000/ and explore the examples provided by the Swagger interface.
In order to start a REST API server, after installing ATM open a terminal, activate its virtualenv, and execute this command:
atm start
This will start ATM server as a background service. The REST server will be listening at the port 5000 of your machine, and if you point your browser at http://127.0.0.1:5000/, you will see the documentation website that shows information about all the REST operations allowed by the API.
Optionally, the --port <port>
can be added to modify the port which the server listents at:
atm start --port 1234
If you would like to see the status of the server process you can run:
atm status
An output similar to this one will appear:
ATM is running with 1 worker
ATM REST server is listening on http://127.0.0.1:5000
In order to stop the server you can run the following command:
atm stop
Notice that atm start
will start one worker by default. If you would like to launch more than one,
you can do so by adding the argument --workers <number_of_workers
or -w <number_of_workers>
.
atm start --workers 4
For more detailed options you can run atm start --help
to obtain a list with the arguments
that are being accepted.
Once the server is running, you can register your first dataset using the API. To do so, you need
to send the path to your CSV
file and the name of your target_column
in a POST
request to
api/datasets
.
This call will create a simple dataset
in our database:
POST /api/datasets HTTP/1.1
Content-Type: application/json
{
"class_column": "your_target_column",
"train_path": "path/to/your.csv"
}
Once you have created some datasets, you can see them by sending a GET
request:
GET /api/datasets HTTP/1.1
This will return a json
with all the information about the stored datasets.
As an example, you can get and register a demo dataset by running the following two commands:
atm get_demos
curl -v localhost:5000/api/datasets -H'Content-Type: application/json' \
-d'{"class_column": "class", "train_path": "demos/pollution_1.csv"}'
In order to trigger a datarun, once you have created a dataset, you have to send the dataset_id
in a POST
request to api/run
to trigger the workers
with the default values.
POST /api/datasets HTTP/1.1
Content-type: application/json
{
"dataset_id": id_of_your_dataset
}
If you have followed the above example and created a pollution
dataset in the database, you can
run the following POST
to trigger it's datarun:
curl -v localhost:5000/api/run -H'Content-type: application/json' -d'{"dataset_id": 1}'
NOTE atleast one worker should be running in order to process the datarun.
While running, the workers, will log what they are doing in the file atm.log
.
In order to monitor their activity in real time, you can execute this on another terminal:
tail -f atm.log
Once the database is populated, you can use the REST API to explore the following 4 models:
- Datasets
- Dataruns
- Hyperpartitions
- Classifiers
And these are the operations that can be performed on them:
In order to get all the objects for a single model, you need to make a GET
request to
/api/<model>
.
The output will be a JSON with 4 entries:
num_results
: The number of results foundobjects
: A list containing a subdocument for each resultpage
: The current pagetotal_pages
: The number of pages
For example, you can get all the datasets using:
GET /api/datasets HTTP/1.1
And the output will be:
{
"num_results": 1,
"objects": [
{
"class_column": "class",
"d_features": 16,
"dataruns": [
{
"budget": 100,
"budget_type": "classifier",
"dataset_id": 1,
"deadline": null,
"description": "uniform__uniform",
"end_time": "2019-04-11T20:58:11.346733",
"gridding": 0,
"id": 1,
"k_window": 3,
"metric": "f1",
"priority": 1,
"r_minimum": 2,
"score_target": "cv_judgment_metric",
"selector": "uniform",
"start_time": "2019-04-11T20:58:02.514514",
"status": "complete",
"tuner": "uniform"
}
],
"description": null,
"id": 1,
"k_classes": 2,
"majority": 0.516666667,
"n_examples": 60,
"name": "pollution_1",
"size_kb": 8,
"test_path": null,
"train_path": "/path/to/atm/data/test/pollution_1.csv"
}
],
"page": 1,
"total_pages": 1
}
In order to get one particular objects for a model, you need to make a GET
request to
/api/<model>/<id>
.
The output will be the document representing the corresponding object.
For example, you can get the dataset with id 1 using:
GET /api/datasets/1 HTTP/1.1
And the output will be:
{
"class_column": "class",
"d_features": 16,
"dataruns": [
{
"budget": 100,
"budget_type": "classifier",
"dataset_id": 1,
"deadline": null,
"description": "uniform__uniform",
"end_time": "2019-04-11T20:58:11.346733",
"gridding": 0,
"id": 1,
"k_window": 3,
"metric": "f1",
"priority": 1,
"r_minimum": 2,
"score_target": "cv_judgment_metric",
"selector": "uniform",
"start_time": "2019-04-11T20:58:02.514514",
"status": "complete",
"tuner": "uniform"
}
],
"description": null,
"id": 1,
"k_classes": 2,
"majority": 0.516666667,
"n_examples": 60,
"name": "pollution_1",
"size_kb": 8,
"test_path": null,
"train_path": "/path/to/atm/data/test/pollution_1.csv"
}
In order to get all the childre objects from one parent object, you need to make a
GET
request to /api/<parent_model>/<parent_id>/<child_model>
.
The output will be in the same format as if you had requested all the elements from the children model, but with the results filtered by the parent one.
So, for example, in order to get all the dataruns that use the dataset with id 1, you can use:
GET /api/datasets/1/dataruns HTTP/1.1
And the output will be (note that some parts have been cut):
{
"num_results": 1,
"objects": [
{
"budget": 100,
"budget_type": "classifier",
"classifiers": [
{
"cv_judgment_metric": 0.8444444444,
"cv_judgment_metric_stdev": 0.1507184441,
"datarun_id": 1,
"end_time": "2019-04-11T20:58:02.600185",
"error_message": null,
"host": "83.56.245.36",
"hyperparameter_values_64": "gAN9cQAoWAsAAABuX25laWdoYm9yc3EBY251bXB5LmNvcmUubXVsdGlhcnJheQpzY2FsYXIKcQJjbnVtcHkKZHR5cGUKcQNYAgAAAGk4cQRLAEsBh3EFUnEGKEsDWAEAAAA8cQdOTk5K/////0r/////SwB0cQhiQwgPAAAAAAAAAHEJhnEKUnELWAkAAABsZWFmX3NpemVxDGgCaAZDCCsAAAAAAAAAcQ2GcQ5ScQ9YBwAAAHdlaWdodHNxEFgIAAAAZGlzdGFuY2VxEVgJAAAAYWxnb3JpdGhtcRJYCQAAAGJhbGxfdHJlZXETWAYAAABtZXRyaWNxFFgJAAAAbWFuaGF0dGFucRVYBgAAAF9zY2FsZXEWiHUu",
"hyperpartition_id": 23,
"id": 1,
"metrics_location": "metrics/pollution_1-4bc39b14.metric",
"model_location": "models/pollution_1-4bc39b14.model",
"start_time": "2019-04-11T20:58:02.539046",
"status": "complete",
"test_judgment_metric": 0.6250000000
},
...
],
"dataset": {
"class_column": "class",
"d_features": 16,
"description": null,
"id": 1,
"k_classes": 2,
"majority": 0.516666667,
"n_examples": 60,
"name": "pollution_1",
"size_kb": 8,
"test_path": null,
"train_path": "/path/to/atm/data/test/pollution_1.csv"
},
"dataset_id": 1,
"deadline": null,
"description": "uniform__uniform",
"end_time": "2019-04-11T20:58:11.346733",
"gridding": 0,
"hyperpartitions": [
{
"categorical_hyperparameters_64": "gANdcQAoWAcAAABwZW5hbHR5cQFYAgAAAGwxcQKGcQNYDQAAAGZpdF9pbnRlcmNlcHRxBIiGcQVlLg==",
"constant_hyperparameters_64": "gANdcQAoWAwAAABjbGFzc193ZWlnaHRxAVgIAAAAYmFsYW5jZWRxAoZxA1gGAAAAX3NjYWxlcQSIhnEFZS4=",
"datarun_id": 1,
"id": 1,
"method": "logreg",
"status": "incomplete",
"tunable_hyperparameters_64": "gANdcQAoWAEAAABDcQFjYnRiLmh5cGVyX3BhcmFtZXRlcgpGbG9hdEV4cEh5cGVyUGFyYW1ldGVyCnECY2J0Yi5oeXBlcl9wYXJhbWV0ZXIKUGFyYW1UeXBlcwpxA0sFhXEEUnEFXXEGKEc+5Pi1iONo8UdA+GoAAAAAAGWGcQeBcQh9cQkoWAwAAABfcGFyYW1fcmFuZ2VxCmgGWAUAAAByYW5nZXELXXEMKEfAFAAAAAAAAEdAFAAAAAAAAGV1YoZxDVgDAAAAdG9scQ5oAmgFXXEPKEc+5Pi1iONo8UdA+GoAAAAAAGWGcRCBcRF9cRIoaApoD2gLXXETKEfAFAAAAAAAAEdAFAAAAAAAAGV1YoZxFGUu"
},
...
],
"id": 1,
"k_window": 3,
"metric": "f1",
"priority": 1,
"r_minimum": 2,
"score_target": "cv_judgment_metric",
"selector": "uniform",
"start_time": "2019-04-11T20:58:02.514514",
"status": "complete",
"tuner": "uniform"
}
],
"page": 1,
"total_pages": 1
}
If you would like to run more workers or you would like to launch a second ATM process, you can
do so by specifying a different PID
file.
For example:
atm start --no-server -w 4 --pid additional_workers.pid
To check the status of this process we have to run:
atm status --pid additional_workers.pid
This will print an output like this:
ATM is running with 4 workers
If you have an ATM process running and you would like to restart it and add more workers to it
or maybe change the port on which is running, you can achieve so with the atm restart
:
atm restart
This command will restart the server with the default values, so if you would like to use other
options you can run --help
to see the accepted arguments:
atm restart --help
As we saw before, by runing the command atm stop
you will terminate
the ATM process. However
this command accepts a few arguments in order to control this behaviour:
-
-t TIMEOUT
,--timeout TIMEOUT
, time to wait in order to check if the process has been terminated. -
-f
,--force
, Kill the process if it does not terminate gracefully. -
--pid PIDFILE
, PID file to use
If you would like to monitorize the server for debugging process, you can do so by runing the with the following command:
atm server
An output similar to this one should apear in the terminal:
* Serving Flask app "api.setup" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: 150-127-826
For additional arguments run atm server --help
Note that this command will not launch any workers
process. In order to launch a foreground
worker you have to do so by runing atm worker
.