You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/administration/enrichments.md
+86-27Lines changed: 86 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,74 +36,133 @@ A DAG to enrich an IPv4 entity can have:
36
36
37
37
You are limited only by what you can do programmatically in python and using APIs
38
38
39
-
## Sample DAG
39
+
## Creating your first DAG
40
40
Below is a simple DAG workflow you can use as a starting place. It's made for enrichment of an IPv4 address (but could be modified for any entity type). Based on how you set up your Airflow instance and define your DAG endpoints, you should name this script to match the endpoint. In this case, where we define the endpoint (see workflow above) as `/api/v1/dags/scot_entity_[ENTITY_TYPE_PLACEHOLDER]_enrichment/dagRuns`, and assuming you have an entity type named `ipaddr`, your DAG should be named: `scot_entity_ipaddr_enrichment`
41
41
42
+
43
+
### MaxMind DB
44
+
This enrichment takes a given IPv4 and performs a lookup in a database file from MaxMind that can be stored locally on the same server where your Airflow instance is running. You can create an account under MaxMind's GeoLite2 service (https://www.maxmind.com/en/geolite2/signup) which will allow access to their db files which you can down load and use. In this particular example, the database file we expect to use is 'GeoLite2-City.mmdb'. Once you have this file, place it somewhere on your airlow server and note the file path for later.
45
+
46
+
47
+
### Sample DAG code
48
+
49
+
See the Airflow documentation for creating a new DAG. Copy the following code and save as described in the section above "Creating your first DAG"
50
+
42
51
```
43
-
import os
52
+
from airflow.utils.task_group import TaskGroup
44
53
import json
45
54
import pendulum
46
-
47
-
from airflow.utils.task_group import TaskGroup
48
55
from airflow.decorators import task, dag, task_group
49
56
from airflow.models import Variable
50
-
from airflow.operators.bash import BashOperator
51
57
from airflow.operators.python import get_current_context
52
58
from airflow.models.log import Log
53
59
from airflow.utils.db import create_session
54
-
from airflow.models import Variable
60
+
from airflow.timetables.trigger import CronTriggerTimetable
61
+
import logging
62
+
import os
55
63
56
64
@dag(
57
-
schedule_interval=None,
65
+
timetable=None, ## Add your schedule interval here, any valid cron notation (https://en.wikipedia.org/wiki/Cron) works here. Use CronTriggerTimeTable like this example: CronTriggerTimetable("0 */4 * * *", timezone="UTC")
## Airflow can store secrets, use that to store an API key for your SCOT instance and retrieve it using Variable.get()
128
+
## if you have a second/multiple SCOT instances you can store multiple API keys and retrieve the correct one based on which callback URL is being given
129
+
if callback_url is not None and "scot4-prod" in callback_url:
130
+
api_key = Variable.get('scot4-api-key')
131
+
elif callback_url is not None and "scot4-test" in callback_url:
132
+
api_key = Variable.get('scot4-test-api-key')
133
+
92
134
if callback_url is not None and results.get('enrichment_data') is not None and len(results['enrichment_data']) > 0:
93
135
for enrichment_data in results['enrichment_data']:
res = requests.post(url, data=json.dumps(enrichment_data), headers=
139
+
res = requests.post(url, data=json.dumps(enrichment_data), headers={
98
140
'Content-Type':'application/json',
99
141
'Authorization': f'apikey {api_key}',
100
142
})
101
143
if not res.ok:
102
144
raise Exception(f"Request to {url} failed: {res.status_code} {res.reason}: {res.text}")
145
+
print(f"Request to {url} succeeded: {res.status_code} {res.reason}: {res.text}")
146
+
147
+
148
+
## This is where we define the flow of the DAG.
149
+
## See https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html
150
+
## for more information about defining the control flow of DAGs.
151
+
152
+
task_results = enrichment_task()
153
+
add_enrichment_to_scot(task_results)
154
+
155
+
156
+
dag = scot4_simple_example()
157
+
```
158
+
103
159
160
+
### Testing your DAG
161
+
See the Airflow documentation for how to trigger a DAG from the Airflow UI. When you tigger a DAG, you're asked to supply values for the params that are specified in the code. You should already have the entity you want to test with registered/flaired in SCOT so that it has an entity ID.
104
162
105
-
task1_results = task1_enrichment()
106
-
add_enrichment_to_scot(task1_results)
163
+
- 'entity_id': The ID of the entity you're testing
164
+
- 'entity_value': The value 0f the entity (try the IP for maxmind.com)
- 'callback_url': the full URL to your SCOT API as defined in your SCOT-API config settings (ex. https://scot4-test.domain.com/api/v1)
107
167
108
-
dag = scot4_entity_ipaddr_enrichment()
109
-
```
168
+
Once you enter these fields and run your DAG, you'll see the progress and whether the perations completed successfully (green) or not (red). If the DAG failed, click on the red box indicating the portion of the DAG that failed and then select the log tab on the right to check for any errors. See the Airflow docs for details.
0 commit comments