-
Notifications
You must be signed in to change notification settings - Fork 21
Architecture
Valentin Kuznetsov edited this page Feb 7, 2018
·
4 revisions
The CMSSpark architecture is shown in figure below:
It consists of several components:
- a wrapper script run_spark
- a user based python template code (we call it workflow) which should implement initialization of
Spark context and data processing pipeline
- dbs_aaa.py represents an example of python template to aggregate data between CMS DBS and AAA records on HDFS. More examples can be found in the same location
- cern_monit.py represents an example of python template to send data to CERN MONIT system
run_spark
loads provided Python template code and perform data processing
pipeline. It stores data back to HDFS where they can be inspected. Optionally,
end-user can call run_spark cern_monit.py
bundle to put data into CERN MONIT
system (via Stomp AMQ call to specified end-point).
All python templated code are based on PySpark architecture.