From 56c2ad4b0c108afe4005b81d879e6cba6d9e0fa9 Mon Sep 17 00:00:00 2001 From: Hong Date: Fri, 3 Sep 2021 13:43:12 +0800 Subject: [PATCH] [SQL-DS-CACHE-201] Update guide for OAP 1.2.0 (#204) --- docs/Developer-Guide.md | 2 +- docs/User-Guide.md | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/Developer-Guide.md b/docs/Developer-Guide.md index 7294a930a..93fac8a8f 100644 --- a/docs/Developer-Guide.md +++ b/docs/Developer-Guide.md @@ -36,7 +36,7 @@ To use optimized Plasma cache with OAP, you need following components: ``` cd /tmp git clone https://github.com/oap-project/arrow.git -cd arrow && git checkout arrow-4.0.0-oap-1.1.1 +cd arrow && git checkout v4.0.0-oap-1.2.0 cd cpp mkdir release cd release diff --git a/docs/User-Guide.md b/docs/User-Guide.md index 8b19464f7..b267acecf 100644 --- a/docs/User-Guide.md +++ b/docs/User-Guide.md @@ -263,7 +263,7 @@ Socket Configuration -> Intel UPI General Configuration -> Stale AtoS : Disable For more information you can refer to [Quick Start Guide: Provision IntelĀ® Optaneā„¢ DC Persistent Memory](https://software.intel.com/content/www/us/en/develop/articles/quick-start-guide-configure-intel-optane-dc-persistent-memory-on-linux.html) -- SQL Data Source Cache uses Plasma as a node-level external cache service, the benefit of using external cache is data could be shared across process boundaries. [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/) is a high-performance shared-memory object store and a component of [Apache Arrow](https://github.com/apache/arrow). We have modified Plasma to support PMem, and make it open source on [oap-project-Arrow](https://github.com/oap-project/arrow/tree/arrow-4.0.0-oap-1.1.1) repo. If you have finished [OAP Installation Guide](OAP-Installation-Guide.md), Plasma will be automatically installed and then you just need copy `arrow-plasma-4.0.0.jar` to `$SPARK_HOME/jars`. For manual building and installation steps you can refer to [Plasma installation](./Developer-Guide.md#Plasma-installation). +- SQL Data Source Cache uses Plasma as a node-level external cache service, the benefit of using external cache is data could be shared across process boundaries. [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/) is a high-performance shared-memory object store and a component of [Apache Arrow](https://github.com/apache/arrow). We have modified Plasma to support PMem, and make it open source on [oap-project-Arrow](https://github.com/oap-project/arrow/tree/arrow-4.0.0-oap-1.2) repo. If you have finished [OAP Installation Guide](OAP-Installation-Guide.md), Plasma will be automatically installed and then you just need copy `arrow-plasma-4.0.0.jar` to `$SPARK_HOME/jars`. For manual building and installation steps you can refer to [Plasma installation](./Developer-Guide.md#Plasma-installation). - Refer to configuration below to apply external cache strategy and start Plasma service on each node and start your workload. @@ -280,11 +280,11 @@ spark.executor.instances 6 spark.sql.extensions org.apache.spark.sql.OapExtensions # absolute path of the jar on your working node, when in Yarn client mode -spark.files $HOME/miniconda2/envs/oapenv/oap_jars/plasma-sql-ds-cache--with-spark-.jar,$HOME/miniconda2/envs/oapenv/oap_jars/pmem-common--with-spark-.jar +spark.files $HOME/miniconda2/envs/oapenv/oap_jars/plasma-sql-ds-cache--with-spark-.jar,$HOME/miniconda2/envs/oapenv/oap_jars/pmem-common--with-spark-.jar,$HOME/miniconda2/envs/oapenv/oap_jars/arrow-plasma-4.0.0.jar # relative path to spark.files, just specify jar name in current dir, when in Yarn client mode -spark.executor.extraClassPath ./plasma-sql-ds-cache--with-spark-.jar:./pmem-common--with-spark-.jar +spark.executor.extraClassPath ./plasma-sql-ds-cache--with-spark-.jar:./pmem-common--with-spark-.jar:./arrow-plasma-4.0.0.jar # absolute path of the jar on your working node,when in Yarn client mode -spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/plasma-sql-ds-cache--with-spark-.jar:$HOME/miniconda2/envs/oapenv/oap_jars/pmem-common--with-spark-.jar +spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/plasma-sql-ds-cache--with-spark-.jar:$HOME/miniconda2/envs/oapenv/oap_jars/pmem-common--with-spark-.jar:$HOME/miniconda2/envs/oapenv/oap_jars/arrow-plasma-4.0.0.jar # for parquet file format, enable binary cache spark.sql.oap.parquet.binary.cache.enabled true @@ -355,7 +355,7 @@ Run ```yarn app -destroy plasma-store-service```to destroy it. This section provides instructions and tools for running TPC-DS queries to evaluate the cache performance of various configurations. The TPC-DS suite has many queries and we select 9 I/O intensive queries to simplify performance evaluation. -We created some tool scripts [oap-benchmark-tool.zip](https://github.com/oap-project/oap-tools/releases/download/v1.1.1-spark-3.1.1/oap-benchmark-tool.zip) to simplify running the workload. If you are already familiar with TPC-DS data generation and running a TPC-DS tool suite, skip our tool and use the TPC-DS tool suite directly. +We created some tool scripts [oap-benchmark-tool.zip](https://github.com/oap-project/oap-tools/releases/download/v1.2.0/oap-benchmark-tool.zip) to simplify running the workload. If you are already familiar with TPC-DS data generation and running a TPC-DS tool suite, skip our tool and use the TPC-DS tool suite directly. ### Prerequisites @@ -363,7 +363,7 @@ We created some tool scripts [oap-benchmark-tool.zip](https://github.com/oap-pro ### Prepare the Tool -1. Download [oap-benchmark-tool.zip](https://github.com/oap-project/oap-tools/releases/download/v1.1.1-spark-3.1.1/oap-benchmark-tool.zip) and unzip to a folder (for example, `oap-benchmark-tool` folder) on your working node. +1. Download [oap-benchmark-tool.zip](https://github.com/oap-project/oap-tools/releases/download/v1.2.0/oap-benchmark-tool.zip) and unzip to a folder (for example, `oap-benchmark-tool` folder) on your working node. 2. Copy `oap-benchmark-tool/tools/tpcds-kits` to ***ALL*** worker nodes under the same folder (for example, `/home/oap/tpcds-kits`). ### Generate TPC-DS Data