Releases: cas-bigdatalab/piflow
Releases · cas-bigdatalab/piflow
PiFlow V1.0 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V0.9 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V0.8 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V0.7-spark-3.0.0 Release
Requirements
- JDK 1.8
- Scala 2.12.10
- Spark-3.0.0
- Hadoop-3.2.0
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V0.7 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V0.6 Release
Requirements
- JDK 1.8 or newer
- Spark-2.1.0 (piflow jar of other spark version should be build by the code)
- Hadoop-2.6.0 (piflow jar of other hadoop version should be build by the code)
- Hive-1.2.1(if you need to use hive)
Configure
#server Ip and Port
server.ip=10.0.88.70
server.port=8002
#Spark master and deploy mode
spark.master=yarn
spark.deploy.mode=cluster
#yarn related configurations
yarn.resourcemanager.hostname=10.0.88.70
yarn.resourcemanager.address=10.0.88.70:8032
yarn.access.namenode=hdfs://10.0.88.70:9000
yarn.stagingDir=hdfs://10.0.88.70:9000/tmp/
yarn.jars=hdfs://10.0.88.70:9000/user/spark/share/lib/*.jar
yarn.url=http://10.0.88.70:8088/ws/v1/cluster/apps/
#hive metaStore uris
hive.metastore.uris=thrift://10.0.88.71:9083
#piflow server jar folder, please change this parameter to your path
piflow.bundle=/data/piflow/piflow-server-v0.6/lib/piflow-server-0.9.jar
#hdfs path for checkpoint、debug、increment,please create these folders first
checkpoint.path=hdfs://10.0.88.70:9000/user/piflow/checkpoints/
debug.path=hdfs://10.0.88.70:9000/user/piflow/debug/
increment.path=hdfs://10.0.88.70:9000/user/piflow/increment/
#set 0 if you don not want to show data in log
data.show=10
#h2 db port
h2.port=50002
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V0.5 Release
Requirements
- JDK 1.8 or newer
- Spark-2.1.0
- Hadoop-2.6.0
- Hive-1.2.1
- Other products you want to use, such as ElasticSearch, Solr,MongoDB etc.
Configure
-
config.properties
#server ip and port server.ip=10.0.86.191 server.port=8002 #h2 db port h2.port=50002 #spark and yarn config spark.master=yarn spark.deploy.mode=cluster yarn.resourcemanager.hostname=10.0.86.191 yarn.resourcemanager.address=10.0.86.191:8032 yarn.access.namenode=hdfs://10.0.86.191:9000 yarn.stagingDir=hdfs://10.0.86.191:9000/tmp/ yarn.jars=hdfs://10.0.86.191:9000/user/spark/share/lib/*.jar yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/ #hive config hive.metastore.uris=thrift://10.0.86.191:9083 #piflow-server.jar path, remenber to modify piflow.bundle=/opt/piflowServer/piflow-server-0.9.jar #checkpoint hdfs path checkpoint.path=hdfs://10.0.86.89:9000/piflow/checkpoints/ #debug path debug.path=hdfs://10.0.88.191:9000/piflow/debug/ #yarn url yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/ #the count of data shown in log, set 0 to show no data data.show=10
Run Command
- start: ./start.sh or nohup ./start.sh > piflow.log 2>&1 &
- stop: ./stop.sh