-
Notifications
You must be signed in to change notification settings - Fork 3
Create Upfront Index
Anil Shanbhag edited this page May 13, 2016
·
2 revisions
There are fabric scripts already configured to do this with 3 simple commands. Before you can use them, you need to do some configuration setup.
- First, run
jps
and make sure you have Hadoop, Spark, Zookeeper up and working. - Go to
scripts/fabfile/confs.py
. Change the appropriate settings oflocal_
or create a new conf entry to match your development environment.
Then,
fab setup:<your conf entry> create_table_info bulk_sample_gen create_robust_tree write_partitions
Here is what of the commands in the fab does:
-
bulk_sample_gen
runs on each of the machines and samples the input data files based on the sampling percentage specified in the conf. -
create_robust_tree
runs only on the master and creates an upfront partitioning tree based on the samples. -
write_partitions
runs on each of the machines, take the index as input and writes out the input data partitioned into HDFS.