-
Notifications
You must be signed in to change notification settings - Fork 50
Batch Import and Export
hd-shell> server start --app batch_jobs
Running: /home/mpollack/projects/spring-hadoop-samples/shell/target/spring-hadoop-shell/runtime/bin/server -batchA
dmin Server started.
To view the status of the server, use the command 'server status'
hd-shell> server status
server is running
To view the log of the server, use the command 'server log'. Here we only show the last two lines of the log
hd-shell> server log
01:14:07.540 [server-1] INFO DispatcherServlet - FrameworkServlet 'Batch Servlet': initialization completed in 155
0 ms 01:14:07.542 [server-1] INFO log - Started [email protected]:8081
You can launch the UI to browse jobs, execute jobs etc.
hd-shell> launch --console batch_admin
Or use the shell admin commands
You can launch the UI for the database browser by typing
hd-shell>launch --console database
The information to type into the database UI is
Driver Class: org.h2.Driver
JDBC URL: jdbc:h2:tcp://localhost/mem:productdb
User Name: sa
Password:
Then press the Connect button. You can see the contents in the PRODUCT table by selecting it and then pressing the Run button.
Make sure you have an empty directory in HDFS.
hd-shell> hadoop fs -rmr /import/data/products
command is:hadoop fs -rmr /import/data/products
Deleted hdfs://localhost:9000/import/data/products
hd-shell> hadoop fs -mkdir /import/data/products
command is:hadoop fs -mkdir /import/data/products
Created hdfs://localhost:9000/import/data/products
Then list the batch jobs that are available
hd-shell> admin job-list
name description executionCount launchable incrementable
----------------- -------------- -------------- ---------- -------------
wordcountBatchJob No description 0 true false
exportProducts No description 0 true false
importProducts No description 0 true false
And start the import to HDFS job
hd-shell> admin job-start --jobName importProducts
id name status startTime duration exitCode
-- -------------- --------- --------- -------- ---------
1 importProducts COMPLETED 10:06:28 00:00:00 COMPLETED
To run the job again, clean the input directory and add on --jobParameters run=2 to the job-start command so that the input parameters can be unique.
You can view the content of the files in HDFS
hd-shell> hadoop fs -ls /import/data/products
command is:hadoop fs -ls /import/data/products
Found 6 items
-rw-r--r-- 3 mpollack supergroup 114 2013-02-26 10:06 /import/data/products/product-0.txt
-rw-r--r-- 3 mpollack supergroup 113 2013-02-26 10:06 /import/data/products/product-1.txt
-rw-r--r-- 3 mpollack supergroup 122 2013-02-26 10:06 /import/data/products/product-2.txt
-rw-r--r-- 3 mpollack supergroup 119 2013-02-26 10:06 /import/data/products/product-3.txt
-rw-r--r-- 3 mpollack supergroup 136 2013-02-26 10:06 /import/data/products/product-4.txt
-rw-r--r-- 3 mpollack supergroup 0 2013-02-26 10:06 /import/data/products/product-5.txt
To then export the data from HDFS back into the database, but filtering some of the data, run the export job.
hd-shell> admin job-start --jobName exportProducts --jobParameters hdfsSourceDirectory=/import/data/products/product*.txt
id name status startTime duration exitCode
-- -------------- --------- --------- -------- ---------
6 exportProducts COMPLETED 10:08:38 00:00:00 COMPLETED
In the database browser UI, select the table PRODUCTS_EXPORT and then press the RUN button to see the contents of the table.
To run the export job again, add an unique key-value pair to the jobParameters, e.g. "run=2". Note that JobParameters are separated via a comma.