Skip to content
This repository has been archived by the owner on Apr 5, 2022. It is now read-only.

Batch Import and Export

markpollack edited this page Feb 26, 2013 · 1 revision

Running the import job

hd-shell> server start --app batch_jobs 
Running: /home/mpollack/projects/spring-hadoop-samples/shell/target/spring-hadoop-shell/runtime/bin/server -batchA

dmin Server started.

To view the status of the server, use the command 'server status'

hd-shell> server status
server is running

To view the log of the server, use the command 'server log'. Here we only show the last two lines of the log

hd-shell> server log

01:14:07.540 [server-1] INFO DispatcherServlet - FrameworkServlet 'Batch Servlet': initialization completed in 155

0 ms 01:14:07.542 [server-1] INFO log - Started [email protected]:8081

You can launch the UI to browse jobs, execute jobs etc.

hd-shell> launch --console batch_admin 

Or use the shell admin commands

You can launch the UI for the database browser by typing

hd-shell>launch --console database

The information to type into the database UI is

        Driver Class: org.h2.Driver
        JDBC URL: jdbc:h2:tcp://localhost/mem:productdb
        User Name: sa
        Password: 

Then press the Connect button. You can see the contents in the PRODUCT table by selecting it and then pressing the Run button.

Make sure you have an empty directory in HDFS.

 hd-shell> hadoop fs -rmr /import/data/products
 command is:hadoop fs -rmr /import/data/products

 Deleted hdfs://localhost:9000/import/data/products

 hd-shell> hadoop fs -mkdir /import/data/products
 command is:hadoop fs -mkdir /import/data/products

 Created hdfs://localhost:9000/import/data/products

Then list the batch jobs that are available

 hd-shell> admin job-list
 name               description     executionCount  launchable  incrementable
 -----------------  --------------  --------------  ----------  -------------
 wordcountBatchJob  No description  0               true        false        
 exportProducts     No description  0               true        false        
 importProducts     No description  0               true        false        

And start the import to HDFS job

 hd-shell> admin job-start --jobName importProducts
 id  name            status     startTime  duration  exitCode 
 --  --------------  ---------  ---------  --------  ---------
 1   importProducts  COMPLETED  10:06:28   00:00:00  COMPLETED

To run the job again, clean the input directory and add on --jobParameters run=2 to the job-start command so that the input parameters can be unique.

You can view the content of the files in HDFS

 hd-shell> hadoop fs -ls /import/data/products
 command is:hadoop fs -ls /import/data/products
 Found 6 items
 -rw-r--r--   3 mpollack supergroup        114 2013-02-26 10:06 /import/data/products/product-0.txt
 -rw-r--r--   3 mpollack supergroup        113 2013-02-26 10:06 /import/data/products/product-1.txt
 -rw-r--r--   3 mpollack supergroup        122 2013-02-26 10:06 /import/data/products/product-2.txt
 -rw-r--r--   3 mpollack supergroup        119 2013-02-26 10:06 /import/data/products/product-3.txt
 -rw-r--r--   3 mpollack supergroup        136 2013-02-26 10:06 /import/data/products/product-4.txt
 -rw-r--r--   3 mpollack supergroup          0 2013-02-26 10:06 /import/data/products/product-5.txt

Running the export job

To then export the data from HDFS back into the database, but filtering some of the data, run the export job.

 hd-shell> admin job-start --jobName exportProducts --jobParameters hdfsSourceDirectory=/import/data/products/product*.txt
 id  name            status     startTime  duration  exitCode 
 --  --------------  ---------  ---------  --------  ---------
 6   exportProducts  COMPLETED  10:08:38   00:00:00  COMPLETED

In the database browser UI, select the table PRODUCTS_EXPORT and then press the RUN button to see the contents of the table.

To run the export job again, add an unique key-value pair to the jobParameters, e.g. "run=2". Note that JobParameters are separated via a comma.

Clone this wiki locally