-
Download
flume-sources-1.0-SNAPSHOT.jar
from this link. -
Copy the downloaded
flume-sources-1.0-SNAPSHOT.jar
to the Flume classpath Copyflume-sources-1.0-SNAPSHOT.jar
to/usr/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar
/usr/lib/flume-ng/lib/lume-sources-1.0-SNAPSHOT.jar
/var/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar
if those places don't exist,
sudo mkdir
them. -
Edit (or create)
/root/flume/conf/flume-twitter.conf
file to be like shown below.TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = YourConsumerKey TwitterAgent.sources.Twitter.consumerSecret = YourConsumerSecret TwitterAgent.sources.Twitter.accessToken = YourAccessToken TwitterAgent.sources.Twitter.accessTokenSecret = YourAccessTokenSecret TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scient$ TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs:///user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 TwitterAgent.channels.MemChannel.transactionCapacity = 100
-
Run this command.
flume-ng agent --name TwitterAgent --conf /root/flume/conf --conf-file /root/flume/conf/flume-twitter.conf
-
Download json-serde from this link.
-
And copy it to
/usr/lib/hive/lib/
or useADD JAR
in Hive. -
Then, run
hive
and enter this command to create a table from data which created by flume.CREATE EXTERNAL TABLE tweets ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN, retweet_count INT, retweeted_status STRUCT< text:STRING, `user`:STRUCT<screen_name:STRING,name:STRING>>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>, text STRING, `user` STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:INT, time_zone:STRING>, in_reply_to_screen_name STRING ) ROW FORMAT SERDE "org.openx.data.jsonserde.JsonSerDe" LOCATION "/user/flume/tweets";
-
And the table will be created. However, enter this command in hive to make sure that the table has been created correctly.
select * from tweets;