GitHub - pwendell/spark-twitter-collection: Spark example of collecting tweets and loading into HDFS/S3

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
sbt		sbt
.gitignore		.gitignore
README.txt		README.txt
TwitterCollector.scala		TwitterCollector.scala
TwitterUtils.scala		TwitterUtils.scala
build.sbt		build.sbt
credentials.txt.template		credentials.txt.template

Repository files navigation

How to run this from scratch on EC2:

1. Launch Ubuntu 12.04 on Amazon (m1.medium)
2. sudo apt-get update
3. sudo apt-get install -y openjdk-7-jdk
4. sudo apt-get install -y git
5. git clone https://github.com/pwendell/spark-twitter-collection.git
export AWS_ACCESS_KEY_ID=XXX
export AWS_SECRET_ACCESS_KEY=YYY
export OUTPUT_BATCH_INTERVAL=3600
export OUTPUT_DIR=ZZZ
6. cp credentials.txt.template credentials.txt
7. Fill in Twitter credentials
8. sbt/sbt run