-
Notifications
You must be signed in to change notification settings - Fork 4
Home
João Antonio Ferreira edited this page May 21, 2017
·
9 revisions
If you get this error:
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
You will need git-lfs
support to download Large TAR and GZ files
sudo apt-get install software-properties-common
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:git-core/ppa
sudo apt-get update
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
git lfs track "*.tar"
git lfs track "*.gz"
cd install/
git checkout .
docker run -i -t -h my-spark -p 8095:8080 --rm parana/spark bash
or, if you need see the test folder inside container, use :
docker run -i -t -h my-spark -p 8095:8080 --rm -v $PWD/test:/mongo parana/spark bash
Inside the container use this:
cd /mongo/myspark
spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0,br.com.joao-parana:myspark:1.0-SNAPSHOT
On your host computer you can open the Web Browser
https://www.mongodb.com/presentations/webinar-introducing-the-spark-connector-for-mongodb
val df = spark.read.
format("org.apache.spark.csv").
option("header", false).
option("inferSchema", "true").
csv("data/customer.csv")
df.printSchema
val df=spark.read.
format("org.apache.spark.csv").
option("header",true).
option("inferSchema", "true").
csv("data/region.csv")
df.printSchema