Compiling #2

saj9191 · 2018-07-18T23:12:45Z

Hello,
I'm trying to install spark on lambda. When I run

./dev/make-distribution.sh --name spark-lambda-2.1.0 --tgz -Phive -Phadoop-2.7 -Dhadoop.version=2.6.0-qds-0.4.13 -DskipTests

The Project Launcher fails and I get the following error.

[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.1.0: Failure to find com.hadoop.gplcompression:hadoop-lzo:jar:0.4.19 in https://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

I tried to explicitly add hadoop-lzo as a dependency in the launcher pom.xml, but I still get the same error. Is there something I need to download or change to get this to work?

Thanks!

venkata91 · 2018-07-20T03:30:49Z

Hi saj9191,

It seems like something changed in our side where we keep the maven artifacts, we'll fix it and update you here. Thanks for trying it out. Sorry for the inconvenience.

faromero · 2018-09-02T21:18:15Z

I am also having the same issue (also tried adding hadoop-lzo dependency manually to pom.xml with no success). Have there been any updates on resolving this issue?

venkata91 · 2018-09-04T17:05:08Z

We were also hitting this issue recently. I will get back with a fix soon and post it here. Thanks for taking your time to try it out.

faromero · 2018-09-04T17:27:56Z

I believe I have found a solution:
In spark-on-lambda/common/network-common/pom.xml, add the following dependency (as suggested previously):

<dependency>
  <groupId>com.hadoop.gplcompression</groupId>
  <artifactId>hadoop-lzo</artifactId>
  <version>0.4.19</version>
</dependency>

Then, in spark-on-lambda/pom.xml, add the following repository (which "houses" hadoop-lzo):

<repository>
  <id>twitter</id>
  <name>Twitter Repository</name>
  <url>http://maven.twttr.com</url>
</repository>

After this, I ran the make-distribution.sh command from your README and was able to build it all the way through.

venkata91 · 2018-09-04T17:32:19Z

Nice workaround! Let me also try it and update it.

venkata91 · 2018-09-04T17:34:19Z

Also may I know your use case for which you are trying it out or do you want to just try it out?

faromero · 2018-09-04T17:37:04Z

Thanks for working to update it!

We are working on a research project associated with using Lambda for what we call "interactive massively parallel" applications, and wanted to compare Spark-on-Lambda to current state-of-the-art, as well as our work!

By the way, from your blog post, do you have the data available that you use for sorting 100GB in under 10 minutes?

venkata91 · 2018-09-04T17:40:53Z

Interesting! Can you please elaborate a bit more on that? Btw the data is generated using Teragen utility from https://github.com/ehiggs/spark-terasort which you can use to generate the data.

faromero · 2018-09-04T17:51:11Z

You can view our work here: we call it gg, and while it was originally intended for compilation, it now supports general purpose applications (as simple as sorting and as complex as video encoding). Let me know if you have any questions about it (can be in a different forum instead of this issue thread)

I will try to run your sorting example and let you know if I have any issues!

venkata91 · 2018-10-20T05:53:45Z

Another easier workaround is to remove the pom.xml additions basically reverting the commit "Fix pom.xml to have the other Qubole repository location having 2.6.0... (2ca6c68)"

Build your package using this command - ./dev/make-distribution.sh --name spark-lambda-2.1.0 --tgz -Phive -Phadoop-2.7 -DskipTests

And finally add the below jars to classpath before starting spark-shell

1. wget http://central.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
2. wget http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar

Refer here - https://markobigdata.com/2017/04/23/manipulating-files-from-s3-with-apache-spark/

webroboteu · 2019-05-27T20:28:07Z

hi, venkata91, I wrote you an email. I'm looking for an advisor for my startup. It is a spark-based web scraping service. The idea is to use this serverless computation but I'm having problems. As soon as you have time I would like to deepen it.

venkata91 mentioned this issue May 29, 2019

s3a error #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling #2

Compiling #2

saj9191 commented Jul 18, 2018

venkata91 commented Jul 20, 2018

faromero commented Sep 2, 2018 •

edited

Loading

venkata91 commented Sep 4, 2018

faromero commented Sep 4, 2018

venkata91 commented Sep 4, 2018

venkata91 commented Sep 4, 2018

faromero commented Sep 4, 2018

venkata91 commented Sep 4, 2018

faromero commented Sep 4, 2018

venkata91 commented Oct 20, 2018

webroboteu commented May 27, 2019

Compiling #2

Compiling #2

Comments

saj9191 commented Jul 18, 2018

venkata91 commented Jul 20, 2018

faromero commented Sep 2, 2018 • edited Loading

venkata91 commented Sep 4, 2018

faromero commented Sep 4, 2018

venkata91 commented Sep 4, 2018

venkata91 commented Sep 4, 2018

faromero commented Sep 4, 2018

venkata91 commented Sep 4, 2018

faromero commented Sep 4, 2018

venkata91 commented Oct 20, 2018

webroboteu commented May 27, 2019

faromero commented Sep 2, 2018 •

edited

Loading