Development for save_as_text_file #8

gnilrets · 2015-06-11T23:02:05Z

So I want to contribute to this project and thought I'd start with something simple like saveAsTextFile. Here's a spec that tries to save an RDD to a text file. The thing I don't understand is that when the RDD is created from a text file, it works fine. However, when the RDD is created from a parallelize command, the data that is written to the file becomes garbage. Can you please help provide some insight into what's going wrong here?

1) Spark::RDD .save_as_text_file saves the par_rdd
   Failure/Error: expect(result).to eq (0..5).collect { |i| i.to_s }

   expected: ["0", "1", "2", "3", "4", "5"]
        got: ["[B@2e293b50", "[B@14f518dd", "[B@67335fea", "[B@6b4b2ad1", "[B@13e8365f", "[B@7249a12e"]

   (compared using ==)
 # ./spec/lib/save_spec.rb:34:in `block (4 levels) in <top (required)>'
 # ./spec/lib/save_spec.rb:30:in `block (3 levels) in <top (required)>'

ondra-m · 2015-06-12T06:43:27Z

lib/spark/rdd.rb

+    #   rdd.save_as_text_file(path)
+    #
+    def save_as_text_file(path)
+      jrdd.saveAsTextFile(path)


Check https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L1426

Development for save_as_text_file

27263d0

ondra-m reviewed Jun 12, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development for save_as_text_file #8

Development for save_as_text_file #8

gnilrets commented Jun 11, 2015

ondra-m Jun 12, 2015

Development for save_as_text_file #8

Are you sure you want to change the base?

Development for save_as_text_file #8

Conversation

gnilrets commented Jun 11, 2015

ondra-m Jun 12, 2015

Choose a reason for hiding this comment