Exception for empty.tail #118

darthsuogles · 2017-08-09T04:26:32Z

tensorframes/src/main/scala/org/tensorframes/Shape.scala

Line 49 in 9a4f596

def tail: Shape = Shape(ds.tail)

https://stackoverflow.com/questions/18647874/safely-get-tail-of-array

phi-dbq · 2017-08-09T17:10:03Z

Below is a small example in which the output tensor is scalar.

vec_size = 17
num_vecs = 41 * sc.defaultParallelism

df = spark.createDataFrame([
    Row(idx=idx, vec=np.random.randn(vec_size).tolist())
    for idx in range(num_vecs)])

analyzed_df = tfs.analyze(df)
tfs.print_schema(analyzed_df)
print(analyzed_df.count())

with tf.Session():
    #x = tf.placeholder(tf.float64, shape=[None, vec_size])
    x = tfs.block(analyzed_df, 'vec')
    z = tf.reduce_mean(x)
    output_df = tfs.map_blocks([z], analyzed_df)

TensorFrame is capable to infer the block size as

 |-- idx: long (nullable = true) long[41]
 |-- vec: array (nullable = true) double[41,17]

But when trying to run output_df.show(), it throws the following error message.

Py4JJavaError: An error occurred while calling o848.buildDF.
: java.lang.UnsupportedOperationException: empty.tail
	at scala.collection.TraversableLike$class.tail(TraversableLike.scala:421)
	at scala.collection.mutable.ArrayOps$ofLong.scala$collection$IndexedSeqOptimized$$super$tail(ArrayOps.scala:246)
	at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:129)
	at scala.collection.mutable.ArrayOps$ofLong.tail(ArrayOps.scala:246)
	at org.tensorframes.Shape.tail(Shape.scala:49)
	at org.tensorframes.ColumnInformation$.structField(ColumnInformation.scala:82)
	at org.tensorframes.impl.DebugRowOps$$anonfun$29.apply(DebugRowOps.scala:357)
	at org.tensorframes.impl.DebugRowOps$$anonfun$29.apply(DebugRowOps.scala:351)
	at scala.collection.immutable.Stream.map(Stream.scala:418)
	at org.tensorframes.impl.DebugRowOps.mapBlocks(DebugRowOps.scala:351)
	at org.tensorframes.impl.DebugRowOps.mapBlocks(DebugRowOps.scala:294)
	at org.tensorframes.impl.PythonOpBuilder.buildDF(PythonInterface.scala:145)

phi-dbq · 2017-08-09T17:22:53Z

It seems that tf.reduce_mean changes the leading (i.e. batch) dimension, which would lead to the final DataFrame not having the same number of rows as the input DataFrame.
It will be nice to prompt the user that tensors in fetches must have the same leading dimension as the input tensors.

For this example, changing tf.reduce_mean(x) to tf.reduce_mean(x, axis=1) gives the desired output.

thunterdb · 2017-08-10T21:01:00Z

@phi-dbq thanks. Yes, this error should be made more explicit. Just to confirm, in that case the shape of tf.reduce_mean(x) is going to be [17], if I am not mistaken?

phi-dbq · 2017-08-10T21:05:34Z

@thunterdb thanks. Currently tf.reduce_mean without specifying the axis parameter will reduce along all dimensions. It became a scalar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception for empty.tail #118

Exception for empty.tail #118

darthsuogles commented Aug 9, 2017

phi-dbq commented Aug 9, 2017 •

edited

Loading

phi-dbq commented Aug 9, 2017 •

edited

Loading

thunterdb commented Aug 10, 2017

phi-dbq commented Aug 10, 2017

Exception for empty.tail #118

Exception for empty.tail #118

Comments

darthsuogles commented Aug 9, 2017

phi-dbq commented Aug 9, 2017 • edited Loading

phi-dbq commented Aug 9, 2017 • edited Loading

thunterdb commented Aug 10, 2017

phi-dbq commented Aug 10, 2017

phi-dbq commented Aug 9, 2017 •

edited

Loading

phi-dbq commented Aug 9, 2017 •

edited

Loading