Skip to content
5 changes: 4 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
3.0.0-Alpha
3.0-Alpha
* Include all DS Specific Features
* Includes CassandraHiveMetastore
* Structured Streaming Sink Support
Expand All @@ -12,6 +12,9 @@

*******************************************************************************

2.4.3
* Fix issue with Hadoop3 Compatibility

2.4.2
* Support for Scala 2.12 (SPARKC-458)

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

| What | Where |
| ---------- | ----- |
| Packages | [Spark Cassandra Connector Spark Packages Website](https://spark-packages.org/package/datastax/spark-cassandra-connector) |
| Community | Chat with us at [DataStax Academy's #spark-connector Slack channel](#slack) |
| Scala Docs | Most Recent Release (2.4.2): [Spark-Cassandra-Connector](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector/), [Embedded-Cassandra](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector-embedded/) |

| Community | Chat with us at [Datastax and Cassandra Q&A](https://community.datastax.com/index.html) |
| Scala Docs | Most Recent Release (2.4.3): [Spark-Cassandra-Connector](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector/), [Embedded-Cassandra](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector-embedded/) |
| Latest Production Release | [2.4.3](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.12/2.4.3) |
| Latest Preview Release | [3.0-alpha](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.12/3.0-alpha) |
## Features

*Lightning-fast cluster computing with Apache Spark™ and Apache Cassandra®.*
Expand Down Expand Up @@ -105,7 +105,7 @@ This project has also been published to the Maven Central Repository.
For SBT to download the connector binaries, sources and javadoc, put this in your project
SBT config:

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.2"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.3"

* The default Scala version for Spark 2.0+ is 2.11 please choose the appropriate build. See the
[FAQ](doc/FAQ.md) for more information
Expand Down
8 changes: 4 additions & 4 deletions doc/0_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ Configure a new Scala project with the Apache Spark and dependency.

The dependencies are easily retrieved via Maven Central

libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.4.2"
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.4.3"

The spark-packages libraries can also be used with spark-submit and spark shell, these
commands will place the connector and all of its dependencies on the path of the
Spark Driver and all Spark Executors.

$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3

For the list of available versions, see:
- https://spark-packages.org/package/datastax/spark-cassandra-connector
Expand Down Expand Up @@ -58,7 +58,7 @@ Run the `spark-shell` with the packages line for your version. To configure
the default Spark Configuration pass key value pairs with `--conf`

$SPARK_HOME/bin/spark-shell --conf spark.cassandra.connection.host=127.0.0.1 \
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3

This command would set the Spark Cassandra Connector parameter
`spark.cassandra.connection.host` to `127.0.0.1`. Change this
Expand Down
2 changes: 1 addition & 1 deletion doc/13_spark_shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Find additional versions at [Spark Packages](https://spark-packages.org/package/
```bash
cd spark/install/dir
#Include the --master if you want to run against a spark cluster and not local mode
./bin/spark-shell [--master sparkMasterAddress] --jars yourAssemblyJar --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2 --conf spark.cassandra.connection.host=yourCassandraClusterIp
./bin/spark-shell [--master sparkMasterAddress] --jars yourAssemblyJar --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3 --conf spark.cassandra.connection.host=yourCassandraClusterIp
```

By default spark will log everything to the console and this may be a bit of an overload. To change this copy and modify the `log4j.properties` template file
Expand Down
4 changes: 2 additions & 2 deletions doc/15_python.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ https://spark-packages.org/package/datastax/spark-cassandra-connector

```bash
./bin/pyspark \
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3
```

### Loading a DataFrame in Python
Expand Down Expand Up @@ -46,7 +46,7 @@ source and by specifying keyword arguments for `keyspace` and `table`.

### Saving a DataFrame in Python to Cassandra

A DataFrame can be saved to an *existing* Cassandra table by using the the `org.apache.spark.sql.cassandra` source and by specifying keyword arguments for `keyspace` and `table` and saving mode (`append`, `overwrite`, `error` or `ignore`, see [Data Sources API doc](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)).
A DataFrame can be saved to an *existing* Cassandra table by using the the `org.apache.spark.sql.cassandra` source and by specifying keyword arguments for `keyspace` and `table` and saving mode (`append`, `overwrite`, `error` or `ignore`, see [Data Sources API doc](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#save-modes)).

#### Example Saving to a Cassanra Table as a Pyspark DataFrame
```python
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ object MappedToGettableDataConverter extends StrictLogging {
for (i <- columnValues.indices)
columnValues(i) = converters(i).convert(columnValues(i))
struct.newInstance(columnValues: _*)
case None =>
case _ =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i'm just paranoid, but should we do None | null? I may just be paranoid (this wouldn't be the first time...)

Copy link
Author

@acustiq acustiq Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that the _ wildcard symbol catches both None and null as it matches anything:

case _ => // Wild card pattern -- matches anything

Source: https://docs.scala-lang.org/tutorials/FAQ/finding-symbols.html

null.asInstanceOf[struct.ValueRepr]
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,13 @@ class MappedToGettableDataConverterSpec extends FlatSpec with Matchers {
row.isNullAt(1) shouldEqual true
}

it should "convert null UDTs to null" in {
val userTable = newTable(loginColumn, passwordColumn)
val converter = MappedToGettableDataConverter[UserWithOption](userTable, userTable.columnRefs)
val row = converter.convert(null)
row shouldEqual null
}

case class UserWithNestedOption(login: String, address: Option[Address])

it should "convert None case class to null" in {
Expand Down