Macrometa Connector Databricks

The Macrometa Spark Connector for Databricks is a versatile and efficient integration tool that enables users to seamlessly connect Macrometa's real-time data streams and collections with Apache Spark within the Databricks environment. This comprehensive connector facilitates the ingestion, processing, and analysis of both streaming and batch data by leveraging Spark's advanced capabilities, allowing users to derive valuable insights and make data-driven decisions.

With its simple installation process and compatibility with Databricks Runtime and Scala, the Macrometa Spark Connector offers two main components: a Streaming Data Connector and a Collection Data Connector.

The Streaming Data Connector handles real-time data streams, while the Collection Data Connector focuses on batch data processing from Macrometa collections.

Source and Target operations for collections can be executed using the format com.macrometa.spark.collection.MacrometaTableProvider
Source and Target operations for streams can be executed using the format com.macrometa.spark.stream.MacrometaTableProvider

Prerequisites

Databricks Runtime 11.3 LTS(with Apache Spark 3.3.0)
Scala 2.12 or later
Macrometa account with access to streams

Considerations

When mapping from a Macrometa Array to Spark Array, we are using ArrayType which is a collection data type that extends the DataType class which is a superclass of all types in Spark. All elements of ArrayType should have the same type of elements.
During the process of auto inferring the schema, the Collection data connector fetches the first 50 documents from a collection and then determines the most frequent schema among them. Nevertheless, users are encouraged to specify their own schema definitions while creating the dataframe for enhanced accuracy.
Similarly, in the context of auto inferring the schema, the Streaming data connector retrieves the earliest unconsumed message from a stream and utilizes the schema of that message as the foundational schema. Yet, it is recommended for users to specify their own schema definitions while creating the dataframe to achieve optimal outcomes.

How to install the Macrometa Databricks Connector

Obtain the JAR file. You can obtain the JAR file for the connector through either of the following methods:

a. Using the Official GitHub Package: Download the pre-built JAR file from the official GitHub package for this repository. For example: app-0.0.1.jar. This is the recommended way for production usage.

b. Building from Source: Clone this repository by running git clone https://github.com/Macrometacorp/macrometa-connector-databricks.git, and then build the JAR file using Gradle. Open a terminal in the root folder of the project and execute the command: ./gradlew clean shadowJar. This method provides the latest code, but it may not be officially released, so it's not recommended for production environments.

The generated JAR file named 'macrometa-connector-databricks.jar' will be located in the app/build/libs directory. Upload the JAR file to your Databricks workspace using the Databricks CLI or the Databricks UI.
Attach the JAR file to your Databricks cluster by following the instructions in the Databricks documentation.

How to use the Macrometa Databricks Connector

Detailed steps on how to use collections and streams connectors are listed bellow:

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
app		app
gradle/wrapper		gradle/wrapper
.gitattributes		.gitattributes
.gitignore		.gitignore
GETTING_STARTED_READ_FROM_STREAM_WRITE_TO_COLLECTION.md		GETTING_STARTED_READ_FROM_STREAM_WRITE_TO_COLLECTION.md
GETTING_STARTED_WITH_COLLECTION_DATA_CONNECTOR.md		GETTING_STARTED_WITH_COLLECTION_DATA_CONNECTOR.md
GETTING_STARTED_WITH_STREAM_DATA_CONNECTOR.md		GETTING_STARTED_WITH_STREAM_DATA_CONNECTOR.md
LICENSE		LICENSE
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Macrometa Connector Databricks

Prerequisites

Considerations

How to install the Macrometa Databricks Connector

How to use the Macrometa Databricks Connector

About

Releases 6

Packages

Contributors 5

Languages

License

Macrometacorp/macrometa-connector-databricks

Folders and files

Latest commit

History

Repository files navigation

Macrometa Connector Databricks

Prerequisites

Considerations

How to install the Macrometa Databricks Connector

How to use the Macrometa Databricks Connector

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 5

Languages

Packages