-
First, our team just tested 0.6.4. We <3 <3 the support for delta! 👯 I'm not an expert in Databricks btw, just somebody that can write a little Scala. So we have this problem. We have multiple notebooks in Databricks, and we would like to attach the notebook name to the extra properties of Spline. In Databricks we have something like this: import os
token = os.environ["NOTEBOOK_PATH"] = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
print("Configured: ", os.environ.get("NOTEBOOK_PATH")) I thought that maybe I could set an env variable and then read it in the filter: class EnvFilter(configuration: Configuration) extends PostProcessingFilter {
def createExtraProperties() = {
val notebookName = sys.env("NOTEBOOK_PATH")
Map(
"notebookName" -> notebookName
)
}
override def processExecutionEvent(event: ExecutionEvent, ctx: HarvestingContext): ExecutionEvent =
event.withAddedExtra(createExtraProperties())
override def processExecutionPlan(plan: ExecutionPlan, ctx: HarvestingContext ): ExecutionPlan =
plan.withAddedExtra(createExtraProperties())
override def processReadOperation(op: ReadOperation, ctx: HarvestingContext ): ReadOperation =
op.withAddedExtra(createExtraProperties())
override def processWriteOperation(op: WriteOperation, ctx: HarvestingContext): WriteOperation =
op.withAddedExtra(createExtraProperties())
override def processDataOperation(op: DataOperation, ctx: HarvestingContext ): DataOperation =
op.withAddedExtra(createExtraProperties())
} But the value is always empty. I also tried writing it into the config but is also not in the config. I'm using declarative initialization. I guess I could move to programmatic initialization and set the value then. I guess, my general question. In Databricks, how can you "set" or "pass" information to the filter since the filter only has access to the Configuration in the constructor. Is there like a singleton somewhere else to read these values? Thanks in advance! And thanks for this great project. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
Maybe to provide a little more context. I tried this: import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
class DBUtilsFilter(configuration: Configuration) extends PostProcessingFilter {
def createExtraProperties() = {
Map(
"notebookName" -> dbutils.notebook.getContext.notebookPath
)
}
override def processExecutionEvent(event: ExecutionEvent, ctx: HarvestingContext): ExecutionEvent =
event.withAddedExtra(createExtraProperties())
override def processExecutionPlan(plan: ExecutionPlan, ctx: HarvestingContext ): ExecutionPlan =
plan.withAddedExtra(createExtraProperties())
override def processReadOperation(op: ReadOperation, ctx: HarvestingContext ): ReadOperation =
op.withAddedExtra(createExtraProperties())
override def processWriteOperation(op: WriteOperation, ctx: HarvestingContext): WriteOperation =
op.withAddedExtra(createExtraProperties())
override def processDataOperation(op: DataOperation, ctx: HarvestingContext ): DataOperation =
op.withAddedExtra(createExtraProperties())
} Compiled it with this SBT: scalaVersion := "2.12.14"
name := "dataopsfilter"
organization := "com.myorg"
version := "1.0"
// You can define other libraries as dependencies in your build like this:
libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.2"
libraryDependencies += "za.co.absa.spline.agent.spark" %% "agent-core" % "0.6.2"
libraryDependencies += "com.databricks" % "dbutils-api_2.12" % "0.0.5" Error:
|
Beta Was this translation helpful? Give feedback.
-
I think the env variable doesn't work because the JVM process might be already running at the time you set the env var in the notebook, and thus one isn't visible to Spline. I'm not sure about Programmatic initialization would hardly help here because neither of those cases is related to the time of calling a filter constructor. Try using system properties for the purpose instead of environment variables. // in the notebook
sys.props("NOTEBOOK_PATH") = ...
// in the `createExtraProperties()` method
val notebookPath = sys.props("NOTEBOOK_PATH") |
Beta Was this translation helpful? Give feedback.
-
One note about the configuration: The For example, this is configuration for DataSourcePasswordReplacingFilter: spline.postProcessingFilter.dsPasswordReplace.className=za.co.absa.spline.harvester.postprocessing.DataSourcePasswordReplacingFilter
spline.postProcessingFilter.dsPasswordReplace.replacement=***** and in the filter constructor I would get the replacement string like this: def this(conf: Configuration) = this(
conf.getRequiredString("replacement")
) In other words: only keys starting You can see the configuration in The configuration must be set before the Filter is initialized. |
Beta Was this translation helpful? Give feedback.
-
Thanks so much! Yes, properties worked just fine! I did something like this in the filter: class PropertyFilter(configuration: Configuration) extends PostProcessingFilter {
def createExtraProperties() = {
Map(
"notebookName" -> System.getProperty("spline.config.notebook")
)
}
override def processExecutionEvent(event: ExecutionEvent, ctx: HarvestingContext): ExecutionEvent =
event.withAddedExtra(createExtraProperties())
override def processExecutionPlan(plan: ExecutionPlan, ctx: HarvestingContext ): ExecutionPlan =
plan.withAddedExtra(createExtraProperties())
override def processReadOperation(op: ReadOperation, ctx: HarvestingContext ): ReadOperation =
op.withAddedExtra(createExtraProperties())
override def processWriteOperation(op: WriteOperation, ctx: HarvestingContext): WriteOperation =
op.withAddedExtra(createExtraProperties())
override def processDataOperation(op: DataOperation, ctx: HarvestingContext ): DataOperation =
op.withAddedExtra(createExtraProperties())
}
And in the notebook I have a command that sets the property like this:
|
Beta Was this translation helpful? Give feedback.
One note about the configuration:
The
Configuration
injected in the filter via constructor is always only subset of the configuration that have the filter namespace.For example, this is configuration for DataSourcePasswordReplacingFilter:
and in the filter constructor I would get the replacement string like this:
In other words: only keys starting
with spline.postProcessingFilter.dsPasswordReplace.
are available ins…