Skip to content

Commit 2330033

Browse files
[SDP] Python decorators
1 parent f40aaad commit 2330033

File tree

2 files changed

+89
-13
lines changed

2 files changed

+89
-13
lines changed

docs/declarative-pipelines/PipelinesHandler.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ handlePipelinesCommand(
5151

5252
[handlePipelinesCommand](#handlePipelinesCommand)...FIXME
5353

54-
### DEFINE_DATASET { #DEFINE_DATASET }
54+
### <span id="DefineDataset"> DEFINE_DATASET { #DEFINE_DATASET }
5555

5656
[handlePipelinesCommand](#handlePipelinesCommand) prints out the following INFO message to the logs:
5757

docs/declarative-pipelines/index.md

Lines changed: 88 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -132,41 +132,117 @@ from pyspark import pipelines as dp
132132

133133
Declarative Pipelines uses the following [Python decorators](https://peps.python.org/pep-0318/) to describe tables and views:
134134

135-
* [@dp.append_flow](#append_flow)
136-
* [create_streaming_table](#create_streaming_table)
137-
* [@dp.materialized_view](#materialized_view) for materialized views
138-
* [@dp.table](#table) for streaming and batch tables
139-
* [@dp.temporary_view](#temporary_view)
135+
* [@dp.append_flow](#append_flow) for append-only flows
136+
* [@dp.create_streaming_table](#create_streaming_table) for streaming tables
137+
* [@dp.materialized_view](#materialized_view) for materialized views (with supporting flows)
138+
* [@dp.table](#table) for streaming and batch tables (with supporting flows)
139+
* [@dp.temporary_view](#temporary_view) for temporary views (with supporting flows)
140140

141141
### @dp.append_flow { #append_flow }
142142

143143
```py
144144
append_flow(
145-
*,
146-
target: str,
147-
name: Optional[str] = None,
148-
spark_conf: Optional[Dict[str, str]] = None,
145+
*,
146+
target: str,
147+
name: Optional[str] = None,
148+
spark_conf: Optional[Dict[str, str]] = None,
149149
) -> Callable[[QueryFunction], None]
150150
```
151151

152-
`append_flow` defines a flow in a pipeline.
152+
Registers an (append) flow in a pipeline.
153153

154154
`target` is the name of the dataset (_destination_) this flow writes to.
155155

156-
Sends a `DefineFlow` pipeline command for execution (to be handled by [PipelinesHandler](PipelinesHandler.md#DefineFlow) on a Spark Connect server).
156+
Sends a `DefineFlow` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineFlow) on a Spark Connect server).
157157

158158
### @dp.create_streaming_table { #create_streaming_table }
159159

160+
```py
161+
create_streaming_table(
162+
name: str,
163+
*,
164+
comment: Optional[str] = None,
165+
table_properties: Optional[Dict[str, str]] = None,
166+
partition_cols: Optional[List[str]] = None,
167+
schema: Optional[Union[StructType, str]] = None,
168+
format: Optional[str] = None,
169+
) -> None
170+
```
171+
172+
Registers a `StreamingTable` dataset in a pipeline.
173+
174+
Sends a `DefineDataset` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineDataset) on a Spark Connect server).
175+
176+
`dataset_type` is `TABLE`.
177+
160178
### @dp.materialized_view { #materialized_view }
161179

162-
Creates a [MaterializedView](MaterializedView.md) (for a table whose contents are defined to be the result of a query).
180+
```py
181+
materialized_view(
182+
query_function: Optional[QueryFunction] = None,
183+
*,
184+
name: Optional[str] = None,
185+
comment: Optional[str] = None,
186+
spark_conf: Optional[Dict[str, str]] = None,
187+
table_properties: Optional[Dict[str, str]] = None,
188+
partition_cols: Optional[List[str]] = None,
189+
schema: Optional[Union[StructType, str]] = None,
190+
format: Optional[str] = None,
191+
) -> Union[Callable[[QueryFunction], None], None]
192+
```
193+
194+
Registers a [MaterializedView](MaterializedView.md) dataset and the corresponding `Flow` in a pipeline.
195+
196+
For the `MaterializedView`, `materialized_view` sends a `DefineDataset` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineDataset) on a Spark Connect server).
197+
198+
* `dataset_type` is `MATERIALIZED_VIEW`.
199+
200+
For the `Flow`, `materialized_view` sends a `DefineFlow` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineFlow) on a Spark Connect server).
163201

164202
### @dp.table { #table }
165203

204+
```py
205+
table(
206+
query_function: Optional[QueryFunction] = None,
207+
*,
208+
name: Optional[str] = None,
209+
comment: Optional[str] = None,
210+
spark_conf: Optional[Dict[str, str]] = None,
211+
table_properties: Optional[Dict[str, str]] = None,
212+
partition_cols: Optional[List[str]] = None,
213+
schema: Optional[Union[StructType, str]] = None,
214+
format: Optional[str] = None,
215+
) -> Union[Callable[[QueryFunction], None], None]
216+
```
217+
218+
Registers a `StreamingTable` dataset and the corresponding `Flow` in a pipeline.
219+
220+
For the `StreamingTable`, `table` sends a `DefineDataset` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineDataset) on a Spark Connect server).
221+
222+
`dataset_type` is `TABLE`.
223+
224+
For the `Flow`, `table` sends a `DefineFlow` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineFlow) on a Spark Connect server).
225+
166226
### @dp.temporary_view { #temporary_view }
167227

228+
```py
229+
temporary_view(
230+
query_function: Optional[QueryFunction] = None,
231+
*,
232+
name: Optional[str] = None,
233+
comment: Optional[str] = None,
234+
spark_conf: Optional[Dict[str, str]] = None,
235+
) -> Union[Callable[[QueryFunction], None], None]
236+
```
237+
168238
[Registers](GraphElementRegistry.md#register_dataset) a `TemporaryView` dataset and a [Flow](Flow.md) in the [GraphElementRegistry](GraphElementRegistry.md#register_flow).
169239

240+
For the `TemporaryView`, `temporary_view` sends a `DefineDataset` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineDataset) on a Spark Connect server).
241+
242+
`dataset_type` is `TEMPORARY_VIEW`.
243+
244+
For the `Flow`, `temporary_view` sends a `DefineFlow` pipeline command for execution (to [PipelinesHandler](PipelinesHandler.md#DefineFlow) on a Spark Connect server).
245+
170246
## SQL
171247

172248
Spark Declarative Pipelines supports SQL language to define data processing pipelines.

0 commit comments

Comments
 (0)