Document how SDP datasets are stored and refreshed

### Are you willing to submit PR?

- [x] Yes. I can contribute a fix for this bug independently.

### What is the problem?

The [Spark Declarative Pipelines programming guide](https://spark.apache.org/docs/latest/declarative-pipelines-programming-guide.html) does not explain how datasets are stored and refreshed internally. Key information missing includes:

- **Default table format**: SDP creates tables using Spark's default format (`parquet` via `spark.sql.sources.default`), but this is not documented. Users don't know what format their tables will be in or how to change it.
- **Materialized view refresh behavior**: Materialized views perform a full recomputation (TRUNCATE + append) on every pipeline run. This is fundamentally different from database-native materialized views (e.g., PostgreSQL) that support incremental refresh. Users need to understand this to plan for performance and cost.
- **Streaming table checkpoint requirement**: Streaming tables require a checkpoint directory on a Hadoop-compatible file system, but the relationship between the `storage` field in `spark-pipeline.yml` and checkpoint behavior is not explained.
- **Full refresh semantics**: The `--full-refresh` / `--full-refresh-all` CLI options are documented, but the actual effect on each dataset type is not described.

### How to reproduce

Read the current programming guide and try to answer: "What format are my tables stored in?" or "What happens to my materialized view data on each pipeline run?"

### What is the expected behavior?

The programming guide should include a section explaining how datasets are stored and refreshed, covering table format configuration, materialized view refresh mechanics, streaming table checkpoint requirements, and full refresh behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how SDP datasets are stored and refreshed #55276

Are you willing to submit PR?

What is the problem?

How to reproduce

What is the expected behavior?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Document how SDP datasets are stored and refreshed #55276

Description

Are you willing to submit PR?

What is the problem?

How to reproduce

What is the expected behavior?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions