Releases: dagster-io/dagster
0.7.5
New
-
Added the
IntSource
type, which lets integers be set from environment variables in config. -
You may now set tags on pipeline definitions. These will resolve in the following cases:
- Loading in the playground view in Dagit will pre-populate the tag container.
- Loading partition sets from the preset/config picker will pre-populate the tag container with
the union of pipeline tags and partition tags, with partition tags taking precedence. - Executing from the CLI will generate runs with the pipeline tags.
- Executing programmatically using the
execute_pipeline
api will create a run with the union
of pipeline tags andRunConfig
tags, withRunConfig
tags taking precedence. - Scheduled runs (both launched and executed) will have the union of pipeline tags and the
schedule tags function, with the schedule tags taking precedence.
-
Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this. -
We now export the
SolidExecutionContext
in the public API so that users can correctly type hint
solid compute functions.
Dagit
- Pipeline run tags are now preserved when resuming/retrying from Dagit.
- Scheduled run stats are now grouped by partition.
- A "preparing" section has been added to the execution viewer. This shows steps that are in
progress of starting execution. - Markers emitted by the underlying execution engines are now visualized in the Dagit execution
timeline.
Bugfix
- Resume/retry now works as expected in the presence of solids that yield optional outputs.
- Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that wereNone
. - Fixed an issue with attempting to set
threads_per_worker
on Dask distributed clusters.
dagster-postgres
- All postgres config may now be set using environment variables in config.
dagster-aws
- The
s3_resource
now exposes alist_objects_v2
method corresponding to the underlying boto3
API. (Thanks, @basilvetas!) - Added the
redshift_resource
to access Redshift databases.
dagster-k8s
- The
K8sRunLauncher
config now includes theload_kubeconfig
andkubeconfig_file
options.
Documentation
- Fixes and improvements.
Dependencies
- dagster-airflow no longer pins its werkzeug dependency.
Community
-
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.If you'd like to opt in to telemetry, please add the following to
$DAGSTER_HOME/dagster.yaml
:telemetry: enabled: true
-
Thanks to @basilvetas and @hspak for their contributions!
0.7.4
New
- It is now possible to use Postgres to back schedule storage by configuring
dagster_postgres.PostgresScheduleStorage
on the instance. - Added the
execute_pipeline_with_mode
API to allow executing a pipeline in test with a specific
mode without having to specifyRunConfig
. - Experimental support for retries in the Celery executor.
- It is now possible to set run-level priorities for backfills run using the Celery executor by
passing--celery-base-priority
todagster pipeline backfill
. - Added the
@weekly
schedule decorator.
Deprecations
- The
dagster-ge
library has been removed from this release due to drift from the underlying
Great Expectations implementation.
dagster-pandas
PandasColumn
now includes anis_optional
flag, replacing the previous
ColumnExistsConstraint
.- You can now pass the
ignore_missing_values flag
toPandasColumn
in order to apply column
constraints only to the non-missing rows in a column.
dagster-k8s
- The Helm chart now includes provision for an Ingress and for multiple Celery queues.
Documentation
- Improvements and fixes.
0.7.3
New
- It is now possible to configure a dagit instance to disable executing pipeline runs in a local
subprocess. - Resource initialization, teardown, and associated failure states now emit structured events
visible in Dagit. Structured events for pipeline errors and multiprocess execution have been
consolidated and rationalized. - Support Redis queue provider in
dagster-k8s
Helm chart. - Support external postgresql in
dagster-k8s
Helm chart.
Bugfix
- Fixed an issue with inaccurate timings on some resource initializations.
- Fixed an issue that could cause the multiprocess engine to spin forever.
- Fixed an issue with default value resolution when a config value was set using
SourceString
. - Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
- Fixed an issue with where the CLI command
dagster schedule up
would fail in certain scenarios
with theSystemCronScheduler
.
Pandas
- Column constraints can now be configured to permit NaN values.
Dagstermill
- Removed a spurious dependency on sklearn.
Docs
- Improvements and fixes to docs.
- Restored dagster.readthedocs.io.
Experimental
- An initial implementation of solid retries, throwing a
RetryRequested
exception, was added.
This API is experimental and likely to change.
Other
- Renamed property
runtime_type
todagster_type
in definitions. The following are deprecated
and will be removed in a future version.InputDefinition.runtime_type
is deprecated. UseInputDefinition.dagster_type
instead.OutputDefinition.runtime_type
is deprecated. UseOutputDefinition.dagster_type
instead.CompositeSolidDefinition.all_runtime_types
is deprecated. UseCompositeSolidDefinition.all_dagster_types
instead.SolidDefinition.all_runtime_types
is deprecated. UseSolidDefinition.all_dagster_types
instead.PipelineDefinition.has_runtime_type
is deprecated. UsePipelineDefinition.has_dagster_type
instead.PipelineDefinition.runtime_type_named
is deprecated. UsePipelineDefinition.dagster_type_named
instead.PipelineDefinition.all_runtime_types
is deprecated. UsePipelineDefinition.all_dagster_types
instead.
0.7.2
Docs
- New docs site at docs.dagster.io.
- dagster.readthedocs.io is currently stale due to availability issues.
New
- Improvements to S3 Resource. (Thanks @dwallace0723!)
- Better error messages in Dagit.
- Better font/styling support in Dagit.
- Changed
OutputDefinition
to takeis_required
rather thanis_optional
argument. This is to
remain consistent with changes toField
in 0.7.1 and to avoid confusion
with python's typing and dagster's definition ofOptional
, which indicates None-ability,
rather than existence.is_optional
is deprecated and will be removed in a future version. - Added support for Flower in dagster-k8s.
- Added support for environment variable config in dagster-snowflake.
Bugfixes
- Improved performance in Dagit waterfall view.
- Fixed bug when executing solids downstream of a skipped solid.
- Improved navigation experience for pipelines in Dagit.
- Fixed for the dagster-aws CLI tool.
- Fixed issue starting Dagit without DAGSTER_HOME set on windows.
- Fixed pipeline subset execution in partition-based schedules.
0.7.1
Dagit
- Dagit now looks up an available port on which to run when the default port is
not available. (Thanks @rparrapy!)
dagster_pandas
- Hydration and materialization are now configurable on
dagster_pandas
dataframes.
dagster_aws
- The
s3_resource
no longer uses an unsigned session by default.
Bugfixes
- Type check messages are now displayed in Dagit.
- Failure metadata is now surfaced in Dagit.
- Dagit now correctly displays the execution time of steps that error.
- Error messages now appear correctly in console logging.
- GCS storage is now more robust to transient failures.
- Fixed an issue where some event logs could be duplicated in Dagit.
- Fixed an issue when reading config from an environment variable that wasn't set.
- Fixed an issue when loading a repository or pipeline from a file target on Windows.
- Fixed an issue where deleted runs could cause the scheduler page to crash in Dagit.
Documentation
- Expanded and improved docs and error messages.
Waiting To Exhale
🎆 🚢 🎆 Dagster 0.7.0: Waiting To Exhale 😤 😌 🍵
We are pleased to announce version 0.7.0 of Dagster, codenamed “Waiting To Exhale”. We set out to make Dagster a solution for production-grade pipelines on modern cloud infrastructure. In service of that goal, we needed to fill missing gaps and incorporate feedback from the community at large.
Our last release, 0.6.0, expanded Dagster from local developer experience to a hostable product, allowing for scheduling, execution, and monitoring of pipelines in the cloud.
This release goes further, supporting pipelines with 100s and 1000s of nodes, deployable to modern, scalable cloud infrastructure, with dramatically improved monitoring tools, as well as other features.
Given this, 0.7.0 introduces the following:
- Revamped, Scalable Dagit A completely redesigned Dagit with a more intuitive navigation structure, beautiful look-and-feel, and massive performance improvements to handle pipelines with hundreds or even thousands of nodes.
- Execution Viewer Executing and historical runs within Dagit uses a new live-updating, queryable waterfall viewer. See below for a preview of the new UI:
https://media.giphy.com/media/Rhx6ujovXlvuKaLCGY/giphy.gif
- A Dagster-K8s library which provides the ability to launch runs in ephemeral Kubernetes Pods, as well as an early helm chart for executing pipelines.
- A Dagster-Celery library designed to work with K8s that provides global resource management using dedicated queues, and distributed execution of dagster pipelines across a cluster.
- Streamlined scheduler configuration and new backfill APIs and tools to help manage your scheduled workflows in production.
- A Dagster-Pandas integration that provides useful APIs for dataframe validation, summary statistics emission, and auto-documentation in dagit so that you can better understand and control how data flows through your pipelines.
- Redesigned documentation, examples, and guides to help flesh out the core ideas behind the system.
Warning
There are a substantial number of breaking changes in the 0.7.0 release. These changes effect the scheduler system, config system, required resources, and the type system. We apologize for the thrash, and thank you for bearing with us!
For more info on changes check out the following resources:
Changelog: https://github.com/dagster-io/dagster/blob/master/CHANGES.md
0.7.0 migration guide: https://github.com/dagster-io/dagster/blob/master/070_MIGRATION.md
0.4.0
API Changes
- There is now a new top-level configuration section
storage
which controls whether or not
execution should store intermediate values and the history of pipeline runs on the filesystem,
on S3, or in memory. Thedagster
CLI now includes options to list and wipe pipeline run
history. Facilities are provided for user-defined types to override the default serialization
used for storage. - Similarily, there is a new configuration for
RunConfig
where the user can specify
intermediate value storage via an API. OutputDefinition
now contains an explicitis_optional
parameter and defaults to being
not optional.- New functionality in
dagster.check
:is_list
- New functionality in
dagster.seven
: py23-compatibleFileNotFoundError
,json.dump
,
json.dumps
. - Dagster default logging is now multiline for readability.
- The
Nothing
type now allows dependencies to be constructed between solids that do not have
data dependencies. - Many error messages have been improved.
throw_on_user_error
has been renamed toraise_on_error
in all APIs, public and private
GraphQL
- The GraphQL layer has been extracted out of Dagit into a separate dagster-graphql package.
startSubplanExecution
has been replaced byexecutePlan
.startPipelineExecution
now supports reexecution of pipeline subsets.
Dagit
- It is now possible to reexecute subsets of a pipeline run from Dagit.
- Dagit's
Execute
tab now opens runs in separate browser tabs and a newRuns
tab allows you to
browse and view historical runs. - Dagit no longer scaffolds configuration when creating new
Execute
tabs. This functionality will
be refined and revisited in the future. - Dagit's
Explore
tab is more performant on large DAGs. - The
dagit -q
command line flag has been deprecated in favor of a separate command-line
dagster-graphql
utility. - The execute button is now greyed out when Dagit is offline.
- The Dagit UI now includes more contextual cues to make the solid in focus and its connections
more salient. - Dagit no longer offers to open materializations on your machine. Clicking an on-disk
materialization now copies the path to your clipboard. - Pressing Ctrl-Enter now starts execution in Dagit's Execute tab.
- Dagit properly shows List and Nullable types in the DAG view.
Dagster-Airflow
- Dagster-Airflow includes functions to dynamically generate containerized (
DockerOperator
-based)
and uncontainerized (PythonOperator
-based) Airflow DAGs from Dagster pipelines and config.
Libraries
- Dagster integration code with AWS, Great Expectations, Pandas, Pyspark, Snowflake, and Spark
has been reorganized into a new top-level libraries directory. These modules are now
importable asdagster_aws
,dagster_ge
,dagster_pandas
,dagster_pyspark
,
dagster_snowflake
, anddagster_spark
. - Removed dagster-sqlalchemy and dagma
Examples
- Added the event-pipeline-demo, a realistic web event data pipeline using Spark and Scala.
- Added the Pyspark pagerank example, which demonstrates how to incrementally introduce dagster
into existing data processing workflows.
Documentation
- Docs have been expanded, reorganized, and reformatted.
0.2.8.post3
Hotfix to not put config values in error messages. Had to re-release because of packaging errors uploaded pypi (.pyc files or similar were included)
v.0.2.8.post0
Pushing an update because dagit 0.2.8 was getting out-of-date code.