Stricter defaults for concat, combine, open_mfdataset, merge

### Is your feature request related to a problem?

The defaults for `concat` are excessively permissive: `data_vars="all", coords="different", compat="no_conflicts", join="outer"`. This [comment](https://github.com/pydata/xarray/issues/1385#issuecomment-1958761334) illustrates why this can be hard to predict or understand: a seemingly unrelated option `decode_cf` controls whether a variable is in `data_vars` or `coords`, and can result in wildly different concatenation behaviour.

1. This always concatenates data_vars along `concat_dim` even if they did not have that dimension to begin with.
2. If the same coordinate var exists in different datasets/files, they will be sequentially compared for equality to decide whether they get concatenated.
3. The outer join (applied along all dimensions that are not `concat_dim`) can result in very large datasets due to small floating points differences in the indexes, and also questionable behaviour with staggered grid datasets.
4. "no_conflicts" basically picks the first not-NaN value after aligning all datasets, but is quite slow (we should be using `duck_array_ops.nanfirst` here I think).

While "convenient" this really just makes the default experience quite bad with hard-to-understand slowdowns. 


### Describe the solution you'd like

I propose we migrate to `data_vars="minimal", coords="minimal", join="exact", compat="override"`. This should
1. only concatenate `data_vars` and `coords` variables when they already have `concat_dim`.
2. For any variables that do not have `concat_dim`, it will blindly pick them from the first file.
3. `join="exact"` will prevent ballooning of dimension sizes due to floating point inequalities.
4. These options will totally avoid any data reads unless explicitly requested by the user.


Unfortunately, this has a pretty big blast radius so we'd need a long deprecation cycle.



### Describe alternatives you've considered

_No response_

### Additional context

xref https://github.com/pydata/xarray/issues/4824
xref https://github.com/pydata/xarray/issues/1385
xref https://github.com/pydata/xarray/issues/8231
xref https://github.com/pydata/xarray/issues/5381
xref https://github.com/pydata/xarray/issues/2064
xref https://github.com/pydata/xarray/issues/2217


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Stricter defaults for concat, combine, open_mfdataset, merge #8778

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Stricter defaults for concat, combine, open_mfdataset, merge #8778

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions