-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Breaking Change for DataValidation #4240
Draft
oleibman
wants to merge
5
commits into
PHPOffice:master
Choose a base branch
from
oleibman:issue797
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+211
−112
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix PHPOffice#797 (marked stale in 2018, but now reopened). Fix PHPOffice#4091. DataValidation is specified in the Xml at the sheet level rather than the cell level. PhpSpreadsheet, however, currently requires a cell and a data validation object for each cell in a range. As a consequence, it may exhaust memory allocating objects that are never really needed. If it reads a spreadsheet which applies DataValidation to an entire column, the reader creates a million cells and a million DataValidation objects. An additional problem is that, when it saves the spreadsheet, it creates a row entry for each of the million rows even if they contain no data. The code is changed to locate DataValidation for a cell by running the DataValidation list to see if the Sqref for any entry matches the cell in question. This eliminates the need to require a DataValidation entry for each cell; indeed, it eliminates the need to create a cell just because it is subject to DataValidation. Initial tests are very encouraging. The memory-exhausting spreadsheet from 797 required over 1GB of memory. With this change in place, that is reduced to 6MB, and DataValidation works as expected for all cells in the range. This change is a work in progress. It needs formal tests. I need to see how it works with inserting or removing rows and columns. I need to see how Excel handles conflicting entries. My early tests indicate that it at least doesn't do anything irrational (like claiming corruption) when dealing with conflicts. It appears that, if I have a DataValidation applied to column A, and another applied to cell A5, whichever appears first in the Xml is the ultimate arbiter of what rule applies to A5. If this change is merged, it will be a breaking range. It will require a new major release (probably PhpSpreadsheet 4.0), and will not happen for at least a couple of months, and probably later.
Existing code for ReferenceHelper adjusts sqref for DataValidation when rows/columns inserted/deleted, but does not adjust formula (where list may be expressed as a range). That is a bug, and I think it needs to be fixed as part of this PR. |
Also add first tests that DV formulas are updated correctly after row/column insert/delete. Many more tests are still needed.
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this pull request
Dec 1, 2024
This has been requested a few times, most recently issue PHPOffice#4092. Because it's a breaking change, I haven't proceeded with it. But, because I have a breaking change PR PHPOffice#4240 already in the queue, this gives a plan for getting where we want to go (under the extremely likely assumption that most users don't deal with Csv files with Mac line endings). This PR doesn't change the current behavior, but it gets us to a state where a single-line change will be sufficient when the time comes for a new major release.
12 tasks
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this pull request
Dec 1, 2024
When Html Writer outputs a cell with a boolean value, the result will either be 1 or null-string; neither of these is optimal for anyone looking at the resulting html. Html Reader already has the ability to recognize data types using the html `data-type` attribute, but Html Writer doesn't use it. This PR adds the ability to generate that attribute for booleans. It will generate a string value appropriate for the locale when it encounters a boolean. Html Reader, when it encounters `data-type="b"`, will interpret the result as true if the value is 1 or a string value recognized as true in any locale; it will interpret the result as false if the value is 0, null-string, null, or a string value recognized as false in any locale; if none of the above, it will leave the value as an unchanged string. So, Reader will wind up with the correct result even if its locale is different than what Writer used. Because this is a breaking change, it is opt-in. You need to call `Writer::setBetterBoolean(true)` in order for it take effect. The current default value for that property is false. When it is time to introduce breaking changes (see PR PHPOffice#4240), the default will be changed to true.
11 tasks
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this pull request
Dec 11, 2024
Fix PHPOffice#4269. In response to issue PHPOffice#456 PR PHPOffice#515, `forceFullCalc` was added to workbook.xml whenever `preCalculateFormulas` was set to false. It is not clear why this should have been needed; attempts to reproduce the error in the original 6.5-year-old issue are unable to reproduce it today. Nevertheless, it is, or was, there for a reason. Today, the forceFullCalc option sets an option where formulas *might* not be recalculated when a cell used in the formula changes. I have not succeeded in finding a situation where it doesn't automatically recalculate, but it probably exists on complicated spreadsheets. To overcome this possibility, Excel offers a button which can be used to recalculate on demand. By itself, this might not be a terrible problem. However, it seems to come with the strange property that any spreadsheets opened at the same time as a forceFullCalc spreadsheet operate as if they too specified forceFullCalc. That *is* a problem, especially since users when closing a spreasheet affected in this way will be prompted to save it even when they haven't changed anything. I am not willing to make a BC change at this time, although I might consider it in future (PR PHPOffice#4240). For now, I am adding a new property `forceFullCalc` with setter (no getter needed) to Xlsx Writer. That property can be `null` (default, in which case the Xml attribute is set as today), or `false` or `true` (in which case the Xml attribute will be set to the Writer attribute). I think that, when `preCalculateFormulas` is set to false, the calling application should give consideration to setting `forceFullCalc` to false as well. All other situations should just use the default.
11 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #797 (marked stale in 2018, but now reopened). Fix #4091. DataValidation is specified in the Xml at the sheet level rather than the cell level. PhpSpreadsheet, however, currently requires a cell and a data validation object for each cell in a range. As a consequence, it may exhaust memory allocating objects that are never really needed. If it reads a spreadsheet which applies DataValidation to an entire column, the reader creates a million cells and a million DataValidation objects. An additional problem is that, when it saves the spreadsheet, it creates a row entry for each of the million rows even if they contain no data.
The code is changed to locate DataValidation for a cell by running the DataValidation list to see if the Sqref for any entry matches the cell in question. This eliminates the need to require a DataValidation entry for each cell; indeed, it eliminates the need to create a cell just because it is subject to DataValidation. Initial tests are very encouraging. The memory-exhausting spreadsheet from 797 required over 1GB of memory. With this change in place, that is reduced to 6MB, and DataValidation works as expected for all cells in the range.
This change is a work in progress. It needs formal tests. I need to see how it works with inserting or removing rows and columns. I need to see how Excel handles conflicting entries. My early tests indicate that it at least doesn't do anything irrational (like claiming corruption) when dealing with conflicts. It appears that, if I have a DataValidation applied to column A, and another applied to cell A5, whichever appears first in the Xml is the ultimate arbiter of what rule applies to A5.
If this change is merged, it will be a breaking range. It will require a new major release (probably PhpSpreadsheet 4.0), and will not happen for at least a couple of months, and probably later.
This is:
Checklist:
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.