-
Notifications
You must be signed in to change notification settings - Fork 115
Add dev config for nested column support #466
Conversation
@andrei-ionescu I created a temp config for disabling nested column support (for v0.5 release). |
@sezruby Please provide a clear explanation of the reason to add this flag. Is there a technical reason to add it and have this "nested column based index creation feature" disabled by default? In the project are other features that are partially supported and I don't understand why this is different and needs to have this new flag. By the way, adding this flag cannot stop one to set it and have the nested column based index creation enabled. One more thing, if such a nested column based index is created, when they'll try to use it, it will not be applied. The results will be correct and impact is only that performance wise Hyperspace does not provide enough performance in this case. The user may just open some tickets in regards to it. On the other hand, I offered my time to work on adding the full nested field support into Hyperspace for v0.5 but due to different priorities and lack of time from your team, the nested field support had to be postponed and I agreed with it, but with this PR I'm under the impression that this feature is in your way and needs to be removed (more work from my side to put it back). Maybe I'm getting it wrong and please correct me if so.
|
@andrei-ionescu Sorry for the lack of explanation. We need to release v0.5 this month to deliver some bug fixes. It's ok to deliver incomplete PRs without the temp config while developing, but for release, we need to disable (or exclude) incomplete features to avoid unnecessary issue handling. It's a temporal / hidden config, so I don't expect anyone will use the config. |
@sezruby The feature of creating an index over a nested field is also hidden, the users may not use it at all. And this is the way Hyperspace did work before I added it. Before adding this functionality I remember that there hasn't been any error specific to the fact that nested fields are not supported - it just did not work. From my perspective with or without it Hyperspace works the same as before. So, let me see if I got right what you're saying:
My questions are:
@rapoth, @imback82: I may need you input here are there are some product related questions that I would like to get some answers. Thank you. |
1=>
in current master:
with this fix:
From this result, I'll
IMO quick failure and a temp config for incomplete / unstable features are a kind of best practice for development. 2=>
This PR doesn't mean that Hyperspace won't support nested field at all in the future, and there's no such decision between our team. This is a quick fix for release, just because I strongly thought that we should disable incomplete features before release to avoid any confusion. It was my mistake that I didn't inform you first before this PR and give enough information about it. 3=> 4=> @rapoth and @imback82 are OOO until next week / next month, so there will be some delay on response. |
Having a feature toggle for a non-trivial under-tested new feature is a good thing. Maybe I should make one for DataSkippingIndex, too 😄 The PR message can be reworded like "Add a feature toggle for nested columns". I think we can drop "temp" in the config key, and make the feature fully pluggable at will. Or maybe "dev" instead of "temp". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# Conflicts: # src/test/scala/com/microsoft/hyperspace/index/CreateIndexNestedTest.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to add a flag to disable this functionality. Just check if there is any nested field in the config and throw the needed exception.
Tests can be disabled with ignore
for this release.
I can develop later by removing the exists(_.isNested)
condition and re-enabling the tests.
@sezruby, @clee704: Thanks for clarifying things.
As @clee704 noted, the same approach should be to any big feature regardless if is developed by your internal team or by someone from the open source community.
throw HyperspaceException("Index config is not applicable to dataframe schema.") | ||
} | ||
|
||
// valid state check | ||
// TODO: Temporarily block creating indexes using nested columns until it's fully supported. | ||
if (!(HyperspaceConf.nestedColumnEnabled(spark) || resolvedColumns.get.forall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why adding a flag? Just check if there is any nested column and throw. And disable the tests.
if (!(HyperspaceConf.nestedColumnEnabled(spark) || resolvedColumns.get.forall( | |
if (resolvedColumns.get.exists(_.isNested)) { |
@sezruby I need a clear time frame when to get back working on the nested field feature. I need to plan my time as I also have other things on my plate and I would like as much as possible to stick to that time frame. |
I'm not sure what do you mean by this. There is no half-implemented feature on the master branch except this. I'm going to add a feature toggle for DataSkippingIndex which is not merged yet, but you are against using a feature toggle. Can you explain why? |
You guys have been merging different PRs with refactor in master that address a part of a future feature. For me that is a partial implemented feature. I don't want to go into this discussion.
What I wanted to say is that I'm glad that you come with this idea of having a unified approach when developing big features.
It is not that I'm against a flag. It is just the fact that there is a simpler approach to disable it. For the "nested feature" the simplest thing to do is to directly throw if a nested field is detected and disable the tests. When I'll resume developing the feature I'll remove the throw lines and enable the tests. Doing it with a flag all require do some more extra cleanup. |
Refactoring is not a feature. A feature provides new functionality to users. Refactoring does not. |
Currently we're not doing it but disabling the tests can drop the code coverage and it would be an issue if we are going to add coverage check to CI. Having a non-tested, unused code sitting around in the codebase doesn't feel right. But if you insist then I'm fine and I'll leave it up to @sezruby. |
I already said that I prefer not to get into this discussion but now that you mentioned it I'll give some feedback. You know the phrase: "don't fix it if it works!" In the majority of the cases refactoring happens due to new features that needs to be added. So, with this being said, any refactor is a partial feature, at least from my perspective. |
I'd like to keep the dev config approach as I don't want to break the tests without knowing it. @andrei-ionescu please note that @clee704 and I are NOT the kind of person who doesn't fix the code just because it's working. @imback82 has even higher standards. BTW WHY ARE YOU ARGUING WITH US? I don't understand what you're doing here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling temporarily via config seems reasonable to me. Thanks @sezruby!
I forgot to add: the nested column support is only relevant to CoveringIndex. Other index types might support it without any change to the framework. For example, DataSkippingIndex can support nested columns out of the box because it doesn't directly store the column values in the index. Therefore, a more correct fix would be blocking nested columns in CoveringIndex. But this is a minor issue as currently we only have CoveringIndex now. This is just something to remember when we continue to work on the nested column support later. |
@sezruby I'm not arguing with you. I just propose different solutions than what you already did choose to do and I try to compare the differences between your proposal and mine and build a case for my proposal. The fact that it is in contradiction with what you are planning is in fact part of the development itself.
I never did think of you guys having low standards. On the contrary, my impression is that you have pretty high standards. I'm used to work in all kinds of projects (start-up, research, PoCs, production level, etc) and I'm comfortable in any of them. From my experience there should be a balance in regards the standards too, as these may slow the development in the early stages. There are things that I see that you can improve in regards to doing open development. The biggest thing that in my opinion is hindering the open development is that there is a "double standard development" -- what's on your direct interests is treated differently than other things. This is understandable up to a point but after that it drives developers off the project. You need to find the balance. There are other things as well in regards to open development but I'll stop here. If you want to discuss about these open dev things we can have a separate context. |
I'm getting back to my ask which didn't get an answer yet. Can you provide a time frame for me so that I can plan my time accordingly? |
I'm not sure but you can resume after merging #461 - early or mid July. |
What is the context for this pull request?
What changes were proposed in this pull request?
Disable index creation with nested columns until it's fully supported.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test