-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an abstract type for AbstractTrees
to AbstractTrees.jl
#115
Comments
the hard thing here is that not all trees in abstract trees have a node, and those the do often have a more useful super type than tree node. |
@oscardssmith, the name of the new type doesn't matter for my issue. It's about having a type (in contrast to not having a type), because dispatch needs a type. We could also call it |
I'm currently implementing the So in order to avoid confusion, it would even be better to name the abstract type I need for the plot recipe Just to make the idea of the new type clearer: An object of (sub)type The question here is, how much of the
Comments on this are welcome 🤓. |
The interface section of the documentation currently says "All trees must define |
Thanks for the hint! In the top section it also states:
So my question from above is already clearly answered. That's good news! 😊 Then I would suggest that we add the new abstract type abstract type AbstractTree end to the Any object Would that be a solution everybody could agree on? |
@roland-KA what exactly is your use case for the There is also an issue with the |
Use Case@jlumpe the use case is, as described above, to enable a visualization of (arbitrary) decision trees. We are currently focused on decision tree implementations within MLJ, but the concept is not limited to that meta-package. The visualization should meet the following goals:
We have realised these goals by using a plot recipe and by requiring each decision tree to implement the A few months ago I already published this article on Towards Data Science, where I explain the whole concept.
No, I could indeed implement a plot recipe that depends exclusively on the
There are at least two reasons for an
... and perhaps it is a good occasion to grow the user base of AbstractTrees.jl and thus appreciating the work that you and your fellow contributors have put into that package 😊. Current state of implementationJust to make the situation clear. We already have
The only part that lacks in order to make the whole thing publicly available is this abstract type that allows us to call tree :: TreeImplementingTheAbstractTreesAPI
plot(tree) and thus enabling Tree vs. NodeYes there are indeed different ways to implement a tree. In some implementations the tree is simply represented by the top-most node. But for the use case described above, this distinction is irrelevant, as it relies only on And from a more general software engineering perspective, I think this distinction shouldn't show up in a type definition. It is the very purpose of an abstract data type (like a tree) to hide details of the implementation. But perhaps I just don't know what applications you have in mind, where this distinction should show up on a type level. Do you have any examples? BTW: This wouldn't conflict with the definition of an |
To make it easier to understand, I have created a diagram of the intended structure, which I described above: Here we can see, that The plot recipe Only a script that wants to visualize a tree has dependencies to several components: To the tree itself, to the plot recipe and to |
Use Case (Part 2)Perhaps I need to explain the use case a bit in more detail. Not everybody is familiar with plot recipes. Each plot recipe is defined on a type (that's why I need something like For that reason all tree implementations not only have to implement struct InfoNode{T} <: AbstractTree
node :: DecisionNode{T}
info :: NamedTuple
end The decision tree from And for plotting we pass actually an object of type |
Be good to get a decision here. @roland-KA 's suggestion is clear, and the use-case comprehensively explained. In view of discussion at #116, |
Hi alltogether, it's now almost one month since I explained my use case in detail and there hasn't been any feedback since. I understand that many may be (or have been) on vacation during this time. But it would be nice to get some answer, so we know how to proceed with our project, because this issue stalls our works completely. If you need more details on the use case, feel free to ask. If you object the proposal, please say so (and explain why) and of course the most pleasing answer would be that you add the abstract type within a reasonable timeframe 😃. |
Hi @jlumpe, @ExpandingMan, @timholy, @oscardssmith you seem to be the most active contributors/maintainers of this package, so I would appreciate your opinion, decision or comment on this issue. Can we add an abstract type |
Sorry, we've been pretty distracted by solving iteration without abusing the compiler, so those of us who've worked on this package recently have been very focused on that (I also am not a maintainer... I don't think, so I wasn't getting notifications on this thread). Anyway, I'm going to have to come back and read the above extremely carefully, but so far my take on your problem is that you have a whole bunch of types that all implement the AbstractTrees interface and you want them to have a common abstract type. In that case, the question this issue needs to answer is: should that type be defined in AbstractTrees.jl or somewhere in plot recipes world? My initial reaction to this is pretty firmly toward the latter, that you should define an abstract type somewhere in your own packages and that AbstractTrees.jl should not implement this. The reason is that trees in general do not have and cannot have a common abstract type. For example, nested arrays are tree nodes, but they cannot descend from a node type since they clearly need to be In spite of the above, it could still be argued that the So, my question at this point is: why not define an |
The Julia package An ADT is specified by a set of operations (thus defining an interface) and it has a name. Naming of such an ADT-interface can be done in Julia in two (complementary) ways:
Now to the question "should that type be defined in
The example of the "nested arrays" above which could be used to implement a TREE is no contradiction to this approach, since nobody is forced to use an abstract type IMHO, the "nested array" example is an unfortunate mixture of interface and implementation (which unfortunately can be seen often in the Julia world) and contradicts the very idea of an ADT because it exhibits the implementation in the interface. You only get the problem described above because these aspects are mixed. Therefore the sentence "trees in general do not have and cannot have a common abstract type" is a contradiction in itself -- a TREE is an abstract data type (in the computer-science sense). But that's another discussion. @ExpandingMan you write: "My main problem with that is that I don't see a clear use case for it." Well, our use case is clearly stated above and I think there may be more cases where users of the |
I certainly agree that there is something lacking in the design of Julia itself where this is concerned, though it's difficult to see what the solution should be and this is a much larger discussion. In the meantime, we simply have no choice about this. Other abstract types such as I am not dead set against adding an In my view it makes it much harder to argue for an There is some precedent here: Tables.jl is probably even more widely implemented than this package. It provides some special wrapper types for use cases that require transforming a generic table into something more specific, but there is no If there is some consensus to add an |
I'm glad to hear that you are open to a pragmatic solution 😊.
Of course the Computer science text books on "data structures & algorithms" typically include in a minimal TREE interface also operations like With respect to the question of usefulness: I was able to implement the complete plot recipe for plotting a tree on the basis of that function (plus |
I think a good question would be - how would this type even be used within the package? Besides defining |
No, it would not be used by AbstractTrees.jl. Nothing internal to AbstractTrees.jl would inherit from this. The purpose is not internal, but to allow more external packages to make use of AbstractTrees.jl. This is not about plotting per se; it just so happens that in the context of plotting and the way plot recipes work, there does not appear to be any way to avoid introducing this type somewhere. Plots.jl is very widely used, mature, and unlikely to change the way recipes work. I think all @roland-KA is requesting here is the addition of a single line of code:
and inviting the community to adopt the convention that any object subtyping |
The concern with these sorts of things is always about adding something dubious to a package with a ton of dependents. The fact that this hasn't been done in e.g. Tables.jl gives me some pause. I suppose my biggest concern is about code in other packages expecting to dispatch on Indeed, I'm finding the arguments both for and against this proposition similarly weak. The only part of any of this discussion that I find compelling is that if it is going to be defined somewhere it might as well be here. I guess that means I've been convinced that this should be added, though I'm not without misgivings. However, I don't think we are going to be able to call it |
Yes. The previous discussions made that clear, my bad. |
Then I would prepare a PR which adds an abstract type And about your concerns @ExpandingMan: I wouldn't call this small change "dubious". The purpose and its use case are clearly specified and not a single user of |
Well others will have to chime in on this as well, I'm not even an admin on this repo. The very next thing that has to be done on this package is that we MUST fix this. I might have an idea how to solve it, I'm going to turn my attention to it as soon as I finish up another project. I can add |
Ok, thanks for taking care of the issue! So we can expect to get an |
I'd feel somewhat better about having that fixed first mainly so we can be more sure about what all the cursor type signatures are going to be, but it's probably not strictly necessary. I can put it in an upcoming PR for that fix, but if you want to make one sooner it's possible it'll get merged. Again, I'm not even the person who'll make the decision here. @oscardssmith what's your take on adding this? |
I think I pretty much agree with you. I don't see it being especially useful, but I also don't think it will cause major problems. |
Oh, it would be great, if we could get it sooner in an upcoming PR (as it seems to me, that there is no easy solution for #117). |
Thanks for advancing the issue! 👍 |
I'm currently implementing a plot recipe for decision trees in order to be able to visualize all decision trees within the MLJ-package.
The goal is a plot recipe that
Plots.jl
Goal 1 is an inherent characteristic of plot recipes. So we have reached it just by using plot recipes. Goal 2 is realised by requiring each decision tree implementation which wants its trees to be visualized to implement a narrow part of the
AbstractTrees
-interface (namelychildren
andprintnode
).I have already done such an implementation for
DecisionTree.jl
(see https://github.com/JuliaAI/DecisionTree.jl/blob/dev/src/abstract_trees.jl). The plot recipe itself exists also (see JuliaAI/DecisionTree.jl#147; the recipe is btw so general that it can be applied to each tree implementing the above mentioned interface, it isn't restricted to decision trees).The only missing part to make all that work, is a type for dispatch. If we have a
tree
which we would like to plot (e.g. an instance ofDecisionTree
) then we want to be able to callplot(tree)
. In order forplot
to chose the correct plot recipe,tree
must be of the type that is associated with the plot recipe.As we want a recipe that is independent of any specific decision tree implementation, this type has to be independent of all these implementations. Therefore I have defined a new abstract type
AbstractNode
.The question is now, where this new type should be placed. A very natural place would be
AbstractTrees.jl
(as theAbstractTrees
-interface is already used as a common denominator).Therefore the question: Could such an abstract type be added to
AbstractTrees.jl
?BTW: If
AbstractTrees.jl
wouldn't be "just" an interface but define also a type (likeAbstractTree
), the whole problem wouldn't exist at all).@ablaom: Do you want to add some remarks?
The text was updated successfully, but these errors were encountered: