-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor or redesign implementation of graphs with directed and undirected edges #65
Comments
In a more general case, we also have the case of an ADMG and a MAG. In the case of a MAG, the difference is that there is only one edge allowed among nodes, and that there are no primitive inducing paths among non-adjacent nodes. |
So the way i would imagine it is we have a base directed-undirected edge graph class, and then constructors for each type of graph that enforce whatever requirements the specific graph requires. I'll take a stab at this some time this week. |
Do we need a constructor for each type of graph necessarily? Perhaps just assume user knows what they are doing and then provide a function to check validity? Or is that too expensive... Do you think we should also have a base directed-undirected-bidirected-circle graph class then as well? This could lead to a PAG, ADMG (causal DAG w/ bidirectional edges) and a MAG? We just need functional checks to make sure the corresponding PAG, ADMG and MAG are valid? |
This is sort of how ananke is designed - we have a base graph class that has all the edge types and then every kind of graph we actually use is a subclass of that. If we had a directed-undirected-bidirected-circle graph class then everything could just be constructed from that class I suppose. I imagine that for e.g. ADMGs you would just set the undirected and circle parts to empty graphs by default? |
Referencing discussion in #67. As far as names we could always go DUGraph, DBGraph, DBCUGraph. I am still of the opinion that we should create an API so that users don't have to call these graph types directly, but that we have better organization and less duplication of code in the implementation. Best of both worlds, I suppose. Consequently the naming shouldn't be that important. So we have 4 types of edges - directed, bidirected, circle, and undirected. We could in theory create all 4 choose k subtypes for k > 0 (starting with the three that you mentioned above). But if we are creating e.g. a directed-bidirected graph, and then setting the bidirected argument to be empty, it seems somewhat arbitrary? Why would we not take the directed-bidirected-circle-undirected, and then just set all but directed to be empty grpahs? It seems like we could create the one DBCUGraph with all four edge types and then for each kind of graph we actualyl want to consider (PAG, ADMG, DAG, CPDAG etc.) we create a wrapper around DBCUGraph that zeroes out the appropriate edge graphs. If we are seriously following the networkx style of detaching the graphical representation from the operations on the graph (including checks such as validity) then this implementation should suffice, seeing as there should not be much need for any inheritance behaviors. |
I propose to start with the following, and will open a PR/RFC to this effect:
Future proposals:
|
To summarize and clarify, is your proposal open to the idea of either:
? I don't have a great reason to support one over the other currently :p. But see below on some cons of supporting explicitly the class types ADMG, PAG, DAG, CPDAG, etc. The graph classes are not guaranteed without a check to be a "valid ADMG/PAG/CPDAG/etc.". This is an issue that was raised in #67.
The reason the networkx design is "nice" imo (but not ideal) is that checking for validity of a certain "type of graph" is quite expensive. E.g. if we want to ensure that a constructed |
As an aside, seeing the implementation of CG and CPDAG as a subclass of DUGraph, I'm pondering the tradeoffs here:
If we can't arrive at a consensus here, we can discuss next week, or something? |
Apologies for taking a while to get back to this. This would be worth talking about actually, if you have some time tomorrow after the usual meeting. Or happy to discuss another time if that doesn't work. Regarding the proposal of DUGraph + all variants versus DBCUGraph, I am actually not super sure myself. I do think not supporting explicit graph types ADMG or CPDAG would be weird interface-wise, but perhaps the right way to think about this if we don't implement any algorithms inside the graphs themselves. For 1., maybe we can lazily evaluate the validity of the graph (e.g. algorithms don't assume it's a valid instance of the class)? I think you're right that e.g. checking validity at each step is probably not going to be efficient or very tedious to I think 2. makes the most sense design wise. Perhaps we can talk about potential API designs? |
Discussion with @jaron-lee we have decided the following strategy:
In the longer-term, we can consider removing all the specific causal graph class implementations, so that way it's more like networkx, but in the short-term, we can just rely on this approach to be internally lean, and rely on user-feedback to guide this decision. Feel free to add anything I missed. |
Check bnlearn, and Meek for the CPDAG definition. |
Is your feature request related to a problem? Please describe.
At the moment there is a graph class
pywhy_graphs.CPDAG
that handles graphs with directed and undirected edges. However, such graphs appear in other contexts throughout causal inference; for example, chain graphs. These graphs have the same representation but have a distinct and separate meaning.However, simply using the
CPDAG
class to represent chain graphs could be confusing as these concepts are distinct in the literature.Describe the solution you'd like
A simple way to address this is to rename the CPDAG class to something more generic. Unfortunately the correct term (Acyclic Directed Mixed Graph) already applies to graphs with bidirected edges in place of undirected edges; thus another term like "acyclic directed undirected graph" might be considered.
CPDAGs and chain graphs would use this new class to implement their graphical representations. The onus would be on the user to pass this to downstream functions in a way that makes sense (e.g. it would be up to the user to not take output from the PC algorithm and fit a chain graph model to it, for example).
Describe alternatives you've considered
Implement a separate chain graph class, with potential refactoring of the reusable parts for both into a parent class. This would leave recognizable API graph classes that might help users who are familiar with the literature to use the software.
The text was updated successfully, but these errors were encountered: