-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Discrepancy Issue Between Nebula Studio and Spark Connector #715
Comments
@wey-gu any thoughts on this? |
Could you please help provide what it's like to query the same data of the two? Like pattern of query etc. |
match (v1:vertex_a)-[:edge_a]->(v2:vertex_b) return count(*) spark.read.format( |
Could we assume all edge_a 's source tag is If not, there are not equivalent. In case yes, there are cases of dangling edge that lead to the difference between the two. If there are edge_a edges that with only edges being inserted but the src/dest vertices were not inserted, they are dangling edges, which could be scanned from the storage side(with spark) but cannot be scanned in some queries like Also, when possible. ref: https://docs.nebula-graph.io/3.6.0/8.service-tuning/2.graph-modeling/#about_dangling_edges |
Summary:
There is an observed difference in data/count when fetching data from Nebula Studio compared to using the Spark connector. However, the count matches when reading data with the Spark connector. The data in question has been written with the help of the Spark connector.
Steps to Reproduce:
Expected Behavior:
The data and count retrieved from Nebula Studio should match the data and count obtained through the Spark connector.
Actual Behavior:
There is a discrepancy in the data or count between Nebula Studio and the Spark connector, even though the count matches when using the Spark connector alone.
Environment Details:
The text was updated successfully, but these errors were encountered: