Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate code to the new graph structure #15

Closed
ichiriac opened this issue Feb 5, 2017 · 3 comments
Closed

Migrate code to the new graph structure #15

ichiriac opened this issue Feb 5, 2017 · 3 comments
Assignees

Comments

@ichiriac
Copy link
Member

ichiriac commented Feb 5, 2017

Rewrite every node with the new graph code.

This task will make possible these tasks :

@ichiriac
Copy link
Member Author

ichiriac commented Feb 12, 2017

HowTo :

The repository is a graph database. The main problem here is that it's hard to separate nodes relations in order to serialize data. Lets say :

<?php // file1.php
class foo { /* .... */ }

And another file :

<?php // file2.php
class bar extends foo { /* .... */ }

Here the structure of nodes :

REPOSITORY [
  [ FILE1.PHP ]
  [ FOO : CHILD OF FILE1.PHP; EXTENDED BY BAR ]
  [ FILE2.PHP ]
  [ BAR : CHILD OF FILE2.PHP; EXTENDS FOO ]
]

Lets say we want to serialize into separate structures FILE1.PHP and FILE2.PHP. We could implement a node traversal in order to also extract childs of this node. The main problem is references. That's a problem because points are an array based on positions, so the loading orders may break logic.

As the lazy loading is based on files entries, each relation should be related also to them. Another problem, is that we could move, rename, or copy/paste a class definition, so the class may change from a file. The relation is weak, we can't locate with precision a related node, we must pass with an intermediate lookup system.

Why lazy loading is a must have : projects could be huge, with tons of symbols, so putting everything in memory is not the best option. The way to go is to build memory from a caching structure, and load shards of data when the system require them.

  • Weak nodes relations may pass be indexes, same for reverse lookup
  • The index must be loaded at start and attach files with their symbols
  • Each node may have an UUID value so the references could be stateless.

I feel like a generic graph solution is not the best bet 👎

@ichiriac
Copy link
Member Author

good reading : http://highscalability.com/unorthodox-approach-database-design-coming-shard

Data are denormalized. Traditionally we normalize data. Data are splayed out into anomaly-less tables and then joined back together again when they need to be used. In sharding the data are denormalized. You store together data that are used together.

This doesn't mean you don't also segregate data by type. You can keep a user's profile data separate from their comments, blogs, email, media, etc, but the user profile data would be stored and retrieved as a whole. This is a very fast approach. You just get a blob and store a blob. No joins are needed and it can be written with one disk write.

My approach is bad because I want to keep data normalized, need to try something 😄

@ichiriac
Copy link
Member Author

done 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant