Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include charts' parent tags in the algolia index #3790

Merged
merged 2 commits into from
Jul 29, 2024

Conversation

ikesau
Copy link
Member

@ikesau ikesau commented Jul 12, 2024

Part of #3781

Adds a new function that finds all ancestors of a given tag and includes them in a chart's record when we index to Algolia.

Examples

  • A chart record that used to have a tag value of ["Cardiovascular Diseases"] will now have ["Cardiovascular Diseases", "Health"]
  • A chart record that used to have a tag value of ["Indoor Air Pollution", "CO2 & Greenhouse Gas Emissions"] will now have ["Indoor Air Pollution", "CO2 & Greenhouse Gas Emissions", "Air Pollution", "Health", "Energy and Environment"]

Example output of getParentTagsByChildName

@owidbot
Copy link
Contributor

owidbot commented Jul 12, 2024

Quick links (staging server):

Site Admin Wizard

Login: ssh owid@staging-site-index-chart-parent-tags

SVG tester:

Number of differences (default views): 0 ✅
Number of differences (all views): 0 ✅

Edited: 2024-07-12 22:27:06 UTC
Execution time: 1.10 seconds

@larsyencken larsyencken self-requested a review July 15, 2024 09:25
Copy link
Contributor

@larsyencken larsyencken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but suggestion to push more into SQL.

db/db.ts Show resolved Hide resolved
db/db.ts Show resolved Hide resolved
])
}

trackParents(tagGraph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this code works, but the structure of how we're building and working with this graph is more complex than it needs to be.

For example, you need to build the graph twice, once flat, once with the object model, and you also need to do a second query to fetch the tag names. You have to rewrite the parents multiple times instead of computing it just once.

A more standard graph representation would be something like idToNode (and maybe idToChildIds or idToParentId), and would make the subsequent traversal much easier. Then you also skip entirely the fake root node.

In this case, the flat tag graph is basically idToChildNodes, plus the root id. The graph would be better if it was just idToNode, ungrouped, since the node already contains the parent id. Let's imagine that scenario.

const idToNode = getFlatTagGraph(trx)

const tagNameToParentTags = {}
for (const node of Object.values(idToNode)) {
  const name = node.name
  const parents = []

  let parentId = node.parentId
  while (parentId) {
    const parent = idToNode[parentId]
    parents.append(name)
    parentId = parent.parentId
  }

  tagNameToParentTags[name] = parents
}

It becomes easy to iterate across every node, and easy to follow the parent chain for every node too.

How much would that mess up other code elsewhere?

Copy link
Member Author

@ikesau ikesau Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe #3800 might be adding some confusion here 😛

A few things:

  1. With the tag_graph table (not the tags' parentId column) a node can have multiple parents:
image
  1. The fake root node was added as a simple way to coexist with a tags table that has non-topic/area tags in it. Anything that's a child of the root tag node is part of the graph. Things that aren't (e.g. Abstract) can be ignored.

  2. If I hadn't already written code for the tag graph UI, I could have written this function with idToParentIds and idToNode maps, but given I had those 2 functions already (which, if you squint, create the same data structure) it seemed simpler at the time to do it this way.

We should have a call about this to make sure I'm understanding you correctly though. 🙂

@ikesau ikesau merged commit 6351575 into data-catalog-algolia Jul 29, 2024
23 checks passed
@ikesau ikesau deleted the index-chart-parent-tags branch July 29, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants