Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utilities module and functions for reindexing #101

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

eliotjordan
Copy link
Member

@eliotjordan eliotjordan commented Sep 25, 2024

Adds example function for resetting processor marker for an indexing stage and restarting the consumer.

refs #47

Copy link

github-actions bot commented Sep 25, 2024

Container Scanning Status: ✅ Success


ghcr.io/pulibrary/dpul-collections:pr-101 (debian 12.6)
=======================================================
Total: 0 (HIGH: 0, CRITICAL: 0)

@eliotjordan eliotjordan force-pushed the reindex-helpers branch 2 times, most recently from 1a8e124 to c9ae6d8 Compare September 27, 2024 19:02
@eliotjordan eliotjordan changed the base branch from main to i95-cache-version September 27, 2024 19:03
@eliotjordan eliotjordan changed the title Add reindex module and function to reindex solr from cache Add utilities module and functions for reindexing Sep 27, 2024
@eliotjordan eliotjordan marked this pull request as draft September 27, 2024 19:03
Copy link
Contributor

@tpendragon tpendragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More discussion points than specific changes.

GenServer.stop(Figgy.HydrationConsumer)
end

def reindex_all(cache_version) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reset the state of everything if want to reindex? Like if we want a full re-hydration, can't we just reset the hydrator? And if we want a full retransformation without hydration, we just reset the transformer?

IndexingPipeline.get_processor_marker!("figgy_transformer", cache_version)

IndexingPipeline.delete_processor_marker(transformation_processor_marker)
GenServer.stop(Figgy.TransformationConsumer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Broadway.stop?

alias DpulCollections.IndexingPipeline
alias DpulCollections.IndexingPipeline.Figgy

def reindex_solr(cache_version) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these functions be part of the Consumer namespace? So like, Figgy.IndexingConsumer.Restart() or Reset() or ResetTo(nil) or something?

Base automatically changed from i95-cache-version to main September 27, 2024 19:44
@tpendragon
Copy link
Contributor

Update: I tried doing it this way while I was reindexing for description and it didn't work - if indexing is running, it writes the marker between me updating and restarting the node, no matter how quickly I went. I think we need to stop the producer in-band somehow.

@hackartisan
Copy link
Member

Update: I tried doing it this way while I was reindexing for description and it didn't work - if indexing is running, it writes the marker between me updating and restarting the node, no matter how quickly I went. I think we need to stop the producer in-band somehow.

I feel like we should be able to start these broadway pipelines as named processes so that we can shut them down by referencing them with an atom instead of needing a PID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants