Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] - Can gocron read from a persistent storage the scheduled jobs and schedule #533

Open
varsraja opened this issue Aug 2, 2023 · 28 comments
Labels
enhancement New feature or request

Comments

@varsraja
Copy link

varsraja commented Aug 2, 2023

Is your feature request related to a problem?

We are running gocron in a containerized environment where jobs are scheduled based on rest api inputs.
We would like to use some sort of persistence to store the received jobs details like frequency etc.
When the container restarts / the instance where container is running restarts we would like the gocron scheduler to fetch
the existing job status and schedule the next run of each job accordingly.

Describe the solution you'd like

Solution would be that on a restart , the scheduler will be able load data from persistent storage and continue the subsequent runs as it would have if system hadnt rebooted/restarted. The scheduler would update the last run time in the persistent storage so that on fresh startup, it can recalculate the next run time.

Describe alternatives you've considered

Have a wrapper function, which would be responsible for updating persistent storage on when the run was last scheduled. On fresh restart , it would load the scheduler by recalibrating the start time from the last run time and add the jobs.

Additional context

@varsraja varsraja added the enhancement New feature or request label Aug 2, 2023
@varsraja
Copy link
Author

Any update on this ?

@JohnRoesler
Copy link
Contributor

@varsraja I'm not opposed to having this feature in gocron. If it were to be done, I'd like to have an interface that multiple databases could be implemented for (redis, etc.).

The implementation in gocron, I would think something like:

  • scheduler starts, loads entries from storage, and loops through updating the lastRun time of the jobs, and the nextRun if the nextRun is in the future (i think if it's in the past, it would be then set to zero time, and the scheduler would pick the right next run time)
  • each time a job is run, the scheduler would write the job's run details to the store

What details would be required to be stored? A unique identifier for the job - use the job Name field that the locker uses. The last run and next run times of the job.

@varsraja
Copy link
Author

@JohnRoesler Appreciate the response. I was also looking for something along the lines you have suggested.
A unique job identifier would be required if we need to have a primary key in database, last run, next run details as well.
Basically, the go.Cron Job details would be required to be stored, I guess.

@JohnRoesler
Copy link
Contributor

Good news, the latest release added uuids for jobs, so we now have unique identifiers. The fields on the job struct are private, so we'll just want to make a new struct, something like JobStorage, that has public fields for the things that will need to be stored. Then when the job occurs, we could instantiate a JobStorage struct and save it to the database via the interface that will be defined.

@JohnRoesler
Copy link
Contributor

Hm, or perhaps, since the interface will be within the gocron project, it can just accept a Job and then convert it to a JobStorage object for a sql implementation 🤔 Some poking around at the implementation will help flush out some of these details

@varsraja
Copy link
Author

Is it possible to store as file (json dump) as well apart from databases.
Is there a rough estimate on how much days this could take for implementation.
I would love to experiment with the initial cut.

@JohnRoesler
Copy link
Contributor

Is it possible to store as file (json dump) as well apart from databases.

Certainly. I think the beauty of the interface is that it can implemented in whatever way you'd like. As far as the implementation, please do have a go if you'd like!

@tricknife
Copy link

looking forward to this feature. I'm using tag and uuid to manage jobs, which is very inconvenient.

@JohnRoesler JohnRoesler added the v1 label Oct 28, 2023
@4zore4
Copy link

4zore4 commented Nov 8, 2023

Is it possible to store as file (json dump) as well apart from databases.

Certainly. I think the beauty of the interface is that it can implemented in whatever way you'd like. As far as the implementation, please do have a go if you'd like!

I wonder if I can try to contribute to this?

@JohnRoesler
Copy link
Contributor

@4zore4 if you are interested in contributing - let's look at adding it to the v2 branch (as that's the future 😄)

@pkoukk pkoukk mentioned this issue Nov 9, 2023
2 tasks
@4zore4
Copy link

4zore4 commented Nov 9, 2023

看看

Ok, I will try to add this feature in the v2 version

@JohnRoesler
Copy link
Contributor

@4zore4 I think having a separate struct for the job loading - that isn't the internalJob or public Job would be best. You'll need to consider which fields from the job are important to store/load

My initial thoughts on on what you need and don't need from the internalJob

type internalJob struct {
-	ctx    context.Context~
-	cancel context.CancelFunc~
+	id     uuid.UUID
+	name   string
+	tags   []string
+	jobSchedule
+	lastRun, nextRun   time.Time
+	function           any
+	parameters         []any
-	timer              clockwork.Timer
+	singletonMode      bool
+	singletonLimitMode LimitMode
~	limitRunsTo        *limitRunsTo // for this to be useful, you'd also have to store the # of runs 
					// when the scheduler is shutting down
~	startTime          time.Time // this isn't useful beyond the initial run
~	startImmediately   bool // this isn't useful beyond the initial run - but if you set
					// start immediately, would you want your job to also start
					/ /immediately when a new scheduler pod started? I don't
					// think so, you'd want it to continue as close to where it left
					// off as possible.
	// event listeners
+	afterJobRuns          func(jobID uuid.UUID)
+	beforeJobRuns         func(jobID uuid.UUID)
+	afterJobRunsWithError func(jobID uuid.UUID, err error)
}

Another thing we need to make sure is handled - is when scheduling the next run, if the lastRun is far enough in the past that the next run is also in the past. I don't think v2 handles that yet.

@4zore4
Copy link

4zore4 commented Nov 10, 2023

Your idea is very good, as you said whether to execute expired tasks or not, I feel that this choice needs to be given to the user.

I'll try to write a demo at the end of the week, if I'm not lazy.

It is worth mentioning that I have used this library in my company's projects. Thank you very much for your contribution

@JohnRoesler JohnRoesler added the v2 label Nov 27, 2023
@JohnRoesler JohnRoesler removed the v1 label Dec 21, 2023
@kyriakid1s
Copy link

Any update on this?

@kyriakid1s
Copy link

kyriakid1s commented Jan 3, 2024

@4zore4 I think having a separate struct for the job loading - that isn't the internalJob or public Job would be best. You'll need to consider which fields from the job are important to store/load

My initial thoughts on on what you need and don't need from the internalJob

type internalJob struct {
-	ctx    context.Context~
-	cancel context.CancelFunc~
+	id     uuid.UUID
+	name   string
+	tags   []string
+	jobSchedule
+	lastRun, nextRun   time.Time
+	function           any
+	parameters         []any
-	timer              clockwork.Timer
+	singletonMode      bool
+	singletonLimitMode LimitMode
~	limitRunsTo        *limitRunsTo // for this to be useful, you'd also have to store the # of runs 
					// when the scheduler is shutting down
~	startTime          time.Time // this isn't useful beyond the initial run
~	startImmediately   bool // this isn't useful beyond the initial run - but if you set
					// start immediately, would you want your job to also start
					/ /immediately when a new scheduler pod started? I don't
					// think so, you'd want it to continue as close to where it left
					// off as possible.
	// event listeners
+	afterJobRuns          func(jobID uuid.UUID)
+	beforeJobRuns         func(jobID uuid.UUID)
+	afterJobRunsWithError func(jobID uuid.UUID, err error)
}

Another thing we need to make sure is handled - is when scheduling the next run, if the lastRun is far enough in the past that the next run is also in the past. I don't think v2 handles that yet.

Hello, happy new year,

About the startTime field, i believe that it's useful to add it to jobStorage struct, cause if a user want to execute it (with OneTimeJob function ) in 2 hours and scheduler shut down in this period, the job will be lost.

I'm trying to implement this feature on this project, i am not a very experienced programmer but, you know, i'm trying

@JohnRoesler
Copy link
Contributor

Another thought - to make storing it the simplest - I think looking into converting the job export structure to some sort of string could be worth while. Then the export would be to a string and it would import from a string and decode that string into jobs. Or slice of strings...so it's not really long in the event of many many jobs.

@kyriakid1s
Copy link

Any thoughts about how saving the function? I am thinking about saving only the function name.

@4zore4
Copy link

4zore4 commented Jan 6, 2024

Reference in new i

Sorry, I haven't updated it yet, because the company is busy near the end of the year. But I think your idea is great and consistent with mine, and I have implemented the demo.

func Test_Job(t *testing.T) {
	var redisJob redisJob_test.RedisJob
	methodmap := initFun(redisJob)
	j, _, _, _ := newJob(redisJob.TestReflect, methodmap)
	// each job has a unique id
	fmt.Println(j.ID())

	for {
	}
}

func newJob(function func(), methodMap map[string]reflect.Value) (gocron.Job, error, string, map[string]string) {
	methodMap1 := make(map[string]string)

	// Gets a pointer to a function
	funcPtr := reflect.ValueOf(function).Pointer()

	// Gets the name of the function
	funcName := runtime.FuncForPC(funcPtr).Name()
	fmt.Println(funcName)

	for methodName, _ := range methodMap {
		if strings.HasPrefix(funcName, methodName) {
			methodMap1[funcName] = methodName
		}
		fmt.Println(methodName)
	}

	s, err := gocron.NewScheduler()
	if err != nil {
		// handle error
	}
	j, err := s.NewJob(
		gocron.DurationJob(
			10*time.Second,
		),
		gocron.NewTask(function),
	)
	s.Start()

	return j, err, funcName, methodMap1

}

func initFun(redisJob redisJob_test.RedisJob) map[string]reflect.Value {
	methodMap := make(map[string]reflect.Value)
	objValue := reflect.ValueOf(&redisJob)

	objType := objValue.Type()

	for i := 0; i < objType.NumMethod(); i++ {
		method := objType.Method(i)

		funcPtr := method.Func.Pointer()
		methodValue := objValue.MethodByName(method.Name)

		// Use the runtime package to get the name of the function
		funcName := runtime.FuncForPC(funcPtr).Name()
		methodMap[funcName] = methodValue
		log.Println(funcName)
	}
	return methodMap
}

@kyriakid1s
Copy link

What fields the redisJob has ?

@pcfreak30
Copy link
Contributor

I am also very interested in this and may end up implementing it with gorm/mysql. Need a background task queue that can survive shutdowns and be distributed long term.

@pcfreak30
Copy link
Contributor

pcfreak30 commented Feb 25, 2024

I have been thinking about this while working on other components in my project, and while the example @4zore4 uses pointers, it will not work at scale IMHO.

My thought jumped to using a wrapper package on a scheduler, which I already have, to use the lock and elector system and manage all jobs.

You must create many job names or types and register them to task functions that handle them. You could then store this, load all tasks up on boot, and go where you left off. You can't store the job struct data in memory, especially with multiple nodes running (function pointers). So, some "job manager" abstraction is needed for this.

I'm open to thoughts on how this might be designed, but I'll likely end up with an MVP for my needs, which is at least MIT, so others can use it as an example before I put any effort into making it reusable.

@4zore4 @JohnRoesler @varsraja

@JohnRoesler JohnRoesler removed the v2 label May 24, 2024
@pcfreak30
Copy link
Contributor

I thought I would provide an update. I have forked go-cron some and can prob create a PR soon with the change, (added WithIdentifier to set UUID).

But... I have implemented a cron system abstraction here https://github.com/LumeWeb/portal/blob/e44bd0f59300b2d7ee164cef4714543639a65c48/service/cron.go.

Overall I think it makes the most sense to just create a layer on-top vs try to make the library directly support it.

Kudos!

@JohnRoesler
Copy link
Contributor

@pcfreak30 thanks for sharing that! Yes, I agree with your sense that having it be separate would be the best. Then it can wrap gocron as the core scheduling library without introducing a bunch of complexity that many won't need/use.

@JohnRoesler
Copy link
Contributor

For the purposes of restoring jobs from a data store, I think we'll need a method, perhaps a JobOption(s) that supports setting attributes of the job, such as LastRun and NextRuns, similar to the newly added WithIdentifier that allows setting the UUID of the job. Yay/nay?

@pcfreak30
Copy link
Contributor

See https://github.com/LumeWeb/portal/blob/e0caec59acc68a5be80535add4b1b9f32747e0dd/service/cron.go#L94 for inspiration on what I am doing atm.

If you have any thoughts on how your idea or another could refactor this code to be better, im all ears :).

@Nikola-Milovic
Copy link

Hey, having a persistence layer can also help with distributed locks, since most storage solutions have functionality around locking.

What's the status of this proposal? Having gocron integrated with postgres/ other storage solutions would open up a ton of possibilities.

@pcfreak30
Copy link
Contributor

pcfreak30 commented Dec 3, 2024

@Nikola-Milovic I understand what you mean but I concluded that its better as an abstraction layer. You can see what I have ended up at with https://github.com/LumeWeb/portal/blob/ee3347796fbbb8d42657f08994255d296998a056/service/cron.go (prob still has bugs).

my system supports redis and uses gorm for the authoritative data. you can take inspiration from it. I also have several PR's open for things I need to implement what ive done, and I run on a fork atm.

@pcfreak30
Copy link
Contributor

You could probably create a dedicated higher level package focusing on this. gocron IMHO is best as a lower level lego.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants