Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance Scheduler doesn’t stop RDS instance after MaintenanceWindow is completed. #384

Open
natalia-1702 opened this issue Mar 2, 2023 · 7 comments
Labels
enhancement triaged Has been triaged by solutions team

Comments

@natalia-1702
Copy link

Instance Scheduler doesn’t stop instance after MaintenanceWindow is completed.

I have deployed Main CF stack into central account, deployed remote account to all other accounts(workload accounts) in my AWS Organisation. So all RDS instances which I am scheduling are hosted in workload accounts. Regular schedules which I create work perfectly.

According to documentation “Instance Scheduler“ can start and stop RDS instances for MaintenanceWindows automatically. It starts instances at the start of MW and stops them at the end of MW.

But in my case this feature doesn’t work in 50% cases.

During MW RDS instances are started, but not stopped by Instance Scheduler in some reason.
I don’t see any errors in log files. Just nothing, no any event.

For example MW for TEST-RDS-INSTANCE instance 25/02/2023 00:00 - 00:30
RDS Instance has been started at 00:00 , but hasn’t been stopped at 00:30.


25/02/2023  00:00 INFO    : Maintenance window "RDS preferred Maintenance Window Schedule" used as running period found for instance TEST-RDS-INSTANCE



25/02/2023  00:00 DEBUG   : Listing instance RDS:TEST-RDS-INSTANCE (TEST-RDS-INSTANCE) in region ap-southeast-2 with instance type db.t3.small to be started by scheduler

25/02/2023  00:00 INFO    : Adding start tags [{'Key': 'status', 'Value': 'started'}] to instance arn:aws:rds:ap-southeast-2:<accountnumber>:db:TEST-RDS-INSTANCE

25/02/2023  00:00 INFO    : Starting instances RDS:TEST-RDS-INSTANCE (TEST-RDS-INSTANCE) in region ap-southeast-2

25/02/2023  00:00 INFO    : Scheduler result {'<accountnumber>': {'started': {'ap-southeast-2': [{'TEST-RDS-INSTANCE': {'schedule': 'rds-mon-fri0800-2300-sat-sunxxxx-2000-Brisbane'}}..........]}, 'stopped': {}}}


#DEBUG_SKIPPING_INSTANCE message
#https://github.com/aws-solutions/aws-instance-scheduler/blob/main/source/lambda/schedulers/rds_service.py
#looks like in some reason "IS" was trying to stop in at 00:12 instead of 00:30
25/02/2023  00:12 DEBUG   : Skipping rds instance TEST-RDS-INSTANCE because it is not in a start or stop-able state (maintenance)



#25/02/2023  00:30 
#there is no  "Stopping instances" event for this instance  25/02/2023  00:30 !!!!!!!!!!!!!!!!!!!!!!!!!!

Worth to mention:

  • Version of deployment - 1.4.1
  • At the same time at least 4 instances being started/ stopped during MW.
  • With some other instances this feature does work and instances are started and stopped after MW is completed…
  • Couldn’t find any dependency why some instances are stopped, why some are not.
@natalia-1702 natalia-1702 changed the title Instance Scheduler doesn’t stop instance after MaintenanceWindow is completed. Instance Scheduler doesn’t stop RDS instance after MaintenanceWindow is completed. Mar 2, 2023
@natalia-1702
Copy link
Author

More information.
link for documentation: https://s3.amazonaws.com/solutions-reference/aws-instance-scheduler/latest/instance-scheduler.pdf

image

example of schedule which I use:

RdsMonFri08002300SatSunxxxx2000Brisbane:
   Type: 'Custom::ServiceInstanceSchedule'
   Properties:
     Name: 'rds-mon-fri0800-2300-sat-sunxxxx-2000-Brisbane'
     NoStackPrefix: 'True'
     Description: "Some description"
     ServiceToken: >-
       arn:aws:lambda:ap-southeast-2:<accountnumber>:function:infrastructure-instance-scheduler-InstanceSchedulerMain
     Timezone: Australia/Brisbane
     UseMaintenanceWindow: 'True'
     Periods:
       - Description: mon-fri0800-2300
         BeginTime: '08:00'
         EndTime: '23:00'
         WeekDays: Mon-Fri
       - Description: sat-sunxxxx-2000
         EndTime: '20:00'
         WeekDays: Sat-Sun 

region: the only region I use: ap-southeast-2

@CrypticCabub CrypticCabub added the triaged Has been triaged by solutions team label Mar 2, 2023
@natalia-1702
Copy link
Author

natalia-1702 commented Mar 7, 2023

What we noticed:

it happens only with those instances which have schedules without starting time:
for example this RDS instance has following settings:

tag: schedule rds-mon-fri0800-2300-sat-sunxxxx-2000-Brisbane

RdsMonFri08002300SatSunxxxx2000Brisbane:
   Type: 'Custom::ServiceInstanceSchedule'
   Properties:
     Name: 'rds-mon-fri0800-2300-sat-sunxxxx-2000-Brisbane'
     NoStackPrefix: 'True'
     Description: "Some description"
     ServiceToken: >-
       arn:aws:lambda:ap-southeast-2:<accountnumber>:function:infrastructure-instance-scheduler-InstanceSchedulerMain
     Timezone: Australia/Brisbane
     UseMaintenanceWindow: 'True'
     Periods:
       - Description: mon-fri0800-2300
         BeginTime: '08:00'
         EndTime: '23:00'
         WeekDays: Mon-Fri
       - Description: sat-sunxxxx-2000
         EndTime: '20:00'
         WeekDays: Sat-Sun 

Maintenance window: Every Saturday 02:00 - 02:30 UTC+11

we expect on Saturday:
instance to be started at 02:00(MW), stopped at 02:30(MW), after that it will be stopped at 20:00(Brisbane time) if someone start it after 02:30.

But instead of that we get this:Instance is started at 02:00(MW started) and is stopped at 20:00(Brisbane time) by period. And it was working for 18 hours…

if we change MW time and put it after stopping time - it will work properly.

Is this behaviour bug or feature?

@CrypticCabub
Copy link
Member

Hi @nalalia-1702. The behavior you describe would be expected behavior that emerges due to the current scheduling logic.

Instance scheduler looks at each day individually, and a 1-sided stop schedule has the effect of splitting a day into two halves: an "any" period (in which the instance can be in any state) and a "stopped" period (in which the instance is expected to be off). Adding a maintenance window injects an extra running period into the middle of the existing schedule which results in a schedule that looks something like this:

any -- running -- any -- stopped

the result being exactly the behavior you describe. Whether this is a bug or a feature would be open to some interpretation, but I will bring it up with the rest of the team and add it as a backlog item for evaluation

@natalia-1702
Copy link
Author

natalia-1702 commented Mar 8, 2023

@CrypticCabub I get it , thank you.

To be honest would be really good to have possibility to have schedules like this:
Monday-Sunday:
1st period: starts at 6am, stops at 6pm.
2nd period: ( without starting), but stop at 10pm.

I believe it is very useful feature. If someone from the team powered on the instance between 6pm and 10pm - it will be stopped anyway at 10pm to save costs.

Thank you.

@matt-it-guy
Copy link

Any updates on this issue? We are seeing the same behaviour described in this ticket. We are on verison 1.5.6. For example we have a database that turns on for maintenance and it will stay on until the end time instead of turning off after maintenance.

@CrypticCabub
Copy link
Member

Hi @matt-it-guy -- do you also have the same scenario as I discussed with natalia above? (a 1-sided start/stop period on the same day as the maintenance window which causes an overlap).

That particular issue is a fundamental issue with the current implementation of 1-sided periods. The team is considering potential fixes but have not settled on a satisfactory approach at present.

However, If your scenario is different from the overlapping 1-sided period scenario, please let us know the specific configuration of your schedule so that we can dig into the issue further.

@matt-it-guy
Copy link

Thanks for your response. We have the same scenario as Natalia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement triaged Has been triaged by solutions team
Projects
None yet
Development

No branches or pull requests

4 participants