Skip to content

Commit

Permalink
changes
Browse files Browse the repository at this point in the history
  • Loading branch information
emilydunenfeld committed Feb 5, 2024
1 parent 163815f commit 06194a1
Show file tree
Hide file tree
Showing 37 changed files with 414 additions and 332 deletions.
32 changes: 19 additions & 13 deletions docs/aws/concepts/autoscaling.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,51 @@
title: Autoscaling
title: Auto Scaling

The best way to optimize costs in the cloud is to not spend it in the first place. Enter Autoscaling. Autoscaling leverages the elasticity of the cloud to dynamically provision and remove capacity based on demand. That means that as demands decrease autoscaling will automatically scale down resources and allow you to save on costs accordingly.
The best way to optimize costs in the cloud is to not spend it in the first place. Enter Auto Scaling. Auto Scaling leverages the elasticity of the cloud to dynamically provision and remove capacity based on demand. That means that as demands decrease, Auto Scaling will automatically scale down resources and enable you to save on costs accordingly.

Autoscaling applies to a variety of different services, some of which are described in more detail below. If you're looking for EC2 autoscaling concepts, please see the AWS EC2 service page for the [autoscaling section](/aws/services/ec2-pricing/#autoscaling).
Auto Scaling applies to a variety of different services, some of which are described in more detail below. If you're looking for EC2 Auto Scaling concepts, please see the AWS EC2 service page for the [Auto Scaling section](/aws/services/ec2-pricing/#auto-scaling).


## Application Autoscaling
## Application Auto Scaling

For other resources in AWS, [Application Autoscaling](https://docs.aws.amazon.com/autoscaling/application/userguide/what-is-application-auto-scaling.html) provides the ability to adjust provisioned resources.
For other resources in AWS, [Application Auto Scaling](https://docs.aws.amazon.com/autoscaling/application/userguide/what-is-application-auto-scaling.html) provides the ability to adjust provisioned resources.

Application Autoscaling supports the following services:
Application Auto Scaling supports the following services:

* AppStream 2.0 fleets
* Aurora replicas
* Amazon Comprehend document classification and entity recognizer endpoints
* [DynamoDB](/aws/services/dynamodb-pricing/) tables and global secondary indexes
* [Amazon Elastic Container Service (ECS)](/aws/services/ecs-and-fargate-pricing/) services
* ElastiCache for Redis clusters (replication groups)
* Amazon EMR clusters
* Amazon Keyspaces (for Apache Cassandra) tables
* [Lambda](/aws/services/lambda-pricing/) function provisioned concurrency
* Amazon Managed Streaming for Apache Kafka (MSK) broker storage
* Amazon Neptune clusters
* SageMaker endpoint variants
* SageMaker inference components
* SageMaker Serverless provisioned concurrency
* Spot Fleet requests
* Custom resources that are provided by your own applications or services.

## Autoscaling Strategies
## Auto Scaling Strategies

There are various methods by which autoscaling can occur. These are listed below in no particular order:
There are various methods by which Auto Scaling can occur. These are listed below in no particular order:

* **Target Scaling** adds or removes capacity to keep a metric as near a specific value as possible. For example, target average CPU utilization of 50% across a set of ECS Tasks. If CPU utilization gets too high, add nodes. If CPU utilization gets too low, remove nodes.
* **Step Scaling** will adjust capacity up and down by dynamic amounts, depending on the magnitude of a metric.
* **Target Scaling** adds or removes capacity to keep a metric as close to a specific value as possible. For example, a target average CPU utilization of 50% across a set of ECS Tasks. If CPU utilization gets too high, nodes are added. If CPU utilization gets too low, nodes are removed.
* **Step Scaling** will adjust capacity up and down by dynamic amounts depending on the magnitude of a metric.
* **Scheduled Scaling** will adjust minimum and maximum capacity settings on a schedule.
* **Simple Scaling** will add or remove EC2 instances from an Auto Scaling Group when an alarm is in alert state.
* **Simple Scaling** will add or remove EC2 instances from an Auto Scaling Group when an alarm is in an alert state.
* **Predictive Scaling** can leverage historical metrics to preemptively scale EC2 workloads based on daily or weekly trends.
* **Manual Scaling** is possible with EC2 instances if teams need to intervene with an autoscaling group. This allows you to manually adjust the autoscaling target without any automation.
* **Manual Scaling** is possible with EC2 instances if teams need to intervene with an Auto Scaling Group. This allows you to manually adjust the Auto Scaling target without any automation.

## Other Considerations

Adding capacity is generally an easy process. For compute, it's just a matter of launching new workers from static images or automated standup processes.

Reducing capacity can be tricky depending on the application. Web applications generally have their requests clean up to prepare for termination within 30 seconds. Load balancers are often used to drain requests off instances and then terminate the instances "cleanly". Queue/batch workers, on the other hand, need to be done with their work, or stash their work somewhere before the node can be terminated. Otherwise, requests and/or data can be lost or incomplete.

DynamoDB Provisioned Capacity has [restrictions](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html) regarding how frequently it can be reduced (4 times per day at any time, plus any time when there hasn't been a reduction in the last hour). There are no restrictions regarding increasing capacity. Tables and Secondary Indexes are managed/scaled independently.
DynamoDB Provisioned Capacity has [restrictions](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html) regarding how frequently it can be reduced (four times per day at any time, plus any time when there hasn't been a reduction in the last hour). There are no restrictions regarding increasing capacity. Tables and Secondary Indexes are managed/scaled independently.

Scaling cooldown can be the trickiest part of the process. It's generally best to aggressively scale up/out and conservatively scale down/in. A long cooldown process might be necessary when scaling out an application with a long startup process, but it can also block future scale out events, resulting in application instability. Scaling policies should be regularly evaluated and tuned.

Expand Down
10 changes: 5 additions & 5 deletions docs/aws/concepts/credits.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
title: Credits on AWS

Most public cloud infrastructure and service providers have a concept of credits. Credits are incentives typically given to customers opening up new accounts to attract them to build upon their platform. They allow you you to build, learn and get integrated into providers without have to spend money right from the beginning.
Most public cloud infrastructure and service providers have a concept of credits. Credits are incentives typically given to customers opening up new accounts to attract them to build upon their platform. They allow you to build, learn, and integrate into providers without having to spend money right away.

Credit allotments usually are around $5,000 or $10,000 depending on the provider but can be as high as $100,000.
Credit allotments usually are around $5,000 or $10,000 depending on the provider, but can be as high as $100,000.

## Startup Credits Across Clouds

One strategy that is often used for especially cost-conscious startups for public cloud infrastructure providers who have the ability to easily move workloads is to receive credits from multiple providers and run workloads across different providers until credits expire across all of them. So for example, a startup may get $10,000 of AWS credits and $10,000 of GCP credits. A subset of customers will run their application on AWS until their $10,000 is completely utilized then migrate to GCP to use up $10,000 worth of credits there to get $20,000 in total free usage.
One strategy that is often used for especially cost-conscious startups utilizing public cloud infrastructure providers, who have the ability to easily move workloads, is to receive credits from multiple providers and run workloads across different providers until credits expire across all of them. So for example, a startup may get $10,000 of AWS credits and $10,000 of GCP credits. A subset of customers will run their application on AWS until their $10,000 is completely utilized then migrate to GCP to use up $10,000 worth of credits there to get $20,000 in total free usage.

Typically this is advised against because the operational overhead of running workloads across multiple clouds typically isn't worth it. The use-cases that this tends to work for is for very transferable or ephemeral workloads such as training models on GPUs or running containers with no associated state.
Typically, this is advised against because the operational overhead of running workloads across multiple clouds typically isn't worth it. The use cases that this tends to work for are very transferable or ephemeral workloads such as training models on GPUs or running containers with no associated state.

## Credit Expiration

It's important to note that credits typically have a lifecycle tied to them that causes them to expire. Oftentimes this catches customers by surprise. Usually credits are granted on a 1 year basis which means if you have remaining credits that aren't utilized by the expiration term, they're automatically removed from your account. It's important to keep track of your credit expiration dates as to not be caught off-guard.
It's important to note that credits typically have a lifecycle tied to them that causes them to expire. Oftentimes, this catches customers by surprise. Usually, credits are granted on a one-year basis, which means if you have remaining credits that aren't utilized by the expiration term, they're automatically removed from your account. It's important to keep track of your credit expiration dates, so you are not caught off-guard.


!!! Contribute
Expand Down
16 changes: 8 additions & 8 deletions docs/aws/concepts/io-operations.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
title: I/O Operations (IOPS) on AWS | Cloud Cost Handbook

## I/O Operations
## Input/Output Operations

I/O Operations (IOPS) are a relatively low level unit in AWS for measuring disk performance. The maximum size of an IOP is 256 KiB for SSD volumes and 1 GiB for HDD volumes. 1 GiB of storage is worth 3 IOPS so a 1,000 GiB EBS Volume has 3,000 IOPS available. When using these volume types you are charged for the amount of provisioned iops even if you don't fully utilize them.
Input/output operations per second (IOPS) are a relatively low-level unit in AWS for measuring disk performance. The maximum size of an IOP is 256 KiB for SSD volumes and 1 GiB for HDD volumes. 1 GiB of storage is worth 3 IOPS, so a 1,000 GiB EBS Volume has 3,000 IOPS available. When using these volume types you are charged for the amount of provisioned IOPS even if you don't fully utilize them.

As indicated on the [EBS](/aws/services/ebs-pricing) page:

> Provisioned IOPS SSD volumes use a consistent IOPS rate, which you specify when you create the volume, and Amazon EBS delivers the provisioned performance 99.9 percent of the time.
> Provisioned IOPS SSD volumes use a consistent IOPS rate, which you specify when you create the volume, and Amazon EBS delivers the provisioned performance 99.9% of the time.
The ["performance consistency"](https://blog.maskalik.com/blog/2020/05/31/aws-rds-you-may-not-need-provisioned-iops/) between a Provisioned IOPS volume and a general purpose (`gp2`, `gp3`), throughput optimized (`st1`), or cold HDD (`sc1`) is going to be better for both random and sequential disk access. Note that for operations with "large and sequential" accesses, provisioned iops are likely less efficient than an `st1` volume.
The [performance consistency](https://blog.maskalik.com/blog/2020/05/31/aws-rds-you-may-not-need-provisioned-iops/) between a Provisioned IOPS volume and a general purpose (`gp2`, `gp3`), throughput optimized (`st1`), or cold HDD (`sc1`) is going to be better for both random and sequential disk access. Note that for operations with large and sequential accesses, provisioned IOPS are likely less efficient than a `st1` volume.

## IOPS Considerations

- **Volume Type** There are multiple volume types with different impacts on IOPS.
- **I/O Demand** Most likely the workload has a bursty demand pattern, where consistently high throughput is not as important as meeting spikes of demand. As the workload deviates from this, provisioned IOPS become more important.
- **Throughput Limits** The instance will have an upper limit of throughput it can support. For example, an [i2.xlarge](https://instances.vantage.sh/aws/ec2/i2.xlarge.html) can support up to 62,500 IOPS. If the number of Provisioned IOPS is even higher than this limit, it is a waste because the instance cannot use them all up.
- **Volume Type:** There are multiple volume types with different impacts on IOPS.
- **I/O Demand:** Most likely the workload has a bursty demand pattern, where consistently high throughput is not as important as meeting spikes of demand. As the workload deviates from this, provisioned IOPS become more important.
- **Throughput Limits:** The instance will have an upper limit of throughput it can support. For example, an [i2.xlarge](https://instances.vantage.sh/aws/ec2/i2.xlarge.html) can support up to 62,500 IOPS. If the number of Provisioned IOPS is even higher than this limit, it's a waste, because the instance cannot use them all up.

## Optimal Provisioned IOPS

The most common cost waste with IOPS is having too many of them. It is commonly believed that the key to [RDS](/aws/services/rds-pricing/) is to have some amount of Provisioned IOPS. Happily, we do not have to guess.
The most common cost waste with IOPS is having too many of them. It is commonly believed that the key to [RDS](/aws/services/rds-pricing/) is to have some amount of Provisioned IOPS. Luckily, we don't have to guess.

AWS [suggests](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html) inspecting the `VolumeQueueLength` metric for [CloudWatch](/aws/services/cloudwatch-pricing/). This metric is reported as IOPS, which means the formula is simple: if `VolumeQueueLength` is greater than the number of provisioned IOPS and latency is an issue, then you should consider increasing the number of provisioned IOPS.

Expand Down
2 changes: 1 addition & 1 deletion docs/aws/concepts/regions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
title: Regions Pricing

Pricing for public cloud infrastructure providers typically varies by geographic region. Depending on the nature of your applications, you may not have a choice but to be located as close to your users as possible for latency purposes. That being said, it is worth looking at pricing on a per region basis as there can be significant discounts on a per-region basis.
Pricing for public cloud infrastructure providers typically varies by geographic region. Depending on the nature of your applications, you may not have a choice but to be located as close to your users as possible for latency purposes. That being said, it is worth looking at pricing on a per region basis, as there can be significant discounts on a per region basis.

The [Instances](https://instances.vantage.sh/) pricing tool has prices for popular AWS services in all regions. To see a list of AWS regions, consult this reference [list of AWS regions](/aws/reference/aws-regions).

Expand Down
8 changes: 4 additions & 4 deletions docs/aws/concepts/reserved-instances.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
title: Reserved Instances

Reserved Instances (oftentimes referred to as their abbreviation of RIs) are one of the most popular and high-impact cost reduction methods you can leverage for cutting your bill. Reserved Instances give you the ability to pay upfront for certain AWS services to receive a discount. As a result, if you are able to profile usage across your AWS account and know that you'll hit certain usage levels, Reserved Instances can typically save you money.
Reserved Instances (RIs) are one of the most popular and high-impact cost-reduction methods you can leverage for cutting your bill. Reserved Instances give you the ability to pay upfront for certain AWS services to receive a discount. As a result, if you are able to profile usage across your AWS account and know that you'll hit certain usage levels, Reserved Instances can typically save you money.

Reserved Instances are available to a variety of AWS services such as [EC2](../services/ec2-pricing.md), [ElastiCache](../services/elasticache-pricing.md) and [RDS](../services/rds-pricing.md). AWS Billing automatically applies your Reserved Instance discounted rate when attributes of your instance usage match attributes of an active Reserved Instance. For general compute usage (EC2, Fargate, etc.), [Savings Plans](savings-plans.md) are _always_ preferred to Reserved Instances as they give you the same discount but are more flexible across all compute.
Reserved Instances are available to a variety of AWS services such as [EC2](../services/ec2-pricing.md), [ElastiCache](../services/elasticache-pricing.md), and [RDS](../services/rds-pricing.md). AWS Billing automatically applies your Reserved Instance discounted rate when attributes of your instance usage match attributes of an active Reserved Instance. For general compute usage (EC2, Fargate, etc.), [Savings Plans](savings-plans.md) are _always_ preferred to Reserved Instances, since they give you the same discount but are more flexible across all compute.

It's important to note that Reserved Instances aren't actually separate instances. They are merely financial instruments that you buy and are automatically applied to your account. As a result, you can continue to spin up and use on-demand instances and purchase Reserved Instances concurrently. As on-demand instances match your Reserved Instance attributes, you'll automatically receive discounts.
It's important to note that Reserved Instances aren't actually separate instances. They are merely financial instruments that you buy and are automatically applied to your account. As a result, you can continue to spin up and use On-Demand Instances and purchase Reserved Instances concurrently. As On-Demand Instances match your Reserved Instance attributes, you'll automatically receive discounts.

## Reserved Instance Term

AWS gives different discounts depending on the term that you pay upfront for. You can yield greater savings for paying upfront for longer terms but lose flexibility as a result. We find that smaller customers just getting started in their infrastructure journey tend to prefer 1-Year Reserved Instances whereas more mature organizations will leverage 3-Year Reserved Instances for the greatest savings as they can more accurately model and predict their usage.
AWS gives different discounts depending on the term that you pay upfront for. You can yield greater savings for paying upfront for longer terms, but lose flexibility as a result. We find that smaller customers just getting started in their infrastructure journey tend to prefer 1-Year Reserved Instances, whereas more mature organizations will leverage 3-Year Reserved Instances for the greatest savings as they can more accurately model and predict their usage.

!!! Contribute
Contribute to this page on [GitHub](https://github.com/vantage-sh/handbook) or join the `#cloud-costs-handbook` channel in the [Vantage Community Slack](https://vantage.sh/slack).
Loading

0 comments on commit 06194a1

Please sign in to comment.