![](https://private-user-images.githubusercontent.com/41976906/282285291-b728aab9-d2ce-41bd-a448-c5c181b61453.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2Njk1MzIsIm5iZiI6MTczOTY2OTIzMiwicGF0aCI6Ii80MTk3NjkwNi8yODIyODUyOTEtYjcyOGFhYjktZDJjZS00MWJkLWE0NDgtYzVjMTgxYjYxNDUzLmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDAxMjcxMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ2NjcwMmMzYzI4MzUxMjI3YTE4MjcxZjRhYWMzNmU3ZDdmNWZmZWNkMGEyZDQ4ZDNmYTJiYTVlMzE0ODQyMWUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Otk5j8y25L8UVFG3MoFgNrdckk0oN3ds7SyNCzko_ZQ)
Our primary goal was to build a scalable and efficient backend for a ticketing service that could handle high loads with minimal latency. The project focuses exclusively on the backend and infrastructure aspects, omitting a frontend interface to concentrate on the underlying mechanics and performance. This repository highlights the infrastructure conponents, illustrating our journey through building infrastructure, creating CI/CD pipelines, and managing containers.
Programming Language & Framework: We chose Kotlin and Spring Boot for their expressive syntax and the powerful suite of tools for building web applications efficiently.
Database: We utilized MySQL for its powerful database locking functions, which are essential in managing concurrent operations effectively within our ticketing system. This capability ensures data integrity and consistent performance under high-load scenarios.
Containerization & Orchestration: Kubernetes was used to manage our containerized applications, enabling easy scaling and management across multiple servers hosted on AWS.
Configuration Management: Helm charts helped us streamline the installation and management of our Kubernetes applications.
Continuous Deployment: ArgoCD was employed to automate the deployment process, ensuring our changes were seamlessly and reliably pushed to production.
Infrastructure as Code: Terraform allowed us to define our infrastructure using configuration files, which helped in maintaining consistency and ease of deployment across environments.
Performance Testing: We employed K6 to conduct spike tests, simulating scenarios with excessive simultaneous access to evaluate the performance and robustness of our system under extreme conditions.
Monitoring: We integrated Prometheus and Grafana to monitor our applications and infrastructure, ensuring high availability and performance through real-time insights.
In the course of developing our infrastructure, we tackled a range of infrastructure challenges and optimizations. Below are key resources and discussions that provide insights into our decision-making process and the solutions we implemented:
- Building the Deployment Environment with Terraform: This issue tracks our use of Terraform to automate the provisioning of our entire cloud environment, focusing on reliability and scalability.
- Migration from AWS ALB to Nginx Ingress (Baremetal): To reduce costs, we replaced AWS ALB with a more cost-effective Nginx Ingress setup on bare metal. This discussion details the reasons behind the change and the implementation process.
- Using Public Subnet Node Group to Address NAT Gateway Cost Issues: We opted to configure our EKS cluster using a public subnet node group to avoid high costs associated with NAT gateways.
- How to Scrape Metrics from Multiple Pods Using Spring Actuator: This article explains how we set up Prometheus, via Helm, to scrape metrics from multiple pods, enhancing our monitoring capabilities using Spring Boot’s Actuator.
- Injecting Secrets into EKS Pods Using Terraform: We explored methods to securely inject secrets into our Kubernetes pods using Terraform, ensuring sensitive data is managed safely and effectively.
- Queue System Design Issues: Discusses considerations for preventing update losses, implementing non-blocking APIs, and choosing data structures for the queue system.
- Project Package Structure Considerations: Deliberations on how to organize the project's package structure effectively.
- Convention Documentation: Defines conventions for branch naming, commit messages, HTTP response structures, serialization, testing, and more.
- API Enhancement Considerations: Detailed discussion on time conventions, data transfer between layers (errors, responses), logging best practices, and their implementation.
- Maintaining Over 80% Test Coverage with Jacoco and Codecov: Outlines strategies and efforts to maintain a high level of test coverage using Jacoco and Codecov.
- Integration Testing Environment with Testcontainers and MySQL Container: Describes the setup of an integrated testing environment using Testcontainers and a MySQL Docker container to enhance testing reliability and consistency.
![image](https://private-user-images.githubusercontent.com/41976906/282295924-00651cfb-8e03-4857-bc3b-14a400c84cbe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2Njk1MzIsIm5iZiI6MTczOTY2OTIzMiwicGF0aCI6Ii80MTk3NjkwNi8yODIyOTU5MjQtMDA2NTFjZmItOGUwMy00ODU3LWJjM2ItMTRhNDAwYzg0Y2JlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDAxMjcxMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlYzFmODQxNjViY2E3MDk2NzgxY2E4MDQ5OTRkZDUwM2M3MzY5ZmY4MDdmNmMyMGUyZDVkNTc4NmZhYjAzMzUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.8ofboCN9hwp685UbHxtbbiX1vYBLlEzGmapJyxULIpY)
- Considerations for Building the Performance Test Environment: A detailed discussion on the setup and challenges of creating a suitable environment for performance testing.
- Detailed Performance Test Scenarios: This link provides a thorough description of the performance test scenarios used in our project.
- Calculating Costs for Spike Testing Using ALB LCU: An analysis of cost implications when using AWS ALB Load Capacity Units (LCU) for spike testing.
- Creating K6 Performance Test Scripts: Discussion and documentation on how we developed K6 scripts for our performance testing.
- Database Setup for Test Data and Large-Scale Data Insertions: Outlines our approach to preparing the database for testing, including the creation of large datasets.
- Building a Monitoring Environment with Prometheus and Grafana: Details on how we configured Prometheus and Grafana to monitor our application and infrastructure during the performance tests.
- Improved Slow Queries by adding an index to a single-column with 1 million records.
- Observed changes in CPU performance due to encryption: increased CPU core count and observed changes in encryption difficulty based on encryption level adjustments.
- Observed changes in JVM CodeHeap and performance by repeating the same test after process creation.
- Improved performance of
SELECT COUNT(*)
on ten million records by implementing NoOffset. - Introduced a queue system after considering competition for locks on a single resource (=Event) and observed tests.
- Improved CPU resource usage by modifying thread pool strategy for thread creation.
- Improved Pending Connection by modifying DB Connection Pool strategy.
- Improved latency by applying Redis caching.
Junha Ahn | Hayoung Lim | Jeongseop Park | Minjun Kim |
---|---|---|---|
@junha-ahn | @hihahayoung | @ParkJeongseop | @minjun3021 |
Infrastructure (Leader) | Infrastructure | Backend | Backend |