Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling and Securing Passport: Enhancing Infrastructure for High-Impact Growth #2392

Open
erichfi opened this issue Apr 17, 2024 · 6 comments

Comments

@erichfi
Copy link
Collaborator

erichfi commented Apr 17, 2024

Scaling and Securing Passport: Enhancing Infrastructure for High-Impact Growth

Overview

This milestone focuses on enhancing the security measures and scaling the infrastructure of our project to handle increased load efficiently. The timeline allocated for this phase is from April 25th to May 7th, totaling 19 Dev Days.

Revised Tasks and Priorities

Automating Data Sync

  1. Automating Data Sync: Parquet File Export Integration for Passport Scores with Open Source Observer #2353: Automate the synchronization process to enhance reliability and timeliness of data exchanges.

Monitoring Enhancements

  1. Infra Week: Monitoring of Public and Private APIs #2402: Ensure robust monitoring to detect disruptions or performance issues.
  2. Review & Replace Existing Alarms #2228: Update and streamline alarm systems to improve responsiveness and relevance.
  3. Monitoring of Cron Jobs #2404: Set up monitoring for all scheduled tasks to ensure they run as expected without failures.
  4. Monitoring Database Performance #2403: Implement detailed database performance monitoring to anticipate and mitigate potential issues.

Database Model Changes

  1. Change to db models for data models #2412: Change to db models for data models.

Security and Access Management

  1. Investigate Passport IAM Errors and AWS Alarm  #2183: Address identity and access management errors to maintain system integrity.

Comprehensive Testing

  1. End-to-End Testing of Passport App #2405: Conduct thorough testing to ensure all components function seamlessly together.

Resource Management

  1. Resource Tagging #2406: Improve resource tracking and cost management through systematic tagging.

Load Testing

  • Load Testing (Pending Tim's Availability): When Tim gets back online, he will prioritize load testing to ensure the infrastructure's capability to handle expected traffic. This is crucial to verify our system’s resilience and performance under high load conditions.

Future Considerations

These tasks are critical for long-term strategic goals but are not prioritized in the current milestone due to either complexity or lower immediate impact:

  • Set Up Aurora Serverless DB Settings: Postponed due to the need for thorough evaluation and potential risks.
  • Clean Up Pager Duty Alarms: Integrated into the review of existing alarms but specific clean-up tasks may be deferred.
  • Pulumi Security Updates: Including WAF setup and AWS Shield Basic—pending further discussion on implications for partners and testing methodologies.
  • Restrict SSH Access: Important for security but not critical for immediate implementation.
  • Fix Docker Access: Essential for operational integrity but can be addressed after more pressing security tasks.
  • Document Processes and Coding Guidelines
  • Preview Step in Pipeline for Pulumi
  • Automate Running Migrations
  • Manage Access Keys
  • Document Deployment Processes
  • Map Out Architecture
  • Map Out DevOps Cycle
  • Rollback Github Actions

Strategic Focus

The focus remains on leveraging the 80/20 principle to prioritize tasks that offer the most significant impact on security and scalability with minimal resource expenditure. The immediate goal is to ensure the infrastructure is robust enough to handle anticipated traffic surges without compromising security or performance.

Expected Outcomes

  • Enhanced security across all levels of the infrastructure.
  • Improved monitoring and response capabilities.
  • Streamlined and efficient data synchronization with open-source observers.
  • Comprehensive testing to ensure functionality and readiness.

This updated milestone draft aims to reflect the current priorities and strategic adjustments necessary for successful project execution, with a clear path outlined for future enhancements and optimizations.

@erichfi
Copy link
Collaborator Author

erichfi commented Apr 17, 2024

Next Steps:

  • Review and Planning: @nutrina and @larisa17 are tasked with reviewing the notes thoroughly. Please collaborate closely with the engineering team to determine the feasibility and priority of the proposed tasks within the allocated timeframe. Should additional time be necessary, we may consider splitting this up into two chunks and doing the second / third sprint on this in the future.
  • Implementation and Tracking: Following the above discussions, @erichfi will formalize this document into a structured checklist. This checklist will be integrated into GitHub with linked issues, organized according to the agreed-upon priorities in the development backlog.

@erichfi erichfi changed the title Boost Security and Scale Infrastructure Scaling and Securing Passport: Enhancing Infrastructure for High-Impact Growth Apr 17, 2024
@nutrina
Copy link
Collaborator

nutrina commented Apr 19, 2024

@erichfi we have prioritised the list and created issues.

So in the order of priorities we have the following list:

Monitoring

Also, as part of each ticket, update this Notion to reflect the latest state: https://www.notion.so/gitcoin/Passport-Monitors-PD-Alarms-444bfbe603d146ecbdd54211e1646957
The Notion page should provide an overview of the monitoring topic.

End-to-end testing

Tags

@erichfi
Copy link
Collaborator Author

erichfi commented Apr 19, 2024

Prio

Top priority
#2402
#2228
#2404
#2403
OSS

#2183
#2405
#2406

Tim Prio:
Load Testing
Then rest of backlog

@erichfi
Copy link
Collaborator Author

erichfi commented May 13, 2024

Aligned to not inlcude #2406, #2403, #2405, #2409

@nutrina
Copy link
Collaborator

nutrina commented May 15, 2024

Putting #2402 back into the backlog for now.
The current work is merge but it is not finalised yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants