- Functionality of Web Application
- Implemented
- Data Security
- Storage of Secrets
- Encryption of Data
- Middlewares
- Role-based Access Controls (RBAC) Logic
- Automated Attacks Mitigations
- Data Export as per the user's request
- URL Redirect Confirmation
- Account Security
- XSS Mitigation
- Cloud Functions
- Scheduled Cloud Functions
- Chat Security
- End-to-end Integrity
- Data Masking
- Pagination
- Image Validations & Compression
- Integration
- Research
- Login and Register
- 1:1 Chat
- Search (for users, comments, and posts)
- Notifications
- File uploading logic
- Including chunked uploading due to the 32MB limit per request for Google Cloud Run.
- HTML embeds for popular websites like YouTube
- Image content moderation using Google Cloud Platform (GCP) Computer Vision API
- Able to detect explicit content and spoof (aka memes) content.
- Below is a demo of blocking spoof content.
- Google Cloud Platform (GCP) asynchronous capable Python codes
- Since most of the GCP's APIs Python libraries are not asynchronous capable, I had to write my own asynchronous capable Python codes for the various APIs to improve the performance of the web application.
- Additionally, even if the library had async support, it had some issues with the Python's asyncio event loop.
- Storage of Secrets
- All secrets of the web applications such as API access tokens, database credentials, etc. are stored in GCP Secret Manager.
-
Encryption of Data
- Using Using Google Cloud Platform (GCP) Key Management Service (KMS).
- Encrypted on the Application Layer using AES-256-GCM.
- User's data that are encrypted:
- Phone numbers
- As it is only used for SMS 2FA
- Argon2id password hashes
- as pepper
- Shared Time-based One-Time Password (TOTP) secrets
- Used for 2FA with Google Authenticator or other compatible apps
- Chat messages
- Phone numbers
- Middlewares
- CSRF
- HMAC-SHA256
- Uses the header and cookie for CSRF validation to prevent CSRF attacks.
- Uses GCP KMS Cloud HSM to generate a high entropy bytes for the CSRF token.
- Session
- HMAC-SHA256
- Flexible as compared to the in-built FastAPI/Starlette session middleware.
- Able to become session cookie (without expiry date but the session ID lasts for a day) when the user does not check the
stay signed in
checkbox. - If the user checks the
stay signed in
checkbox, the session will become a persistent cookie that expires after 2 weeks. - Uses GCP KMS Cloud HSM to generate a high entropy bytes for the session ID.
- Able to become session cookie (without expiry date but the session ID lasts for a day) when the user does not check the
- Cache control middleware for the web application endpoints.
- For better performance and availability.
- CSRF
- Role-based Access Controls (RBAC) logic
- Uses FastAPI's dependency injection feature.
- Clears invalid sessions
- Redirects the user to its default endpoint if not authorised.
- For sensitive routes like the admin pages, it will raise 404 HTTP error if the user is not authorised.
-
Automated Attacks Mitigations
- Rate limiter using FastAPI limiter
- Uses FastAPI's dependency injection feature.
- Using reCAPTCHA Enterprise to increase friction against bots.
- Integrated Cloudflare to our domain, miraisocial.live to increase friction against bots and protect against any network attacks.
- Rate limiter using FastAPI limiter
-
Data export as per the user's request.
- Fulfils Art. 20 GDPR – Right to data portability.
- Fulfils the PDPA's Data Portability Obligation.
- Uses Cloud Tasks to export the user's data and send it to the user's email.
- Since I deployed the code to Google Cloud Run, it has a max runtime of 1 hour which is sufficient for this project.
- For scalability, one could deploy the code to Google Compute Engine or Google App Engine which can have up to 24 hours of processing time.
-
URL redirect confirmation for external links posted by users.
- Intgerated with Calvin's URL analysis feature for suspicious or malicious URLs.
-
Account Security
- Google and Facebook OAuth2 login.
- Forgot Password
- Voluntary revocation of the user's sessions.
- Alerting users when their passwords are leaked in data breaches using reCAPTCHA Enterprise API.
- Takes the username or a canonicalised email and the user's password and pass it through a Scrypt hash function and then sends it to the reCAPTCHA Enterprise API to check if it is in their database of compromised passwords.
- Added password policy.
- 2FA using Authenticator app or SMS (using Twilio API).
- 2FA backup single use code to be used to disable their 2FA in the event that they lose access to their device.
- Location-based login 2FA if the user is logging in from a new location and does not have 2FA enabled.
- XSS mitigation for the web application endpoints.
- Using DOMPurify,
html.escape()
, and Jinja2 to escape dirty user inputs.
- Using DOMPurify,
-
- Create Signed URL (Golang)
- Uses the Golang's Google Cloud Storage (GCS) library to create a signed URL for the user to view the file.
- During the process of signing the GCS URL, it can also contain an expiry time for a short-lived signed URL which will expire and become invalid.
- Used in posts and chat messages for confidentiality.
- Uses the Golang's Google Cloud Storage (GCS) library to create a signed URL for the user to view the file.
- Sending Emails (Golang)
- Since aiosmtplib Python library takes a while (~5 mins) for the user to receive the emails, I had to make a Cloud Function to send the emails which is coded in golang which helped to reduce the time taken for the user to receive the emails to ~15 seconds.
- Create Signed URL (Golang)
-
Scheduled Cloud Functions
- Using Cloud Scheduler to schedule the Cloud Functions to run at a specific intervals.
- Re-encrypt Database (Golang)
- Automated the re-encryption of the user's data when the encryption key in GCP Key Management Service (KMS) is rotated.
- Database Cleanups (Golang)
- As per data retention policy
- Delete expired chat messages.
- Delete orphan comments (comments that are not attached to any posts as the post was deleted).
- Delete the user's data if the user has not logged in for 2 years.
- Delete the user's data if the user has not verified their email for a month.
- Delete the admin's account if the admin has been inactive for more than a month.
- It also due to security reasons such as to minimise the risk of the admin's account being compromised.
- As per data retention policy
-
Chat Security
- Allow users to add a chat password for extra security.
- If the user forgets the password, they can reset it by clicking on the "Forgot Password" button which will send an email to the user's email address with a link to disable their chat password protection.
- Disappearing messages that can be configured by either the sender or the receiver.
- Will take the one with the shortest duration to be used for the message's self-destruct timer.
- Signed Google Cloud Storage (GCS) URL for files uploaded by users as previously mentioned.
- Allow users to add a chat password for extra security.
-
End-to-end integrity
- Chat messages are checked using CRC32C and MD5 checksums for integrity checks and also with performance in mind.
- No need for SHA256 as it is already sent via WebSocket Secure (WSS) which is encrypted and ensures the integrity of the data.
- Web application/API server to GCS server integrity checks are done by sending the file's MD5 and CRC32C checksums to GCS for file integrity validations on Google's end.
- Chat messages are checked using CRC32C and MD5 checksums for integrity checks and also with performance in mind.
-
Data Masking
- Using GCP Computer Vision API and GCP Natural Language API to mask sensitive data in images, pdfs, and text.
- Using regex to also mask sensitive data in text due to the limitations of the GCP Natural Language API.
-
Pagination
- Implemented in the chat as to prevent:
- The server's and the client's browser memory from being overloaded with too many chat messages which can cause either the server or the client's browser to crash.
- Overloading or getting rate limited by the GCP KMS API.
- Search results are also paginated to prevent:
- The server's and the client's browser memory from being overloaded with too many search results which can cause either the server or the client's browser to crash.
- Implemented in the chat as to prevent:
-
Image Validation & Compression
- Helps to reduce the size of the image files uploaded by the user.
- Also checks for decompression bomb attacks which can cause the client's browser to crash.
- Uses Python's Pillow library to compress and do the image decompression bomb attack checks.
- The attacks are executed by uploading a very large resolution file (E.g. 10,000 x 10,000 pixels) which can cause severe lag and can even crash the client's browser.
- For posts, the large resolution image will be blocked from being uploaded and the user will be notified.
- For chat messages, the large resolution image will be treated as a normal file and will not be displayed as an image on the user's browser.
- This approach is safer as the user can still view the image by downloading it and viewing it directly on his/her device without crashing the client's browser.
- Original images can be viewed by clicking on the "View Original" button or removing the
?compress=true
query parameter from the image URL.
- Helped to deploy the Cloud Functions developed by my group members to GCP.
- Integrated Eden's PassportEye OCR with the file uploading logic.
- Integrated Eden's data security enhancements
- Separate database servers for user-related data and admin-related data
- RBAC configurations for the web application endpoints
- Helped to clean up the code and fix bugs in the web application and API.
- Helped to develop asynchronous capable Python codes for the GCP APIs if needed for my group members' features such as the GCP Web Risk API for Calvin's URL analysis.
- MongoDB Configurations for Data Security
- MongoDB sharding which allows the database to scale horizontally by splitting the data into chunks and distributing them across multiple servers. This helps to provide higher availability and scalability.
- Encryption at rest for the MongoDB database.
- Automatic backups for the MongoDB database (which is also encrypted at rest).
- Multiple nodes for the MongoDB database for automatic failover to provide higher availability.