Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload improvements #2919

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from
Open

Conversation

BCerki
Copy link
Contributor

@BCerki BCerki commented Feb 26, 2025

#2123

This PR

  • set blob chunk size (I tried a couple options, didn't seem to make much of a difference)
  • remove check for same hash -- more efficient to just delete and recreate than read and check
  • the docs explain the file upload strategy
  • consolidated and renamed some of the ninja schemas
  • I'm having trouble with some mypy errors I hope the reviewer can weigh in on
  • I'm also having trouble with a bug where you can't edit an operation to be new entrant because an opted-in field is missing

@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch from e775467 to 43cca7d Compare February 27, 2025 19:38
@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch from 38918a4 to 0bdf9cc Compare March 6, 2025 23:14
<ul className="m-0 py-0 flex flex-col justify-start">
<li>
<Link
download={fileName}

Check warning

Code scanning / CodeQL

DOM text reinterpreted as HTML Medium

DOM text
is reinterpreted as HTML without escaping meta-characters.
DOM text
is reinterpreted as HTML without escaping meta-characters.

Copilot Autofix AI 19 days ago

To fix the problem, we need to ensure that the fileName is properly escaped before being used in the Link component. This can be achieved by using a library like he (HTML entities) to encode the fileName, which will prevent any HTML or script content from being interpreted by the browser.

  1. Install the he library to handle HTML entity encoding.
  2. Import the he library in the file.
  3. Encode the fileName before using it in the Link component.
Suggested changeset 2
bciers/libs/components/src/form/widgets/FileWidget.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bciers/libs/components/src/form/widgets/FileWidget.tsx b/bciers/libs/components/src/form/widgets/FileWidget.tsx
--- a/bciers/libs/components/src/form/widgets/FileWidget.tsx
+++ b/bciers/libs/components/src/form/widgets/FileWidget.tsx
@@ -6,2 +6,3 @@
 import Link from "next/link";
+import he from 'he';
 
@@ -99,3 +100,3 @@
             <Link
-              download={fileName}
+              download={he.encode(fileName)}
               href={downloadUrl}
@@ -104,3 +105,3 @@
             >
-              {fileName}
+              {he.encode(fileName)}
             </Link>
EOF
@@ -6,2 +6,3 @@
import Link from "next/link";
import he from 'he';

@@ -99,3 +100,3 @@
<Link
download={fileName}
download={he.encode(fileName)}
href={downloadUrl}
@@ -104,3 +105,3 @@
>
{fileName}
{he.encode(fileName)}
</Link>
bciers/package.json
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bciers/package.json b/bciers/package.json
--- a/bciers/package.json
+++ b/bciers/package.json
@@ -106,3 +106,4 @@
     "vite-require": "^0.2.3",
-    "vitest": "^3.0.5"
+    "vitest": "^3.0.5",
+    "he": "^1.2.0"
   },
EOF
@@ -106,3 +106,4 @@
"vite-require": "^0.2.3",
"vitest": "^3.0.5"
"vitest": "^3.0.5",
"he": "^1.2.0"
},
This fix introduces these dependencies
Package Version Security advisories
he (npm) 1.2.0 None
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL has raised an interesting point here. I'm looking into this more to see if it can actually be exploited.

) -> Tuple[Literal[200], Operation]:
payload = OperationNewEntrantApplicationInWithDocuments(
date_of_first_shipment=details.date_of_first_shipment,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it like this instead of **details.dict() was to make mypy happy

Comment on lines 25 to 26
naics_code_id: Optional[int] = None
opt_in: Optional[bool] = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are optional and None so I'm not sure why I'm getting mypy errors

@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch 4 times, most recently from 06b749b to 6a71ad4 Compare March 10, 2025 20:33
Copy link
Contributor

@andrea-williams andrea-williams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-03-11 at 3 08 50 PM

I wasn't able to get past the first step of the Registration workflow. Getting this error in the backend:

  File "/Users/awilliam/Documents/cas-registration/bc_obps/service/document_service_v2.py", line 29, in create_or_replace_operation_document
    existing_document = cls.get_operation_document_by_type_if_authorized(user_guid, operation_id, document_type)
                                                                                    ^^^^^^^^^^^^
NameError: name 'operation_id' is not defined

---------------------------------------------ERROR END-------------------------------------------------
Bad Request: /api/registration/operations/df62d793-8cfe-4272-a93e-ea9c9139ff82/registration/operation
[11/Mar/2025 22:07:41] "PUT /api/registration/operations/df62d793-8cfe-4272-a93e-ea9c9139ff82/registration/operation HTTP/1.1" 400 49

@router.get(
"/operations/{uuid:operation_id}/documents/{uuid:document_type}",
response={200: DocumentOut, custom_codes_4xx: Message},
tags=OPERATOR_TAGS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPERATION_TAGS instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up not using this endpoint because we can use a GCS download link instead

"""
existing_document = cls.get_operation_document_by_type_if_authorized(user_guid, operation_id, document_type)
# if there is an existing document, check if the new one is different
existing_document = DocumentDataAccessService.get_operation_document_by_type(operation_id, type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did get_operation_document_by_type_if_authorized get replaced with this? We're no longer checking whether the user is authorized to access the document

Comment on lines +90 to +91
- DocumentServiceV2.create_or_replace_operation_document
- DocumentDataAccessServiceV2.create_document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- DocumentServiceV2.create_or_replace_operation_document
- DocumentDataAccessServiceV2.create_document
- DocumentService.create_or_replace_operation_document
- DocumentDataAccessService.create_document

Comment on lines 32 to 37
// this removes the file keys if no new file has been uploaded
if (
(key === "boundary_map" ||
key === "process_flow_diagram" ||
key === "new_entrant_application") &&
typeof rjsfFormData[key] === "string"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please help this idiot understand - where does the check for "no new file has been uploaded" happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new file has been uploaded, the value is a File. If the value is a string, it means there's no new file. There's some info about this in the file_uploads.md (feel free to suggest improvements), and I'll update this comment to make it clearer too

<ul className="m-0 py-0 flex flex-col justify-start">
<li>
<Link
download={fileName}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL has raised an interesting point here. I'm looking into this more to see if it can actually be exploited.

@BCerki
Copy link
Contributor Author

BCerki commented Mar 12, 2025

document_service_v2.py

I think this is on old version of the branch--the attachment filename is displaying as the id in the screenshot, and I fixed that in a later commit

@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch from 9c912b2 to 8baa162 Compare March 13, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants