Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add "GET /files" controller #21

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

feat: add "GET /files" controller #21

wants to merge 4 commits into from

Conversation

psankhe28
Copy link
Collaborator

@psankhe28 psankhe28 commented Sep 11, 2024

Description

Checklist

  • My code follows the contributing guidelines
    of this project, including, in particular, with regard to any style guidelines
  • The title of my PR complies with the
    Conventional Commits specification; in particular, it clearly
    indicates that a change is a breaking change
  • I acknowledge that all my commits will be squashed into a single commit,
    using the PR title as the commit message
  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have updated the user-facing documentation to describe any new or
    changed behavior
  • I have added type annotations for all function/class/method interfaces
    or updated existing ones (only for Python, TypeScript, etc.)
  • I have provided appropriate documentation
    (Google-style Python docstrings) for all
    packages/modules/functions/classes/methods or updated existing ones
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature
    works
  • New and existing unit tests pass locally with my changes
  • I have not reduced the existing code coverage

Comments

Summary by Sourcery

Add a new endpoint to list all files in the MinIO bucket and include an integration test to ensure its functionality.

New Features:

  • Introduce a new endpoint '/list_files' to retrieve a list of all files in the MinIO bucket.

Tests:

  • Add an integration test 'test_get_files' to verify the functionality of the new '/list_files' endpoint.

Signed-off-by: Pratiksha Sankhe <[email protected]>
Copy link
Contributor

sourcery-ai bot commented Sep 11, 2024

Reviewer's Guide by Sourcery

This pull request implements a new endpoint to retrieve a list of all files stored in the MinIO bucket. The changes include adding a new API route, implementing the corresponding controller function, and creating a test for the new endpoint.

Sequence diagram for the '/list_files' endpoint

sequenceDiagram
    actor User
    participant API as Cloud Storage API
    participant Controller as Controller
    participant MinIO as MinIO Storage

    User->>API: GET /list_files
    API->>Controller: Handle request
    Controller->>MinIO: list_objects(bucket_name)
    MinIO-->>Controller: Return list of files
    Controller-->>API: Return JSON with file list
    API-->>User: 200 OK, JSON with file list
Loading

Class diagram for the updated controller

classDiagram
    class Controller {
        +home()
        +list_files()
    }

    class MinIOClient {
        +list_objects(bucket_name)
    }

    Controller --> MinIOClient: uses
    note for list_files "Handles GET requests to '/list_files' endpoint"
Loading

File-Level Changes

Change Details Files
Implement new API endpoint for listing files
  • Add '/list_files' GET route to the API specification
  • Define response schema for the new endpoint
  • Implement 'list_files' controller function to handle requests
  • Use MinIO client to fetch file names from the bucket
  • Return file list as JSON response
cloud_storage_handler/api/specs/specs.yaml
cloud_storage_handler/api/elixircloud/csh/controllers.py
Add integration test for the new list_files endpoint
  • Create 'test_get_files' function to test the new endpoint
  • Mock the GET request to the '/list_files' endpoint
  • Assert the response status code is 200 OK
tests/test_integration/test_operations.py

Assessment against linked issues

Issue Objective Addressed Explanation
#6 Implement GET /list_files operation
#1 Add endpoint for file upload The PR does not implement a file upload endpoint. It only adds an endpoint to list files.
#1 Add endpoint for listing files
#1 Add endpoint for file deletion The PR does not implement a file deletion endpoint. It only adds an endpoint to list files.
#2 Add a README.md file with comprehensive project documentation The PR does not include any changes related to adding a README.md file. Instead, it focuses on implementing a new API endpoint to list files.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@psankhe28 psankhe28 marked this pull request as draft September 11, 2024 17:45
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @psankhe28 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider implementing pagination for the file list to handle large numbers of files more efficiently.
  • The error handling could be more granular. Instead of catching all S3Errors and returning a 500 status code, consider handling different types of S3Errors separately and returning appropriate status codes.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (0b08ba9) to head (097a550).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #21   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines           30        45   +15     
=========================================
+ Hits            30        45   +15     
Flag Coverage Δ
test_integration 100.00% <100.00%> (ø)
test_unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@psankhe28 psankhe28 marked this pull request as ready for review October 14, 2024 17:51
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @psankhe28 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider improving error handling in the list_files function. It currently only catches S3Error, but there might be other exceptions that could occur.
  • The test for the new endpoint (test_get_files) only checks the status code. Consider mocking the MinIO client and testing the actual file list returned for more thorough coverage.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 2 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

tests/test_integration/test_operations.py Outdated Show resolved Hide resolved
tests/test_integration/test_operations.py Outdated Show resolved Hide resolved
tests/test_integration/test_operations.py Show resolved Hide resolved
Signed-off-by: Pratiksha Sankhe <[email protected]>
@uniqueg uniqueg changed the title feat: endpoint to get all files feat: add "GET /files" controller Oct 15, 2024
Copy link
Member

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psankhe28. It's not far off, but the PR shows that you could improve your knowledge of FOCA, REST, OpenAPI etc. Please have a look at some online resources and a few other FOCA-based services and understand what's happening and why (my comments should give you pointers). I think it's worth it, for future PRs and a better understanding of backend programming :)

schema:
type: array
items:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to return more than just a string, rather a list of objects following a model that you define. At the very least, when uploading, you should create a resource ID, maybe a UUID4 or something similar. Please look at other services and see how they do that.

Actually, it would have made more sense to start with implementing POST /files, as that would have clarified that point better. It is expected by REST semantics that a POST request creates a resource, and that resources has a certain ID, and that ID should be guaranteed to be unique (which a filename is not guaranteed to be).

You could also consider including other basic info, like file size, some sort of hash like MD5 (could be configurable; a hash could then be used to avoid duplicate files to be registered, if we want that; we can discuss that), maybe the MIME type of the file if it's available (or whatever info you get from the file magic string, if present), possibly a description if that is provided during upload (if there is a field for that) - or whatever else DRS needs to create a POST /objects.

But given that you started with GET /files, let's keep the model simple for now and only return a unique identifier and a filename. And then discuss about additional properties when you implement POST /files.

cloud_storage_handler/api/specs/specs.yaml Show resolved Hide resolved
Comment on lines +29 to +30
except S3Error as err:
return jsonify({"error": str(err)}), 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to raise an application-specific, custom error. Describe it in cloud_storage_handler.exceptions and make sure to also include it in the exceptions dictionary of that module. Have a look at HTTP status codes to see if you find a more appropriate status code than 500. If so, use that, otherwise keep 500. In any case, use the dictionary to map it to 500.

Comment on lines +22 to +27
minio_config = current_app.config.foca.custom.minio
bucket_name = minio_config.bucket_name
minio_client = current_app.config.foca.custom.minio.client.client
objects = minio_client.list_objects(bucket_name)
files = [obj.object_name for obj in objects]
return jsonify({"files": files}), 200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing a specific class for this controller, in a dedicated module. There will be more code added at a later point, and we don't want these functions to get too complex.

minio_config = current_app.config.foca.custom.minio
bucket_name = minio_config.bucket_name
minio_client = current_app.config.foca.custom.minio.client.client
objects = minio_client.list_objects(bucket_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the only line that could raise an S3Error if I'm not mistaken. All the other lines could also raise errors, but not S3Errors.

Please have a look at other FOCA implementations and understand when and when not to define errors. You only want to treat errors that you reasonably expect (and that you want to actually handle in some way). But make sure that you make use of FOCA's built-in error handling. If your error handling doesn't add anything to the existing error handling (which maps all known/defined errors to the error message and code defined in the exceptions dictionary in cloud_storage_handler.exceptions and raises a default 500 errors for any errors it doesn't know about), then leave it.

In this case, you can probably assume that the config is valid, because it was actually validated at some point. If it were not valid (and attributes would be missing), that would really be an unexpected error pointing to some deeper problem that the user shouldn't know about. In such cases, no specific error handling is needed, it is truly an InternalServerError. Admins would then need to investigate, and they would look at the logs and see the actual error trace. That's exactly what we want to happen - so no need to or additional value from handling such errors specifically.

You need to think hard in each case whether there is something that we want to let the user/client to know. Most importantly, will there be a point in retrying or not? An S3 error could be because the S3 service has a temporary glitch, so it might actually be worth trying again. But a BadRequest would always be expected to fail, so clients should not retry.

Here, I do think that you actually handle errors quite right: S3Error is also the only error I'd catch, but you should only put the try / catch around that one line that can actually raise such an error. And you want to define a custom error and use FOCA to map that error to a 500 or more appropriate error response with an informative error message.

Everything else you can rightly ignore, at least for now.

Make sure to learn about FOCA, read on the different HTTP status codes, REST philosophy and error handling, especially in web services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement "GET /list_files" operation
2 participants