Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Object Storage #410

Open
Minivera opened this issue Oct 7, 2022 · 6 comments
Open

Proposal: Object Storage #410

Minivera opened this issue Oct 7, 2022 · 6 comments

Comments

@Minivera
Copy link
Contributor

Minivera commented Oct 7, 2022

Why

Object storage is on the roadmap and I think it would be a great addition to the framework as it would enable a lot of use cases, like hosting static assets for dynamically generated web applications (backend for frontend or micro-frontends) or file storage for users.

While web hosting could be solved by embedding the static assets inside of the binary (with //go:embed, maybe using something like #311), but user object storage is a lot harder to do at the moment. For example, consider an endpoint that takes in some form data:

package files

import (
    "http",
)

// UploadFile uploads a file on a S3 bucket.
//encore:api public raw
func UploadFile(w http.ResponseWriter, req *http.Request) {
    r.ParseMultipartForm(32 << 20)
    file, handler, err := r.FormFile("uploadfile")
    defer file.Close()

   // Upload to a S3 bucket
}

The upload to a S3 bucket code can be pretty wieldy, and requires a lot of configuration and environment variables. Not to mention that all the infrastructure for that S3 bucket has to be handled outside of the Encore deployment process.

A Download endpoint would be similar and comes with the same limitations.

Proposal

Similar to how PubSub is handled, I think it would be great to be able to create a bucket directly in code and use it to download or upload files as bytes. For example:

package files

import (
    "http",

    "encore.dev/storage"
)

var UserFiles = storage.NewBucket("user_files", options) // Recreating a new bucket could either panic or reuse the bucket

// UploadFile uploads a file on a S3 bucket.
//encore:api public raw
func UploadFile(w http.ResponseWriter, req *http.Request) {
    r.ParseMultipartForm(32 << 20)
    file, handler, err := r.FormFile("uploadfile")
    defer file.Close()

   UserFiles.Upload("some_name", file)
}

This logic would create a new bucket on the service provider, then upload and download objects as file object, array of bytes, string... Locally, it could be possible to emulate persistent file storage using a minio container on the cluster, like the postgres container.

@sourcec0de
Copy link

This module provides drivers for fs, azure, aws, and gcp.
Could be a good starting point for encore.Storage
https://gocloud.dev/

@eandre
Copy link
Member

eandre commented Oct 20, 2022

Thanks @Minivera for writing this up! Totally agree with the direction and the API sketch mirrors my thinking.

The main thing we need to figure out is the API surface area to include in the v1. There are lots of potential things to include, from pre-signed URLs and batch uploads to Cloud Functions in response to bucket/object events.

@sourcec0de that's a very good idea. I've looked at go-cloud in the past and think the API (at least for object storage) is quite good, so that could be a good starting point, with a good overview of what functionality is portable across clouds and what isn't.

@Minivera
Copy link
Contributor Author

That's an excellent question, I kept it open to start the discussion 😄 I do not know all that's possible with Object Storage, just prefacing in case these suggestions don't cover all the use cases you have in mind. In my opinion, I think a good v1 should cover a few things:

  1. Upload files with or without metadata
  2. Download files, list files, everything I'd expect to manage my files.
  3. Download/update file metadata.
  4. Dependency inject one or more files/file metadata into a service, which could be pretty useful.

Cloud storage triggers are definitely great, but I think for a v1 it might be better to keep them separate? Maybe aim for triggers to be their own generic feature so we can also get things like logging triggers or just general cloud events triggers.

On type safety, what I think it could be awesome to have types on metadata and be able to assign those types to paths/buckets or something similar. For example, if I save files in the bucket users under the user_config path (with each object under that path being unique text files), I'd love to be able to define what I want to save as custom metadata for those files with type safety. I'd want to know that if I save files in that bucket and path, Encore makes sure the files will always have that metadata through Go's type system.

For example:

package files

import (
    "http",

    "encore.dev/storage"
)

type Metadata struct {
  userID  string
}

type ConfigsMetadata struct {
  Metadata
  version string
}

var UserFiles = storage.NewBucket("user_files", storage.WithMetadataType("*", Metadata{}), storage.WithMetadataType("configs", ConfigsMetadata{})) // config is the path, * means all paths. doesn't have to be like this

// UploadFile uploads a file on a S3 bucket.
//encore:api public raw
func UploadFile(w http.ResponseWriter, req *http.Request) {
    // ...

   UserFiles.Upload("configs/some_name.txt", file, ConfigsMetadata{
    userID: "test",
    version: "1"
  }) // Error if wrong metadata type
}


// DownloadFile downloads a file from a S3 bucket.
//encore:api public raw
func DownloadFile(w http.ResponseWriter, req *http.Request) {
    // ...

   file, metadata, err := UserFiles.Download("configs/some_name.txt") // I know metadata is of type ConfigsMetadata
}

I also think it would be great to have some kind of file content auto-processing. Imagine, for example, that I am saving slate documents as JSON in GCS and I want to process them in Encore when saved/loaded. It would be great to be able to define specific types for buckets/paths (like with the metadata idea) that will automatically marshal the data loaded from GCP into a struct, slice, or whatever other type I define.

For example:

var UserFiles = storage.NewBucket("user_files")

untypedFile, err := UserFiles.Download("some_path.txt")

type Config struct {
  UserPreferences string
}

// Now I can make sure that all files uploaded or downloaded are of this struct type
var typedFile Config
err := UserFiles.DownloadInto("some_path.txt", *typedFile)

// Could also use any types
var typedFile int
err := UserFiles.DownloadInto("some_path.txt", *typedFile) // Would likely fail marshalling, which is good!

// And handling that file as just bytes
var typedFile []byte
err := UserFiles.DownloadInto("some_path.txt", *typedFile)

But I also think that type safety should be available at compile time, like with the metadata example. The above example is nice for typings, but it won't validate and will likely throw errors at runtime, which isn't optimal. Maybe something like this:

type Config struct {
  UserPreferences string
}

var UserFiles = storage.NewBucket("user_files", storage.WithFileSchema("*", Config{})) // Now all files are types at build time to Config.

// Alternative could also be to use "sub buckets" to make typing and managing easier. The above could probably not trigger type errors without compiling.
var (
  UserFiles = storage.NewBucket("user_files", []byte{}),
  UserConfigs = storage.NewFolder("configs", Config{}),
)

@matthiasbruns
Copy link

Any plans to support S3? I would really love to dig into encore, but with so may missing features, I would again end up in a mixed IaC stack.

@hakimmazouz
Copy link

Any updates on supporting pre-signed read URLs? I see it as pretty essential in the scenario of building SaaS where the bucket needs to be private but the urls the payloads the API can return are authorized to read the resources without having to download them and relay them to the user.

An alternative in the docs would be a simple example to circumvent the current bucket abstraction and just manually use S3 buckets.

@erikcarlsson
Copy link
Contributor

@hakimmazouz pre-signed upload and download URLs are in our short term plans. Hoping to make good progress in the next couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants