carrot-kpi · luzzif · Jan 24, 2024 · Jan 22, 2024 · Jan 22, 2024 · Jan 24, 2024
diff --git a/.env.example b/.env.example
@@ -4,7 +4,6 @@ JWT_SECRET="foo"
 DB_CONNECTION_STRING="postgresql://user:[email protected]:5432/data-uploader"
 W3UP_PRINCIPAL_KEY="foo"
 W3UP_DELEGATION_PROOF="foo"
-S3_ENDPOINT="http://foo.bar"
 S3_BUCKET="foo"
 S3_ACCESS_KEY_ID="foo"
 S3_SECRET_ACCESS_KEY="foo"
diff --git a/Dockerfile b/Dockerfile
@@ -46,5 +46,14 @@ ENV W3UP_PRINCIPAL_KEY=$W3UP_PRINCIPAL_KEY
 ARG W3UP_DELEGATION_PROOF
 ENV W3UP_DELEGATION_PROOF=$W3UP_DELEGATION_PROOF
 
+ARG S3_BUCKET
+ENV S3_BUCKET=$S3_BUCKET
+
+ARG S3_ACCESS_KEY_ID
+ENV S3_ACCESS_KEY_ID=$S3_ACCESS_KEY_ID
+
+ARG S3_SECRET_ACCESS_KEY
+ENV S3_SECRET_ACCESS_KEY=$S3_SECRET_ACCESS_KEY
+
 EXPOSE $PORT
 ENTRYPOINT ["node", "index.mjs"]
diff --git a/README.md b/README.md
@@ -19,8 +19,132 @@
 
 # Carrot data uploader
 
-This project implements a simple server that acts as a proxy to various storage
-services. Its API can be accessed with a valid JWT.
+This service is responsible for managing data in the Carrot protocol, which
+primarily falls into two categories at the time of writing:
+
+1. **Templates:** Represent Carrot templates, including Webpack federated React
+   components, CSS, and a `base.json` metadata file.
+2. **Generic specifications:** Comprise JSON files providing information about
+   various entities, such as KPI token campaign specifications and DefiLlama
+   oracle specifications.
+
+### Data States
+
+Data in Carrot exists in two main states:
+
+- **Limbo:** data in limbo doesn't yet need to be persisted but is a potential
+  candidate for persistence. It includes items like Carrot templates with active
+  deployment proposals and specifications for entities that have yet to be
+  created.
+- **Persistent:** data in the persistent state is data that is referenced by
+  on-chain entities within the Carrot protocol. This data needs to be reliably
+  available at all times and for an extremely long period of time.
+
+### On-Chain Data Reference
+
+The on-chain reference mechanism previously mentioned and used to determine if
+data should be persisted and removed from limbo is based on CIDs following the
+`multiformats` CIDv1 specification. A given CID is considered referenced
+on-chain when it's stored in the blockchain's state by a Carrot protocol etity.
+At that point the data referenced by that CID needs to be persisted.
+
+### Storage Locations
+
+Data in Carrot is mainly stored in two locations:
+
+- **AWS S3 Bucket:** this is a solution for hot/warm storage of both limbo and
+  persistent data, served through a CloudFront distributed CDN for quick access.
+  The S3 bucket contains all non-expired limbo data (both raw data and IPFS CAR
+  data) plus all persisted data, and is indexed using CIDs for the data itself.
+
+- **IPFS/Filecoin:** here we exclusively store persistent data that needs to be
+  extremely long lived and available in a decentralized way. Web3.storage is
+  utilized for IPFS data uploads and Filecoin persistence operations.
+
+### API endpoints
+
+1. **`/data/s3/json`:** this endpoint can be used to store JSON limbo data. The
+   API takes the raw input JSON, encodes it into the IPFS CAR format and
+   determines the raw data CID. Both the raw content and the CAR file are
+   uploaded to the S3 bucket using the CID as the base key (the raw content uses
+   the CID itself as the key, while the CAR is uploaded under `$CID/car`).
+
+2. **`/data/ipfs`:** this endpoint persists limbo data and replicates it to
+   IPFS/Filecoin. The API accepts a single parameter `cid` which must refer to
+   some limbo data that the caller wants to persist to IPFS/Filecoin. The API
+   fetches the CAR associated with the passed CID (stored on the S3 bucket under
+   `$CID/car`) and stores the fetched CAR file on IPFS/Filecoin through
+   web3.storage's w3up service. The resulting upload CID is checked for
+   consistency and if everything is fine the raw data is also persisted on the
+   S3 bucket while the CAR is deleted from there.
+
+### Benefits of this approach
+
+This centralized approach where only this service manages Carrot data has a few
+extremely important benefits.
+
+#### Deterministic CIDs
+
+IPFS can store data in different formats, and depending on the picked format,
+the same starting data can result in different multihashes once uploaded to the
+network, which in the end results in different CIDs. This is a problem for
+Carrot because the on-chain CID references are immutable and we need some way to
+guarantee that the on-chain CIDs reference some real data that is in fact stored
+on IPFS.
+
+Let's have the following example:
+
+1. A template author wants to add a template to Carrot to unlock some specific
+   functionality. He builds the template and ends up with the final template's
+   code, which he uploads to IPFS using a pinning service such as Pinata.
+2. The output step from step 1 is the template's code CID, which can be used to
+   create a proposal to add the template to Carrot on-chain. The proposal is
+   created.
+3. After some time, the proposal is approved and the template is added to Carrot
+   on-chain. This results in the template code'S CID being referenced on-chain,
+   which should make the data persistent in Carrot, as explained above.
+4. The IPFS pinner daemon picks up this added reference and makes the template
+   code persistent on IPFS. In order to do that it downloads the template's code
+   from IPFS and uploads it to web3.storage through a dedicated library. This
+   library follows a different data encoding prodedure, resulting in a different
+   multihash and CID at the end of the process. **So at this point the same
+   starting data has been added to IPFS in different ways, resulting in a CID
+   mismatch.**
+5. After some time the author unpins from Pinata the template's code.
+
+The end result? The template's code has been put in limbo and then persisted to
+IPFS in 2 different ways, resulting in 2 different CIDs, and now the limbo data
+is no more. We end up with a dangling CID: **the on-chain reference to the
+template's code is referencing data the doesn't exist anywhere**.
+
+The best solution to avoid this scenario is to handle both limbo data addition
+and persistent data addition in the same place, and this place is the
+`data-uploader` service. Adding data to limbo will cause the `data-uploader`
+service to calculate this data's CID by creating an IPFS CAR containing the
+data, and returning this CID to the caller. **It's then responsibility of the
+caller to use that CID to reference the limbo data**. As long as the caller does
+that, we have an extremely strong guarantee that when the data will be persisted
+it will be persisted with the same original CID. This is because the peristence
+process is performed by storing the CAR file on IPFS/Filecoin, the same CAR file
+that was originarily used to determine's the data CID.
+
+#### Performance and decentralization
+
+Through the double S3/IPFS storing mechanism we can guarantee the best
+properties of both worlds. If a Carrot user doesn't have strong decentralization
+guarantee he will be able to access all Carrot data from the S3 bucket directly
+through a distributed CloudFront CDN, as the bucket always contains all limbo
+data + persisted data. The addition of the CDN also boosts data delivery
+performance, resulting in a snappier and overall better experience.
+
+For users that want the maximum amount of decentralization and trustlessness
+it's also possible to access Carrot data directly from IPFS too, as IPFS will
+have all Carrot's persistent data at all times. In most cases this won't have
+the same performance of a distributed CloudFrontn CDN though.
+
+This setup is especially powerful (in both decentralization and trustlessness)
+if coupled with a frontend that allows using a locally hosted IPFS node to
+access the data.
 
 ## Tech used
 

diff --git a/package.json b/package.json
@@ -18,6 +18,7 @@
     "devDependencies": {
         "@commitlint/cli": "^18.4.4",
         "@commitlint/config-conventional": "^18.4.4",
+        "@smithy/types": "^2.9.1",
         "@types/jsonwebtoken": "^9.0.5",
         "@types/pg": "^8.10.9",
         "dotenv": "^16.3.1",
@@ -48,6 +49,7 @@
         "hapi-swagger": "^17.2.0",
         "joi": "^17.11.1",
         "jsonwebtoken": "^9.0.2",
+        "multiformats": "^13.0.1",
         "pg": "^8.11.3",
         "viem": "^2.2.0"
     }