Palisade is no longer under active development.
Windows is not an explicitly supported environment, although where possible Palisade has been made compatible.
For Windows developer environments, we recommend setting up WSL.
For an overview of Palisade, start with the Palisade introduction and the accompanying guides: QuickStart Guide; and Developer Guide which are found in the Palisade README.
The Palisade-readers repository enables functionality for providing the implementations needed for Palisade to integrate with existing products and technologies.
A good starting point to understanding these modules is with the Data Service.
For a single request to the Data Service, the request might look like GET /read/chunked resourceId=hdfs:/some/protected/employee_file0.avro token=some-uuid-token
, which can be broken down into a number of capabilities that are required:
- Reading data from an
HDFS
cluster - Deserialising an
Avro
data-stream - Understanding what an
Employee
datatype looks like and how the rules on the/protected
directory will apply to the fields - How to return data for the
/read/chunked
API endpoint
The Palisade-readers repository therefore implements many of these functions, abstracted away from the inner workings of the core Palisade-services:
- In this case, the Data Service's default API is for the
/read/chunked
endpoint, and is therefore already implemented in the ReadChunkedDataService, but we could imagine other protocols. - To read from an
HDFS
filesystem, we need the Resource Service to discover the available resources (like doing anls
on a directory), as well as needing the Data Service to read the raw bytes of that resource. We implement the Hadoop Resource Service and Hadoop Data Reader to enable this functionality. - To work with the raw bytes returned from the Data Reader, we need to deserialise into Java objects. We implement the Avro Serialiser that, given a domain class, will serialise and deserialise between Java objects of this class and plain bytes.
- The domain class for the aforementioned serialiser in this case is
Employee
, which is implemented elsewhere and equivalent to a schema definition and is generally a property of the specific dataset, not the Palisade deployment in general. All that is important is that this POJO exists somewhere on the classpath.
The decoupling of these technology-specific implementations allows Palisade to be flexible enough to be trivially implemented into existing tech stacks and datasets.
The above deployment could as easily have been using the S3 Resource Service and S3 Data Reader to serve a request for GET /read/chunked resourceId=s3:/some/protected/employee_file0.avro token=some-uuid-token
.
For information on the different implementations, see the following modules:
- Apache Avro Format
- Apache Hadoop Distributed File System
- Amazon S3 Object Storage