Skip to content

Scalable, redundant, and distributed object store for Apache Hadoop

License

Notifications You must be signed in to change notification settings

devgateway/ozone-s3-spark-iceberg-demo

 
 

Repository files navigation

Apache Ozone

Ozone is a scalable, redundant, and distributed object store for Hadoop and Cloud-native environments. Apart from scaling to billions of objects of varying sizes, Ozone can function effectively in containerized environments such as Kubernetes and YARN.

  • MULTI-PROTOCOL SUPPORT: Ozone supports different protocols like S3 and Hadoop File System APIs.
  • SCALABLE: Ozone is designed to scale to tens of billions of files and blocks and, in the future, even more.
  • CONSISTENT: Ozone is a strongly consistent object store. This consistency is achieved by using protocols like RAFT.
  • CLOUD-NATIVE: Ozone is designed to work well in containerized environments like YARN and Kubernetes.
  • SECURE: Ozone integrates with Kerberos infrastructure for authentication, supports native ACLs and integrates with Ranger for access control and supports TDE and on-wire encryption.
  • HIGHLY AVAILABLE: Ozone is a fully replicated system that is designed to survive multiple failures.

Documentation

The latest documentation is generated together with the releases and hosted on the apache site.

Please check the documentation page for more information.

Contact

Ozone is a top level project under the Apache Software Foundation

  • Ozone web page
  • Mailing lists
  • Chat: There are a few ways to interact with the community
    • You can find the #ozone channel on the official ASF Slack. Invite link is here.
    • You can use GitHub Discussions to post questions or follow community syncs.
  • There are Open Weekly calls where you can ask anything about Ozone.
    • Past meeting notes are also available from the wiki.
  • Reporting security issues: Please consult with SECURITY.md about reporting security vulnerabilities and issues.

Download

Latest release artifacts (source release and binary packages) are available from the Ozone web page.

Quick start

Run Ozone from published Docker image

The easiest way to start a cluster with docker is:

docker run -p 9878:9878 apache/ozone

And you can use AWS S3 cli:

aws s3api --endpoint http://localhost:9878/ create-bucket --bucket=wordcount
aws s3 --endpoint http://localhost:9878 cp --storage-class REDUCED_REDUNDANCY  /tmp/testfile  s3://wordcount/testfile

Run Ozone from released artifact

If you need a more realistic cluster, you can download the latest (binary) release package, and start a cluster with the help of docker-compose:

After you untar the binary:

cd compose/ozone
docker-compose up -d --scale datanode=3

The compose folder contains different sets of configured clusters (secure, HA, mapreduce example), you can check the various subfolders for more examples.

Run on Kubernetes

Ozone is a first class citizen of the Cloud-Native environments. The binary package contains multiple sets of K8s resource files to show how it can be deployed.

Build from source

Ozone can be built with Apache Maven:

mvn clean install -DskipTests

And can be started with the help of Docker:

cd hadoop-ozone/dist/target/ozone-*/compose/ozone
docker-compose up -d --scale datanode=3

For more information, you can check the Contribution guideline

Contribute

All contributions are welcome.

  1. Please open a Jira issue
  2. And create a pull request

For more information, you can check the Contribution guideline

License

The Apache Ozone project is licensed under the Apache 2.0 License. See the LICENSE file for details.

About

Scalable, redundant, and distributed object store for Apache Hadoop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 97.9%
  • TypeScript 0.8%
  • Shell 0.4%
  • C++ 0.3%
  • Python 0.2%
  • HTML 0.2%
  • Other 0.2%