lakeFS - Data version control for your data lake | Git for data
-
Updated
Jul 3, 2024 - Go
lakeFS - Data version control for your data lake | Git for data
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Kafka Connect FileSystem Connector
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
基于Hadoop的分布式云存储系统 🌴
Helm chart for Apache Hadoop using multi-arch docker images
OctopuFS library helps managing cloud storage, ADLSgen2 specifically. It allows you to operate on files (moving, copying, setting ACLs) in very efficient manner. Designed to work on databricks, but should work on any other platform as well.
Hadoop utility to compact small files
Python wrapper to access Hadoop HDFS REST API
SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol
This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Spark Streaming via Kafka
Neat and Handy Place for all Hadoop codes
Add a description, image, and links to the hadoop-filesystem topic page so that developers can more easily learn about it.
To associate your repository with the hadoop-filesystem topic, visit your repo's landing page and select "manage topics."