emr

Star

Here are 337 public repositories matching this topic...

udayshankar1306 / spark_way

Star

Apache Spark - From installation to performing awesome operations in Apache Spark Stack

emr spark apache-spark hadoop spark-fundamentals

Updated May 8, 2017
Python

FredrikBakken / udacity_data-engineering

Star

👨‍🎓 Learn to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets.

python emr docker aws postgres airflow scala spark docker-compose s3 jupyter-notebook apache data-warehouse redshift directed-acyclic-graph

Updated Jun 25, 2020
Jupyter Notebook

manikanta-komuravelli / Data-Lake-using-Spark

Star

This is my fourth project of Data Engineering Nanodegree from Udacity. In this project, I have created an EMR cluster with Spark. Extracted the data from S3 into Spark, and transformed them and written them into parquet files.

emr spark s3

Updated Dec 20, 2021
Jupyter Notebook

masonleon / largescale-spark-graph-analytics

Star

Group 10 Project, Fall 2020, CS 6240: Large-Scale Parallel Data Processing, Khoury College of Computer Sciences, Northeastern University

emr aws scala spark graph-algorithms gephi graph-visualization network-analysis almond large-scale

Updated Jul 12, 2021
Scala

raz-mon / DSP_Ass2

Star

Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).

emr aws distributed-systems hadoop ec2 s3 n-gram

Updated Feb 22, 2022
Java

NHLI-Respiratory-Epi / AECOPD

Star

How to: find acute exacerbation of COPD events in UK primary care electronic healthcare records

emr tutorial howto stata how-to ehr electronic-health-records codeset snomed-ct ehr-phenotyping electronic-healthcare-data codelists

Updated Jul 20, 2023
Stata

aws-cloudformation / aws-cloudformation-resource-providers-emrwal

Star

cfn resource provider package for AWS EMRWAL

emr resources aws-resources

Updated Aug 25, 2023
Java

Haabiy / EMRRunner

Star

Start and monitor jobs on EMR cluster

emr api flask distributed-systems apache-spark cloud-computing software-engineering venv-bootstrap

Updated Mar 15, 2024
Python

mikkokotila / icd10

Star

High Performance ICD10 (ICD-10) to Category Conversion for Python

python emr icd-10 icd10

Updated Apr 23, 2018

emunozlorenzo / Spark_AWS_EMR

Star

python emr aws spark pyspark spark-sql

Updated Jun 16, 2019
Python

silviomori / covid19-datalake

Star

python emr docker aws data-science airflow spark docker-container aws-s3 ecs python3 aws-emr data-engineering data-lake aws-ecs boto3 aws-emr-clusters aws-ecs-cluster

Updated Jun 19, 2020
Python

PannagaS / ETL-Logic-orchestration-using-Spark-and-AWS

Star

An ETL logic is written in Spark for transforming the given data set present in S3, and query on the transformed data is run using AWS Redshift. The data sets are in json format. All the raw data in json format has to be first uploaded to an S3 source bucket. Using EMR, a Spark job is executed, which would fetch the source data from S3 source bu…

python emr spark s3 aws-emr pyspark redshift

Updated Aug 23, 2022
Python

albertlok / MDL-Batch-Project

Star

Data Engineering Batch Project

emr aws engineering data airflow ec2 pipeline metabase redshift

Updated Aug 5, 2022
Jupyter Notebook

escobarana / twitter_msk_emr

Star

Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API

emr serverless twitter-api amazon pyspark msk streaming-pipeline

Updated Sep 10, 2023
HCL

ragsav / SCF_PEKS

Star

Efficient Secure-Channel Free Public Key Encryption with Keyword Search for EMRs in Cloud Storage implementation in C using PBC library

emr public-key-cryptography cryptographic-algorithms keyword-search peks

Updated Apr 4, 2021
C

andrescaro16 / bigdata-labs

Star

emr big-data hive jupyter-notebook hue google-colab

Updated May 26, 2024
Jupyter Notebook

danielhaviv / emr_storage_autoscaler

Star

emr spark yarn emr-cluster

Updated Mar 22, 2018
Shell

alomb / Scala-SparkProject

Star

The project developed for the second module of the course "Languages and Algorithms for Artificial Intelligence", focused on Functional Programming with Scala and Spark.

emr scala big-data akka akka-http spark ethereum blockchain graphx