title |
---|
Ecosystem |
Apache Flink supports a broad ecosystem and works seamlessly with many other data processing projects and frameworks.
{% toc %}
Connectors provide code for interfacing with various third-party systems.
Currently these systems are supported:
- Apache Kafka (sink/source)
- Elasticsearch 1.x / 2.x / 5.x / 6.x (sink)
- HDFS (sink)
- RabbitMQ (sink/source)
- Amazon Kinesis Streams (sink/source)
- Twitter (source)
- Apache NiFi (sink/source)
- Apache Cassandra (sink)
- Redis, Flume, and ActiveMQ (via Apache Bahir) (sink)
To run an application using one of these connectors, additional third party components are usually required to be installed and launched, e.g., the servers for the message queues. Further instructions for these can be found in the corresponding subsections.
This is a list of third party packages (i.e., libraries, system extensions, or examples) built on Flink. The Flink community collects links to these packages but does not maintain them. Thus, they do not belong to the Apache Flink project, and the community cannot give any support for them. Is your project missing? Please let us know on the [user/dev mailing list]({{ site.baseurl }}/community.html#mailing-lists).
Apache Zeppelin
Apache Zeppelin is a web-based notebook that enables interactive data analytics and can be used with Flink as an execution engine. See also Jim Dowling's Flink Forward talk about Zeppelin on Flink.
Apache Mahout
Apache Mahout is a machine learning library that will feature Flink as an execution engine soon. Check out Sebastian Schelter's Flink Forward talk about Mahout-Samsara DSL.
Cascading
Cascading enables a user to build complex workflows easily on Flink and other execution engines. Cascading on Flink is built by dataArtisans and Driven, Inc. See Fabian Hueske's Flink Forward talk for more details.
Apache Beam
Apache Beam is an open-source, unified programming model that you can use to create a data processing pipeline. Flink is one of the back-ends supported by the Beam programming model.
GRADOOP
GRADOOP enables scalable graph analytics on top of Flink and is developed at Leipzig University. Check out Martin Junghanns’ Flink Forward talk.
BigPetStore
BigPetStore is a benchmarking suite including a data generator and will be available for Flink soon. See Suneel Marthi's Flink Forward talk as preview.
FastR
FastR is an implemenation of the R language in Java. FastR Flink executes R workloads on top of Flink.
Apache SAMOA
Apache SAMOA (incubating) is a streaming ML library featuring Flink as an execution engine soon. Albert Bifet introduced SAMOA on Flink at his Flink Forward talk.
Alluxio
Alluxio is an open-source memory-speed virtual distributed storage that enables applications to efficiently share data and access data across different storage systems in a unified namespace. Here is an example of using Flink to access data through Alluxio.
Python Examples on Flink
A collection of examples using Apache Flink's Python API.
WordCount Example in Clojure
A small WordCount example on how to write a Flink program in Clojure.
Anomaly Detection and Prediction in Flink
flink-htm is a library for anomaly detection and prediction in Apache Flink. The algorithms are based on Hierarchical Temporal Memory (HTM) as implemented by the Numenta Platform for Intelligent Computing (NuPIC).
Apache Ignite
Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time. See Flink sink streaming connector to inject data into Ignite cache.
Tink temporal graph library
Tink is a temporal graph library built on top of Flink. It allows for temporal graph analytics like different interpretations of the shortest temporal path algorithm and metrics like temporal betweenness and temporal closeness. This library was the result of the Thesis of Wouter Ligtenberg.