diff --git a/README.md b/README.md index 89ca5a4..89c479a 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,11 @@ Some examples of what you can easily do with Envelope: ### Requirements -Envelope requires a CDH5.7+ cluster with Cloudera's distributions of Spark 2.1 (and Kafka 0.10 and Kudu 1.3, if connecting to those components). +Envelope requires a CDH5.7+ cluster with: + +- Cloudera's distribution of Apache Spark 2.1.0, or above +- Cloudera's distribution of Apache Kafka 2.1.0 (based on Apache Kafka 0.10) or above, if using that component +- Cloudera's distribution of Apache Kudu 1.3.0, if using that component ### Compiling Envelope @@ -42,14 +46,14 @@ You can run Envelope by submitting it to Spark with the configuration file for y A helpful place to monitor your running pipeline is from the Spark UI for the job. You can find this via the YARN ResourceManager UI, which can be found in Cloudera Manager by navigating to the YARN service and then to the ResourceManager Web UI link. -## Get involved +## More information If you are ready for more, dive in: + * [User Guide](docs/userguide.adoc) - details on the design, operations, configuration, and usage of Envelope -* [Configuration Guide](docs/configurations.adoc) - a deep-dive into the parameters and options of Envelope +* [Configuration Specification](docs/configurations.adoc) - a deep-dive into the configuration options of Envelope * [Inputs Guide](docs/inputs.adoc) - detailed information on each provided input, and how to write custom inputs * [Derivers Guide](docs/derivers.adoc) - detailed information on each provided deriver, and how to write custom derivers * [Planners Guide](docs/planners.adoc) - directions and details on when, why, and how to use planners and associated outputs * [Looping Guide](docs/looping.adoc) - information and an example for defining loops in an Envelope pipeline -* [Decisions Guide](docs/decisions.adoc) - information on using decisions to dynamically choose which parts of the pipeline to run -* [Contributing to Envelope](docs/contributing.adoc) - guidelines and best practices for both developing and sharing Envelope components and applications +* [Decisions Guide](docs/decisions.adoc) - information on using decisions to dynamically choose which parts of the pipeline to run \ No newline at end of file diff --git a/docs/contributing.adoc b/docs/contributing.adoc deleted file mode 100644 index 2c8444c..0000000 --- a/docs/contributing.adoc +++ /dev/null @@ -1,92 +0,0 @@ -= Contributing to Envelope - -:toc: right - -*tl;dr* - -_Hack:_ Fork the repo, PR against the `develop` branch. - -_Build:_ Pull library from Maven repo, submit Spark job - -== Hacking Envelope - -Envelope is designed for extensibility, so new components are expected and encouraged. Most often, a developer adds a -new function simply by implementing a core interface, constructing the code as a jar file, and referencing the function -in the configuration directly. - -For work on the framework itself, we follow a mix of two workflows, -https://www.atlassian.com/git/tutorials/comparing-workflows#gitflow-workflow[Gitflow] and -https://www.atlassian.com/git/tutorials/comparing-workflows#forking-workflow[Forking]. - -The central repository implements many of the concepts in Gitflow. For example, Envelope has two main branches, `master` -and `develop`, the former is designed for releases, the latter for stable development. Hotfixes, etc. follow the -guidelines set down in the Gitflow documentation. - -Development work, however, does not use the branches defined in Gitflow (we might under certain circumstances, but have -not to date). Instead, we use the Forking workflow, that is a developer creates a branch against `develop` in their own, -forked repository of Envelope and issues a pull request (PR) from that branch to the `develop` branch on the central -repository. - -=== Pull Requests - -Developers should prefix their PRs with the associated JIRA issue, if one is available. For example, "[ENV-74] Remove -requirement for application name." A PR is reviewed via GitHub's UI, and our convention is that the PR submitter cannot -be the committer as well. Committed PRs should use "squash and merge" in order to streamline the Git history. - -Lastly, commits should be short and descriptive, using present tense verbs, i.e. "Remove requirement for application -name". Why, you ask? It just reads better, don't you think? - -=== Unit Tests - -All code should include unit tests whenever possible. We rely on JUnit and JMock. Currently, we do not have full -integration tests, though this functionality is on the roadmap. - -=== Documentation - -Envelope documentation should be written in http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/[AsciiDoc] and -use the `.adoc` file extension. AsciiDoc allows us to publish documentation in many forms and use a number of plugins, -like graphing, etc. It also is supported natively by GitHub and looks and feels a lot like Markdown. - -All public APIs should have Javadocs. - -== Building with Envelope - -Envelope is designed with configuration in mind, so using Envelope can be as easy as tweaking a configuration file -and submitted a Spark job. - -The pieces to an Envelope application are pretty simple: - -. Configuration file (see link:configurations.adoc[Configurations] for details) -. Supporting JARs, like a custom Deriver -. Supporting assets, like Morphline configurations or CSV files -. Envelope JAR - -At minimum, the application needs the configuration file and the Envelope JAR. - -=== Configuring Extensions - -For extensions, like a custom Input or Deriver, defaults are best handled using a `reference.conf` configuration bundled in the extension's JAR: - -[source,java] ----- - -String REFERENCE_CONFIG = "example-extension-reference.conf"; -Config config; // The primary Envelope configuration (already resolved and passed to the extension) -config = config.withFallback(ConfigFactory.parseResources(REFERENCE_CONFIG)); ----- - -Extensions should also mask any sensitive parameters, after handling the defaults: - -[source,java] ----- -if (LOG.isDebugEnabled()) { - Map mask = new HashMap<>(); - mask.put(ExampleExtension.USERNAME_PARAM_CONFIG, "--username--"); - mask.put(ExampleExtension.PASSWORD_PARAM_CONFIG, "--password--"); - LOG.debug("Configuration:\n{}", ConfigFactory.parseMap(mask, "Masked") - .withFallback(config).root().render()); -} ----- - - - diff --git a/pom.xml b/pom.xml index 461c643..9cb76aa 100644 --- a/pom.xml +++ b/pom.xml @@ -4,7 +4,7 @@ com.cloudera.labs envelope - 0.4.0-SNAPSHOT + 0.4.0 jar Envelope