se-sic
diff --git a/‎.drone.yml
Lines changed: 1 addition & 1 deletion b/‎.drone.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎NEWS.md
Lines changed: 4 additions & 0 deletions b/‎NEWS.md
Lines changed: 4 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 53 additions & 44 deletions b/‎README.md
Lines changed: 53 additions & 44 deletions
diff --git a/‎install.R
Lines changed: 1 addition & 0 deletions b/‎install.R
Lines changed: 1 addition & 0 deletions
diff --git a/‎tests/codeface-data/results/testing/test_feature/feature/commitMessages.list
Lines changed: 7 additions & 0 deletions b/‎tests/codeface-data/results/testing/test_feature/feature/commitMessages.list
Lines changed: 7 additions & 0 deletions
diff --git a/‎tests/codeface-data/results/testing/test_proximity/proximity/commitMessages.list
Lines changed: 7 additions & 0 deletions b/‎tests/codeface-data/results/testing/test_proximity/proximity/commitMessages.list
Lines changed: 7 additions & 0 deletions
@@ -64,7 +64,7 @@ steps:
 
 - name: R-3.3
   pull: if-not-exists
-  image: r-base:3.3.3
+  image: r-base:3.3.2
   commands: *runTests
   depends_on: [clone]
 
 
@@ -2,6 +2,10 @@
 
 ## Unversioned
 
+### Added
+- Add functionality to read and process commit messages in order to merge them to the commit data (see issue #180). Three values are available for the new attribute `commit.messages` in `ProjectConf`: `none`, `title` and `messages` (PR #193, 85b1d0572c0fb9f4c062bceb1363b0398f98b85f, fdc414ade1a640f533e809a25cfe012e42b3cffa, 43e1894998e18faff3a65114fa65ee54e1d2f66e)
+- Add functions `cleanup.commit.message.data` and `cleanup.synchronicity.data` to remove commit hashes that are not any more present in the commit data from the commit message data or synchronicity data (PR #193, 98e83b037ecc88d9a29e8e4ca93598a9978e85a2)
+
 ### Changed/Improved
 - Add `.drone.yml` to enable running our CI pipelines on drone.io (PR #191, 1c5804b59c582cf34af6970b435add51452fbd11)
 
 
@@ -10,41 +10,41 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur
 
 
 ## Table of contents
-
-- [Integration](#integration)
-    * [Requirements](#requirements)
-        * [R](#r)
-        * [packrat (recommended)](#packrat)
-        * [Folder structure of the input data](#folder-structure-of-the-input-data)
-        * [Needed R packages](#needed-r-packages)
-    * [Submodule](#submodule)
-    * [Selecting the correct version](#selecting-the-correct-version)
-- [Functionality](#functionality)
-    * [Configuration](#configuration)
-    * [Data sources](#data-sources)
-    * [Network construction](#network-construction)
-        * [Data sources for network construction](#data-sources-for-network-construction)
-        * [Types of networks](#types-of-networks)
-        * [Relations](#relations)
-        * [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
-        * [Vertex and edge attributes](#vertex-and-edge-attributes)
-    * [Further functionalities](#further-functionalities)
-        * [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
-        * [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
-        * [Handling data independently](#handling-data-independently)
-    * [How-to](#how-to)
-    * [File/Module overview](#filemodule-overview)
-- [Configuration classes](#configuration-classes)
-    * [ProjectConf](#projectconf)
-        * [Basic information](#basic-information)
-        * [Artifact-related information](#artifact-related-information)
-        * [Revision-related information](#revision-related-information)
-        * [Data paths](#data-paths)
-        * [Splitting information](#splitting-information)
-        * [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
-    * [NetworkConf](#networkconf)
-- [License](#license)
-- [Work in progress](#work-in-progress)
+  - [Integration](#integration)
+    - [Requirements](#requirements)
+      - [`R`](#r)
+      - [`packrat` (recommended)](#packrat-recommended)
+      - [Folder structure of the input data](#folder-structure-of-the-input-data)
+      - [Needed R packages](#needed-r-packages)
+    - [Submodule](#submodule)
+    - [Selecting the correct version](#selecting-the-correct-version)
+  - [Functionality](#functionality)
+    - [Configuration](#configuration)
+    - [Data sources](#data-sources)
+    - [Network construction](#network-construction)
+      - [Data sources for network construction](#data-sources-for-network-construction)
+      - [Types of networks](#types-of-networks)
+      - [Relations](#relations)
+      - [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
+      - [Vertex and edge attributes](#vertex-and-edge-attributes)
+    - [Further functionalities](#further-functionalities)
+      - [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
+      - [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
+      - [Handling data independently](#handling-data-independently)
+    - [How-to](#how-to)
+    - [File/Module overview](#filemodule-overview)
+  - [Configuration classes](#configuration-classes)
+    - [ProjectConf](#projectconf)
+      - [Basic information](#basic-information)
+      - [Artifact-related information](#artifact-related-information)
+      - [Revision-related information](#revision-related-information)
+      - [Data paths](#data-paths)
+      - [Splitting information](#splitting-information)
+      - [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
+    - [NetworkConf](#networkconf)
+  - [Contributing](#contributing)
+  - [License](#license)
+  - [Work in progress](#work-in-progress)
 
 
 ## Integration
@@ -123,6 +123,7 @@ Alternatively, you can run `Rscript install.R` to install the packages.
 - `parallel`: For parallelization
 - `logging`: Logging
 - `sqldf`: For advanced aggregation of `data.frame` objects
+- `data.table`: For faster data processing
 - `testthat`: For the test suite
 - `patrick`: For the test suite
 - `ggplot2`: For plotting of data
@@ -179,11 +180,16 @@ There are two distinguishable types of data sources that are both handled by the
     * Issue data (called `"issues"` internally)
 
 - Additional (orthogonal) data sources (augmentable to main data sources, not splittable)
+    * Commit messages are available through the parameter `commit.messages` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class. Three values can be used:
+        1. `none` is the default value and does not impact the configuration at all.
+        2. `title` merges the commit message titles (i.e. the first non white space line of a commit message) to the commit data. This gives the data frame an additional column `title`.
+        3. `messages` merges both titles and message bodies to the commit data frame. This adds two new columns `title` and `message`.
     * [PaStA](https://github.com/lfd/PaStA/)  data (patch-stack analysis, see also the parameter `pasta` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class))
         * Patch-stack analysis to link patches sent to mailing lists and upstream commits
     * Synchronicity information on commits (see also the parameter `synchronicity` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class)
         * Synchronous commits are commits that change a source-code artifact that has also been changed by another author within a reasonable time-window.
-
+   
+   
  The important difference is that the *main data sources* are used internally to construct artifact vertices in relevant types of networks. Additionally, these data sources can be used as a basis for splitting `ProjectData` in a time-based or activity-based manner – obtaining `RangeData` instances as a result (see file `split.R` and the contained functions). Thus, `RangeData` objects contain only data of a specific period of time.
 
  The *additional data sources* are orthogonal to the main data sources, can augment them by additional information, and, thus, are not split at any time.
@@ -532,16 +538,23 @@ There is no way to update the entries, except for the revision-based parameters.
 - `commits.filter.untracked.files`
     * Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files.
     * [*`TRUE`*, `FALSE`]
-- `mails.filter.patchstack.mails`
-    * Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
-'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
-    * [`TRUE`, *`FALSE`*]
+- `commmit.messages`
+  * Read and add commit messages to commits. The column `title` will contain the first line of the message and, if selected, the column `message` will contain the rest.
+  * [*`none`*, `title`, `messages`]
 - `issues.only.comments`
     * Only use comments from the issue data on disk and no further events such as references and label changes
     * [*`TRUE`*, `FALSE`]
 - `issues.from.source`
     * Choose from which sources the issue data on disk is read in. Multiple sources can be chosen.
     * [*`github`, `jira`*]
+- `mails.filter.patchstack.mails`
+    * Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
+'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
+    * [`TRUE`, *`FALSE`*]
+- `pasta`
+    * Read and integrate [PaStA](https://github.com/lfd/PaStA/) data with commit and mail data (columns `pasta` and `revision.set.id`)
+    * [`TRUE`, *`FALSE`*]
+    * **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`.
 - `synchronicity`
     * Read and add synchronicity data to commits (column `synchronicity`)
     * [`TRUE`, *`FALSE`*]
@@ -550,10 +563,6 @@ There is no way to update the entries, except for the revision-based parameters.
     * The time-window (in days) to use for synchronicity data if enabled by `synchronicity = TRUE`
     * [1, *5*, 10, 15]
     * **Note**: If, at least, one artifact in a commit has been edited by more than one developer within the configured time window, then the whole commit is considered to be synchronous.
-- `pasta`
-    * Read and integrate [PaStA](https://github.com/lfd/PaStA/) data with commit and mail data (columns `pasta` and `revision.set.id`)
-    * [`TRUE`, *`FALSE`*]
-    * **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`.
 
 ### NetworkConf
 
 
@@ -32,6 +32,7 @@ packages = c(
     "parallel",
     "logging",
     "sqldf",
+    "data.table",
     "testthat",
     "patrick",
     "ggplot2",
 
@@ -0,0 +1,7 @@
+32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
+32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";"    Add some more stuff     "
+32710;"3a0ed78458b3976243db6829f63eba3eead26774";"     I added important things     the things are     nothing"
+32714;"1143db502761379c2bfcecc2007fc34282e7ee61";"         I wish it would work now" 
+32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish     intensifies"
+32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";"    ...     still     doesn't     work          as expected     " 
+32711;"0a1a5c523d835459c42f33e863623138555e2526";""
@@ -0,0 +1,7 @@
+32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
+32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";"    Add some more stuff     "
+32710;"3a0ed78458b3976243db6829f63eba3eead26774";"     I added important things     the things are     nothing"
+32714;"1143db502761379c2bfcecc2007fc34282e7ee61";"         I wish it would work now" 
+32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish     intensifies"
+32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";"    ...     still     doesn't     work          as expected     " 
+32711;"0a1a5c523d835459c42f33e863623138555e2526";""