Skip to content

Commit c72188e

Browse files
authored
Merge pull request #193 from nlschn/mydev
Add commit message merge functionality Reviewed-by: Claus Hunsen <[email protected]> Reviewed-by: Thomas Bock <[email protected]> Reviewed-by: Christian Hechtl <[email protected]>
2 parents b1eeaf6 + 18843a8 commit c72188e

File tree

14 files changed

+1008
-431
lines changed

14 files changed

+1008
-431
lines changed

.drone.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ steps:
6464

6565
- name: R-3.3
6666
pull: if-not-exists
67-
image: r-base:3.3.3
67+
image: r-base:3.3.2
6868
commands: *runTests
6969
depends_on: [clone]
7070

NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
## Unversioned
44

5+
### Added
6+
- Add functionality to read and process commit messages in order to merge them to the commit data (see issue #180). Three values are available for the new attribute `commit.messages` in `ProjectConf`: `none`, `title` and `messages` (PR #193, 85b1d0572c0fb9f4c062bceb1363b0398f98b85f, fdc414ade1a640f533e809a25cfe012e42b3cffa, 43e1894998e18faff3a65114fa65ee54e1d2f66e)
7+
- Add functions `cleanup.commit.message.data` and `cleanup.synchronicity.data` to remove commit hashes that are not any more present in the commit data from the commit message data or synchronicity data (PR #193, 98e83b037ecc88d9a29e8e4ca93598a9978e85a2)
8+
59
### Changed/Improved
610
- Add `.drone.yml` to enable running our CI pipelines on drone.io (PR #191, 1c5804b59c582cf34af6970b435add51452fbd11)
711

README.md

Lines changed: 53 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -10,41 +10,41 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur
1010

1111

1212
## Table of contents
13-
14-
- [Integration](#integration)
15-
* [Requirements](#requirements)
16-
* [R](#r)
17-
* [packrat (recommended)](#packrat)
18-
* [Folder structure of the input data](#folder-structure-of-the-input-data)
19-
* [Needed R packages](#needed-r-packages)
20-
* [Submodule](#submodule)
21-
* [Selecting the correct version](#selecting-the-correct-version)
22-
- [Functionality](#functionality)
23-
* [Configuration](#configuration)
24-
* [Data sources](#data-sources)
25-
* [Network construction](#network-construction)
26-
* [Data sources for network construction](#data-sources-for-network-construction)
27-
* [Types of networks](#types-of-networks)
28-
* [Relations](#relations)
29-
* [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
30-
* [Vertex and edge attributes](#vertex-and-edge-attributes)
31-
* [Further functionalities](#further-functionalities)
32-
* [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
33-
* [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
34-
* [Handling data independently](#handling-data-independently)
35-
* [How-to](#how-to)
36-
* [File/Module overview](#filemodule-overview)
37-
- [Configuration classes](#configuration-classes)
38-
* [ProjectConf](#projectconf)
39-
* [Basic information](#basic-information)
40-
* [Artifact-related information](#artifact-related-information)
41-
* [Revision-related information](#revision-related-information)
42-
* [Data paths](#data-paths)
43-
* [Splitting information](#splitting-information)
44-
* [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
45-
* [NetworkConf](#networkconf)
46-
- [License](#license)
47-
- [Work in progress](#work-in-progress)
13+
- [Integration](#integration)
14+
- [Requirements](#requirements)
15+
- [`R`](#r)
16+
- [`packrat` (recommended)](#packrat-recommended)
17+
- [Folder structure of the input data](#folder-structure-of-the-input-data)
18+
- [Needed R packages](#needed-r-packages)
19+
- [Submodule](#submodule)
20+
- [Selecting the correct version](#selecting-the-correct-version)
21+
- [Functionality](#functionality)
22+
- [Configuration](#configuration)
23+
- [Data sources](#data-sources)
24+
- [Network construction](#network-construction)
25+
- [Data sources for network construction](#data-sources-for-network-construction)
26+
- [Types of networks](#types-of-networks)
27+
- [Relations](#relations)
28+
- [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
29+
- [Vertex and edge attributes](#vertex-and-edge-attributes)
30+
- [Further functionalities](#further-functionalities)
31+
- [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
32+
- [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
33+
- [Handling data independently](#handling-data-independently)
34+
- [How-to](#how-to)
35+
- [File/Module overview](#filemodule-overview)
36+
- [Configuration classes](#configuration-classes)
37+
- [ProjectConf](#projectconf)
38+
- [Basic information](#basic-information)
39+
- [Artifact-related information](#artifact-related-information)
40+
- [Revision-related information](#revision-related-information)
41+
- [Data paths](#data-paths)
42+
- [Splitting information](#splitting-information)
43+
- [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
44+
- [NetworkConf](#networkconf)
45+
- [Contributing](#contributing)
46+
- [License](#license)
47+
- [Work in progress](#work-in-progress)
4848

4949

5050
## Integration
@@ -123,6 +123,7 @@ Alternatively, you can run `Rscript install.R` to install the packages.
123123
- `parallel`: For parallelization
124124
- `logging`: Logging
125125
- `sqldf`: For advanced aggregation of `data.frame` objects
126+
- `data.table`: For faster data processing
126127
- `testthat`: For the test suite
127128
- `patrick`: For the test suite
128129
- `ggplot2`: For plotting of data
@@ -179,11 +180,16 @@ There are two distinguishable types of data sources that are both handled by the
179180
* Issue data (called `"issues"` internally)
180181

181182
- Additional (orthogonal) data sources (augmentable to main data sources, not splittable)
183+
* Commit messages are available through the parameter `commit.messages` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class. Three values can be used:
184+
1. `none` is the default value and does not impact the configuration at all.
185+
2. `title` merges the commit message titles (i.e. the first non white space line of a commit message) to the commit data. This gives the data frame an additional column `title`.
186+
3. `messages` merges both titles and message bodies to the commit data frame. This adds two new columns `title` and `message`.
182187
* [PaStA](https://github.com/lfd/PaStA/) data (patch-stack analysis, see also the parameter `pasta` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class))
183188
* Patch-stack analysis to link patches sent to mailing lists and upstream commits
184189
* Synchronicity information on commits (see also the parameter `synchronicity` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class)
185190
* Synchronous commits are commits that change a source-code artifact that has also been changed by another author within a reasonable time-window.
186-
191+
192+
187193
The important difference is that the *main data sources* are used internally to construct artifact vertices in relevant types of networks. Additionally, these data sources can be used as a basis for splitting `ProjectData` in a time-based or activity-based manner – obtaining `RangeData` instances as a result (see file `split.R` and the contained functions). Thus, `RangeData` objects contain only data of a specific period of time.
188194

189195
The *additional data sources* are orthogonal to the main data sources, can augment them by additional information, and, thus, are not split at any time.
@@ -532,16 +538,23 @@ There is no way to update the entries, except for the revision-based parameters.
532538
- `commits.filter.untracked.files`
533539
* Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files.
534540
* [*`TRUE`*, `FALSE`]
535-
- `mails.filter.patchstack.mails`
536-
* Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
537-
'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
538-
* [`TRUE`, *`FALSE`*]
541+
- `commmit.messages`
542+
* Read and add commit messages to commits. The column `title` will contain the first line of the message and, if selected, the column `message` will contain the rest.
543+
* [*`none`*, `title`, `messages`]
539544
- `issues.only.comments`
540545
* Only use comments from the issue data on disk and no further events such as references and label changes
541546
* [*`TRUE`*, `FALSE`]
542547
- `issues.from.source`
543548
* Choose from which sources the issue data on disk is read in. Multiple sources can be chosen.
544549
* [*`github`, `jira`*]
550+
- `mails.filter.patchstack.mails`
551+
* Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
552+
'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
553+
* [`TRUE`, *`FALSE`*]
554+
- `pasta`
555+
* Read and integrate [PaStA](https://github.com/lfd/PaStA/) data with commit and mail data (columns `pasta` and `revision.set.id`)
556+
* [`TRUE`, *`FALSE`*]
557+
* **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`.
545558
- `synchronicity`
546559
* Read and add synchronicity data to commits (column `synchronicity`)
547560
* [`TRUE`, *`FALSE`*]
@@ -550,10 +563,6 @@ There is no way to update the entries, except for the revision-based parameters.
550563
* The time-window (in days) to use for synchronicity data if enabled by `synchronicity = TRUE`
551564
* [1, *5*, 10, 15]
552565
* **Note**: If, at least, one artifact in a commit has been edited by more than one developer within the configured time window, then the whole commit is considered to be synchronous.
553-
- `pasta`
554-
* Read and integrate [PaStA](https://github.com/lfd/PaStA/) data with commit and mail data (columns `pasta` and `revision.set.id`)
555-
* [`TRUE`, *`FALSE`*]
556-
* **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`.
557566

558567
### NetworkConf
559568

install.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ packages = c(
3232
"parallel",
3333
"logging",
3434
"sqldf",
35+
"data.table",
3536
"testthat",
3637
"patrick",
3738
"ggplot2",
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
2+
32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";" Add some more stuff "
3+
32710;"3a0ed78458b3976243db6829f63eba3eead26774";" I added important things the things are nothing"
4+
32714;"1143db502761379c2bfcecc2007fc34282e7ee61";" I wish it would work now"
5+
32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish intensifies"
6+
32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";" ... still doesn't work as expected "
7+
32711;"0a1a5c523d835459c42f33e863623138555e2526";""
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
2+
32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";" Add some more stuff "
3+
32710;"3a0ed78458b3976243db6829f63eba3eead26774";" I added important things the things are nothing"
4+
32714;"1143db502761379c2bfcecc2007fc34282e7ee61";" I wish it would work now"
5+
32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish intensifies"
6+
32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";" ... still doesn't work as expected "
7+
32711;"0a1a5c523d835459c42f33e863623138555e2526";""

0 commit comments

Comments
 (0)