Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionalities to analyze the content of commit messages #281

Open
wants to merge 18 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur
- [Core/Peripheral classification](#coreperipheral-classification)
- [Count-based metrics](#count-based-metrics)
- [Network-based metrics](#network-based-metrics)
- [Commit-message functionalities](#commit-message-functionalities)
- [How-to](#how-to)
- [File/Module overview](#filemodule-overview)
- [Configuration classes](#configuration-classes)
Expand Down Expand Up @@ -147,6 +148,9 @@ Alternatively, you can run `Rscript install.R` to install the packages.
- `Matrix`: For sparse matrix representation of large adjacency matrices (package version `1.3.0` or higher is required)
- `fastmap`: For fast implementation of a map
- `purrr`: For fast implementation of a mapping function
- `tm`: For NLP tasks used on commit messages
- `textstem`: For lemmatization of commit messages
- `SnowballC`: For text stemming, used by NLP package `tm`

### Submodule

Expand Down Expand Up @@ -429,6 +433,10 @@ In this section, we provide descriptions of the different algorithms we provide
* calculates scores based on the eccentricity of vertices in a network
* eccentricity measures the length of the shortest path to each vertex's furthest reachable vertex

#### Commit-message functionalities

In this section, we give an overview of the functionalities we offer regarding the textual analysis of commit messages. These consist of basic NLP tasks, such as stemming (`get.stemmed.commit.messages`), tokenization (`get.tokenized.commit.messages`), and lemmatization (`get.lmmatized.commit.messages`), as well as preprocessing steps (`get.preprocessed.commit.messages`) such as lowercase transformation and removal of punctuation, stopwords, and extra whitespaces. Apart from these, there is the option of searching for a set of strings in commit messages for matching commits (`get.commit.messages.by.strings`) as well as getting token counts for commit messages (`get.commit.message.counts`).

### How-to

In this section, we give a short example on how to initialize all needed objects and build a bipartite network.
Expand Down
16 changes: 11 additions & 5 deletions install.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
## Copyright 2020-2024 by Thomas Bock <[email protected]>
## Copyright 2019 by Anselm Fehnker <[email protected]>
## Copyright 2021 by Christian Hechtl <[email protected]>
## Copyright 2024 by Leo Sendelbach <[email protected]>
## Copyright 2024-2025 by Leo Sendelbach <[email protected]>
## Copyright 2024 by Maximilian Löffler <s8@[email protected]>
## All Rights Reserved.
##
Expand Down Expand Up @@ -48,6 +48,9 @@ packages = c(
"Matrix",
"fastmap",
"purrr",
"tm",
"textstem",
"SnowballC",
"testthat",
"patrick",
"covr"
Expand Down Expand Up @@ -76,11 +79,14 @@ if (length(p) > 0) {
}

Matrix.version = installed.packages()[rownames(installed.packages()) == "Matrix", "Version"]
if (compareVersion(Matrix.version, "1.3.0") == -1) {
print("WARNING: Matrix version 1.3.0 or higher is necessary for using coronet. Re-install package Matrix...")
install.packages("Matrix", dependencies = NA, verbose = TRUE, quiet = TRUE)
if (compareVersion(Matrix.version, "1.5.0") == -1) {
print("WARNING: Matrix version 1.5.0 or higher is necessary for using coronet. Re-install package Matrix...")
matrix.1.5.0.url = "https://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.5-0.tar.gz"
install.packages(matrix.1.5.0.url, repos = NULL, dependencies = NA, verbose = TRUE, quiet = TRUE)
## redo installation of textstem, which fails if matrix is outdated or not present
install.packages("textstem", dependencies = NA, verbose = TRUE, quiet = TRUE)
Matrix.version = installed.packages()[rownames(installed.packages()) == "Matrix", "Version"]
if (compareVersion(Matrix.version, "1.3.0") == -1) {
if (compareVersion(Matrix.version, "1.5.0") == -1) {
print("WARNING: Re-installation of package Matrix did not end up in the necessary package version.")
}
}
Expand Down
Loading
Loading