-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further vertex attributes #92
Comments
While implementing the possibility of active ranges for multiple data sources, the following consistence problem occured: For some calls, NA is added for no active ranges, which is consistent to other vertex attributes, but inconsistent to an empty list, which is added for some other calls and is semantically the best choice.
This leads to inconsistency in the function: if only commits are considered, the active ranges for Klaras commits are NA. If commits and mails are considered, they are an empty list. In my opinion, there are three possibilities:
I think I prefer the third solution. What do you think, @bockthom ? Do you have further ideas or reasons? |
Hm, that is not easy to decide. I see your point, @klaraschlueter.
I don't like solution 2 -- this is way too much overhead in implementation for such a little inconsistency issue. Solution 1 is not the best one -- I agree on that -- but it would be acceptable for me. Regarding solution 3: In the case that we would agree on solution 3, I would do it the other way round: Replace If there is no data available, I would either expect However, in the end, we should think about different interpretations: Is there a difference if the author does not exist in the data or whether (s)he is just inactive? If there would be a difference, can we somehow measure that difference? So, finally, I would go for the reverse version of solution 3 and always return a list of lists 😉 |
So, in summary, the default value should be used in all cases where no data is available for a developer:
And the default value is to be the empty list ( By the way, the default value does not get honored in the function If you need exemplary code on this, I can send you something. |
It is no problem to consider the default value in get.first.activity.data, the required changes are minimal, @clhunsen . However, in all other functions that add vertex attributes, the default value is only used for vertices where no data is available. This is done by passing it to split.and.add.vertex.attribute in the function add.vertex.attribute.first.activity, which is already implemented. Considering the default value instead of NA in get.first.activity.data means to use it in a slightly different case as well: If there is data for a vertice in at least one considered data source, but not in all considered data sources, and take.first.overall is FALSE. If for example the default value is given as list(), authorA was active in mails on 2019-21-01 but not active in commits and authorB was never active, the resulting vertex attributes per author are: In conclusion, I can easily "fix" this, but I'm not sure, if this is, what we want. |
It is possible to decide if an author is non-existent or inactive, if we consider not only the data sources about author activity, but also the graph to add the vertex attributes to. But again, this seems like to much overhead in implementation to me.
I am not sure, if you really think of a list of lists in every case, @bockthom ? In case there is no data for one data source for one author, it should be a simple empty list, shouldn't it? Besides, I will implement the reverse version of solution 3, as you suggested. |
I don't remember how we differentiate between using one data source and multiple data sources. So what was the behavior so far? |
Yes, @bockthom . The type of the vertex attribute only depends on the parameter take.first.over.all.activity.types in the first activity computation and on the equivalent parameter (to be implemented next) in the active ranges computation. If it is FALSE, the attribute is always a list containing for every specified data source the corresponding value. For first.activity, this is a date, for active.ranges, this is a list of ranges. |
With the merge of PR #169, the last item in this issue is fixed. Closing here. 🎉 |
To improve over PR #67, we should consider adding further vertex attributes:
commit.count.author.and.committer
andcommit.count.author.or.committer
(see here, fixed in PR Complement commit count vertex attribute possibilities #127)Related to here, we need to improve the
first.activity
attribute by enabling the user to pass several data sources. (fixed in PR Improve add.vertex.attribute.first.activity #135)Related to here, we need to improve the
active.ranges
attribute by incorporating further data sources. Maybe, we should add a parameteractivity.type
as foradd.vertex.attribute.first.activity
.Also count committers for
add.vertex.attribute.artifact.editor.count
(see here and Correctly incorporate committer data everywhere #84, see PR Add possibility of editor definition to add.vertex.attribute.artifact.editor.count #169)?Currently, we only provide vertex attributes for commits, but we definitely should provide functions to add attributes for e-mails and issues. For some further information, seehere.
(moved into a separate issue, see Add further vertex attributes for artifact vertices #170)
The text was updated successfully, but these errors were encountered: