Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create GET /ideas?filter[highlyRated]=bottomRateLimit #56

Merged
merged 1 commit into from
Apr 5, 2018

Conversation

agatatalita
Copy link
Member

@agatatalita agatatalita commented Mar 31, 2018

AQL query takes me 90% of the time but hopefully, it will get faster.

Hope it will not make much mess. I created it from commented by and was rebasing while making changes on other branches, still got some conflicts.

@agatatalita
Copy link
Member Author

This is a great work.
But are you, @mrkvon , comfortable with closing it eventually and merging it to production?

Copy link
Member

@mrkvon mrkvon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @agatatalita.
There are a few minor proposed changes which we could discuss.
I'd be happy to implement those which you'd approve, if you're not motivated to do so.

async function getIdeasHighlyRated(req, res, next) {
try {
// gather data
const { page: { offset = 0, limit = 5 } = { } } = req.query;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should definitely put these values to a config (perhaps different PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I am comfortable with doing in some pull request, maybe when ideas are done (for this sprint)?

apidoc.raml Outdated
@@ -373,6 +373,7 @@ types:
- new ideas: `?sort=-created`
- ideas with provided creators: `?filter[creators]=username0,username1,username2`
- ideas commented by provided users: `?filter[commentedBy]=username0,username1,username2`
- highly rated ideas with minimum rate parameter: `?filter[highlyRated]=bottomValueLimit`
Copy link
Member

@mrkvon mrkvon Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the use case for bottomValueLimit limit? We sort the output by sum of votes anyways?

  2. Could i.e. highlyVoted be clearer naming than highlyRated? And talk about sum of votes rather than rate which doesn't carry any specific meaning?

Regarding point 2, I suggest using highlyVoted instead of highlyRated, and voteSum instead of rate throughout this PR.
If @agatatalita feels in favor of these changes but not motivated to do them, i'll do.

Copy link
Member Author

@agatatalita agatatalita Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the first question. Is it a separate question?
I don't understand the second question.

Is the naming part separate from the rest? If so, we can leave it and I will change the naming for the one you suggested and focus on the two first @mrkvon .

Copy link
Member

@mrkvon mrkvon Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the first question. Is it a separate question?
I don't understand the second question.

Edited the comment to be clearer.

I want to understand a use case for different bottomValueLimit. I probably need an example when it would be useful to have the limit.
Why?

  1. We will always get the highest voted ideas on page 1. It's up to user/frontend how far into the pages they want to go.
  2. The limit different than 0 doesn't have a constant impact (i.e. at the beginning 5 can be high, but later (when everybody uses ditup and votes) 100 can be not enough)

Is the naming part separate from the rest?

Yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We would want to list 'the most popular' ideas which would be those with minimum 100 votes. I understand it would give us the similar result to 'get 20 first ideas most voted' but maybe we really want to list just 100+ ideas or ideas with positive votes or ideas which got more interest than ours.

// set bottom limit of rate
FILTER rate >= @rateBottomLimit
// find creator
LET c = (DOCUMENT(id.creator))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for self: we should DRY this idea enriching. It's repeated over and over in queries for ideas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree here. Sometimes it is nice to have all the code in one place where you can look at it and not look for it in many files.

// get sum of each idea's votes values
AGGREGATE rate = SUM(vote.value)
// set bottom limit of rate
FILTER rate >= @rateBottomLimit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not redundant, if we sort by rate anyways? See the comment above (at .raml file).

Copy link
Member Author

@agatatalita agatatalita Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sort by rate but we may not be willing to show ideas with 0 or lower than zero rates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we actually want to restrict people from seeing low voted ideas? Why?

Well, maybe it doesn't matter. I might be just bikeshedding.

Might be a cliche, but: we should make things as simple as possible, but not simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is sometimes not necessary to discuss things like this and just let them be if it is really one documented variable which if clear what is it for.
And dry for being dry or simple for being simple, not for saving time and energy for a different work.
Is there any case this variable can be harmful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I totally understand your point @agatatalita. I'm nitpicking here.
I don't understand the strong resistance though.

I believe if something can be simplified, it should be. I'm willing to make the changes myself, only if I get approval for them.

Unless we leak some information, the read-only can't be harmful.
The only meaningful value is 0, unless we want to add a user interface for this, which I doubt. (Or do we?) And then it limits our ability to see negatively voted ideas in this list.

If you're still not convinced, I'm leaving this discussion and we keep it as it is.

Copy link
Member Author

@agatatalita agatatalita Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave a few examples when it can be useful. Why would I be writing it again later if I was able to put the parameter now or that was my imagination of the use case?

Generally also the name highlyVotedIdeas can be confusing. Because we are actually listing all of the ideas, not just highly voted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name highlyVotedIdeas can be confusing

This is true, actually. I haven't thought about this. Maybe it should be ?sort=-voteSum or similar. Then we could also have ?sort=voteSum to find the absolutely worst ideas. :)
Maybe also random should be ?sort=random.
Not sure if this matters. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all of this filtering and sorting will be rewritten at some point (because it is quite bothering to be not able to connect sort with filters and I believe people may expect this - so there will be one filtering sorting function or some function for creating queries probably?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agatatalita I apologize for the fuss around this. I realize it really is a minor thing. It just takes time for it to fit my rigid mindset. :)

I don't know what future holds, but indeed some nice generalizing refactor may be beneficial at some point.

@@ -7,7 +7,8 @@ const config = require(path.resolve('./config'));

const ideaSerializer = new Serializer('ideas', {
id: 'id',
attributes: ['title', 'detail', 'created', 'creator', 'ideaTags'],
// TODO where to put rate in this serialization
attributes: ['title', 'detail', 'created', 'creator', 'ideaTags', 'rate'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an impression that rate or whatever we want sum of votes to be called, should belong to meta. See the related issue #59.

It can be implemented with jsonapi-serializer something like this:

{
  dataMeta {
    voteSum(record, current) {
      return current.rate;
    }
  }
}

The examples are here: https://github.com/SeyZ/jsonapi-serializer/blob/master/test/serializer.js#L254
Are you opinionated on this, @agatatalita ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I actually wanted to include it to meta.

FOR vote IN votes
FILTER idea._id == vote._to
// group by idea id
COLLECT id = idea
Copy link
Member

@mrkvon mrkvon Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming idea id may be confusing. How about i or collectedIdea or renaming idea to i above and using idea here? Just id is confusing with actual id of something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, to be changed.

static async findHighlyRated(rateBottomLimit, { offset, limit }) {
const query = `
FOR idea IN ideas
FOR vote IN votes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of writing this may be

FOR idea IN ideas
  LET ideaVotes = (FOR v, e IN 1..1 INBOUND idea, INBOUND votes RETURN e) // graph traversal and assigning to ideaVotes variable
  LET voteSum = SUM(ideaVotes[*].value) // simplifying aggregation
  ...

I'd really like to understand which of these would have better performance, if any difference. Now I simply don't know.

This feels a bit more efficient to me, but it's hard to judge without the internal knowledge of Arango. Perhaps we could ask on stackoverflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a need for performance tests - I will happily test it. Also may ask the question on the stack - after tests. Probably also you can use 'explain' for AQL and this may give some information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I also believe the current query will filter out all the ideas which haven't received any votes. So at least a subset of my proposed changes should be applied.

i.e. not FOR idea IN ideas FOR vote IN votes... but FOR idea IN ideas LET ideaVotes=(*subquery*)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'..depends on the one hand which queries you want to perform and the amount of data you have. The general rule of thumb is, that if you do not know the depth of you search, then a graph traversals are likely to be faster than joins. And if you have a known or fixed depth joins are normally faster.' from Arango slack regarding JOINS vs. GRAPHS

And this 'LEFT JOIN' case can be actualy easier to do with graph, I will look at it.

@mrkvon
Copy link
Member

mrkvon commented Apr 2, 2018

Why should I be not comfortable with eventually merging this to master, @agatatalita ?

Copy link
Member

@mrkvon mrkvon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agatatalita if you approve my fixes, please squash and merge to production. :) I'm happy with this PR as it is.

	* create GET /ideas?filter[highlyRated]=bottomRateLimit

	* renaming rate => voteSum, putting voteSum to meta in response

	* rename + fix in query (we were ignoring ideas with 0 votes before)
@agatatalita agatatalita merged commit 45c7114 into master Apr 5, 2018
@mrkvon
Copy link
Member

mrkvon commented Apr 5, 2018

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants