Skip to content
This repository has been archived by the owner on Jul 6, 2020. It is now read-only.

Article sorting of feeds in folders messed up #836

Open
t0mcat1337 opened this issue Aug 13, 2015 · 29 comments
Open

Article sorting of feeds in folders messed up #836

t0mcat1337 opened this issue Aug 13, 2015 · 29 comments

Comments

@t0mcat1337
Copy link

Hi everybody,

I have multiple feeds organized in folders. When clicking on the folder title all (unread) articles of all feeds in this folder are displayed as expected, BUT: the sort order (newest to oldest) is messed up... (see screenshot).
sortorder

Bug or feature? ;)

Environment:
Owncloud 8.1.1 (stable)
News 5.3.9
Browser: Firefox 40 (or any other)
PHP 5.5.9

Greetings
T0mc@

@BernhardPosselt
Copy link
Contributor

Feature, articles are sorted by date added (id) which is a requirement for fast auto paging. This happens only in the beginning, afterwards it should basically be sorted by date.

Search the bug tracker for a full explanation ;)

@t0mcat1337
Copy link
Author

So I got u right that articles are not really sorted by date (like a "Select * from ...order by date DESC") when they r displayed but only when they r inserted to the DB what makes the primary key (I think that is what u mean with ID) the sorting thing?
If this is right I see that sorting as I would expect in folders is excluded by this design right now.
So it is absolutly impossible to make sorting more flexible? Adding and/or create an Index to the date field in the DB is not an option?

I could imagine that more people r confused about this behaviour as they see articles not sorted by date despite the app announces them to be

@BernhardPosselt
Copy link
Contributor

see #115

The issue is basically that the offset is invalidated when you mark items read. The only solution I can see is to load everything which would defeat the purpose of autopaging and make the app very slow

@BernhardPosselt
Copy link
Contributor

#812 (comment)

@t0mcat1337
Copy link
Author

I'm currently trying to dig into the news app code in special and Ownclouds AppFramework in common as I have common experiences in PHP/Ajax aso. development for a couple of years now.
Actually I'm thinking of something like a concatenated ID Field in the "_news_items" Table.. somehow this way: New_ID = "pub_date-id" ('1439996640-237113') which could be used as the unique id field itself or just something like an "orderHelper"
Could this be a valid approach?

I try to make those changes by myself actually but as I've just started right now I'm fighting a little with the AppFramework ("findEntities" ;) )

@BernhardPosselt
Copy link
Contributor

Yes, would be a valid approach :)

@BernhardPosselt
Copy link
Contributor

GZ :)

@BernhardPosselt
Copy link
Contributor

Hm, basically you could also

ORDER BY pub_date, id DESC

I've tried that once, however it didn't work out. Can't remember the catch anymore

@BernhardPosselt
Copy link
Contributor

Ah right, that issue was the offset because you can't do:

WHERE pub_date < 123 OR IF pub_date EQUAL < id

Your approach would solve that

@BernhardPosselt
Copy link
Contributor

Some thoughts: since you are combining 2 64 bit long numbers (20 digits), you'd need to pad them with zeros on both ends to avoid a string sorting like this:

  • 1
  • 10
  • 2
  • 3
  • 30
  • 41
  • 5

Combining both padded numbers would therefore turn into a 40 characters long string. Your example (1439996640-237113) would therefore need to become 0000000000143999664000000000000000237113.

@BernhardPosselt
Copy link
Contributor

findEntities does nothing more than map the result from the sql query (assoc array) onto the data objects by using the entity's fromRow method: https://github.com/owncloud/core/blob/master/lib/public/appframework/db/mapper.php#L318

To test your stuff, add a new 40 characters wide text field much like the guid_hash https://github.com/owncloud/news/blob/master/appinfo/database.xml#L211 to the same file and increase the version number to trigger the database migration (e.g. 6.0.2): https://github.com/owncloud/news/blob/master/appinfo/info.xml#L10 :

<field>
    <name>sort_order</name>
    <type>text</type>
    <length>40</length>
    <notnull>false</notnull>
</field>

Then add the field in pascalCase to the item entitiy: https://github.com/owncloud/news/blob/master/db/item.php#L59 :

protected $sortOrder

Finally each time when a new item is inserted:

you'd also need to update the field, e.g.:

$item = $this->itemMapper->insert($item);
$item->generateSortOrder();
$item = $this->itemMapper->update($item);

the generateSortOrder method would ofc have to be created on the Item class first and would look something like this:

public function generateSortOrder() {
    $this->sortOrder = str_pad($this->pubDate, 20, STR_PAD_LEFT) . str_pad($this->id, 20, STR_PAD_LEFT);
}

Once you hit the JS part, notify me

@t0mcat1337
Copy link
Author

sorry for my delay but I was late at home yesterday ;)
Lets see what u wrote:

Hm, basically you could also
ORDER BY pub_date, id DESC
I've tried that once, however it didn't work out. Can't remember the catch anymore

This was the first thing I tried, but then the articles weren't sorted by date in the GUI despite they where, when I fired the resulting SQL directly to the DB. So I thought that there must be some mechanism which sorts the results from the DB afterwards (the JSON Object - as seen in Firebug - was sorted by ID) which tooks me to the "findEntities"...

Some thoughts: since you are combining 2 64 bit long numbers (20 digits), you'd need to pad them >with zeros on both ends to avoid a string sorting like this:

1
10
2
3
30
41
5

Combining both padded numbers would therefore turn into a 40 characters long string. Your >example (1439996640-237113) would therefore need to become >0000000000143999664000000000000000237113.

If one don't need a seperator (like the "-") couldn't this field be a BIGINT? And as the first part is the unix epoch which will always be the size of 10 digits (for a very long time ;) ) will this really be necessary?

findEntities does nothing more than map the result from the sql query (assoc array) onto the data >objects by using the entity's fromRow method: https://github.com/owncloud/core/blob/master
/lib/public/appframework/db/mapper.php#L318

So "findEntities" doesn't do some sort of sorting? Like "sort by the first field in the resulting dataset"? I'm still wondering why the articels in the GUI where sorted different, than when firing directly to the DB (as described above)...

Thanks a lot for the little documentation about how using this stuff. As I said, never worked with this before ;)
Then lets give it a try (hoping I will manage this ;) )

@BernhardPosselt
Copy link
Contributor

BIGINT is usually a 64bit number, check mysql docs:

BIGINT[(M)] [UNSIGNED] [ZEROFILL] A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.

@BernhardPosselt
Copy link
Contributor

@BernhardPosselt
Copy link
Contributor

@t0mcat1337
Copy link
Author

sooo... what I managed so far:

1.) Inserted a new field "sort_helper" to the table oc_news_items (BIGINT, Not signed)
2.) Manually updated this field, setted it to: CONCAT(pub_date,id) for the whole table. Looks like this now f.e.: '1439969520236639'
3.) Made the XML changes u mentioned above and so forced an OC Update
4.) Changed the SQL in "makeSelectQuery" to : ....'ORDER BY items.sort_helper ' . $ordering;
5.) Debug "findAllFolder" by inserting some fwrites and found out, that the resulting object of "findEntitiesIgnoringNegativeLimit" now is in the right chronological order, as the SQL would assume.
6.) Debugged the same way step by step till the index function of the itemcontroller. At this point the sorting is still correct as in the SQL.

BUT: the articles (items) in the GUI stay NOT sorted in chronological order though. So I assume, now something on the client side (some js) is responsible for this re-sorting, right?

@t0mcat1337
Copy link
Author

I just realized, that the "itemController" correctly includes the new field "sortHelper":

.
.
[items] => Array
        (
            [0] => OCA\News\Db\Item Object
                (
                    [guidHash:protected] => 4b4b121f7ac464f0a23ea0e5b56934ef
                    [guid:protected] => 4b4b121f7ac464f0a23ea0e5b56934ef
                   .
                   .
                    [pubDate:protected] => 1439969520
                   .
                   .
                    [lastModified:protected] => 1439973912
                    [sortHelper:protected] => 1439969520236639
                   .
                   .
                )

BUT the JSON Object in Firefox doesn't??

In between times I simply tried to change these lines:

https://github.com/owncloud/news/blob/master/templates/part.content.php#L24
and
https://github.com/owncloud/news/blob/master/js/controller/ContentController.js#L89

from "id" to "pubDate" (just to give it a try ;) ) but of course this didn't work... I got an ugly JS Error concerning something in Angular that I'm far away from understanding.

@t0mcat1337
Copy link
Author

Ok.... after fighting a little with this Angular stuff without any really success I simply deleted this:

orderBy:[Content.orderBy()] track by item.id"

here
https://github.com/owncloud/news/blob/master/templates/part.content.php#L24

and now - together with my formerly mentioned changes - the GUI seems to respect the chronological order from the origin SQL...

Don't really know if this is a really valid solution... but could be an approach? :S
Will hold an eye on it and see how it now behaves...

@t0mcat1337
Copy link
Author

Perhaps simplyfied this a little:
By changing the SQL in ItemMapper.makeSelectQuery this way:

...ORDER BY CONCAT(items.pub_date,items.id)

there seems to be no need of an extra field in the DB...

@BernhardPosselt
Copy link
Contributor

Concat could work but you'd still need to pad it with zeroes and I think the sorting by varchar is probably already slow enough.

As for the GUI the orderBy is probably not needed since everything gets reset on reload and we add new stuff in order.

@t0mcat1337
Copy link
Author

A right, CONCATs result is a string.. ok.. then lets cast it before sorting like this:

order by cast(concat(pub_date,id) as UNSIGNED)

Tested it in a little demo table with 2 integer fields and numbers with different length. Should work....

@BernhardPosselt
Copy link
Contributor

Well, that won't work since you're overflowing ;)

Like I said you need to use a varchar, but you need to pad it with 0 to deal with the unwanted string sorting algorithms.

@t0mcat1337
Copy link
Author

Overflowing could be indeed an impact... but IMHO just theoretically because:
Unsigned BIGINT in MySQL can be max

18.446.744.073.709.551.615

In my example the casted INT will be:

1.439.969.520.236.639

far away from the maximum.

When I'm thinking right, the last valid ID in the item table could then be

9.999.999.999

Reaching the ID 10.000.000.000 will result in an overflow.

Is it realistic that this will ever happen?

Verified this with my test table (one INT field as PK named 'ID', two BIGINT fields named 'inta' and 'intb') with a few rows of example data like occuring in the oc_news_items. This is my result:

SELECT cast(concat(inta,intb) as UNSIGNED) t,inta,intb 
FROM test
order by t

results in :

t inta intb
144014847223 1440148472 23
1440148472238396 1440148472 238396
1440148472238397 1440148472 238397
14401484722383970 1440148472 2383970
14401484729999999999 1440148472 9999999999
18446744073709551615 1440148472 10000000000

As u can see, the concat overflows in the last row, giving the max possible BIGINT value as result...

Whart do u think about that?

@BernhardPosselt
Copy link
Contributor

Yes an overflow is actually possible on larger installations that run more than 5 years. Out of the roughly 20 digits 10 are occupied for Unix time. The rest is basically 10 billion possible id's. Let's say the install runs 10 years, that's 120 months, so we're left with 80 million ids per month. Assume an install with 20000 users, then you're left with 4000 articles per month and per user. Let's say an average feed has 500 articles per month then only having 8 feeds per user would be enough to make it overflow in 10 years. If there are more feeds than it even happens sooner

@BernhardPosselt
Copy link
Contributor

Another thing: I did some calculations a while ago and I realized that 32 bit integer ids were too little so I upped to 64 bit. Given that unixtime overflows a 32bit integer in 2038 (google 2038 year problem) you can see that this is a bad idea ;D

@t0mcat1337
Copy link
Author

Yes an overflow is actually possible on larger installations that run more than 5 years.
Out of the roughly 20 digits 10 are occupied for Unix time. The rest is basically 10 billion possible id's. Let's say the install runs 10 years, that's 120 months, so we're left with 80 million ids per month. Assume an install with 20000 users, then you're left with 4000 articles per month and per user. Let's say an average feed has 500 articles per month then only having 8 feeds per user would be enough to make it overflow in 10 years. If there are more feeds than it even happens sooner

Which leads me to my origin quistion: Is such a scenario really realistic? 10 years run time? Without any re-installation? 20000 users? each with 500 articles per month?
What brings me to another question: What about houskeeping? Will the articles stay in the table for ever? When there r really billions of datasets in this table, won't there be other problems to be expected as just "a little overflow during cast"?
Or are they DELETEd after beeing read an amount of time later? In this case isn't there a possibility to re-use IDs in MySQL? Can't really remember...

Of course dealing with VARCHAR is a valid way, I'm with u, but it's a little more complicated as u correctly mentioned (filling with 0). I like the KISS principle a lot (Keep it simple, stupid) ;)

As u r the boss of this (btw. fantastic) OC-App u decide which way to go... ;) Just wanted to share my thoughts

@t0mcat1337
Copy link
Author

Another thing: I did some calculations a while ago and I realized that 32 bit integer ids were too little so I upped to 64 bit. Given that unixtime overflows a 32bit integer in 2038 (google 2038 year problem) you can see that this is a bad idea ;D

lol Absolutly ... I wish u, that ur app will be used much longer than 2038 and then this is something what REALLY will happen, with no discussion

@BernhardPosselt
Copy link
Contributor

Yes, scenario is realistic, I know of installs with 10000 users :)

It's better to keep this in mind than to deal with weird bug reports later on ;)

@t0mcat1337
Copy link
Author

Yes, scenario is realistic, I know of installs with 10000 users :)

Ah really? Big GZ for this!

It's better to keep this in mind than to deal with weird bug reports later on ;)

Yes, absolutly correct... just wanted to get an idea of how much the probability is for hitting the overflow... if it where REALLY little it won't be really useful to make more/complicate work than necessary (which then could again include other bugs ;) )

Nevertheless...how does housekeeping work?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants