-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backend: ProQuest Federated Search Gateway #3991
base: dev
Are you sure you want to change the base?
Add backend: ProQuest Federated Search Gateway #3991
Conversation
Other than the TODOs, I think this is far enough to be worth a review. Also worth noting that there is no authentication config, it's simply IP-based access. |
* @throws BackendException | ||
* @return void | ||
*/ | ||
public function checkForSRUDiagnosticsError($resultBody) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sturkel89 pointed out that if you are not on the correct IP range, the error was hidden and there were just zero results. It turns out that SRU likes to return an HTTP 200 no matter what goes wrong; there is a list linked above of dozens of different error codes, including '3' for authentication problems.
This is a general fix to make any such "diagnostic" messages throw an error akin to an HTTP 4xx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would think a "diagnostic" message might be benign debug content; but looking at the list, apparently it's always an error of some kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to see text suggesting that you don't have access to ProQuest content because you are not on the campus network, but this may be the best we can do for now.
No that's just because we're both running VuFind on our local machines. In a production context, the error would be because the VuFind server was not in ProQuest's configured IP range. Would not matter whether the user had VPN or not.
And FWIW if you turn on development mode in your apache config, the error is more specific (and ugly). I unfortunately can't generate a screenshot of that today because I'm on campus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider this issue resolved, right? No more work to be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so!
Probably best to keep things simple for now; I agree that it would be better to implement (perhaps configurable) base template changes rather than creating a custom ProQuest-specific template. But if what we're doing now is not actively incorrect, it's probably easiest to avoid adding complexity in this initial PR and instead tackle that separately as needed as a follow-up. |
I think this is at a good functional point now.
So I think this is ready for some final reviews and whatever cleanup. |
Thanks for all of the incremental improvements! This is looking a lot better than version 1.0. It's probably just about ready to merge (perhaps pending a response to my question about how date display relates to number of 045 fields). However, I just noticed one other thing that is probably a bug. (I showed Demian and he agrees.) Here it is:
First page of results after viewing a record and then going back to see the results list: Later pages of results after viewing a record and going back to see the results list: |
@sturkel89 This is a good catch. As you and @demiankatz say this is a bug. When I query the underlying API with any I'm going to report the issue to ProQuest, but I'm still waiting on a response to an email from November so not holding my breath. In the meantime I could fix this by doing a duplicate query every time for the facet sidebar, but that seems wasteful when probably 90% of faceting will happen on page 1? Could be convinced otherwise. |
Hard to say what the typical user would do. My preference is to always have the filters available, esp. since PQ can provide such a firehose of results. If the user enters a simple search and gets a huge results set, it may take a couple of screens of browsing for them to realize that they should probably limit the results SOMEHOW. |
I guess there are two ways we could go with this: 1.) If doing the redundant query to fetch the facet data is easy to implement and very fast (e.g. if you set record count to 0 and instantly get summary data back), maybe it's worth doing to make things work better... BUT 2.) If fixing this problem adds complexity or significantly impacts performance, maybe the best course of action is to open a bug-level JIRA ticket documenting the problem so we don't forget it, but giving it some time to stew in case ProQuest can offer a better solution on their end. I suspect option 2 is probably the path of least resistance for now... |
Ok, I found a compromise (that I should have thought of in the first place) to only repeat the search when it's actually needed i.e. when startRecord > 1. Doing the extra search adds about 600ms, which I don't love, but it's not horrible especially if it's only done when necessary. (Setting record count to 0 doesn't appear to speed things up.) |
Well no, I'm doing something wrong now and the offsets are not stable. Looking into it. |
Never mind, now I can't reproduce the issue. |
This seems to be fixed for me! |
Another question/possible bug: When I do a search, then limit results by source, I see that limit across the top of the screen. I'm finding that if I either change the search term or change the search type dropdown (All fields, Title) my limit goes away. Is that intentional? (In Villanova's implementation, these limits are sticky in both the main catalog and in EDS.) |
There are these two settings in searches.ini that control the filter retention behavior for Solr: https://github.com/vufind-org/vufind/blob/dev/config/vufind/searches.ini#L103-L110 Maybe something equivalent is missing in the ProQuest-specific configuration or configuration processing. Let me know if you need me to dig deeper into this, @maccabeelevine! |
Thanks @sturkel89 for catching this. During my early testing I had changed the two config settings that @demiankatz cited. Just restored them to their defaults and it works as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @maccabeelevine and @sturkel89 -- see below for another round of (mostly very minor) comments.
config/vufind/ProQuestFSG.ini
Outdated
; [NoResultsRecommendations] sections below. | ||
; See the comments above those sections for details on legal settings. You may | ||
; repeat these lines to load multiple recommendations. | ||
;default_top_recommend[] = TopFacets:FacetsTop:Primo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray Primo reference here. Should we just remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not necessarily wrong (you could??? bring in Primo results at the top) but it's definitely confusing and shout not be there. Removed.
config/vufind/ProQuestFSG.ini
Outdated
|
||
; Checkbox facets are facets, which are getting displayed as checkboxes | ||
; pcAvailability is a special facet. It's used to show all records found in | ||
; PrimoCentral without checking the local availability against a holdingsfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment also seems to be copied from elsewhere and probably not relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes clearly using Primo.ini as a source was not ideal. I've reverted this to (almost) what it says in facets.ini.
config/vufind/ProQuestFSG.ini
Outdated
|
||
; This section shows which search types will display in the basic search box at | ||
; the top of Primo pages. The name of each setting below corresponds with an | ||
; index defined in the Primo API. The value of each setting is the text to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More Primo references here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
view=full | ||
|
||
; This section provides settings for optional caching of search requests. | ||
[SearchCache] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that your connector here does not support search caching. Am I missing something, or should we remove these settings for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be working! It's actually the SRU Connector that uses ConnectorCacheTrait and uses it in search(); looks like I added that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! Sorry for missing that detail -- didn't look high enough in the class hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I had to check myself anyway, I wrote that part back in October!
; NOTE: This feature is incompatible with SyndeticsPlus content; please use | ||
; regular Syndetics if necessary. | ||
[List] | ||
view=full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has anyone tested whether tab/accordion modes work here? Let us know if you'd like @sturkel89 or me to give it a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely tested it during development (only added stuff to ProQuest.ini that I tested -- except obviously I copied over a lot of bad Primo comments :) ) but that's definitely not proof it works as designed.
*/ | ||
public function getFacetList($filter = null) | ||
{ | ||
if (null === $this->simplifiedResponseFacets) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check here whether any facets are actually turned on? You should be able to check $this->getParams()->getFacetConfig()
and bypass the extra lookup if the Database facet has not been requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point on efficiency in case someone turns off the Database facet, but I think I can just check $filter passed into this method to see if it's empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A null $filter value means "get all configured facets," so you might need to check the config if you get a null value... but you're right that if it's a non-null value, you can look at the filter and not worry about the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the $filter check and it seems to work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry posted my comment before seeing your last one...will review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might want to add a && $filter !== null
check in there to be safe; I can't remember under what circumstances the null value is passed in, but if you don't check for null, then the facets will be excluded when somebody asks for all the facets, which seems likely to cause a problem somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right the first time around (I think) -- I swapped out the $filter logic with getFacetConfig
and it seems to work as expected in all cases, including if I hard code $filter = null
they still display.
$this->setRecordCollectionFactory($factory); | ||
} | ||
$this->connector = $connector; | ||
$this->identifier = 'ProQuestFSG'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style point: do the extra spaces before =
really serve a purpose here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but copied from six other Backend classes that do the same :)
* | ||
* PHP version 8 | ||
* | ||
* Copyright (C) Andrew Nagy 2008. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to update this copyright statement. You can certainly give Andrew credit in the author list if this is heavily derived from his original code, but I suspect this is largely your work and justifies a 2025 date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was copy/paste mistake from the SRU connector. I'm leaving this all at 2024 tho.
@@ -51,7 +51,7 @@ | |||
<?php else: ?> | |||
<?php | |||
$journalTitle = $this->driver->getContainerTitle(); | |||
$summDate = $this->driver->getPublicationDates(); | |||
$summDate = $this->driver->getHumanReadablePublicationDates(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you done any analysis to be sure this change won't have weird side effects outside of ProQuest? (I haven't looked closely, just double-checking that this is safe and intentional since it impacts a base template; happy to dig in deeper if you'd like me to invest the time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair point, I did some quick display of Solr results and didn't see any difference, and I figured if there was going to be a difference, it would be a good once since this is a template and so the 'human readable' thing presumably applies. But that's not proof, and if it's safer I can roll it back and adjust the new driver accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a closer look at this tomorrow with a fresh brain and see how concerned I feel about it. ;-)
@@ -0,0 +1,32 @@ | |||
<?php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the bootstrap3 templates you added for this PR didn't get deleted when you merged in the removal of that theme. We should get rid of them now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! When I merged in dev I saw all of the deletions of the existing bootstrap3 templates I had edited, but didn't think to also remove the new ones. Fixed.
…l $filter means display all of them
The ProQuest Federated Search Gateway is an SRU API for searching across research databases licensed via ProQuest. It's old (docs last updated 2016) but still works and per ProQuest customer service:
On the plus side, it has a fairly rich CQL syntax for search. On the minus, there are no facets offered other than the constituent databases.
Implementation is patterned after the soon-to-be-deleted WorldCat backend as they both use SRU.
TODO