Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scopus400Error: Error translating query - Refining results with "source title" query argument #89

Open
FabianEUR opened this issue Mar 13, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@FabianEUR
Copy link

Hi,

Is it possible to refine/process publications from Scopus limited to source titles containing a specified keyword? For example, my query (and variations thereof) gives me the above error after refining a 2000+- publications:

( TITLE-ABS-KEY ( "recommend* sys*" OR "recommend* servi*" ) AND SRCTITLE ( "comput*" OR "acm" ) )

I had a look at the API reference and the existing issues, but I had some trouble finding an answer to my question.

Thank you

@stijnh
Copy link
Member

stijnh commented Mar 13, 2024

Thanks for using litstudy!

I cannot see the error. The query looks fine. Does the query work if you use it on the Scopus website?

@FabianEUR
Copy link
Author

FabianEUR commented Mar 13, 2024

The query works on Scopus and I can find publications which can be exported. I've tried variations without quotation marks, with/without brackets, only one keyword, with/out wildcard, etc. but get the same error.

Here is the error:

Scopus400Error                            Traceback (most recent call last)
Cell In[2], line 7
      4 import logging
      5 logging.getLogger().setLevel(logging.CRITICAL)
----> 7 docs_scopus, docs_not_found = litstudy.refine_scopus(docs_scopus)
      8 print(len(docs_scopus), "papers found on Scopus")
      9 print(len(docs_not_found), "papers NOT found on Scopus")

File ~\AppData\Roaming\Python\Python311\site-packages\litstudy\sources\scopus.py:248, in refine_scopus(docs, search_title)
    244                     return ScopusDocument.from_eid(record.eid)
    246     return None
--> 248 return docs._refine_docs(callback)

File ~\AppData\Roaming\Python\Python311\site-packages\litstudy\types.py:53, in DocumentSet._refine_docs(self, callback)
     50 old_docs = []
     52 for i, doc in enumerate(progress_bar(self.docs)):
---> 53     new_doc = callback(doc)
     55     if new_doc is not None:
     56         new_indices.append(i)

File ~\AppData\Roaming\Python\Python311\site-packages\litstudy\sources\scopus.py:236, in refine_scopus.<locals>.callback(doc)
    234 if len(title) > 10 and search_title:
    235     query = f"TITLE({title})"
--> 236     response = ScopusSearch(query, view="STANDARD", download=False)
    237     nresults = response.get_results_size()
    239     if nresults > 0 and nresults < 10:

File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\scopus_search.py:206, in ScopusSearch.__init__(self, query, refresh, view, verbose, download, integrity_fields, integrity_action, subscriber, **kwds)
    204 self._query = query
    205 self._view = view
--> 206 Search.__init__(self, query=query, api='ScopusSearch', count=count,
    207                 cursor=subscriber, download=download,
    208                 verbose=verbose, **kwds)

File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\superclasses\search.py:62, in Search.__init__(self, query, api, count, cursor, download, verbose, **kwds)
     59 self._cache_file_path = get_folder(api, self._view)/stem
     61 # Init
---> 62 Base.__init__(self, params=params, url=URLS[api], download=download,
     63               api=api, verbose=verbose)

File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\superclasses\base.py:66, in Base.__init__(self, params, url, api, download, verbose, *args, **kwds)
     64         self._json = loads(fname.read_text())
     65 else:
---> 66     resp = get_content(url, api, params, *args, **kwds)
     67     header = resp.headers
     69     if ab_ref_retrieval:

File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\utils\get_content.py:116, in get_content(url, api, params, **kwds)
    114         except:
    115             reason = ""
--> 116     raise errors[resp.status_code](reason)
    117 except KeyError:
    118     resp.raise_for_status()

Scopus400Error: Error translating query

--

I'm using jupyter notebook and have the same error via uni VPN and on campus.

@stijnh stijnh added the bug Something isn't working label Mar 13, 2024
@stijnh
Copy link
Member

stijnh commented Mar 13, 2024

Seems that this is a bug. It seems that litstudy tries to search Scopus for the title of the paper by using the query "TITLE({title})", but this results in an incorrect syntax for Scopus for certain titles. This will need further investigation.

However, I don't really understand the line litstudy.refine_scopus(docs_scopus). You have loaded documents from Scopus into docs_scopus and then want to refine them again using Scopus? Or do you load the original documents from a file?

@FabianEUR
Copy link
Author

Ahh, maybe that explains why the refining always only works until a certain publication before the error appears.

I exported the .csv from scopus and loaded the file into docs_scopus and then refined them. Is this only meant to be done for non-scopus datasets?

@stijnh
Copy link
Member

stijnh commented Mar 14, 2024

Ahh, maybe that explains why the refining always only works until a certain publication before the error appears.

Indeed. If you could figure out which publication it fails on, you can remove that one from the dataset as a temporary solution.

I exported the .csv from scopus and loaded the file into docs_scopus and then refined them. Is this only meant to be done for non-scopus datasets?

That is fine, if you load it from a CSV file it indeed makes sense to refine it afterwards. The function refine_scopus should work on any dataset from any source. It fails here because of a bug :-(

@stijnh
Copy link
Member

stijnh commented Mar 14, 2024

If you would like to look into this issue, we are happy to accept pull requests!

I think what need to happen is probably that the title needs to be "stripped" from punctuation before it is passed to Scopus. For example, if the title is something like:

Research on the number of prime numbers between n² and (n+1)²

The query sent to Scopus will be:

TITLE(Research on the number of prime numbers between n² and (n+1)²)

but all those non-alphabetic characters result in query that is not accepted by Scopus.

Additionally, in the case were a Document already has a ScopusID, we can just query Scopus directly for the publication without having to search based on the title (I think the CSV file already provides the ScopusID).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants