Migrate fetchers to Search.g4 ANTLR parser. #13691

turhantolgaunal · 2025-08-13T18:34:28Z

Closes #13607

Changed inherited fetcher classes to use ANTLR generated classes instead of lucene libraries.
Changed ACMPortalFetcher.java logic for transforming the parsed syntax to URL

TO DO:

Replace the logic in other fetcher classes to transform the parsed nodes into URL calls.
Modify other fetcher classes to correctly override changed inherited functions.

Steps to test

Using the Search.g4 syntax for searching on the web with different options.

Mandatory checks

I own the copyright of the code submitted and I license it under the MIT license
Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (if change is visible to the user)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

-Changed inherited fetcher classes to use ANTLR generated classes instead of lucene libraries. - Changed ACMPortalFetcher.java logic for transforming the parsed syntax to URL

jablib/src/main/java/org/jabref/logic/importer/fetcher/ACMPortalFetcher.java

calixtus · 2025-08-14T09:59:44Z

Great start, to me it looks like your on the right track.

- Changed ACMPortalFetcherTest unit test code to use Search.g4 generated classes instead of Lucene - Removed trivial comment from ACMPortalFetcher

- Changed AbstractQueryTransformer methods to obey Search.g4 parser rules - Modified ACMPortalFetcher to use the changed transformer logic

…rrectly

…ew interface

…d the search based fetcher classes to use it

…de while still being compatible with Search.g4 parser

trag-bot · 2025-08-17T20:53:54Z

jablib/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java

-            queryNode = parser.parse(searchQuery, NO_EXPLICIT_FIELD);
-        } catch (QueryNodeParseException e) {
+            queryNode = visitor.visitStart(searchQueryObject.getContext());
+        } catch (Exception e) {


Catching a generic Exception is too broad and can mask specific issues that should be handled differently. More specific exception types should be caught.

Correct. Be specific with your Exception handling. There might always be other exceptions caused by whatever reason that might not be FetcherExceptions in the end.

trag-bot · 2025-08-17T20:55:20Z

jablib/src/test/java/org/jabref/logic/importer/fetcher/ArXivFetcherTest.java


-        // There is only a single paper found by searching that contains the exact sequence "Taxonomy of Distributed" in the title.
-        assertEquals(List.of(expected), resultWithPhraseSearch);
+        // The first result should be the expected paper


Comment is trivial and can be derived from the code. It doesn't provide additional information about the reasoning or purpose of the test.

trag-bot · 2025-08-17T20:55:47Z

jablib/src/main/java/org/jabref/logic/search/query/SearchQueryVisitor.java

+        // Check the actual operator token
+        if (ctx.bin_op.getType() == SearchParser.AND) {


Comment provides no additional information beyond what is clearly visible in the code. The comment simply restates what the code does and should be removed.

trag-bot · 2025-08-17T20:55:52Z

jablib/src/main/java/org/jabref/logic/search/query/SearchQueryVisitor.java

+                    || operator == SearchParser.NCEEQUAL
+                    || operator == SearchParser.NREQUAL
+                    || operator == SearchParser.NCREEQUAL) {
+                return null;


Public method should return Optional instead of null to better handle absence of value. This follows modern Java practices and prevents potential NullPointerExceptions.

…logic

trag-bot · 2025-08-17T21:02:06Z

jablib/src/test/java/org/jabref/logic/importer/fetcher/BvbFetcherTest.java

@@ -53,7 +51,7 @@ class BvbFetcherTest {

    @Test
    void performTest() throws FetcherException {
-        String searchquery = "effective java author:bloch";
+        String searchquery = "effective java author=bloch";


Variable name contains a typo ('searchquery' instead of 'searchQuery'). Variable names should be correctly spelled to maintain code quality and readability.

…r and node logic

…e logic

trag-bot · 2025-08-17T21:43:34Z

jablib/src/test/java/org/jabref/logic/importer/fetcher/DOABFetcherTest.java

@@ -71,7 +71,8 @@ public static Stream<Arguments> performSearch() {
                                .withField(StandardField.LANGUAGE, "English")
                                .withField(StandardField.KEYWORDS, "Religion, thema EDItEUR::Q Philosophy and Religion::QR Religion and beliefs::QRM Christianity::QRMF Christianity: sacred texts and revered writings::QRMF1 Bibles::QRMF13 New Testaments")
                                .withField(StandardField.PUBLISHER, "Brill"),
-                        "Four Kingdom Motifs before and beyond the Book of Daniel"
+                        "\"Four Kingdom Motifs before and beyond the Book of Daniel\""
+                        // The title needs to be in quotes in order for and to be parsed correctly here


Comment merely explains what is visible in the code (the quotes around the title) without providing additional information about why this is necessary for the parser implementation.

trag-bot · 2025-08-18T18:36:29Z

jablib/src/main/java/org/jabref/logic/importer/fetcher/DOAJFetcher.java

@@ -184,9 +184,9 @@ public Optional<HelpFile> getHelpPage() {
    }

    @Override
-    public URL getURLForQuery(QueryNode luceneQuery) throws URISyntaxException, MalformedURLException {
+    public URL getURLForQuery(BaseQueryNode queryNode) throws URISyntaxException, MalformedURLException {


Method lacks null check for queryNode parameter. According to modern Java practices and JabRef's standards, parameters should be validated using Objects.requireNonNull.

… new parser and node logic

…the new Search.g4 logic. Fixed how AbstractQueryTransformer handles fields.

…parser

trag-bot · 2025-08-18T20:24:22Z

jablib/src/main/java/org/jabref/logic/importer/fetcher/LOBIDFetcher.java

@@ -46,14 +46,14 @@ public class LOBIDFetcher implements PagedSearchBasedParserFetcher, IdBasedParse
    /**
     * Gets the query URL
     *
-     * @param luceneQuery the search query
+     * @param queryNode the list that contains the parsed nodes


The JavaDoc parameter description is inaccurate and misleading. It describes queryNode as a list when it's actually a BaseQueryNode object representing a search query structure.

trag-bot · 2025-08-18T20:38:20Z

jablib/src/main/java/org/jabref/logic/importer/fetcher/MathSciNet.java

@@ -96,9 +96,9 @@ public URL getURLForEntry(BibEntry entry) throws URISyntaxException, MalformedUR
    }

    @Override
-    public URL getURLForQuery(QueryNode luceneQuery) throws URISyntaxException, MalformedURLException {
+    public URL getURLForQuery(BaseQueryNode queryNode) throws URISyntaxException, MalformedURLException {


Method lacks null check for queryNode parameter. According to modern Java practices and JSpecify guidelines, parameters should be validated or annotated appropriately.

calixtus · 2025-08-19T08:10:56Z

jablib/src/main/java/org/jabref/logic/importer/PagedSearchBasedFetcher.java

-            return this.performSearchPaged(parser.parse(searchQuery, NO_EXPLICIT_FIELD), pageNumber);
-        } catch (QueryNodeParseException e) {
+            return this.performSearchPaged(visitor.visitStart(searchQueryObject.getContext()), pageNumber);
+        } catch (Exception e) {


Try to avoid general exceptions, be specific.

calixtus · 2025-08-19T08:12:39Z

jablib/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java


    /**
     * Looks for hits which are matched by the given free-text query.
     *
-     * @param searchQuery query string that can be parsed into a lucene query
+     * @param searchQuery query string that can be parsed into a search.g4 query


'g4' is just the default file extension for antlr4 files. just write search query.

calixtus · 2025-08-19T08:14:03Z

jablib/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java

-            queryNode = parser.parse(searchQuery, NO_EXPLICIT_FIELD);
-        } catch (QueryNodeParseException e) {
+            queryNode = visitor.visitStart(searchQueryObject.getContext());
+        } catch (Exception e) {


Correct. Be specific with your Exception handling. There might always be other exceptions caused by whatever reason that might not be FetcherExceptions in the end.

trag-bot · 2025-08-19T18:48:54Z

jablib/src/main/java/org/jabref/logic/importer/PagedSearchBasedFetcher.java

-            throw new FetcherException("An error occurred during parsing of the query.");
+            return this.performSearchPaged(visitor.visitStart(searchQueryObject.getContext()), pageNumber);
+        } catch (ParseCancellationException e) {
+            throw new FetcherException("A syntax error occurred during parsing of the query");


The exception message does not end with a period, which violates the text formatting guidelines for consistent message formatting across the codebase.

trag-bot · 2025-08-19T20:04:09Z

jablib/src/main/java/org/jabref/logic/importer/fetcher/ZbMATH.java

        URIBuilder uriBuilder = new URIBuilder("https://zbmath.org/bibtexoutput/");
-        uriBuilder.addParameter("q", new ZbMathQueryTransformer().transformLuceneQuery(luceneQuery).orElse("")); // search all fields
+        uriBuilder.addParameter("q", new ZbMathQueryTransformer().transformSearchQuery(queryNode).orElse("")); // search all fields


Comment is trivial and does not provide additional information beyond what is obvious from the code. The comment should either be removed or enhanced with meaningful explanation.

jabref-machine · 2025-08-19T20:05:11Z

JUnit tests of jablib are failing. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Source Code Tests / Unit tests (pull_request)" and click on it.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

pr: migrate fetchers to Search.g4 ANTLR parser.

b02fbde

-Changed inherited fetcher classes to use ANTLR generated classes instead of lucene libraries. - Changed ACMPortalFetcher.java logic for transforming the parsed syntax to URL

trag-bot bot reviewed Aug 13, 2025

View reviewed changes

jablib/src/main/java/org/jabref/logic/importer/fetcher/ACMPortalFetcher.java Outdated Show resolved Hide resolved

calixtus changed the title ~~pr: migrate fetchers to Search.g4 ANTLR parser.~~ Migrate fetchers to Search.g4 ANTLR parser. Aug 14, 2025

calixtus added dev: code-quality Issues related to code or architecture decisions component: fetcher labels Aug 14, 2025

turhantolgaunal added 8 commits August 14, 2025 20:32

pr: migrate fetchers to Search.g4 ANTLR parser.

9892bec

- Changed ACMPortalFetcherTest unit test code to use Search.g4 generated classes instead of Lucene - Removed trivial comment from ACMPortalFetcher

pr: migrate fetchers to Search.g4 ANTLR parser.

d802521

- Changed AbstractQueryTransformer methods to obey Search.g4 parser rules - Modified ACMPortalFetcher to use the changed transformer logic

Added a new interface for search nodes in order to parse operators co…

a645b38

…rrectly

Added 2 new node types and changed SearchQueryNode to implement the n…

44853c3

…ew interface

Created a new visitor class for parsing the search syntax and modifie…

a643d22

…d the search based fetcher classes to use it

Updated AbstractQueryTransformer to be more in line with the older co…

c890e92

…de while still being compatible with Search.g4 parser

Updated ACMPortalFetcher to use the new parser and node logic

954e1a4

Updated ArxivPortalFetcher to use the new parser and node logic

3977901

trag-bot bot reviewed Aug 17, 2025

View reviewed changes

Updated BvbFetcher and BvbFetcherTest to use the new parser and node …

60d2034

…logic

trag-bot bot reviewed Aug 17, 2025

View reviewed changes

turhantolgaunal added 2 commits August 18, 2025 00:26

Updated DBLPFetcher and DBLPQueryTransformerTest to use the new parse…

2f9ea97

…r and node logic

Updated DOABFetcher and DOABFetcherTest to use the new parser and nod…

f3d40af

…e logic

trag-bot bot reviewed Aug 17, 2025

View reviewed changes

Updated DOAJFetcher to use the new parser and node logic

184a721

trag-bot bot reviewed Aug 18, 2025

View reviewed changes

turhantolgaunal added 5 commits August 18, 2025 22:02

Updated GoogleScholar and SearchBasedFetcherCapabilityTest to use the…

3ce8865

… new parser and node logic

Updated GvkFetcher GvkFetcherTest and GVKQueryTransformerTest to use …

b28df9c

…the new Search.g4 logic. Fixed how AbstractQueryTransformer handles fields.

Updated IEEE and IEEEQueryTransformerTest to use Search.g4 based parser

9c978dc

Updated InfixTransformerTest to use Search.g4 based parser.

60b6db9

Updated INSPIREFetcher to use Search.g4 based parser.

98aa35f

turhantolgaunal added 3 commits August 18, 2025 23:09

Updated ISIDOREFetcher and ISIDOREFetcherTest to use Search.g4 based …

efcd08f

…parser

Updated JstorFetcher to use Search.g4 based fetcher

45245b8

Updated LOBIDFetcher to use Search.g4 based fetcher

ee311be

trag-bot bot reviewed Aug 18, 2025

View reviewed changes

Updated MathSciNet to use Search.g4 based fetcher

59175ac

trag-bot bot reviewed Aug 18, 2025

View reviewed changes

calixtus reviewed Aug 19, 2025

View reviewed changes

turhantolgaunal added 3 commits August 19, 2025 21:10

Changed general exceptions to more specific ones according to feedback

bf0be13

Removed null pointer exception

492456f

Removed null pointer exception

f561bbd

trag-bot bot reviewed Aug 19, 2025

View reviewed changes

turhantolgaunal added 3 commits August 19, 2025 21:56

Updated ScholarArchiveFetcher to use the ANTLR parser

d7d4a04

Updated SpringerFetcher to use the ANTLR parser

9d8b24d

Updated rest of the classes needed for compiling

879e5c1

trag-bot bot reviewed Aug 19, 2025

View reviewed changes

		// Check the actual operator token
		if (ctx.bin_op.getType() == SearchParser.AND) {

Uh oh!

Migrate fetchers to Search.g4 ANTLR parser. #13691

Are you sure you want to change the base?

Migrate fetchers to Search.g4 ANTLR parser. #13691

Uh oh!

Conversation

turhantolgaunal commented Aug 13, 2025

Steps to test

Mandatory checks

Uh oh!

Uh oh!

calixtus commented Aug 14, 2025

Uh oh!

trag-bot bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

calixtus Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

calixtus Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

calixtus Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

calixtus Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

jabref-machine commented Aug 19, 2025

Uh oh!

Uh oh!