Implimented arXivId Parsing for PDF with arXivId #12335

ar-rana · 2024-12-24T13:15:07Z

Used the 'parse' method in ArXivIdentifier to get arXivId and added a testcase for the same using the link give in the issue.
Closes #12000
Closes https://github.com/koppor/jabref/issues/47".

Mandatory checks

I own the copyright of the code submitted and I licence it under the MIT license
Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

github-actions

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

In case of issues with the import order, double check that you activated Auto Import.
You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

ar-rana · 2024-12-24T16:05:20Z

@koppor could you please review this PR and suggest changes where I am wrong.

Siedlerchr · 2024-12-24T18:53:01Z

Thanks for your contribution. Please fix the check style issues first

github-actions

Your code currently does not meet JabRef's code guidelines.
We use OpenRewrite to ensure "modern" Java coding practices.
The issues found can be automatically fixed.
Please execute the gradle task rewriteRun, check the results, commit, and push.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".

ar-rana · 2024-12-25T07:34:19Z

Hello maintainers, could you please review this PR, I have fixed the previous issues.
The extra changes that are being reflected are because I merged the latest changes from upstream, please ignore them they will not be there in the actual PR.

Siedlerchr · 2024-12-25T17:10:37Z

Codewise looks good to me. You have accidentally commited the csl styles submodules, see https://devdocs.jabref.org/code-howtos/faq.html#the-problem for a solution

InAnYan · 2024-12-25T17:44:26Z

Hmm, I tried to import these papers:

https://arxiv.org/abs/2406.14319
https://arxiv.org/abs/2412.06769

But JabRef didn't import it properly, and arXiv was null all the time

ar-rana · 2024-12-27T14:26:29Z

I have changed the getArXivId method to this in my local branch:

private String getArXivId(String arXivId) {
        if (arXivId == null) {
            arXivId = ArXivIdentifier.parse(curString).map(ArXivIdentifier::asString).orElse(null);
            if (arXivId != null) {
                if (curString.length() > arXivId.length() + 7) {
                    curString = curString.substring(arXivId.length() + 7);
                    extractYear();
                }
                return arXivId;
            }
        }
        return arXivId;
    }

and modified the ArXivIdentifier.parse method a little by altering the identifier to this String identifier = value.split(" ")[0]; at line 41.

this fixes the null arXiv problem that @InAnYan was encontering when externally importing the papers, but for some reason it still does not import the titles correctly except for the paper that was mentioned in the issue.

github-actions

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

InAnYan · 2024-12-28T14:54:01Z

Hi, ar-rana! Thanks for working on this PR!

As holidays come soon, maintainers could be too busy these weeks, so don't worry if we give feedback too late

Siedlerchr · 2025-01-01T20:49:14Z

src/main/java/org/jabref/logic/importer/fileformat/PdfContentImporter.java

@@ -609,11 +609,15 @@ private String getDoi(String doi) {

    private String getArXivId(String arXivId) {
        if (arXivId == null) {
-            arXivId = ArXivIdentifier.parse(curString).map(ArXivIdentifier::asString).orElse(null);
+            String arXiv = curString.split(" ")[0];


This could lead to a null pointer if there is no whitespace and you access the index

Hello Siedlerchr, I tested this method with empty/non-empty strings with/without whitespaces and it would only give a null pointer if the curString is null which does not seem to be the case here. So should I add a change here or leave it, as getDoi also work the same way

Implimented arXivId Parsing forPDF with arXivId

220ddac

github-actions bot reviewed Dec 24, 2024

View reviewed changes

ar-rana marked this pull request as draft December 24, 2024 15:59

added Optional parameter

32e9867

github-actions bot reviewed Dec 25, 2024

View reviewed changes

ar-rana added 2 commits December 25, 2024 11:48

Merge branch 'main' into feature/arXivId

9c80e04

merged fixes

89de378

ar-rana marked this pull request as ready for review December 25, 2024 13:20

Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Dec 25, 2024

removed csl-styles

28755cf

fixed null arxiv issue on external imports

06b6bb0

github-actions bot reviewed Dec 28, 2024

View reviewed changes

Improved getArxivId Implementation

457bb3f

Siedlerchr reviewed Jan 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implimented arXivId Parsing for PDF with arXivId #12335

Implimented arXivId Parsing for PDF with arXivId #12335

ar-rana commented Dec 24, 2024

github-actions bot left a comment

ar-rana commented Dec 24, 2024 •

edited

Loading

Siedlerchr commented Dec 24, 2024

github-actions bot left a comment

ar-rana commented Dec 25, 2024 •

edited

Loading

Siedlerchr commented Dec 25, 2024

InAnYan commented Dec 25, 2024 •

edited

Loading

ar-rana commented Dec 27, 2024

github-actions bot left a comment

InAnYan commented Dec 28, 2024

Siedlerchr Jan 1, 2025

ar-rana Jan 2, 2025 •

edited

Loading

Implimented arXivId Parsing for PDF with arXivId #12335

Are you sure you want to change the base?

Implimented arXivId Parsing for PDF with arXivId #12335

Conversation

ar-rana commented Dec 24, 2024

Mandatory checks

github-actions bot left a comment

Choose a reason for hiding this comment

ar-rana commented Dec 24, 2024 • edited Loading

Siedlerchr commented Dec 24, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

ar-rana commented Dec 25, 2024 • edited Loading

Siedlerchr commented Dec 25, 2024

InAnYan commented Dec 25, 2024 • edited Loading

ar-rana commented Dec 27, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

InAnYan commented Dec 28, 2024

Siedlerchr Jan 1, 2025

Choose a reason for hiding this comment

ar-rana Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

ar-rana commented Dec 24, 2024 •

edited

Loading

ar-rana commented Dec 25, 2024 •

edited

Loading

InAnYan commented Dec 25, 2024 •

edited

Loading

ar-rana Jan 2, 2025 •

edited

Loading