Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in meteor.py #18

Closed
evanmiltenburg opened this issue Feb 1, 2021 · 8 comments
Closed

ValueError in meteor.py #18

evanmiltenburg opened this issue Feb 1, 2021 · 8 comments

Comments

@evanmiltenburg
Copy link
Contributor

evanmiltenburg commented Feb 1, 2021

I just used the script for the first time, in a clean Miniconda environment on Mac OS X. This is the error that I got:

(base) MacBook-Air-van-Emiel:GEM-metrics emiel$ python run_metrics.py -r /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.outs.json 
[W 210201 08:36:16 data:49] /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/data/meteor/meteor-1.5.jar not found -- downloading https://github.com/GEM-benchmark/GEM-metrics/releases/download/data/meteor.tar.gz. This may take a few minutes.
...100%, 206 MB, 3052 KB/s, 69 seconds passed
[W 210201 08:37:25 data:54] Extracting from /var/folders/xm/6mtvt2mn4fn08d4c6y1_xv0w0000gn/T/tmpdn5ptzro to /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/data/meteor
Traceback (most recent call last):
  File "run_metrics.py", line 96, in <module>
    main(config)
  File "run_metrics.py", line 63, in main
    values = gem_metrics.compute(outs, refs, srcs, metric_dict)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/__init__.py", line 100, in compute
    values.update(metric.compute(outs, refs))
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/meteor.py", line 19, in compute
    meteor, _ = m.compute_score(predictions.untokenized, references.untokenized)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/impl/meteor.py", line 56, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().decode('UTF-8').strip()))
ValueError: could not convert string to float: ''

After that the script doesn't seem to do anything. So I halted it with CTRL+C. Then I ran the script again. Same error:

python run_metrics.py -r /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.outs.json 
Traceback (most recent call last):
  File "run_metrics.py", line 96, in <module>
    main(config)
  File "run_metrics.py", line 63, in main
    values = gem_metrics.compute(outs, refs, srcs, metric_dict)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/__init__.py", line 100, in compute
    values.update(metric.compute(outs, refs))
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/meteor.py", line 19, in compute
    meteor, _ = m.compute_score(predictions.untokenized, references.untokenized)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/impl/meteor.py", line 56, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().decode('UTF-8').strip()))
ValueError: could not convert string to float: ''
@tuetschek
Copy link
Collaborator

Hmm strange... are you able to run the Meteor JAR separately?

@evanmiltenburg
Copy link
Contributor Author

Yes:

(base) MacBook-Air-van-Emiel:meteor emiel$ echo "This is a test" > hyp.txt
(base) MacBook-Air-van-Emiel:meteor emiel$ echo "This is a test" > ref.txt
(base) MacBook-Air-van-Emiel:meteor emiel$ java -jar meteor-1.5.jar hyp.txt ref.txt
Meteor version: 1.5

Eval ID:        meteor-1.5-wo-en-no_norm-0.85_0.2_0.6_0.75-ex_st_sy_pa-1.0_0.6_0.8_0.6

Language:       English
Format:         plaintext
Task:           Ranking
Modules:        exact stem synonym paraphrase
Weights:        1.0 0.6 0.8 0.6
Parameters:     0.85 0.2 0.6 0.75

Segment 1 score:	1.0

System level statistics:


           Test Matches                  Reference Matches
Stage      Content  Function    Total    Content  Function    Total
1                1         3        4          1         3        4
2                0         0        0          0         0        0
3                0         0        0          0         0        0
4                0         0        0          0         0        0
Total            1         3        4          1         3        4

Test words:             4
Reference words:        4
Chunks:                 0
Precision:              1.0
Recall:                 1.0
f1:                     1.0
fMean:                  1.0
Fragmentation penalty:  0.0

Final score:            1.0

@tuetschek
Copy link
Collaborator

Thats really weird... are you able to run E2E metrics (https://github.com/tuetschek/e2e-metrics), or do you get the same error (i.e. did I screw up the transfer, or is that a different error?). I can't unfortunately try it out on a Mac... I'm on Linux (or Windows, but that I don't want to try).

@evanmiltenburg
Copy link
Contributor Author

I have good news and I have bad news. The good news is that your port seems to work the same as the original e2e-code. The bad news is that I still get this error:

computing METEOR score...
Traceback (most recent call last):
  File "./measure_scores.py", line 380, in <module>
    evaluate(data_src, data_ref, data_sys, args.table, args.header, args.sys_file, args.python)
  File "./measure_scores.py", line 234, in evaluate
    coco_eval = run_coco_eval(data_ref, data_sys)
  File "./measure_scores.py", line 323, in run_coco_eval
    coco_eval.evaluate()
  File "/Users/emiel/Tilburg/Projects/GEM/e2e-metrics/pycocoevalcap/eval.py", line 54, in evaluate
    score, scores = scorer.compute_score(gts, res)
  File "/Users/emiel/Tilburg/Projects/GEM/e2e-metrics/pycocoevalcap/meteor/meteor.py", line 45, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().decode('UTF-8').strip()))
ValueError: could not convert string to float: ''

@evanmiltenburg
Copy link
Contributor Author

evanmiltenburg commented Feb 1, 2021

OK, I've located the error. I've added this on line 55: print("Test:", self.meteor_p.stderr.readlines())
That results in this output:

Test: [b'Exception in thread "main" java.util.InputMismatchException\n', b'\tat java.base/java.util.Scanner.throwFor(Scanner.java:939)\n', b'\tat java.base/java.util.Scanner.next(Scanner.java:1594)\n', b'\tat java.base/java.util.Scanner.nextDouble(Scanner.java:2564)\n', b'\tat edu.cmu.meteor.scorer.MeteorStats.<init>(Unknown Source)\n', b'\tat Meteor.scoreStdio(Unknown Source)\n', b'\tat Meteor.main(Unknown Source)\n']

The fix for this issue seems to be discussed here: Maluuba/nlg-eval#91
Apparently it's an issue with newline characters?

Update:
I've tested the fix proposed there but still can't get it to work.

@evanmiltenburg
Copy link
Contributor Author

Ah it might be a different issue, related to system locale! See: Maluuba/nlg-eval#32

@evanmiltenburg
Copy link
Contributor Author

That fixed it! Solution is to change line 26 and 27 of gem_metrics/impl/meteor.py to:

        self.meteor_cmd = ['java', '-jar', '-Xmx2G', '-Duser.language=en', '-Duser.country=US', METEOR_JAR,
                           '-', '-', '-stdio', '-l', meteor_language, '-norm']

Output:

$ python run_metrics.py -r /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.outs.json 
{
    "predictions_file": null,
    "N": 2,
    "total_length": 27,
    "mean_pred_length": 13.5,
    "distinct-1": 0.6296296296296297,
    "vocab_size-1": 17,
    "unique-1": 9,
    "entropy-1": 3.95822916866988,
    "distinct-2": 0.8,
    "vocab_size-2": 20,
    "unique-2": 15,
    "entropy-2": 4.243856189774722,
    "cond_entropy-2": 0.2225626877266411,
    "distinct-3": 0.8695652173913043,
    "vocab_size-3": 20,
    "unique-3": 17,
    "entropy-3": 4.2626923908396215,
    "cond_entropy-3": 0.010140548890983897,
    "total_length-nopunct": 24,
    "mean_pred_length-nopunct": 12.0,
    "distinct-1-nopunct": 0.6666666666666666,
    "vocab_size-1-nopunct": 16,
    "unique-1-nopunct": 9,
    "entropy-1-nopunct": 3.8868421881310122,
    "distinct-2-nopunct": 0.8181818181818182,
    "vocab_size-2-nopunct": 18,
    "unique-2-nopunct": 14,
    "entropy-2-nopunct": 4.095795255000932,
    "cond_entropy-2-nopunct": 0.18150945892357126,
    "distinct-3-nopunct": 0.9,
    "vocab_size-3-nopunct": 18,
    "unique-3-nopunct": 16,
    "entropy-3-nopunct": 4.1219280948873624,
    "cond_entropy-3-nopunct": 0.012496476250064989,
    "msttr-100": NaN,
    "msttr-100_nopunct": NaN,
    "references_file": "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json",
    "bleu": 50.124334172849004,
    "meteor": 0.4427331109107928,
    "local_recall": {
        "1": 0.0,
        "2": 0.2857142857142857,
        "3": 0.8888888888888888
    },
    "rouge1": {
        "low": {
            "precision": 0.75,
            "recall": 0.7428571428571429,
            "fmeasure": 0.7489655172413794
        },
        "mid": {
            "precision": 0.7638888888888888,
            "recall": 0.7487012987012986,
            "fmeasure": 0.7521051362430673
        },
        "high": {
            "precision": 0.7777777777777778,
            "recall": 0.7545454545454545,
            "fmeasure": 0.7552447552447551
        }
    },
    "rouge2": {
        "low": {
            "precision": 0.5151515151515151,
            "recall": 0.4118589743589744,
            "fmeasure": 0.4567901234567901
        },
        "mid": {
            "precision": 0.5171911421911422,
            "recall": 0.45112179487179493,
            "fmeasure": 0.4805318138651472
        },
        "high": {
            "precision": 0.5192307692307692,
            "recall": 0.4903846153846154,
            "fmeasure": 0.5042735042735044
        }
    },
    "rougeL": {
        "low": {
            "precision": 0.6944444444444443,
            "recall": 0.6863636363636364,
            "fmeasure": 0.6889655172413793
        },
        "mid": {
            "precision": 0.6954365079365079,
            "recall": 0.7169913419913421,
            "fmeasure": 0.7022916164295474
        },
        "high": {
            "precision": 0.6964285714285714,
            "recall": 0.7476190476190476,
            "fmeasure": 0.7156177156177156
        }
    },
    "rougeLsum": {
        "low": {
            "precision": 0.6944444444444443,
            "recall": 0.6863636363636364,
            "fmeasure": 0.6889655172413793
        },
        "mid": {
            "precision": 0.6954365079365079,
            "recall": 0.7169913419913421,
            "fmeasure": 0.7022916164295474
        },
        "high": {
            "precision": 0.6964285714285714,
            "recall": 0.7476190476190476,
            "fmeasure": 0.7156177156177156
        }
    }
}

evanmiltenburg added a commit that referenced this issue Feb 1, 2021
Change the Java locale to en.US so that it works for everyone, regardless of their enviroment.
@tuetschek
Copy link
Collaborator

The fix works fine for me – thanks @evanmiltenburg & @sebastianGehrmann !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants