ValueError in meteor.py #18

evanmiltenburg · 2021-02-01T07:41:54Z

I just used the script for the first time, in a clean Miniconda environment on Mac OS X. This is the error that I got:

(base) MacBook-Air-van-Emiel:GEM-metrics emiel$ python run_metrics.py -r /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.outs.json 
[W 210201 08:36:16 data:49] /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/data/meteor/meteor-1.5.jar not found -- downloading https://github.com/GEM-benchmark/GEM-metrics/releases/download/data/meteor.tar.gz. This may take a few minutes.
...100%, 206 MB, 3052 KB/s, 69 seconds passed
[W 210201 08:37:25 data:54] Extracting from /var/folders/xm/6mtvt2mn4fn08d4c6y1_xv0w0000gn/T/tmpdn5ptzro to /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/data/meteor
Traceback (most recent call last):
  File "run_metrics.py", line 96, in <module>
    main(config)
  File "run_metrics.py", line 63, in main
    values = gem_metrics.compute(outs, refs, srcs, metric_dict)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/__init__.py", line 100, in compute
    values.update(metric.compute(outs, refs))
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/meteor.py", line 19, in compute
    meteor, _ = m.compute_score(predictions.untokenized, references.untokenized)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/impl/meteor.py", line 56, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().decode('UTF-8').strip()))
ValueError: could not convert string to float: ''

After that the script doesn't seem to do anything. So I halted it with CTRL+C. Then I ran the script again. Same error:

python run_metrics.py -r /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.outs.json 
Traceback (most recent call last):
  File "run_metrics.py", line 96, in <module>
    main(config)
  File "run_metrics.py", line 63, in main
    values = gem_metrics.compute(outs, refs, srcs, metric_dict)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/__init__.py", line 100, in compute
    values.update(metric.compute(outs, refs))
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/meteor.py", line 19, in compute
    meteor, _ = m.compute_score(predictions.untokenized, references.untokenized)
  File "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/gem_metrics/impl/meteor.py", line 56, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().decode('UTF-8').strip()))
ValueError: could not convert string to float: ''

tuetschek · 2021-02-01T11:23:26Z

Hmm strange... are you able to run the Meteor JAR separately?

evanmiltenburg · 2021-02-01T11:27:58Z

Yes:

(base) MacBook-Air-van-Emiel:meteor emiel$ echo "This is a test" > hyp.txt
(base) MacBook-Air-van-Emiel:meteor emiel$ echo "This is a test" > ref.txt
(base) MacBook-Air-van-Emiel:meteor emiel$ java -jar meteor-1.5.jar hyp.txt ref.txt
Meteor version: 1.5

Eval ID:        meteor-1.5-wo-en-no_norm-0.85_0.2_0.6_0.75-ex_st_sy_pa-1.0_0.6_0.8_0.6

Language:       English
Format:         plaintext
Task:           Ranking
Modules:        exact stem synonym paraphrase
Weights:        1.0 0.6 0.8 0.6
Parameters:     0.85 0.2 0.6 0.75

Segment 1 score:	1.0

System level statistics:


           Test Matches                  Reference Matches
Stage      Content  Function    Total    Content  Function    Total
1                1         3        4          1         3        4
2                0         0        0          0         0        0
3                0         0        0          0         0        0
4                0         0        0          0         0        0
Total            1         3        4          1         3        4

Test words:             4
Reference words:        4
Chunks:                 0
Precision:              1.0
Recall:                 1.0
f1:                     1.0
fMean:                  1.0
Fragmentation penalty:  0.0

Final score:            1.0

tuetschek · 2021-02-01T12:00:52Z

Thats really weird... are you able to run E2E metrics (https://github.com/tuetschek/e2e-metrics), or do you get the same error (i.e. did I screw up the transfer, or is that a different error?). I can't unfortunately try it out on a Mac... I'm on Linux (or Windows, but that I don't want to try).

evanmiltenburg · 2021-02-01T12:29:14Z

I have good news and I have bad news. The good news is that your port seems to work the same as the original e2e-code. The bad news is that I still get this error:

computing METEOR score...
Traceback (most recent call last):
  File "./measure_scores.py", line 380, in <module>
    evaluate(data_src, data_ref, data_sys, args.table, args.header, args.sys_file, args.python)
  File "./measure_scores.py", line 234, in evaluate
    coco_eval = run_coco_eval(data_ref, data_sys)
  File "./measure_scores.py", line 323, in run_coco_eval
    coco_eval.evaluate()
  File "/Users/emiel/Tilburg/Projects/GEM/e2e-metrics/pycocoevalcap/eval.py", line 54, in evaluate
    score, scores = scorer.compute_score(gts, res)
  File "/Users/emiel/Tilburg/Projects/GEM/e2e-metrics/pycocoevalcap/meteor/meteor.py", line 45, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().decode('UTF-8').strip()))
ValueError: could not convert string to float: ''

evanmiltenburg · 2021-02-01T12:39:22Z

OK, I've located the error. I've added this on line 55: print("Test:", self.meteor_p.stderr.readlines())
That results in this output:

Test: [b'Exception in thread "main" java.util.InputMismatchException\n', b'\tat java.base/java.util.Scanner.throwFor(Scanner.java:939)\n', b'\tat java.base/java.util.Scanner.next(Scanner.java:1594)\n', b'\tat java.base/java.util.Scanner.nextDouble(Scanner.java:2564)\n', b'\tat edu.cmu.meteor.scorer.MeteorStats.<init>(Unknown Source)\n', b'\tat Meteor.scoreStdio(Unknown Source)\n', b'\tat Meteor.main(Unknown Source)\n']

The fix for this issue seems to be discussed here: Maluuba/nlg-eval#91
Apparently it's an issue with newline characters?

Update:
I've tested the fix proposed there but still can't get it to work.

evanmiltenburg · 2021-02-01T13:05:12Z

Ah it might be a different issue, related to system locale! See: Maluuba/nlg-eval#32

evanmiltenburg · 2021-02-01T13:14:38Z

That fixed it! Solution is to change line 26 and 27 of gem_metrics/impl/meteor.py to:

        self.meteor_cmd = ['java', '-jar', '-Xmx2G', '-Duser.language=en', '-Duser.country=US', METEOR_JAR,
                           '-', '-', '-stdio', '-l', meteor_language, '-norm']

Output:

$ python run_metrics.py -r /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json /Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.outs.json 
{
    "predictions_file": null,
    "N": 2,
    "total_length": 27,
    "mean_pred_length": 13.5,
    "distinct-1": 0.6296296296296297,
    "vocab_size-1": 17,
    "unique-1": 9,
    "entropy-1": 3.95822916866988,
    "distinct-2": 0.8,
    "vocab_size-2": 20,
    "unique-2": 15,
    "entropy-2": 4.243856189774722,
    "cond_entropy-2": 0.2225626877266411,
    "distinct-3": 0.8695652173913043,
    "vocab_size-3": 20,
    "unique-3": 17,
    "entropy-3": 4.2626923908396215,
    "cond_entropy-3": 0.010140548890983897,
    "total_length-nopunct": 24,
    "mean_pred_length-nopunct": 12.0,
    "distinct-1-nopunct": 0.6666666666666666,
    "vocab_size-1-nopunct": 16,
    "unique-1-nopunct": 9,
    "entropy-1-nopunct": 3.8868421881310122,
    "distinct-2-nopunct": 0.8181818181818182,
    "vocab_size-2-nopunct": 18,
    "unique-2-nopunct": 14,
    "entropy-2-nopunct": 4.095795255000932,
    "cond_entropy-2-nopunct": 0.18150945892357126,
    "distinct-3-nopunct": 0.9,
    "vocab_size-3-nopunct": 18,
    "unique-3-nopunct": 16,
    "entropy-3-nopunct": 4.1219280948873624,
    "cond_entropy-3-nopunct": 0.012496476250064989,
    "msttr-100": NaN,
    "msttr-100_nopunct": NaN,
    "references_file": "/Users/emiel/Tilburg/Projects/GEM/GEM-metrics/test_data/single_dataset.refs.json",
    "bleu": 50.124334172849004,
    "meteor": 0.4427331109107928,
    "local_recall": {
        "1": 0.0,
        "2": 0.2857142857142857,
        "3": 0.8888888888888888
    },
    "rouge1": {
        "low": {
            "precision": 0.75,
            "recall": 0.7428571428571429,
            "fmeasure": 0.7489655172413794
        },
        "mid": {
            "precision": 0.7638888888888888,
            "recall": 0.7487012987012986,
            "fmeasure": 0.7521051362430673
        },
        "high": {
            "precision": 0.7777777777777778,
            "recall": 0.7545454545454545,
            "fmeasure": 0.7552447552447551
        }
    },
    "rouge2": {
        "low": {
            "precision": 0.5151515151515151,
            "recall": 0.4118589743589744,
            "fmeasure": 0.4567901234567901
        },
        "mid": {
            "precision": 0.5171911421911422,
            "recall": 0.45112179487179493,
            "fmeasure": 0.4805318138651472
        },
        "high": {
            "precision": 0.5192307692307692,
            "recall": 0.4903846153846154,
            "fmeasure": 0.5042735042735044
        }
    },
    "rougeL": {
        "low": {
            "precision": 0.6944444444444443,
            "recall": 0.6863636363636364,
            "fmeasure": 0.6889655172413793
        },
        "mid": {
            "precision": 0.6954365079365079,
            "recall": 0.7169913419913421,
            "fmeasure": 0.7022916164295474
        },
        "high": {
            "precision": 0.6964285714285714,
            "recall": 0.7476190476190476,
            "fmeasure": 0.7156177156177156
        }
    },
    "rougeLsum": {
        "low": {
            "precision": 0.6944444444444443,
            "recall": 0.6863636363636364,
            "fmeasure": 0.6889655172413793
        },
        "mid": {
            "precision": 0.6954365079365079,
            "recall": 0.7169913419913421,
            "fmeasure": 0.7022916164295474
        },
        "high": {
            "precision": 0.6964285714285714,
            "recall": 0.7476190476190476,
            "fmeasure": 0.7156177156177156
        }
    }
}

Change the Java locale to en.US so that it works for everyone, regardless of their enviroment.

Fix for #18

tuetschek · 2021-02-03T18:01:46Z

The fix works fine for me – thanks @evanmiltenburg & @sebastianGehrmann !

evanmiltenburg added a commit that referenced this issue Feb 1, 2021

Fix for #18

0dd5c66

Change the Java locale to en.US so that it works for everyone, regardless of their enviroment.

sebastianGehrmann added a commit that referenced this issue Feb 3, 2021

Merge pull request #20 from GEM-benchmark/evanmiltenburg-patch-1

8aa660b

Fix for #18

tuetschek closed this as completed Feb 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError in meteor.py #18

ValueError in meteor.py #18

evanmiltenburg commented Feb 1, 2021 •

edited

Loading

tuetschek commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021

tuetschek commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021 •

edited

Loading

evanmiltenburg commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021

tuetschek commented Feb 3, 2021

ValueError in meteor.py #18

ValueError in meteor.py #18

Comments

evanmiltenburg commented Feb 1, 2021 • edited Loading

tuetschek commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021

tuetschek commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021 • edited Loading

evanmiltenburg commented Feb 1, 2021

evanmiltenburg commented Feb 1, 2021

tuetschek commented Feb 3, 2021

evanmiltenburg commented Feb 1, 2021 •

edited

Loading

evanmiltenburg commented Feb 1, 2021 •

edited

Loading