stanc gives useless error message when model source has non-ASCII characters #501

ksvanhorn · 2018-03-09T00:47:32Z

Summary:

If you have a file that contains a non-ASCII character, then running stanc on the file just tells you that there was a C++ exception, and nothing else.

Description:

If your Stan source contains a non-ASCII character, then stanc just dies with "c++ exception (unknown reason)", which doesn't help in tracking down the cause of the problem, and makes it look like some sort of system instability.

Reproducible Steps:

Write out the "Eight Schools" model to a file schools.stan, then replace all occurrences of the identifier "mu" with the unicode character "μ". Save the results as UTF-8. Then run

stanc("schools.stan")

Current Output:

The following error message:

Error in stanc("~/Tmp/foo.stan") : c++ exception (unknown reason)

Expected Output:

Something telling me that I have illegal, non-ASCII characters in my Stan source.

RStan Version:

2.17.2

R Version:

R version 3.4.3 (2017-11-30)

Operating System:

OS X 10.13.3

The text was updated successfully, but these errors were encountered:

maverickg · 2018-03-09T02:44:53Z

What is the error message for other problematic Stan model code in your current rstan? What is the error message if you use stanc in cmdstan?

bob-carpenter · 2018-03-09T13:54:52Z

That's not what happens for me. I'm on Mac OS X 10.10.5 and RStan 2.17.3 (one beyond where you're at). Is this still an issue for you in 2.17.3?

parameters {
  real μ;
}
model {
  μ ~ normal(0, 1);
}

Here's what I see:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

  error in 'modelbdbe5bf695ec_pe' at line 2, column 10
  -------------------------------------------------
     1: parameters {
     2:   real μ;
                 ^
     3: }
  -------------------------------------------------

PARSER EXPECTED: <identifier>

I agree that could be a much clearer message as to why there's a problem.

ksvanhorn · 2018-03-09T18:23:44Z

I think this is related to issue #431; when I reinstalled rstan by compiling from source as recommended here under "Troubleshooting," this problem went away.

BTW, in response to Bob's comment, the really nasty problem here was not having any idea what line contained the error; even the minimal message "PARSER EXPECTED: ", with a specific line indicated, is hugely more helpful than "c++ exception (unknown reason)".

bob-carpenter · 2018-03-09T20:23:25Z

Thanks for reporting back. I'll close the issue, then. We really do need to be more proactive in the parser about non-ASCII characters and catch the problems right away. I'm going to open an issue to do just that in Stan.:

stan-dev/stan#2485

There's already an issue to allow unicode with UTF-8 encodings; I link to that issue from the issue above.

bob-carpenter mentioned this issue Mar 9, 2018

illegal encoding warning stan-dev/stan#2485

Closed

bob-carpenter closed this as completed Mar 9, 2018

alashworth mentioned this issue Mar 12, 2019

illegal encoding warning alashworth/test-issue-import#188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stanc gives useless error message when model source has non-ASCII characters #501

stanc gives useless error message when model source has non-ASCII characters #501

ksvanhorn commented Mar 9, 2018

maverickg commented Mar 9, 2018 via email

bob-carpenter commented Mar 9, 2018

ksvanhorn commented Mar 9, 2018

bob-carpenter commented Mar 9, 2018

stanc gives useless error message when model source has non-ASCII characters #501

stanc gives useless error message when model source has non-ASCII characters #501

Comments

ksvanhorn commented Mar 9, 2018

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

RStan Version:

R Version:

Operating System:

maverickg commented Mar 9, 2018 via email

bob-carpenter commented Mar 9, 2018

ksvanhorn commented Mar 9, 2018

bob-carpenter commented Mar 9, 2018