Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing inter word whitespace for pdfs in .cls #56

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Diego-Zulu
Copy link

Hey there

I noticed I was having some problems when my CV was getting auto-parsed by some careers websites. The problem was that all parsed text was missing spaces between words.

After digging into it some more, I found out it's a problem in which PDF viewers mistakenly confuse inter word whitespace with other kinds of "unnecessary" whitespace (like the indent due to text justification), and thus removes it. I also noticed I was not the first one to have this problem.

This was not the case for all PDF viewers though, as MacOS' preview correctly injected the space characters when trying to copy and paste text from a rendered Deedy Resume. Nevertheless, I found out the whitespace removal error happened on:

  • Firefox's PDF viewer
  • Microsoft Edge's PDF viewer
  • Overleaf's PDF viewer (the default one, when not using the native option)

The way around it I found was:

  1. Forgo XeTeX, compile with LuaLaTeX instead
  2. Add WordSpace={1.1} on the default font. 1.1 was the minimum inter word space size I found which fixed the problem

You can see the results here. Just download both of them, open them with one of the affected browser's PDF viewer, and try to copy and paste text from them:

(Make sure browser's cache is not affecting your tests)

I'm interested in compiling a list of viewers for which this works and does not work, and find quick fixes for expanded compatibility. I'll probably try and parse a Deedy Resume with OpenCTAS, but I'll need some more time to install it first.


THIS FIX WORKS ON:

  • Firefox's PDF viewer
  • Microsoft Edge's PDF viewer
  • Overleaf's PDF viewer (the default one, when not using the native option)
  • Chrome's PDF viewer*
  • Safari's PDF viewer*
  • Opera's PDF viewer*
  • MacOS' preview*
  • MacOS' Adobe Acrobat*
  • Microsoft Word's PDF to Word Parser*
  • Google Drive's preview*
  • Google Drive's PDF to Doc Parser**

* = Original already worked with this one
** = Original worked with varying degrees of success, now fixed

DOES NOT WORK ON:

N/A


P.S.: If you need inter-word whitespaces to work on a viewer and this quick fix didn't work, try using a bigger word space, like 1.5. A bigger word space may help the viewer to discern where to add spaces.

P.S.2: I understand this make be a change that won't necessarily be merged into master, as we are changing the XeteX dependency. Nevertheless, I'm happy so long as people find this useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant