Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underline Tags #32

Open
alanbacon opened this issue Feb 23, 2016 · 4 comments
Open

Underline Tags #32

alanbacon opened this issue Feb 23, 2016 · 4 comments

Comments

@alanbacon
Copy link

Underline tags in docx files are missed. The offending lines are parse.py:88 and parse.py:31 on commit 833e658.

The master branch treats the underline tag as having only two possible states, on or off and does not account for the fact the underline tag will actually contain string values such as 'single', 'double', 'dashed' etc. I have a branch of the code that will update the rpr dictionary accordingly by altering the code around the two lines that I mentioned, a 'u' field of the dictionary will be added with a string value representing the type of underlining.

However I do not know what further implication this will have. Does another part of this project assume that the 'u' field of rpr will either not exist or take on a true or false value.

@danielhjames
Copy link
Contributor

I see in https://msdn.microsoft.com/en-us/library/office/ff822388%28v=office.15%29.aspx that many types of underlining are possible. So is the intention that, for example:

<w:r>
  <w:rPr>
    <w:u w:val="double"/>
  </w:rPr>
  <w:t>double underlined</w:t>
</w:r>

would become this in the HTML output?

<p>
  <span class="double">double underlined</span>
</p>

Sadly it seems text-decoration-style: double (https://developer.mozilla.org/en-US/docs/Web/CSS/text-decoration-style) only works in Firefox for now.

@aerkalov
Copy link
Contributor

@alanbacon I would say that nothing will be broken with the changes you made. The only problem is that we have 1/0 for the underscore value. For the next step I assume we would need to preserve information what kind of underscore line it as and implement serializers for undersocre (and bold, italic, ...). That would be for the case when we only want to serialize text with single line or use different kind of implementations for underscore. That would help us with different requirements in projects. As it is with your change, it would always end up as single line.

BTW, are you using this lib somewhere? What are your needs? More concerned with parsing part or the serialization?

@alanbacon
Copy link
Author

The only problem is that we have 1/0 for the underscore value

Yes this is why I was concerned that something somewhere might break. The underline status should really be stored a string (or possible an enum) not a boolean. My needs are few, I use this library to read .docx files into the python workspace, once I have the python document object I write my own custom code from then to extract the data I'm interesting in into my own structure format. This library was very helpful for me as I did not have to have any great understanding of the ooxml file structure.

@alanbacon
Copy link
Author

Pull request submitted: #34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants