-
-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[html rowspan] Rowspan is not handled. #1308
Comments
I realized later that the animation was missing the w3m output. Maybe next week I can show that. Though I think from the Visidata output you can see that for some rows the values are shifted left without a direct cell in the column with the rowspan. |
Here is the w3m-only screenshot. The first one shows the table with rowspan, and the second example without rowspan. https://asciinema.org/a/SE31VbT4U156d9s1VBFJO7Rxs That first column that uses rowspan should touch all rows that are spanned, not just the first row. |
Thank you for providing the two nearly identical sets of sample data, one with rowspan and one without. It helps with seeing the problem much clearer! |
Rowspans used to be at least partially handled! I think they were not handled in the way that is expected, and the logic needs to be adjusted. But they were not resulting in an Exception. This is the change where the Cell-Exceptions started: 8a663b8 |
One thing to note: VisiData expects that the rowspan attribute is in a I.e. this is what VisiData is expecting for
Edit: It seems like
|
I'm not sure if you are asking me. I do think the rowspan should be handled for |
Hi @anjakefala I had looked a little more at this issue. It reminds me how odd html tables are as a data format. And when I look for live examples, I find worst examples. Here is a simple example file I created that shows the difference between header and data rows. The html is basically the same for the header as the body (except the replacing of th/td and thead/tbody). You can see that w3m formats them the same, but visidata has a different view. For the header rows, it looks like visidata is doing the expected thing by flattening the values into one header row. For the body rows it shows this bug where colspan/rowspan are ignored. https://asciinema.org/a/7GI0SKWYPecD8hcq1utN6RrPU <table border>
<thead>
<tr>
<th rowspan=2 colspan=2>1.1 </th>
<th>1.3 </th>
</tr>
<tr>
<th> 2.3 </th>
</tr>
<tr>
<th> 3.1 </th>
<th colspan=2> 3.2 </th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan=2 colspan=2>1.1 </td>
<td>1.3 </td>
</tr>
<tr>
<td> 2.3 </td>
</tr>
<tr>
<td> 3.1 </td>
<td colspan=2> 3.2 </td>
</tr>
</tbody>
</table> |
FYI: I found a tool that claims to handle reading tables with a colspan/rowspan: https://github.com/rocheio/wiki-table-scrape It works if you have only one of the types of spans, but my simple tests suggested it doesn't properly handle both types of spans for the same cell. It looks like it misses the lower right corner of a colspan=2 rowspan=2 |
VisiData has an alternate way to read html with pandas, so I tried that, but I found a new bug: #1986 Pandas |
The panda's reader also seem to have issues with some of the tables I want to be able to read. Here is a deep dive of how to parse html tables including algorithms: https://html.spec.whatwg.org/multipage/tables.html#table-processing-model |
One thing in reading this is that a table is modeled by a 2-D grid of slots, very much like VisiData. Some slots can be empty, or they can be occupied one or more cells (e.g. TD/TH). Cells occupy the slot they first encounter, and may occupy more, but only to the right and down because of colspan/rowspan:
|
Small description
HTML table loading doesn't handle rowspan properly.
Expected result
The data in the rowspan column is duplicated on the spanned rows.
Actual result with screenshot
https://asciinema.org/a/qotdqplkmpKJxPUQUC5kaGHEd
Animation shows html files, how w3m renders the data, and how Visidata shows data.
The columns with the errors have this exception:
Steps to reproduce with sample data and a .vd
I would expect the data from columns 2 & 3 for both of these tables would be the same.
Regular 3x3.html (works)
With row span 3x3-rowspan.html (breaks):
Additional context
Please include the version of VisiData. Using latest version from develop branch
I have data sources I try to use with Visidata that make use of rowspan to format html tables.
There is code in the html loader to handle rowspan for column headers.
The text was updated successfully, but these errors were encountered: