You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason as far as I can tell is that, when a tag contains a child tag as its first element, its 'string' property is None. The alternative is to use the 'text' property, which is at least an empty string if there is no text content, but also properly picks up the text that follows the initial child tag. Here's an example that produced the exception during parsing:
In [1]: importbs4In [2]: q=bs4.BeautifulSoup("""<td><span class="custom_link" onmouseover="lovd_showToolTip('<A href=\'http://www.ncbi.nlm.nih.gov/clinvar/?term=126217\' target=\'_blank\'>ClinVar-126217</A>', this);">ClinVar-126217</span>; Tea et al. 2014</td>""")
# warning about specifying "html.parser" omitted for brevityIn [3]: printq.stringNoneIn [4]: printq.textClinVar-126217; Teaetal. 2014
I'm unsure if returning what text can be extracted here is correct, or if it should simply return an empty string. I've gone with using columns.text in my patch, which I'll also submit as a PR for you to modify at your discretion.
The text was updated successfully, but these errors were encountered:
When attempting to download BRCA2 gene data using extract_data.py, the following line throws an AttributeError exception when attempting to call .strip() on columns.string at the following line: https://github.com/andrewhill157/leiden/blob/master/leiden/leiden_database.py#L594.
The reason as far as I can tell is that, when a tag contains a child tag as its first element, its 'string' property is None. The alternative is to use the 'text' property, which is at least an empty string if there is no text content, but also properly picks up the text that follows the initial child tag. Here's an example that produced the exception during parsing:
I'm unsure if returning what text can be extracted here is correct, or if it should simply return an empty string. I've gone with using columns.text in my patch, which I'll also submit as a PR for you to modify at your discretion.
The text was updated successfully, but these errors were encountered: