Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested (infobox) templates result in empty pages #23

Open
IsaacHaze opened this issue Dec 4, 2014 · 1 comment
Open

Nested (infobox) templates result in empty pages #23

IsaacHaze opened this issue Dec 4, 2014 · 1 comment
Assignees

Comments

@IsaacHaze
Copy link
Contributor

There are no links found for the sample pages "Andre Agassi" or "Groen (partij)".

In [1]: from semanticizest.parse_wikidump import clean_text

In [2]: clean_text("""{{ def }}abc""")
Out[2]: 'abc'

In [3]: clean_text("""{{ def {{123}} }}abc""")
Out[3]: ' }}abc'

In [4]: clean_text("""{{ def
   ...:    ...: | asd = [[34]]
   ...:    ...: | wqe = {{be|blaat}}
   ...:    ...: | vrouwen = 
   ...:    ...: }}
   ...:    ...: [[nep:perd|0px]]
   ...:    ...: abc
   ...:    ...: """)
Out[4]: '\n'

The _UNWANTED regex needs tweaking...

@IsaacHaze IsaacHaze self-assigned this Dec 4, 2014
@larsmans
Copy link
Contributor

larsmans commented Dec 4, 2014

Meh... stupid nested wikisyntax...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants