Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Todo for v0.3.0 #12

Closed
betatim opened this issue Jun 30, 2020 · 4 comments · Fixed by #13 or #14
Closed

Todo for v0.3.0 #12

betatim opened this issue Jun 30, 2020 · 4 comments · Fixed by #13 or #14

Comments

@betatim
Copy link
Owner

betatim commented Jun 30, 2020

Things to do for v0.3.0

@betatim
Copy link
Owner Author

betatim commented Jun 30, 2020

For the stretch goal: I'd like to have a outline/table of contents like in this picture:

gonsU

To make it happen we probably need PdfFileWrite.addBookmark() and a way to figure out where on the page the <h1> tag is so we can jump to it. For now I'd just list <h1> tags and deal with nesting later.

@adavidzh
Copy link

adavidzh commented Jul 1, 2020

For the stretch goal: I'd like to have a outline/table of contents like in this picture:
[...]
To make it happen we probably need PdfFileWrite.addBookmark() and a way to figure out where on the page the <h1> tag is so we can jump to it. For now I'd just list <h1> tags and deal with nesting later.

I agree that pages are arbitrary and bookmarks would be great.

FWIW, https://site/file.pdf#page=N is a staple for me when sharing information and I wonder if one could make use of a more granular way of linking to PDF document.

@betatim
Copy link
Owner Author

betatim commented Jul 1, 2020

function getOffset( el ) {
    var _x = 0;
    var _y = 0;
    while( el && !isNaN( el.offsetLeft ) && !isNaN( el.offsetTop ) ) {
        _x += el.offsetLeft - el.scrollLeft;
        _y += el.offsetTop - el.scrollTop;
        el = el.offsetParent;
    }
    return { top: _y, left: _x };
}

will compute the distance from the top (and left) of an element on a web page. Which we can then use with:

for (const elem of document.getElementsByTagName("h1")) {
    console.log(elem, getOffset(elem).top, elem.innerText)
}

to get the positions of all the H1s on the page. Once we have this information we need to return the position and text from chromium to Python and then call addBookmark().

@betatim
Copy link
Owner Author

betatim commented Jul 1, 2020

This is the Python we need to do this:

    await page.evaluate("""
    function getOffset( el ) {
        var _x = 0;
        var _y = 0;
        while( el && !isNaN( el.offsetLeft ) && !isNaN( el.offsetTop ) ) {
            _x += el.offsetLeft - el.scrollLeft;
            _y += el.offsetTop - el.scrollTop;
            el = el.offsetParent;
        }
        return { top: _y, left: _x };
        }
    """, force_expr=True)

    h1s = await page.evaluate(
        """() => {
        var vals = []
        for (const elem of document.getElementsByTagName("h1")) {
            console.log(elem, getOffset(elem).top, elem.innerText)
            vals.push({ top: getOffset(elem).top, text: elem.innerText })
        }
        return vals
    }"""
    )

then h1s will contain the text and "distance from the top of the page" for each h1 tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants