Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go through notes #1

Open
MichaelCurrin opened this issue Jun 2, 2020 · 0 comments
Open

Go through notes #1

MichaelCurrin opened this issue Jun 2, 2020 · 0 comments

Comments

@MichaelCurrin
Copy link
Owner

From plan.txt Dec 2018

Aim:
    Recover URLs which I want to bookmark
        Requires manually reading the page
            Some can be ignored if too much repetition on area or don't need
        Could add to bookmarks to avoid duplication and use some manual sorting into folder
    Make them easy to find
    Read them
    Generate once off report as CSV
        Need domains and pages together
        But also group by visited in periods - column for page to filter by
        Count instead of actual dates

Using frozen dump to recover tab from past year. Afterwards things are sent to bookmarks.


Parsing
    urlparse('')
    => ParseResult(scheme='', netloc='', path='', params='', query='', fragment='')

    from urllib.parse import ParseResult*
    x = ParseResult(*('scheme', 'netloc', 'path', 'params', 'query', 'fragment'))
    x.geturl()
    'scheme://netloc/path;params?query#fragment'



Unicode
    Errors were just in VC maybe? PyCharm is fine.

    TODO: Find out what encoding is used to make use of unicode characters which appear in URLs (such as equals sign) and possibly emojis or at least show emojis as ASCII.
    Some titles contain emojis. Normal unicode characters can only be parsed after emojis are replaced.
    Some URLs are broken
    Check for which sites
    URL can be found using title and domain search.

    https://stackoverflow.com/questions/33485255/python-decoding-a-string-that-consists-of-both-unicode-code-points-and-unicode
    Input either
        codecs.decode('\\u002d', 'unicode_escape')
        '\u002d'
    Gives
        '-'

Categories from transitions
{
    'LINK': 11626,
    'TYPED': 731,
    'AUTO_BOOKMARK': 127
    'RELOAD': 2313, Work keeping just in case.

    'GENERATED': 588,  Generated - google searches
    'AUTO_TOPLEVEL': 579, Chrome native - ignore
    'FORM_SUBMIT': 528,
}


Firebase could be a backend to access from anywhere but still need frontend to be setup on work and home laptop and reachable by cellphone if using that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant