update porcelain.status to output string #890

DedSecer · 2021-07-29T14:21:19Z

Issues #889

jelmer · 2021-07-29T22:17:14Z

Unfortunately it's not as simple as this - this changes the API of porcelain.status(), breaking any existing callers.

codecov · 2021-07-30T01:21:28Z

Codecov Report

Merging #890 (e58448b) into master (2f9248d) will increase coverage by 0.00%.
The diff coverage is 88.88%.

@@           Coverage Diff           @@
##           master     #890   +/-   ##
=======================================
  Coverage   84.64%   84.64%           
=======================================
  Files          91       91           
  Lines       22349    22352    +3     
  Branches     2403     2406    +3     
=======================================
+ Hits        18917    18920    +3     
  Misses       3009     3009           
  Partials      423      423

Impacted Files	Coverage Δ
dulwich/cli.py	`0.00% <0.00%> (ø)`
dulwich/porcelain.py	`81.17% <100.00%> (+0.07%)`	⬆️
dulwich/tests/test_porcelain.py	`99.76% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f9248d...e58448b. Read the comment docs.

jelmer · 2021-07-30T08:14:23Z

This still changes the API - any code in applications that use dulwich that expected bytestrings will now receive plain strings and break.

DedSecer · 2021-07-30T08:18:20Z

So shouldn’t we change the porcelain.status?

jelmer · 2021-08-14T15:16:48Z

We need to figure out a path forward - if we change this, it's going break everybody who relies on this API.

There are a couple of possibilities:

We break it, but do so at a major release and announce it as part of the API changes
We add a flag to enable the new behaviour; this is a little bit icky since we'll have to support both behaviours going forward
We can add another status() function that has the new behaviour

jelmer · 2021-08-14T15:14:14Z

dulwich/porcelain.py

        # 2. Get status of unstaged
        index = r.open_index()
        normalizer = r.get_blob_normalizer()
        filter_callback = normalizer.checkin_normalize
        unstaged_changes = list(get_unstaged_changes(index, r.path, filter_callback))
+        unstaged_changes = [file.decode() for file in unstaged_changes]


This is going to decode with the current encoding (e.g. utf-8) which may not work if the filename contains bytes that can't be decoded with the current encoding (e.g. utf-8) or even if they can it may not result in the string you were expecting (file system encoding and display encoding may be different).

Can we set the default encoding in the script which using dulwich to resolve it?

That would be inconsistent with the way the rest of Dulwich works. And if you're making the caller worry about the encoding, why not let them do the decoding as well?

Generally, it can decode the bytes with defaultcoding, if it doesn't work, we just need to use # -*- coding: xxx -*- at the top of the file.(I think if the defaultcoding is incorrect, there will be other problems too)

The encoding on top of the file is for the source file itself, not for decoding at run time.

If you're a user of Dulwich and you have a repo that happens to use latin1 as encoding while your current encoding is utf8, how would that work?

But if current encoding doesn't match to the default encoding, Dulwich.status will still output bytes unreadable. Then what should we do?

For some uses cases, bytes are sufficient. You also can choose what to do in that case if you want to display the output - you can ignore invalid characters, replace them with surrogates or e.g. use chardet to try to figure out the proper encoding if it isn't utf8.

I'd be open to returning a surrogateescape-decoded string here, since it's possible for the caller to undo that and retrieve the original bytes but that would still be a breaking change for existing users of the API. We should also force utf8 in that case (since that's the convention for git repositories) rather than whatever the users' encoding is.

I think the function in porcelain should work like the git subcommands(a higher function), or I should decode the bytes after using porcelain.status.

update porcelain.status to output string

8bfcb13

DedSecer added 2 commits July 30, 2021 09:04

update API of porcelain.status

a7cf333

fix sytle checking

e58448b

jelmer reviewed Aug 14, 2021

View reviewed changes

DedSecer requested a review from jelmer August 22, 2021 03:32

jelmer marked this pull request as draft October 7, 2021 22:14

jelmer force-pushed the master branch 2 times, most recently from f1ae053 to cd30df4 Compare October 16, 2024 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update porcelain.status to output string #890

update porcelain.status to output string #890

DedSecer commented Jul 29, 2021

jelmer commented Jul 29, 2021

codecov bot commented Jul 30, 2021 •

edited

Loading

jelmer commented Jul 30, 2021

DedSecer commented Jul 30, 2021

jelmer commented Aug 14, 2021

jelmer Aug 14, 2021

DedSecer Aug 15, 2021

jelmer Aug 30, 2021

DedSecer Oct 9, 2021

jelmer Oct 9, 2021

DedSecer Oct 9, 2021

DedSecer Oct 29, 2021

jelmer Oct 30, 2021

jelmer Oct 30, 2021

DedSecer Oct 30, 2021

update porcelain.status to output string #890

Are you sure you want to change the base?

update porcelain.status to output string #890

Conversation

DedSecer commented Jul 29, 2021

jelmer commented Jul 29, 2021

codecov bot commented Jul 30, 2021 • edited Loading

Codecov Report

jelmer commented Jul 30, 2021

DedSecer commented Jul 30, 2021

jelmer commented Aug 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 30, 2021 •

edited

Loading