Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: presence of trailing comma (#416) #418

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions crates/core/src/inline_snippets.rs
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,27 @@ fn delete_hanging_comma(
replacements: &mut [(EffectRange, String)],
offset: usize,
) -> Result<(String, Vec<ReplacementInfo>)> {
// Handle the case when after applying replacements, a single comma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely incorrect. You should not be assuming the comma is on a single line.

The original bug is also present even if arguments are on 1 line: https://app.grit.io/studio?key=-Pm14wW6mvH0S41J_hFh8

Please think logically, don't just try to brute force the single case.

Copy link
Contributor Author

@abhishek818 abhishek818 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgante
you mean when it is like this? :

bibtex_citation="",
n_samples={_EVAL_SPLIT: 1820}, avg_character_length={_EVAL_SPLIT: 78},

then my current fix prints with a extra space:

bibtex_citation="",
 stats=GeneralDescriptiveStats(n_samples={_EVAL_SPLIT: 1820}, avg_character_length={_EVAL_SPLIT: 78}),

But this is due to the single space present between both the matching words (just after the comma of first argument)

Copy link
Contributor Author

@abhishek818 abhishek818 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find any other edge cases (except for arguments on multiple lines which works)
Since I am still new to gritql, can you provide a case where it fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix completely fails in cases where the arguments are on 1 line. The original bug isn't related to line handling at all and introducing that is a fundamental mistake in your fix.


/// Same as above test, but ensures the behavior doesn't depend on line breaks
#[test]
fn trailing_comma_after_argument_removal_one_line() {
    run_test_expected({
        TestArgExpected {
            pattern: r#"
                language python
                `TaskMetadata($args)` where {
                        $args <: any {
                            contains `n_samples=$_` as $ns_kwarg where {
                                $ns_kwarg <: `n_samples = $ns_val` => .
                            },
                            contains `avg_character_length=$_` as $avg_kwarg where {
                                $avg_kwarg <: `avg_character_length = $avg_val` => `stats=GeneralDescriptiveStats(n_samples=$ns_val, avg_character_length=$avg_val)`
                            },
                        },
                    }
                "#.to_owned(),
            source: r#"
                TaskMetadata(description="Parallel news titles from the Tbilisi City Hall website (https://tbilisi.gov.ge/).", main_score="f1", domains=["News"], text_creation="created", n_samples={_EVAL_SPLIT: 1820}, reference="https://huggingface.co/datasets/jupyterjazz/tbilisi-city-hall-titles")
            "#
                .to_owned(),
            expected: r#"
                TaskMetadata(description="Parallel news titles from the Tbilisi City Hall website (https://tbilisi.gov.ge/).", main_score="f1", domains=["News"], text_creation="created",   reference="https://huggingface.co/datasets/jupyterjazz/tbilisi-city-hall-titles")
            "#
                .to_owned(),
        }
    })
    .unwrap();
}

Also has nothing to do with GritQL, you don't need to change the pattern to see how this is broken.

// gets left behind on a single line
let mut temp_code = code.to_string();
for (range, snippet) in replacements.iter_mut() {
let adjusted_range = adjust_range(&range.effective_range(), offset, &temp_code)?;
if adjusted_range.start > temp_code.len() || adjusted_range.end > temp_code.len() {
bail!("Range {:?} is out of bounds for code:\n{}\n", adjusted_range, temp_code);
}

temp_code.replace_range(adjusted_range.clone(), snippet);

let line_start = temp_code[..adjusted_range.start].rfind('\n').map_or(0, |pos| pos + 1);
let line_end = temp_code[adjusted_range.start..].find('\n').map_or(temp_code.len(), |pos| adjusted_range.start + pos);
let line_content = temp_code[line_start..line_end].trim();

if line_content == "," {
// inclusion of "," index in replacements
range.range.end += 1;
}
}

let deletion_ranges = replacements
.iter()
.filter_map(|r| {
Expand Down
99 changes: 99 additions & 0 deletions crates/core/src/test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12014,6 +12014,105 @@ fn trailing_comma_import_from_python_with_alias() {
.unwrap();
}

// refer https://github.com/getgrit/gritql/issues/416
#[test]
fn trailing_comma_after_argument_removal() {
run_test_expected({
TestArgExpected {
pattern: r#"
language python
`TaskMetadata($args)` where {
$args <: any {
contains `n_samples=$_` as $ns_kwarg where {
$ns_kwarg <: `n_samples = $ns_val` => .
},
contains `avg_character_length=$_` as $avg_kwarg where {
$avg_kwarg <: `avg_character_length = $avg_val` => `stats=GeneralDescriptiveStats(n_samples=$ns_val, avg_character_length=$avg_val)`
},
},
}
"#.to_owned(),
source: r#"
from pydantic import BaseModel


class TaskMetadata(BaseModel):
n_samples: dict[str, int]
avg_character_length: dict[str, float]


if __name__ == "__main__":
TaskMetadata(
name="TbilisiCityHallBitextMining",
dataset={
"path": "jupyterjazz/tbilisi-city-hall-titles",
"revision": "798bb599140565cca2dab8473035fa167e5ee602",
},
description="Parallel news titles from the Tbilisi City Hall website (https://tbilisi.gov.ge/).",
type="BitextMining",
category="s2s",
eval_splits=[_EVAL_SPLIT],
eval_langs=_EVAL_LANGS,
main_score="f1",
domains=["News"],
text_creation="created",
n_samples={_EVAL_SPLIT: 1820},
reference="https://huggingface.co/datasets/jupyterjazz/tbilisi-city-hall-titles",
date=("2024-05-02", "2024-05-03"),
form=["written"],
task_subtypes=[],
license="Not specified",
socioeconomic_status="mixed",
annotations_creators="derived",
dialect=[],
bibtex_citation="",
avg_character_length={_EVAL_SPLIT: 78},
)
"#
.to_owned(),
expected: r#"
from pydantic import BaseModel


class TaskMetadata(BaseModel):
n_samples: dict[str, int]
avg_character_length: dict[str, float]


if __name__ == "__main__":
TaskMetadata(
name="TbilisiCityHallBitextMining",
dataset={
"path": "jupyterjazz/tbilisi-city-hall-titles",
"revision": "798bb599140565cca2dab8473035fa167e5ee602",
},
description="Parallel news titles from the Tbilisi City Hall website (https://tbilisi.gov.ge/).",
type="BitextMining",
category="s2s",
eval_splits=[_EVAL_SPLIT],
eval_langs=_EVAL_LANGS,
main_score="f1",
domains=["News"],
text_creation="created",

reference="https://huggingface.co/datasets/jupyterjazz/tbilisi-city-hall-titles",
date=("2024-05-02", "2024-05-03"),
form=["written"],
task_subtypes=[],
license="Not specified",
socioeconomic_status="mixed",
annotations_creators="derived",
dialect=[],
bibtex_citation="",
stats=GeneralDescriptiveStats(n_samples={_EVAL_SPLIT: 1820}, avg_character_length={_EVAL_SPLIT: 78}),
)
"#
.to_owned(),
}
})
.unwrap();
}

#[test]
fn python_orphaned_from_imports() {
run_test_expected({
Expand Down
Loading