Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add empty chunk handling, importing python crate dependencies, tx filtering by address, support for new datatypes in python crate #147

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

davidthegardens
Copy link

@davidthegardens davidthegardens commented Dec 7, 2023

Motivation

  1. Closes pip installation does not require arrow #137, also pip did not install polars and some other libraries, this was also fixed.
  2. Added filtering transactions by --address, which acts as an OR over filtering by --to-address and --from-address.
  3. Adding the argument --write-empty to prevent from writing empty dataframes to disk
  4. Added support for the new datatypes in the python crate

Solution

  1. added the packages and versions into the build file
  2. created a closure that copies the conditions of to and from filtering, adding the from-address into env.execution, and creating a global vector to log the tx hashes that have been moved (in order to prevent duplicates that arise from async)
  3. added support for the flag, and added a couple lines that checks the dataframes shape. If shape is 0 and --write-empty is false, then it will continue rather than write to disk.
  4. allowed the python crate to take in the new datatypes (like transaction), I can't take much of the credit, this was mostly already supported.

PR Checklist

  • Added Tests
  • Added Documentation
  • Breaking changes

@davidthegardens davidthegardens changed the title various-fixes add empty chunk handling, importing python crate dependencies, added tx filtering option, added new datatype support in python crate Dec 7, 2023
@davidthegardens davidthegardens changed the title add empty chunk handling, importing python crate dependencies, added tx filtering option, added new datatype support in python crate add empty chunk handling, importing python crate dependencies, tx filtering by address, support for new datatypes in python crate Dec 7, 2023
Copy link
Member

@sslivkoff sslivkoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi. finally have some time to go through PR's

some of this PR is ready to merge. other parts need a couple iterations. left some comments (if you would like some parts to be merged faster you can split into multiple PRs)

crates/cli/src/args.rs Outdated Show resolved Hide resolved
crates/freeze/src/datasets/erc20_transfers.rs Outdated Show resolved Hide resolved
let addr_filter: Box<dyn Fn(&Transaction) -> bool + Send> =
if let Some(address) = &request.address {
Box::new(move |tx| {
let mut transactions_set = TRANSACTIONS_SET.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for using TRANSACTIONS_SET here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases where you prompt 2 or more addresses, anytime both the sender and receiver of the transaction are in --addresses, you will get duplicated transactions in the df. Originally, I used TRANSACTIONS_SET to prevent duplicates.

I've since made things more efficient and avoided global variables by making the list of addresses accessible.
This is how it works now:

from_bool = addresses.contains(from_address)
to_bool = addresses.contains(to_address)
if from_bool and to_bool:
    return from_address==current_address
else:
    return from_address==current_address or to_address==current_address

I'm unsure if the way I'm getting the list of addresses is the best way, but I couldn't figure out how else to do it. Currently, it gets the addresses by parsing 'cli_command' from ExecutionEnv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, looking very clean now

additionally I think the request.addresses() will give the relevant info without need for get_addresses(). no need to parse cli arg data, request.addresses() will give you the list of addresses relevant to the current chunk (which might be different from all of the addresses, if youre chunking by address)

Copy link
Author

@davidthegardens davidthegardens Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I can't seem to find .addresses(), in request. There is .address though, which only has one address :/
Since cryo filters transactions by each address separately, I'd need the complete list of addresses in addition to the address for the current chunk to prevent duplicates.

crates/python/pyproject.toml Outdated Show resolved Hide resolved
Copy link
Member

@sslivkoff sslivkoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking nice I think it's almost ready

these various python changes will be very useful for an upcoming python accessibility push I've been working on

let addr_filter: Box<dyn Fn(&Transaction) -> bool + Send> =
if let Some(address) = &request.address {
Box::new(move |tx| {
let mut transactions_set = TRANSACTIONS_SET.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, looking very clean now

additionally I think the request.addresses() will give the relevant info without need for get_addresses(). no need to parse cli arg data, request.addresses() will give you the list of addresses relevant to the current chunk (which might be different from all of the addresses, if youre chunking by address)

crates/python/pyproject.toml Outdated Show resolved Hide resolved
@sslivkoff
Copy link
Member

heyo. everything is looking great on the pr. the last thing is get_addresses(). does request.addresses() have everything you need to avoid the get_addresses() function?

@davidthegardens
Copy link
Author

heyo. everything is looking great on the pr. the last thing is get_addresses(). does request.addresses() have everything you need to avoid the get_addresses() function?

Hey, thank you! So request.addresses() doesn't seem to exist and request.address doesn't have all the information needed unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip installation does not require arrow
2 participants