Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using file_account() to filter training data doesn't work for certain use cases #122

Open
redstreet opened this issue Feb 9, 2023 · 4 comments

Comments

@redstreet
Copy link

redstreet commented Feb 9, 2023

smart_importerfilters training data from a single account, obtained by calling the importer'sfile_account()`, as discussed in #30. This presents two questions:

  1. (For my understanding): shouldn't it filter out for accounts in each transaction instead of assuming the importer will always have one single account (that file_account() returns) as a part of every transaction?

  2. My question: There are cases where file_account() is not the account we want to train by. The use case at least a couple of users have run into (including myself) is here. In short, when importing using a commodity-leaf structure, postings look like this:

2020-02-20 * "Transfer"
  Assets:Investments:Fidelity:USD -20 USD
  <to_be_predicted>

file_account() must return Assets:Investments:Fidelity (without the :USD), so bean-file has the correct directory to file this against. However, this means that smart_importer fails to find training data, since that account doesn't exist.

Is there a config option or alternative I'm missing here? If this is a valid use case, I'm happy to help figure out a solution + patch.

@johannesjh
Copy link
Collaborator

Hi, the use case sounds somewhat exotic but legitimate... thank you for bringing this up.

As for a potential solution, what you want to achieve is to not filter training data by account, correct? Training data is filtered in the EntryPredictor.training_data_filter method. You can overwrite this method in inheriting classes. For example:

class PredictPostingsCustom(PredictPostings):
    def training_data_filter(self, txn):
        return True

You can, of course, implement any other filtering logic instead of simply returning True. Is this what you are looking for?

@redstreet
Copy link
Author

redstreet commented Mar 20, 2023

Awesome, that helps! I do have a suggestion though: would smart_importer consider making self.account a list, and also expose it via the constructor?

This way, a) we can have more than one interesting account, and b) can pass in the list without needing to write code.

Here's my sense of why this might becoming a fairly often used case in the future: anyone using commodity-leaf accounts (eg: Assets:Fidelity:FSSIX) would run into this sooner or later, because the filing account (Assets:Fidelity) is not the same as the account smart_importer should filter by. However, I imagine nobody had run into it until now because smart_importer is (understandably) not used with investment accounts.

Cash management accounts, which are becoming popular (anecdotal), change all that: they mix investments with expenses. So a few of us ran into it, including myself. Here's the ugly hack we've been using, just FYI.

@johannesjh
Copy link
Collaborator

johannesjh commented Mar 20, 2023

Glad it helped!

would smart_importer consider making self.account a list

Not sure. This needs careful consideration. If I remember well, the upstream beancount importer classes have a singular account field where importers file the imported CSV or PDF. This may or may not be in conflict with what you are proposing.

@redstreet
Copy link
Author

Not sure. This needs careful consideration. If I remember well, the upstream beancount importer classes have a singular account field where importers file the imported CSV or PDF. This may or may not be in conflict with what you are proposing.

It wouldn't be in conflict because what I'm proposing is to broaden the set of transactions that smart_importer uses as training data. This makes no changes to the singular filing account that beancount upstream classes might expect. Does that make sense or am I missing something here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants