Using `file_account()` to filter training data doesn't work for certain use cases #122

redstreet · 2023-02-09T05:52:27Z

smart_importerfilters training data from a single account, obtained by calling the importer'sfile_account()`, as discussed in #30. This presents two questions:

(For my understanding): shouldn't it filter out for accounts in each transaction instead of assuming the importer will always have one single account (that file_account() returns) as a part of every transaction?
My question: There are cases where file_account() is not the account we want to train by. The use case at least a couple of users have run into (including myself) is here. In short, when importing using a commodity-leaf structure, postings look like this:

2020-02-20 * "Transfer"
  Assets:Investments:Fidelity:USD -20 USD
  <to_be_predicted>

file_account() must return Assets:Investments:Fidelity (without the :USD), so bean-file has the correct directory to file this against. However, this means that smart_importer fails to find training data, since that account doesn't exist.

Is there a config option or alternative I'm missing here? If this is a valid use case, I'm happy to help figure out a solution + patch.

The text was updated successfully, but these errors were encountered:

johannesjh · 2023-03-19T14:05:33Z

Hi, the use case sounds somewhat exotic but legitimate... thank you for bringing this up.

As for a potential solution, what you want to achieve is to not filter training data by account, correct? Training data is filtered in the EntryPredictor.training_data_filter method. You can overwrite this method in inheriting classes. For example:

class PredictPostingsCustom(PredictPostings):
    def training_data_filter(self, txn):
        return True

You can, of course, implement any other filtering logic instead of simply returning True. Is this what you are looking for?

redstreet · 2023-03-20T05:09:43Z

Awesome, that helps! I do have a suggestion though: would smart_importer consider making self.account a list, and also expose it via the constructor?

This way, a) we can have more than one interesting account, and b) can pass in the list without needing to write code.

Here's my sense of why this might becoming a fairly often used case in the future: anyone using commodity-leaf accounts (eg: Assets:Fidelity:FSSIX) would run into this sooner or later, because the filing account (Assets:Fidelity) is not the same as the account smart_importer should filter by. However, I imagine nobody had run into it until now because smart_importer is (understandably) not used with investment accounts.

Cash management accounts, which are becoming popular (anecdotal), change all that: they mix investments with expenses. So a few of us ran into it, including myself. Here's the ugly hack we've been using, just FYI.

johannesjh · 2023-03-20T07:24:07Z

Glad it helped!

would smart_importer consider making self.account a list

Not sure. This needs careful consideration. If I remember well, the upstream beancount importer classes have a singular account field where importers file the imported CSV or PDF. This may or may not be in conflict with what you are proposing.

redstreet · 2023-06-30T20:13:53Z

Not sure. This needs careful consideration. If I remember well, the upstream beancount importer classes have a singular account field where importers file the imported CSV or PDF. This may or may not be in conflict with what you are proposing.

It wouldn't be in conflict because what I'm proposing is to broaden the set of transactions that smart_importer uses as training data. This makes no changes to the singular filing account that beancount upstream classes might expect. Does that make sense or am I missing something here? Thanks!

redstreet mentioned this issue Feb 10, 2023

default value for filter_training_data_by_account #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `file_account()` to filter training data doesn't work for certain use cases #122

Using `file_account()` to filter training data doesn't work for certain use cases #122

redstreet commented Feb 9, 2023 •

edited

Loading

johannesjh commented Mar 19, 2023

redstreet commented Mar 20, 2023 •

edited

Loading

johannesjh commented Mar 20, 2023 •

edited

Loading

redstreet commented Jun 30, 2023

Using file_account() to filter training data doesn't work for certain use cases #122

Using file_account() to filter training data doesn't work for certain use cases #122

Comments

redstreet commented Feb 9, 2023 • edited Loading

johannesjh commented Mar 19, 2023

redstreet commented Mar 20, 2023 • edited Loading

johannesjh commented Mar 20, 2023 • edited Loading

redstreet commented Jun 30, 2023

Using `file_account()` to filter training data doesn't work for certain use cases #122

Using `file_account()` to filter training data doesn't work for certain use cases #122

redstreet commented Feb 9, 2023 •

edited

Loading

redstreet commented Mar 20, 2023 •

edited

Loading

johannesjh commented Mar 20, 2023 •

edited

Loading