Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAGS #125

Open
jpduyx opened this issue Oct 13, 2023 · 2 comments
Open

TAGS #125

jpduyx opened this issue Oct 13, 2023 · 2 comments

Comments

@jpduyx
Copy link

jpduyx commented Oct 13, 2023

I am happy with the smart-importer ... it helps a lot.

I'm trying to figure out why smart-importer doesn't learn and apply the tags from the previous transactions and apply them. Is there a setting I have to change or where should I best start to figure this out?

@johannesjh
Copy link
Collaborator

Hi, thank you for your feedback, glad to hear you find it useful. smart-importer currently only predicts payees and accounts. It does not (yet) predict tags; there is no setting for it. But smart-importer could certainly be made to predict tags as well. I like the idea. Some directions in case you would like to get started:

You could start by adding a class PredictTags to __init__.py. I think it will be the easiest for your new PredictTags class to derive from EntryPredictor, similar to the PredictPayees and PredictPostings classes. Your class can then overwrite the attribute and weights member variables to specify which attribute shall be predicted (i.e., tags) based on which other weighted attributes.

The existing EntryPredictor class can predict attributes, but I don't think (I am not sure if) it can predict tags just yet. It may be the case that tags are handled in a different way from standard attributes in beancount entries. In consequence, you will quite likely have to modify some code in order to get it to work. Some hints in this direction:

The EntryPredictor.__call__ method is where the overall control flow starts. It consists of four basic steps:

  1. load_training_data loads and filters the training data. I don't think you'll have to modify this.
  2. define_pipeline creates the scikit-learn machinelearning pipeline. Amongst other things, it calls the EntryPredictor.targets method, which reads target attribute values from the training data. In your situation, the targets method needs to read tags. The existing implementation can read attributes of beancount entries, I am not sure if it can read tags. This may require some code changes.
  3. train_pipeline does what its name says. I don't expect big changes here.
  4. process_entries writes predicted values into the list of imported entries. In your situation, the method needs to write predicted tags. The existing implementation can write attributes of beancount entries, not sure if it can write tags. This may require some code changes.

Are you interested in working on this?

@jpduyx
Copy link
Author

jpduyx commented Nov 2, 2023

thank you for the tips and the challenge ... I'm really curious and interested to try something with this challenge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants