Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart importer give duplicate asset postings #130

Open
jbrok opened this issue May 6, 2024 · 0 comments
Open

Smart importer give duplicate asset postings #130

jbrok opened this issue May 6, 2024 · 0 comments

Comments

@jbrok
Copy link

jbrok commented May 6, 2024

Hi, first of all, thanks for providing this great package. I've been moving all the families' small accounts to the Nordigen API in combination with smart importers.

There's a problem that I'm struggling to debug. Specifically, about 30% of the transactions from a specific API Importer (GoCardless/Nordigen) end up with three postings when using smart_importer. This issue doesn't occur with csvs, xls, etc.) Initially, I thought this was because of a bug in the Nordigen beancounttools importer but that doesn't seem the case. Here's an example:

2023-08-19 * "amazon.co.uk"
  creditorName: "Amazon.co.uk*1f37b5qz4"
  nordref: "64e135f0-75fa-XXXX-XXXXXX-XXXXXX"
  Expenses:Shopping
  Assets:Person1:Bank:Revolut:GBP <--- Randomly incorrectly added by smart_importer
  Assets:Person2:Bank:Revolut:GBP   -5.99 GBP
  
  2023-11-24 * "Cloudflare"
  nordref: "6560fd83-XXX-XXXXX-XXXX-XXXXX"
  creditorName: "Cloudflare"
  original: "EUR 4.32"
  Assets:Person1:Bank:Monzo:Checking <--- Randomly incorrectly added by smart_importer
  Expenses:Shopping
  Assets:Person2:Bank:Revolut:EUR    -4.32 EUR

It always seems to add an extra random Asset: posting. After researching a while ago I stumbled upon an smart_import caching issue but that issue was fixed.

My importer looks like this:

# Nordigen API accounts example
apply_hooks(nordigen.Importer(), [categories, PredictPostings(), DuplicateDetector(comparator=ReferenceDuplicatesComparator('nordref'), window_days=10)])

Removing PredictPostings() from here gives me the right results so I nailed it down to smart_importer adding the incorrect postings.

I call bean-extract like this:

# filter only .yaml files to debug the Nordigen issue and running it on one account
❯ bean-extract config.py ./import-files/*.yaml -e main.beancount > tmp.beancount && code tmp.beancount
DEBUG:smart_importer.predictor:Loaded training data with 22022 transactions for account , filtered from 22022 total transactions
DEBUG:smart_importer.predictor:Trained the machine learning model.
DEBUG:smart_importer.predictor:Apply predictions with pipeline
DEBUG:smart_importer.predictor:Added predictions to 82 transactions

For the last months, I've been removing the extra postings with a regex find&replace but recently I found out it also impacts deduplication so it doesn't duplicate those transactions. Not sure if it's because of how the API calls are made or if it's a smart_imported issue (seems the latter). I also tried to fork the code and limit the prediction on 1 posting, but that didn't help, it seems the wrong posting had still the highest prediction score.

My beancount file with training data doesn't contain errors, nor transactions with three postings (checked with bean-check and custom scripts).

Any ideas that can point me in the right direction to a solution? Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant