Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntArrayContainer merge method creates duplicates int values #383

Closed
gilleshuron opened this issue Nov 26, 2024 · 3 comments
Closed

IntArrayContainer merge method creates duplicates int values #383

gilleshuron opened this issue Nov 26, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@gilleshuron
Copy link

gilleshuron commented Nov 26, 2024

Describe the bug

This method:

org.tribuo.common.tree.impl.IntArrayContainer#merge(java.util.List<int[]>, org.tribuo.common.tree.impl.IntArrayContainer, org.tribuo.common.tree.impl.IntArrayContainer)

... incorrectly merges list of int[], creates duplicated int values during the merge.

At the least, it impacts performance. Variance computing is completely wrong in the CARTRegressionTrainer.

To Reproduce

Create a unit test with this code:

public void bug() {
    IntArrayContainer first = new IntArrayContainer(10);
    IntArrayContainer second = new IntArrayContainer(10);
    List<int[]> list = List.of(
        new int[]{1, 2, 4},
        new int[]{3, 5, 6, 7, 8, 9, 10}
    );
    IntArrayContainer.merge(list, first, second); //<-- buggy method
    Assertions.assertEquals(1, first.array[0]);
    Assertions.assertEquals(2, first.array[1]); //fails !!! array is filed with [1,1,2,2,3,4,4,5,6,7,8,9,10,0,0]
    //expected is [1,2,3,4,5,6,7,8,9,10]

  }

Expected behavior

//expected is [1,2,3,4,5,6,7,8,9,10]

System information:

  • Tribuo Version: [4.3.1]
  • OS: [Windows]
  • Java Version: [21]
  • JDK Vendor: [temurin-21.0.3]

Additional context

The mistake may lie in how the firstBuffer is initialized and updated in each subsequent merge:

public static int[] merge(List<int[]> input, IntArrayContainer firstBuffer, IntArrayContainer secondBuffer) {
    if (input.size() > 0) {
        firstBuffer.fill(input.get(0));
        for (int i = 1; i < input.size(); i++) { // Start from the second array
            merge(firstBuffer, input.get(i), secondBuffer);
            IntArrayContainer tmp = secondBuffer;
            secondBuffer = firstBuffer;
            firstBuffer = tmp;
        }
        return firstBuffer.copy();
    } else {
        return new int[0];
    }
}

Modification made:
Changed the loop start from i = 0 to i = 1. The first array is already used to initialize firstBuffer, so the merging should start from the second array (i = 1).

@gilleshuron gilleshuron added the bug Something isn't working label Nov 26, 2024
@Craigacp
Copy link
Member

Thanks, that's definitely a bug. Maybe it's part of the cause of #374 as well. I'll work up some more tests for IntArrayContainer and put in your fix.

@Craigacp
Copy link
Member

We've merged in a fix, it'll land in 4.3.2.

@gilleshuron
Copy link
Author

Thanks a lot @Craigacp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants