[jvm-packages] support inferBatchSize - allow for sparse vectors - remove synchonized #11711

jensgoossens-tomtom · 2025-09-29T10:17:11Z

Hi - I'm trying to incorporate some changes based on limitations we're currently having with the scala/java package.

InferBatchSize is hardcoded in transform, while the parameter is present - updated to use that instead.
the internal predict still is synchronized (added in 2016) while the documentation states that predictions are threadsafe
SparseVector support: when providing a SparseVector as features, it now fails because of the internal conversion to Dense and the assertion happening. We internally have a case where the model is being trained with sparse vector optim set to true in Python and we load that model into scala/spark. This makes the transform method unusable and I would otherwise have to resort to some custom, more low level implementation. I acknowledge this is a quickfix, support can be much more elaborate, the caller providing a SparseVector now has to be aware his model should be trained with sparse vector optimization (I don't even know if that's supported in scala as I couldn't find the representative parameter that exists in Python in Scala)

…hronized block

gutierrezm-tomtom

LGTM

trivialfis · 2025-09-29T11:15:20Z

the internal predict still is synchronized (added in 2016) while the documentation states

If memory serves, we tried to remove it a few times. It's not the XBGoost prediction method causing trouble, it's the java iterator throwing out errors when using Spark without the synchronized predict method.

jensgoossens-tomtom · 2025-09-29T12:50:09Z

the internal predict still is synchronized (added in 2016) while the documentation states

If memory serves, we tried to remove it a few times. It's not the XBGoost prediction method causing trouble, it's the java iterator throwing out errors when using Spark without the synchronized predict method.

I have not encountered any errors running it with these changes

jensgoossens-tomtom · 2025-09-29T13:45:12Z

the internal predict still is synchronized (added in 2016) while the documentation states

If memory serves, we tried to remove it a few times. It's not the XBGoost prediction method causing trouble, it's the java iterator throwing out errors when using Spark without the synchronized predict method.

@trivialfis I've added a test in BoosterImplTest which does concurrent calls to the predict method to verify it works concurrently.

trivialfis · 2025-09-30T18:55:50Z

@wbo4958

jensgoossens-tomtom added 2 commits September 29, 2025 12:09

feat: support inferBatchSize - allow for sparse vectors - remove sync…

2faeca2

…hronized block

add tests

eac6723

gutierrezm-tomtom approved these changes Sep 29, 2025

View reviewed changes

add test for concurrent prediction now that synchronized is removed

5994779

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[jvm-packages] support inferBatchSize - allow for sparse vectors - remove synchonized #11711

[jvm-packages] support inferBatchSize - allow for sparse vectors - remove synchonized #11711

Uh oh!

jensgoossens-tomtom commented Sep 29, 2025

Uh oh!

gutierrezm-tomtom left a comment

Uh oh!

trivialfis commented Sep 29, 2025

Uh oh!

jensgoossens-tomtom commented Sep 29, 2025

Uh oh!

jensgoossens-tomtom commented Sep 29, 2025

Uh oh!

trivialfis commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

[jvm-packages] support inferBatchSize - allow for sparse vectors - remove synchonized #11711

Are you sure you want to change the base?

[jvm-packages] support inferBatchSize - allow for sparse vectors - remove synchonized #11711

Uh oh!

Conversation

jensgoossens-tomtom commented Sep 29, 2025

Uh oh!

gutierrezm-tomtom left a comment

Choose a reason for hiding this comment

Uh oh!

trivialfis commented Sep 29, 2025

Uh oh!

jensgoossens-tomtom commented Sep 29, 2025

Uh oh!

jensgoossens-tomtom commented Sep 29, 2025

Uh oh!

trivialfis commented Sep 30, 2025

Uh oh!

Uh oh!