Skip to content

Conversation

jensgoossens-tomtom
Copy link

Hi - I'm trying to incorporate some changes based on limitations we're currently having with the scala/java package.

  • InferBatchSize is hardcoded in transform, while the parameter is present - updated to use that instead.

  • the internal predict still is synchronized (added in 2016) while the documentation states that predictions are threadsafe

  • SparseVector support: when providing a SparseVector as features, it now fails because of the internal conversion to Dense and the assertion happening. We internally have a case where the model is being trained with sparse vector optim set to true in Python and we load that model into scala/spark. This makes the transform method unusable and I would otherwise have to resort to some custom, more low level implementation. I acknowledge this is a quickfix, support can be much more elaborate, the caller providing a SparseVector now has to be aware his model should be trained with sparse vector optimization (I don't even know if that's supported in scala as I couldn't find the representative parameter that exists in Python in Scala)

Copy link

@gutierrezm-tomtom gutierrezm-tomtom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@trivialfis
Copy link
Member

the internal predict still is synchronized (added in 2016) while the documentation states

If memory serves, we tried to remove it a few times. It's not the XBGoost prediction method causing trouble, it's the java iterator throwing out errors when using Spark without the synchronized predict method.

@jensgoossens-tomtom
Copy link
Author

the internal predict still is synchronized (added in 2016) while the documentation states

If memory serves, we tried to remove it a few times. It's not the XBGoost prediction method causing trouble, it's the java iterator throwing out errors when using Spark without the synchronized predict method.

I have not encountered any errors running it with these changes

@jensgoossens-tomtom
Copy link
Author

the internal predict still is synchronized (added in 2016) while the documentation states

If memory serves, we tried to remove it a few times. It's not the XBGoost prediction method causing trouble, it's the java iterator throwing out errors when using Spark without the synchronized predict method.

@trivialfis I've added a test in BoosterImplTest which does concurrent calls to the predict method to verify it works concurrently.

@trivialfis
Copy link
Member

@wbo4958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants