[jvm-packages] support inferBatchSize - allow for sparse vectors - remove synchonized #11711
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi - I'm trying to incorporate some changes based on limitations we're currently having with the scala/java package.
InferBatchSize is hardcoded in transform, while the parameter is present - updated to use that instead.
the internal predict still is synchronized (added in 2016) while the documentation states that predictions are threadsafe
SparseVector support: when providing a SparseVector as features, it now fails because of the internal conversion to Dense and the assertion happening. We internally have a case where the model is being trained with sparse vector optim set to true in Python and we load that model into scala/spark. This makes the transform method unusable and I would otherwise have to resort to some custom, more low level implementation. I acknowledge this is a quickfix, support can be much more elaborate, the caller providing a SparseVector now has to be aware his model should be trained with sparse vector optimization (I don't even know if that's supported in scala as I couldn't find the representative parameter that exists in Python in Scala)