Speed is half of normal jpeg decoders (on ARM)

Hi thanks for the lib! However, the speed of this library seems to be half of normal jpeg decoders. When running on ARM, normal decoder uses 600ms to decode a 4000x3000 color JPEG, while this lib uses 1100ms.