Customs Revenues Prediction Using Ensemble Methods (Statistical Modelling vs Machine Learning)
Introduction
This paper considers the problems associated with prediction of customs revenues by ministries of finance and customs administrations. Accurate predictions of customs revenues result in better liquidity of the central budget, and for that reason, they are extremely important for successful management of public finances. The orthodox approach to forecasting revenues is usually based on forecasting revenues based on tax buoyancy and tax elasticity, with respect of some economic proxy. However, this approach has some shortcomings which can negatively affect accuracy, and for that reason we examine different approaches (machine learning and ensembling). Namely, nowadays in the era of Big Data and digitisation in customs, new approaches based on computer algorithms can give us better results as compared to classic modelling. The paper concludes that using ensemble methods that combine different types of heterogeneous models such as statistical modelling and machine learning can improve forecast accuracy when predicting customs revenues.
Even though customs revenues are collected by customs administrations within customs procedures, forecasting the collection of such revenues is ordinarily performed by ministries of finance. Nevertheless, ministries also need predictions performed by customs administrations themselves. In this regard, we shall review the most frequently used approaches when planning the revenues. The orthodox approach to forecasting tax revenues (including customs revenues) is usually based on forecasting revenues based on tax buoyancy and tax elasticity. Tax buoyancy measures the gross elasticity of tax revenues in relation to the respective macroeconomic variable (for instance, import or consumption). The main characteristic of this approach is that it measures the overall elasticity of taxes in relation to their base. In the tax elasticity approach, the time series needs to be first excluded from the discretionary measures of the fiscal policy to calculate the coefficient of the net tax elasticity in relation to the respective macroeconomic variable. Tax elasticity, according to Jenkins et al. (2000, p. 39), is a relevant factor for forecasting and is most often used by ministries of finance when forecasting tax revenues. Furthermore, to obtain more robust forecasts when estimating the elasticities, it is necessary to harmonise them with the business cycle in the economy, which has significant effects on revenue collection. The advantage of such forecasting is that the forecasted revenues are fully correlated with the macroeconomic indicators so that, should they increase, the revenues are expected to correlate with such an increase. However, this forecasting approach also has its shortcomings.
Macroeconomic indicators (which are usually forecasted twice a year) are used when forecasting the revenues, but from the moment of their forecasting to the moment of realisation, a certain period passes which can have negative effects on the forecast accuracy. In fact, Buettner & Kauder (2009, p.7) point out that the circumstances that the forecasters face can significantly affect the accuracy of the forecasts, and this needs to be taken into consideration when evaluating accuracy. Also, timing of the frequency of forecasts can vary (for instance, in Austria the time is 3.5 months; in Italy the time is six months; and in the Netherlands the time is 9.5 months). To overcome the problems that occur when applying the previous approach, and with the aim of achieving more accurate forecasts, we look at some more flexible approaches that are based primarily on data-driven methods. The use of data-driven methods can be exceptionally useful, since such models provide for forecasting by using high-frequency data and are of particular benefit for cash management and early warning. The main objective of such models is making short-term inflow forecasts (daily, weekly or monthly) for a period not longer than two years (Haughton, 2008, p. 1).