-
Notifications
You must be signed in to change notification settings - Fork 0
/
bb1.txt
985 lines (830 loc) · 40.5 KB
/
bb1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
Deep Learning: A Generic Approach for
Extreme Condition Traffic Forecasting
Yaguang Li∗†
Cyrus Shahabi†
Ugur Demiryurek†
Yan Liu†
Rose Yu∗†
Abstract
Traffic forecasting is a vital part of intelligent trans-
portation systems.
It becomes particularly challeng-
ing due to short-term (e.g., accidents, constructions)
and long-term (e.g., peak-hour, seasonal, weather) traf-
fic patterns. While most of the previously proposed
techniques focus on normal condition forecasting, a sin-
gle framework for extreme condition traffic forecasting
does not exist. To address this need, we propose to
take a deep learning approach. We build a deep neu-
ral network based on long short term memory (LSTM)
units. We apply Deep LSTM to forecast peak-hour traf-
fic and manage to identify unique characteristics of the
traffic data. We further improve the model for post-
accident forecasting with Mixture Deep LSTM model.
It jointly models the normal condition traffic and the
pattern of accidents. We evaluate our model on a real-
world large-scale traffic dataset in Los Angeles. When
trained end-to-end with suitable regularization, our ap-
proach achieves 30%-50% improvement over baselines.
We also demonstrate a novel technique to interpret the
model with signal stimulation. We note interesting ob-
servations from the trained neural network.
Introduction
1
Traffic forecasting is the core component of the intel-
ligent transportation systems (ITS). The problem has
been studied for decades in various communities rang-
ing from transportation system (e.g. [19, 23]), through
economics (e.g.
[18, 6]), to data mining (e.g.[14, 13]).
While normal condition traffic patterns are easy to pre-
dict, an open question in traffic forecasting is to forecast
traffic for extreme conditions, which include both peak
hour and post-accident congestions. National statistics
shows that in 2013, traffic congestion costs Americans
$124 billion direct and indirect loss. The county of Los
Angeles, whose traffic data is studied in the paper, has
continuously been ranked as one of the most traffic-
congested cities in North America, costing drivers to
∗Authors have equal contributions.
†Department of Computer Science, University of Southern
California, {qiyu,yaguang,shahabi,demiryur,yanliu.cs}@usc.edu
spend 34% more time on their routes as compared to
normal traffic condition [5]. Therefore, accurate traffic
forecasting of extreme conditions can substantially en-
hance traffic control, thereby reducing congestion cost,
and leading to significant improvement in societal wel-
fare.
The task is challenging mainly due to the chaotic
nature of traffic incidents. On one hand, recurring inci-
dents like peak-hours would cause steep drop in traffic
speed, leading to non-stationary time series. On the
other hand, non-recurring incidents like accidents are
almost unpredictable, introducing unexpected delay in
traffic flow. Traditional methods are mostly limited to
linear models. They perform well for normal conditions
but forecast poorly for extreme conditions. For exam-
ple, historical average depends solely on the periodic
patterns of the traffic flow, thus can hardly be respon-
sive to dynamic changes. Auto-regressive integrated
moving average (ARIMA) time series models rely on
stationary assumption of the time series, which would be
violated in the face of abrupt changes in traffic flow [4].
Applying neural network to capture the non-linearity in
traffic flow has been studied in the early days [19, 4, 11].
However, the models studied were single layer networks
with few hidden units.
The deep learning approach provides automatic
representation learning from raw data, significantly
reducing the effort of hand-crafted feature engineering.
For traffic forecasting, early attempts include deep
belief network (DBN) [10], stacked autoencoder [15]
and stacked denoising autoencoder [3]. However, they
fail to capture temporal correlations. Deep recurrent
neural network, with great promise in representing
dynamic behavior, has recently achieved massive success
in sequence modeling, especially for video segmentation
[12] and speech recognition [20].
In this paper, we
take a very pragmatic approach to investigate and
enhance the recurrent neural network by studying a
large-scale and high-resolution transportation data from
LA County road network. The datasets are acquired
in real time from various agencies such as CalTrans,
City of LA Department of Transportation, California
Highway Patrol and LA Metro. The data include
777
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
Downloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phptreme conditions traffic forecasting. In particular,
we enhance the Deep LSTM network with careful
data cleaning and feature engineering. We augment
the Deep LSTM model with stacked autoencoder to
account for the accident features.
• We apply our framework to forecast traffic speed
in peak-hour and post-accident conditions. The
architecture can model both the normal traffic
patterns and the dynamics of extreme conditions.
We observe superior performance of the proposed
model for both tasks.
• We also perform model inspection to interpret the
learning process. We observe that our model seems
to memorize the historical average speed as well as
the interruptions caused by accidents.
2 Related Work
A large body of the related work for traffic forecasting
lies in transportation research, see a recent survey paper
[24] and the references therein. Another line of work
touches upon incidents and their impact, with focus
on accident delay prediction. For example, in [7], a
multivariate regression model
is developed based on
predictors such as the number of lanes affected, the
number of vehicle involved and the incidence duration.
The setting is questionable as the traffic delay caused
by incidence can only be predicted after the incident is
cleared. More recently, in [17], the authors hand-craft a
set of features to predict the time-varying spatial span
of the incident impact.
Applying neural networks to traffic forecasting is
not a new idea. But most of them only consider neu-
ral networks with single hidden layer and few hidden
units, which have limited representation power. For ex-
ample, in [4], the authors conduct a careful comparison
between neural network and time series models such as
ARIMA. The study concludes that statistical techniques
generally give slightly better performance for short term
traffic forecasting. Similar observations are presented
Figure 2: Training pipeline of the proposed deep learn-
ing approach for extreme condition traffic forecasting
Figure 1: Visualization of the geographic distribution of
loop detector sensors in LA/OC area
traffic flow records from under-pavement loop detectors
as well as accidents information from police reports.
The geographical distribution of those loop detectors
is visualized in Figure 1.
We start with a state-of-the-art model leveraging
Long Short Term Memory (LSTM) recurrent neural net-
work. Working with real world traffic data, we identify
several of their unique characteristics. First, directly
feeding in traffic flow sequence is not enough for long-
term forecasting during peak-hour. Time stamp fea-
tures such as time of day and day of week are impor-
tant for accurate peak-hour forecasting. Normalization
and missing data imputation are also critical for sta-
ble training. Second, post-accident traffic can be better
modeled as a mixture of normal traffic flow patterns and
accident-specific features. Towards this end, we propose
the Mixture Deep LSTM neural network, which treats
the normal traffic as a background signal and interrup-
tions from accidents as event signals. The outputs of
the two components are concatenated and then aggre-
gated through a regression layer. Third, LSTM learns
the traffic patterns by “memorizing” historical average
traffic as well as the interruption caused by accidents.
The stacked autoencoder component of the proposed
mixture model performs denoising on accident-specific
features. Figure 2 illustrates the pipeline of our learning
procedure.
When trained end-to-end on real world traffic data,
our approach shows superior performance on both fore-
casting tasks, with forecasting error reduced by 30% -
50% as compared to those of the baseline models. Qual-
itative study shows that the learned model is able to
“memorize” the periodic patterns in traffic flow as well
as the accident effects. In summary, our contributions
can be stated as follows:
• We investigate the deep learning approach for ex-
778
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
HistoricTrafficFlow(dynamicfeatures)AccidentReport(staticfeatures)MixtureDeepLSTMAccident?DeepLSTMTrafficForecastYesNoDownloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpin [11], which studies two hours ahead forecasting us-
ing data recorded every 30 minutes. But the model
suffers from noticeable performance drop as the fore-
casting horizon increases. In [22], a single layer RNN is
used for travel time prediction, but with unsatisfactory
performance. In [16], the authors proposes to use single
layer LSTM network for short term traffic forecasting.
Early computers did not have enough processing power
to effectively handle the long running time required by
training large neural networks.
A pioneering work of applying deep learning for
traffic forecasting is proposed in [10]. The authors com-
bine DBN with multi-task learning. The model con-
tains two steps: feature learning and forecasting. DBN
serves as unsupervised pre-training step. The output of
the DBN is fed into a regression layer for forecasting.
The empirical results show improved performance over
traditional methods, especially for long term and rush
hour traffic flow prediction. However, DBN fails to ac-
count for the temporal dependence in speed time series.
In the input layer, DBN assumes the speed reading at
each time stamp is independent and the learned rep-
resentation can hardly reflect the complex dynamics in
traffic flow.
We make the first attempt to apply deep LSTM re-
current neural network for traffic forecasting. LSTM has
recently been popularized through success in machine
translation [20], image caption generation [25] and clin-
ical diagnosis [2]. An attractive property of the LSTM
is that it is capable of learning both long term and short
term temporal dependencies. With careful data clean-
ing and normalization, we manage to obtain the up to 1
hour ahead forecasting with less than 10% error. When
applied to accident forecasting, our architecture merges
the outputs from the two components, instead of stack-
ing them together.
3 Methodology
In this section, we first introduce the basic concepts in
neural networks. Then we study two extreme condition
traffic forecasting tasks: peak-hours and post-accidents.
For each task, we provide problem description, model
specification and training details.
3.1 Basic Concepts
Autoencoder An autoencoder takes an input
vector x and transforms it into a hidden representation
h. The transformation, typically referred as encoder,
follows the following equation:
h = σ(Wx + b)
The resulting hidden representation h is then
mapped back into the reconstructed feature space y as
follows:
y = σ(W(cid:48)h + b(cid:48))
W, W(cid:48), b, b(cid:48) denote the weight and bias respectively,
and σ is the sigmoid function (or the soft-max function
when dealing with multivariate case). This procedure is
called decoder. Autoencoder is trained by minimizing
the reconstruction error (cid:107)y−x(cid:107). Multiple autoencoders
can be connected to construct stacked autoencoder,
which can be used to learn multiple levels of non-linear
features.
Recurrent Neural Network (RNN) Recurrent
neural network is a feature map that contains at least
one feed-back loop. Denote the input vector at time
stamp t as xt, the hidden layer vector as ht, the weight
matrices as Wh and Uh, and the bias term as bh. The
output sequence ot is a function over the current hidden
state. RNN iteratively computes the hidden layer and
outputs using the following recursive procedure:
ht = σ(Whxt + Uhht−1 + bh)
and
ot = σ(Woht + bo)
Here Wo and bo represent the weight and bias for the
output respectively.
Long Short Term Memory LSTM is a special
type of RNN that is designed to avoid the vanishing
gradient issue in the original RNN model. It replaces
the ordinary summation unit in RNN with carefully
designed memory cell which contains gates to protect
and control the cell state [9]. The key to LSTMs is the
cell state, which allows information to flow along the
network. LSTM is able to remove or add information
to the cell state, carefully regulated by structures called
gates, including input gate, forget gate and output gate.
Denote it, ft, ot as input gate, forget gate and output
gate at time t respectively. Let ht and st be the hidden
state and cell state for memory cell at time t. The
architecture of the LSTM is specified as follows.
it = σ(Wixt + Uiht−1 + bi)
ct = tanh(Wcxt + Ucht−1 + bc)
ft = σ(Wf xt + Uf ht−1 + bf )
ot = σ(Woxt + Uoht−1 + bo)
st = st−1 ◦ ft + ct ◦ it
ht = st ◦ ot
where W and b correspond to input to hidden weights
and the bias.
where ◦ denotes Hadamard product.
779
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
Downloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpFigure 3: Graphic illustration of
the Deep LSTM Network
Figure 4: Graphic illustration of mixture deep LSTM. Component
in blue rectangle is deep LSTM and component highlighted in red
rectangle is stacked autoencoder
3.2 Peak-hour Traffic Forecasting Peak-hour is
the period when traffic congestion on roads hits its
highest.
It normally happens twice every weekday.
In this paper, we define peak-hour periods as 6-9 am
and 4-7 pm. During peak-hour congestion, the amount
of uncertainty in truck traffic and intersection turning
movements make it difficult to accurately predict the
traffic flow. The majority of existing traffic forecasting
models are designed for short-term forecasting, usually
5-15 minutes ahead during peak-hour. In addition, the
forecasting is usually limited to single time stamp with
fixed forecasting horizon.
The re-occurring nature of peak-hour traffic mo-
tivates us to take advantage of its long term history.
We extract historical traffic reading sub-sequences as
input features. We treat the future time stamps as out-
put. Input-output pairs are generated by sliding a fixed
length window along the entire time series. This moving
window approach is commonly adopted by time series
analysis community. However, classic time series models
such as Autoregressive moving average (ARMA) or Au-
toregressive integrated moving average (ARIMA) can
only capture the linear temporal dependency of the se-
quences. We tackle this challenge with deep neural net-
work, which automatically learns the non-linear struc-
ture using multiple layers of transformation. We start
with the Deep LSTM network.
3.2.1 Deep LSTM Deep LSTM network is a type
of RNN with multiple hidden layers. It uses the LSTM
cell to replace conventional recurrent unit, thus avoiding
the vanishing gradient issue of traditional RNN. Deep
LSTM is able to learn long-term dependencies. Com-
paring with single layer LSTM, deep LSTM naturally
employs a temporal hierarchy with multiple layers op-
erating at different time scales [8]. Similar models have
been applied to solve many real-life sequence modeling
problems [20, 25]. Figure 3 shows the standard struc-
ture of the Deep LSTM network.
When applying Deep LSTM for traffic forecasting,
we find that time stamp features such as time of day and
day of week are important for accurate peak-hour fore-
casting. We also normalize the input sequences to zero
mean and unit variance for faster convergence. We mod-
ify the training objective function to ignore the zero-
valued missing observations. In this work, we model in-
dividual sensor readings and train them independently.
One can also exploit sensor-sensor correlation to model
all the sensor jointly . We examined the case by training
10 sensors all together. However, due to the instability
of the gradient in the training, we were not able to ob-
tain improved results for jointly training case.
3.3 Post-Accident Traffic Forecasting Traffic ac-
cidents interrupt the periodic patterns of daily traffic.
We are interested in predicting traffic flow immediately
after the accident occurs. The duration of the recov-
ery depends on incident factors such as accident sever-
ity and up/downstream traffic. We formalize the prob-
lem as to predict the sequence after accident using the
historical sequence. We also wish to utilize accident-
specific features including severity, location and time.
We first tried to use deep LSTM by feeding in the
sequences right before the accidents and generating the
sequences after, but were not able to obtain satisfactory
results. We hypothesize this is due to the common delay
of the accident reports. The input sequences already
contain the interruptions caused by the accidents. Thus
the neural network can hardly differentiate between the
normal traffic pattern and effect caused by accidents.
Our next solution is motivated by the real world traffic
accident scenario.
If we treat the traffic in normal
conditions as a background signal and the interruption
from extreme incidents as an event signal, the traffic
flow in extreme conditions can be modeled as the
mixture of the two.
3.3.1 Mixture Deep LSTM It is desirable to have
a framework to model the joint effects of normal traffic
and accidents. As LSTM can capture long term tempo-
780
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
ℎ"#$$ℎ"$ℎ"%$$ℎ"#$&ℎ"&ℎ"%$&ℎ"#$'ℎ"'ℎ"%$'("#$("("%$)"#$)")"%$InputLayerMergeLayerInputSequenceOutputSequenceStaticFeatureHiddenLayercfeaturefeatureOutputLayerDownloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpral dependency, we use it for normal traffic modeling.
For accidents, we utilize autoencoder to extract the la-
tent representations of those static features. We stack
multiple layers of LSTM and autoencoder and combine
them at the end using a linear regression layer. As
shown in Figure 4, the model consists of two compo-
nents. One is the stacked LSTM layers, highlighted in
the blue rectangle. LSTM models the temporal corre-
lations of the traffic flow of multiple scale. The other is
the stacked autoencoder, which learns the latent repre-
sentation that are universal across accidents. This leads
to our design of the Mixture Deep LSTM network.
Mathematically speaking, the model takes the input
of N sequence of length T and dimension D1, which
naturally forms a tensor X ∈ RN×T×D1 . In addition,
the model is provided with N features of dimension D2,
denoted as X(cid:48) ∈ RN×D2 . The stacked autoencoder
output Y (cid:48) ∈ RN×D3 is duplicated over time dimension,
resulting a tensor Y(cid:48) ∈ RN×T×D3. At the merge
layer, a sequence of dimension D1 + D3 is generated by
concatenating the output from deep LSTM and stacked
autoencoder. Then the output layer aggregates the
concatenated sequence into the desired results. The
model outputs a sequence of N × T (cid:48) where T (cid:48) is the
length of the output sequence.
When applying Mixture Deep LSTM to post-
accident traffic forecasting, we use as input the se-
quences one week before the accidents to approximate
the normal traffic pattern, which addresses the delay
issue in the accident reports. We feed the accident-
specific features into the autoencoder input layer to de-
scribe the interruptions caused by the accidents. We
unify the two components by duplicating the output
from stacked autoencoder across each timestamp of the
input sequence. The resulting sequence is then concate-
nated with the sequence generated by deep LSTM. Sim-
ilar to the regression architecture in deep LSTM model,
the model aggregates the concatenated sequence from
deep LSTM and the stacked autoencoder component at
the merge layer. Finally, a fully-connected layer is ap-
plied to transform the concatenated sequence into the
desired output sequence.
4 Experiments
We compare the proposed approach with competitive
baselines for the two forecasting tasks in extreme traffic
conditions and present both quantitative and qualitative
results of the proposed framework.
Traffic Data. The traffic data used in the experi-
ment are collected by 2, 018 loop detectors, located on
the highways and arterial streets of Los Angeles County
(covering 5400 miles cumulatively) from May 19, 2012
to June 30, 2012. We aggregate the speed readings ev-
ery 5 minutes. As there is a high ratio of missing values
in the raw data, we filter out mal-functioning sensors
with more than 20% missing values.
Accident Data. The accident data are col-
lected from various agencies including California High-
way Patrol (CHP), LA Department of Transporta-
tion (LADOT), and California Transportation Agencies
(CalTrans). We select accidents of three major types:
Rolling, Major Injuries and Minor Injuries. The total
number of accidents is 6,811, spreading across 1,650 sen-
sors. Each accident is associated with a few attributes
such as accident type, downstream post mile, and af-
fected traffic direction.
4.1 Baselines For peak-hour forecasting, we com-
pare our model with widely adopted time series analysis
models. 1) ARIMA: Auto-Regressive Integrated Mov-
ing Average model that is widely used for time series
prediction; 2) Random Walk: the traffic flow is modeled
as a constant with random noise; 3) Historical Average:
the traffic flow is modeled as a seasonal process, and
the weighted average of previous seasons is used as the
prediction. In ARIMA, time information is provided to
the model as exogenous variables. We also tried sea-
sonal ARIMA (SARIMA) with season length equals to
a day or a week. The performance of SARIMA is usu-
ally the same or slightly better than ARIMA, while it
requires much longer training/testing time (often 10×).
Thus, we only use ARIMA for large scale experiments.
For post-accident forecasting, as we are forecast-
ing the entire sequence after the accident with two dif-
ferent sets of features, we formulate it as a multi-task
sequence-to-sequence learning problem. Each predic-
tion task is to forecast a future time point traffic speed
given historical observations as well as traffic accident
attributes. The following approaches are used as base-
lines: 1) Linear regression: analogous to ARIMA model,
but with additional accident-specific features; 2) Ridge
regression : linear regression with L2 regularization; 3)
multi-task Lasso regression:
linear regression with L1
regularization.
ARIMA/SARIMA are implemented based on the
statsmodels1 library. Regression models are created
using Sklearn2, and all the neural network architectures
are built using Theano 3 and Keras 4. Grid search
is used for parameter tuning, and early stopping [1]
is adopted to terminate the training process based on
the validation performance. We also experiment with
other deep neural network architectures such as feed-
1https://github.com/statsmodels/statsmodels
2https://github.com/scikit-learn/scikit-learn
3https://github.com/Theano/Theano
4https://github.com/fchollet/keras
781
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
Downloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpforward neural network baselines and RNN with gated
recurrent unit. However, the obtained results so far are
not comparable to other baselines. Thus, we choose to
omit the results for the convenience of illustration.
5e−4 with maximum number of epoch as 100. For each
LSTM layer, we use a dropout rate of 20%. For the
final prediction layer, we impose L2 regularization on
both the weights and the activity.
4.1.1 Deep LSTM For Deep LSTM, our best archi-
tecture consists of 2 hidden LSTM layers of size 64 and
32. We also conduct experiments with 3 or more LSTM
layers, but have not seen clear improvement. This may
because the input sequence is of low-dimension and
larger/deeper recurrent networks tend to overfit [26].
We incorporate normalized speed, time in a day, day
in a week as input features, and use fully connected layer
with linear activation function to aggregate the features
transformed by the hidden layers.
In addition, a
dropout rate of 10% is applied to each LSTM layer. We
use mean absolute average percentage error (MAPE)
as the loss function. To reduce the negative effects of
missing values, which has the default value 0, we define
our loss function as the MAPE for non-zero entries,
i.e., only calculate MAPE on entries that contain valid
sensor readings. For hyper-parameters, the batch size
is set to 4, the learning rate is set to 1e−3, and
RMSProp [21] is used as the optimizer. The network is
set to train for maximum 150 epochs. In the process of
training, we truncate the gradients for back-propagation
through time (BPTT) to 36 unrolled steps, and clip the
norm of the gradients to avoid gradient explosion. Note
that, in the process of testing, the state of the model
in each step is propagated to the next step, thus it is
possible for the model to capture temporal dependency
beyond 36 steps.
4.1.2 Mixture Deep LSTM For Mixture Deep
LSTM, the best architecture is a 2-layer LSTM of
hidden size 64, with a stacked autoencoder of equal
size. The outputs of the stacked autoencoder are
copied and concatenated to that of the LSTM network.
The final prediction layer is implemented with a fully
connected layer. We use linear activation for the LSTM
layers and the final prediction layer. We choose the
sigmoid activation function for the autoencoder. The
loss function used is mean square error (MSE) without
zero-valued missing observations.
To forecast the post-accident traffic, we extract as
inputs the historical sub-sequence one week before the
accidents. We feed into the neural network both the
raw speed readings and the time stamp features. We
also combine 5 static accident features as inputs to
stacked autoencoder. The final prediction layer merges
the outputs from LSTM and stacked autoencoder. The
model generates predicted speed sequence 3 hours after
the reported incident. We use SGD with learning rate
4.2 Peak-hour Traffic Forecasting We evaluate
the performance for our proposed technique by compar-
ing with baselines. Figure 5 shows the performance of
deep LSTM and other baselines from 5 minutes to up
to 1 hour ahead forecasting. The prediction results are
averaged across different sensors.
In the experiments,
we first generate the prediction on the entire testing se-
quence and then evaluate the performance on peak-hour
(Figure 5(a)) as well as off-peak hours (Figure 5(b)) sep-
arately.
Generally, as the forecasting horizon increases, the
performances of most methods degrade. The only ex-
ception is historical average based method which con-
ducts prediction by aggregating traffic speeds from pre-
vious days and weeks, thus is not sensitive to the change
of forecasting horizon. For normal conditions at off-
peak hours, almost all the methods have similar per-
formances. The advantage of deep learning approach
becomes clear when performing forecasting during peak-
hour traffic. Deep LSTM achieves as low as 5% MAPE,
which is almost half of the other baselines. Interestingly,
we note that the performance of deep LSTM stays rela-
tively stable while baseline methods suffer from signifi-
cant performance degrade when evaluated at peak-hour.
4.3 Post-Accident Traffic Forecasting We evalu-
ate the empirical performance of Mixture Deep LSTM
for the task of post-accident forecasting. Figure 6 dis-
plays the forecasting MAPE for different methods with
forecasting horizons vary from 5 minutes to up to 3
hours. Mixture Deep LSTM is roughly 30 % better
than baseline methods. Random walk cannot response
to the dynamics in the accident, thus suffers from poor
performance especially during the first 5 minutes after
the incident. The regularization in the multi-task ridge
regression and the multi-task lasso regression avoids the
over-fitting of model, and is more adaptable to the post-
accident situation. Mixture Deep LSTM, which jointly
models the normal condition traffic as well as the traffic
accident scenario, achieves the best result.
Table 1 shows the forecasting performance evalu-
ated across the complete 3-hour span after the accident.
Mixture deep LSTM performs significantly better than
other baselines. To justify the use of recurrent neural
network in our architecture as opposed to other neural
network design, we also evaluate the performance of a
3-layer feed-forward network. The poor performance of
the feed-forward network shows the importance of tak-
782
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
Downloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php(a) Peak hour
(b) Off-Peak hour
Figure 5: Off-peak and peak-hour traffic forecasting MAPE of Deep LSTM over baselines with respect to different
forecasting horizons
Figure 6: Post-accident traffic forecasting MAPE of Mixture
Deep LSTM over baselines with respect to different forecast-
ing horizons
Figure 7: Three-hour post-accident predicted
sequence comparison of different methods
with respect to ground truth
Table 1: 3 hours post-accident forecasting MAPE com-
parison of Mixture Deep LSTM over baseline models
Method
Mixture Deep LSTM
Deep LSTM
Randon Walk
Linear Regression
Ridge Regression
Multi-task Lasso Regression
Feed-forward Network
MAPE
0.9700
1.003
2.7664
1.6311
1.6296
1.4451
3.6432
ing into account the temporal dependencies for the his-
tory sequence. Figure 7 shows the predicted sequence
from the time when the accident occurs (time 0) to up to
3 hours. The predictions are concatenated with the in-
put sequences to show the temporal dependencies. For
the case shown in Figure 7, the sharp drop caused by
the accident results in a different pattern from that in
the same time in last week. The plot shows that Mix-
ture deep LSTM can correctly predict the mean value
of a highly non-stationary long time sequence.
4.4 Model Inspection To further understand the
behavior of proposed deep architecture, we conduct a
series of qualitative studies. Our method of model
inspection is inspired by the electric stimulation study
in brain science. If we make the analogy of the trained
neural network as a human brain, the weights in the
network contain “learned” knowledge. By feeding into
different type of stimulation, the response of the neural
network would reflect this learned knowledge. The
stimulation is conducted by first feeding the trained
neural network a constant speed input. The neural
network transforms the input signal into 1 hour ahead
783
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
0 5 10 15 20 25 305min15min30min60minMPE (%)Deep LSTMRandom WalkHistorical AverageARIMASVR 0 2 4 6 8 10 125min15min30min60minMPE (%)Deep LSTMRandom WalkHistorical AverageARIMASVR 0 0.5 1 1.5 2 2.5 3 3.5 4 4.55 min30 min1h1.5 h2h2.5h3hMPERandon WalkLinear RegressionRidge RegressionMulti-task Lasso RegressionMixture Deep LSTMDownloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phppredictions. Then we take the output sequence as the
input signal again and repeat the procedure until 48
hours of outputs are generated. Despite the fact that
constant input contains almost zero information about
the variations in the time series, we receive interesting
responses from the model.
(a) Model Inspection
(b) Drop Response
Figure 8: Learned pattern from Mixture Deep LSTM
network. Trained model response under different stim-
ulation signals, compared with groud truth sequence
In peak-hour forecasting, we observe the impor-
tance of incorporating temporal feature, i.e., time of the
day and day of the week, to the performance. To jus-
tify this, we stimulate neural different neural networks:
one trained with temporal features and the other with-
out. Figure 8(a) illustrates the model response when
stimulated with a constant speed input. As shown in
Figure8(a1), with temporal features, the model (blue
solid line) tends to generate similar trend as the his-
torical average (green dotted line), which shows that
the mode can learn long-term as well as short-term sea-
sonal information from historical data. While the model
trained without temporal features (in Figure8(a2)) can
also generate similar outputs, but the period is consid-
erably shorter than that of the historical average.
Next, we simulate the post-accident scenario. We
inject many different interruptions in our input stim-
ulate signal and investigate model response.
In Fig-
ure 8(b1), the blue dotted line represents the model’s
response to a constant speed stimulation. Then we gen-
erate stimulation with a 5 minutes sudden drop (green
dotted line) to simulate the fluctuation. The model re-
sponse is shown in the red dashed line. We note that
the model response is exactly the same as the constant
speed response. This shows that the model is robust to
ephemeral traffic change, which is simulated by a small
fluctuation in input signal.
However, when the duration of the fluctuation be-
comes larger, the response starts to appear.
Fig-
ure 8(b2) shows the response after we increases the du-
ration of the sudden drop to half one hour. After the
sudden drop in the input signal, the response quickly de-
creases and then gradually goes back to normal. This re-
flects the behavior of the traffic flow when facing an ac-
cident. Note that the simulated accident in Figure 8(b2)
happens at off-peak hour. The model response is fur-
ther amplified when the accident happens at the time
of peak-hour. As shown in Figure 8(b3), we change the
time of the stimulation from morning to afternoon, and
a more dramatic change is observed. This is because
traffic changes in different time of the day will have dif-
ferent affects, e.g., the speed decrease during afternoon
usually indicates the arriving of peak hours.
5 Conclusion
In this paper, we studied the problem of traffic fore-
casting under extreme conditions,
in particular for
peak hour and post-accident scenarios. We proposed
a generic deep learning framework based on long short
term memory unit. In particular, we used Deep LSTM
for peak-hour traffic forecasting with careful feature en-
gineering. We further proposed a Mixture Deep LSTM
architecture that unifies Deep LSTM and stacked au-
toencoder. When evaluated on a large-scale real-world
traffic data, our framework is able to achieve signifi-
cant performance improvement over baselines for both
conditions. We also employed a novel model inspec-
tion method by signal stimulation. The model re-
sponses from those stimulations provided insights into
the trained neural network on traffic data.
784
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
06:0000:0029 Jun201200:0030 Jun(a1) Output of model trained with temporal features10203040506070Speed (mile/h)Model OutputHistoricalAverage06:0000:0029 Jun201200:0030 Jun(a2) Output of model trained without temporal features10203040506070Speed (mile/h)Model OutputHistoricalAverageDownloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpAcknowledgment
This research has been supported in part by ME-
TRANS Transportation Center under Caltrans con-
tract# 65A0533, the USC Integrated Media Systems
Center (IMSC), and unrestricted cash gifts from Ora-
cle. Rose Yu and Yan Liu were additionally supported
by NSF research grants IIS-1254206 and IIS-1539608.
Any opinions, findings and conclusions or recommenda-
tions expressed in this material are those of the authors
and do not necessarily reflect the views of any of the
sponsors such as NSF.
References
[1] Y. Bengio, Practical recommendations for gradient-
based training of deep architectures,
in Neural Net-
works: Tricks of the Trade, Springer, 2012, pp. 437–
478.
[2] Z. Che, D. Kale, W. Li, M. T. Bahadori, and
Y. Liu, Deep computational phenotyping, in SIGKDD,
ACM, 2015, pp. 507–516.
[3] Q. Chen, X. Song, H. Yamada, and R. Shibasaki,
Learning deep representation from big and heteroge-
neous data for traffic accident inference,
in AAAI,
2016.
[4] S. D. Clark, M. S. Dougherty, and H. R. Kirby,
The use of neural networks and time series models for
short term traffic forecasting: a comparative study, in
Traffic Engineering and Control, no. 34, 1993, pp. 311–
318.
[5] A. Downs, Still stuck in traffic: coping with peak-hour
traffic congestion, Brookings Institution Press, 2005.
[6] G. Duranton and M. A. Turner, The fundamental
law of road congestion: Evidence from us cities, The
American Economic Review, (2011), pp. 2616–2652.
[7] A. Garib, A. Radwan, and H. Al-Deek, Estimating
magnitude and duration of incident delays, Journal of
Transportation Engineering, 123 (1997), pp. 459–466.
[8] M. Hermans and B. Schrauwen, Training and
in NIPS,
analysing deep recurrent neural networks,
2013, pp. 190–198.
[9] S. Hochreiter and J. Schmidhuber, Long short-
term memory, Neural computation, 9 (1997), pp. 1735–
1780.
[10] W. Huang, G. Song, H. Hong, and K. Xie, Deep
architecture for traffic flow prediction: deep belief net-
works with multitask learning, ITS, IEEE Transactions
on, 15 (2014), pp. 2191–2201.
[11] H. R. Kirby, S. M. Watson, and M. S. Dougherty,
Should we use neural networks or statistical models for
short-term motorway traffic forecasting?, International
Journal of Forecasting, 13 (1997), pp. 43–50.
[12] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng,
Learning hierarchical invariant spatio-temporal features
for action recognition with independent subspace anal-
ysis, in CVPR, IEEE, 2011, pp. 3361–3368.
[13] M. Lippi, M. Bertini, and P. Frasconi, Short-term
traffic flow forecasting: An experimental comparison of
time-series analysis and supervised learning, ITS, IEEE
Transactions on, 14 (2013), pp. 871–882.
[14] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and
X. Xing, Discovering spatio-temporal causal interac-
tions in traffic data streams, in SIGKDD, ACM, 2011,
pp. 1010–1018.
[15] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y.
Wang, Traffic flow prediction with big data: A deep
learning approach, ITS, IEEE Transactions on, 16
(2015), pp. 865–873.
[16] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang,
Long short-term memory neural network for traffic
speed prediction using remote microwave sensor data,
Transportation Research Part C: Emerging Technolo-
gies, 54 (2015), pp. 187–197.
[17] B. Pan, U. Demiryurek, C. Shahabi, and
C. Gupta, Forecasting spatiotemporal impact of traf-
fic incidents on road networks, in ICDM, IEEE, 2013,
pp. 587–596.
[18] K. A. Small and E. T. Verhoef, The economics of
urban transportation, Routledge, 2007.
[19] B. L. Smith and M. J. Demetsky, Traffic flow fore-
casting: comparison of modeling approaches, Journal of
transportation engineering, 123 (1997), pp. 261–266.
[20] I. Sutskever, O. Vinyals, and Q. V. Le, Sequence
to sequence learning with neural networks, in NIPS,
2014, pp. 3104–3112.
[21] T. Tieleman and G. Hinton, Lecture 6.5-rmsprop:
Divide the gradient by a running average of its recent
magnitude, COURSERA: Neural Networks for Ma-
chine Learning, 4 (2012), p. 2.
S.
Van
Lint,
[22] J.
Hoogendoorn,
and
H. Van Zuylen, Freeway travel
time prediction
with state-space neural networks: modeling state-space
dynamics with recurrent neural networks, Journal of
the Transportation Research Board, (2002), pp. 30–39.
[23] E. I. Vlahogianni, J. C. Golias, and M. G.
Karlaftis, Short-term traffic forecasting: Overview of
objectives and methods, Transport reviews, 24 (2004),
pp. 533–557.
[24] E. I. Vlahogianni, M. G. Karlaftis, and J. C.
Golias, Short-term traffic forecasting: Where we are
and where were going, Transportation Research Part
C: Emerging Technologies, 43 (2014), pp. 3–19.
[25] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville,
R. Salakhudinov, R. Zemel, and Y. Bengio, Show,
attend and tell: Neural image caption generation with
visual attention, in ICML, 2015, pp. 2048–2057.
[26] W. Zaremba, I. Sutskever, and O. Vinyals, Re-
current neural network regularization, arXiv preprint
arXiv:1409.2329, (2014).
785
Copyright © by SIAM
Unauthorized reproduction of this article is prohibited
Downloaded 09/08/18 to 139.81.96.227. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php