-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
6624 lines (5749 loc) · 551 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en" style="scroll-padding-top: 70px;">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, shrink-to-fit=no">
<link rel="stylesheet"
href="https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Lora:400,700,400italic,700italic">
<link href="https://fonts.googleapis.com/css2?family=Exo:wght@400;700&family=Lato:wght@400;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="/static/expo/fonts/font-awesome.min.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap-select.min.css">
<link rel="stylesheet" href="cards.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" integrity="sha384-xOolHFLEh07PJGoPkLv1IbcEPTNtaed2xpHsD9ESMhqIYd0nLMwNLD69Npy4HI+N" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.6.1.min.js"
integrity="sha256-o88AwQnZB+VDvE9tvIXrMQaPlFFSUTR+nldQm1LuPXQ=" crossorigin="anonymous"></script>
</script>
<script>
if (typeof jQuery === 'undefined') {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = "/static/core/js/jquery-3.6.1.min.js";
document.head.appendChild(script);
}
</script>
<script src="https://d3js.org/d3.v5.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/umd/popper.min.js" integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-Fy6S3B9q64WdZWQUiU+q4/2Lc9npb8tCaSX9FK7E8HnRr0Jz8D6OP9dO5Vg3Q9ct" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap-select.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/corejs-typeahead/1.3.1/typeahead.bundle.min.js" integrity="sha512-lEb9Vp/rkl9g2E/LdHIMFTqz21+LA79f84gqP75fbimHqVTu6483JG1AwJlWLLQ8ezTehty78fObKupq3HSHPQ==" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/min/moment.min.js"
integrity="sha256-4iQZ6BVL4qNKlQ27TExEhBN1HFPvAvAMbFavKKosSWQ="
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/js-cookie@2/src/js.cookie.min.js"></script>
<script src="/static/core/js/ajax-csrf-snippet.js" type="text/javascript"></script>
<script src="/static/virtual/js/virtual.js"></script>
<link rel="stylesheet" href="virtual.css">
<style>
body {
background: #f6f6f6;
}
</style>
</head>
<body>
<!-- NAV -->
<!--
<nav class="navbar sticky-top navbar-expand-lg navbar-light mr-auto" id="main-nav">
<div class="container-fluid">
<a class="navbar-brand" href="/">
<img src="/static/core/img/ICML-logo.svg" height="40px">
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav"
aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse text-right flex-grow-1" id="navbarNav">
<ul class="navbar-nav ml-auto">
<li class="nav-item ">
<a class="nav-link" href="/virtual/2022/events/tutorial">Tutorials</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button"
data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Main Conference
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
<a class="dropdown-item" href="/virtual/2022/events/oral">Orals</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2022/papers.html">Papers</a>
</div>
</li>
</ul>
</div>
</div>
</nav>
-->
<!-- NAV -->
<nav class="navbar sticky-top navbar-expand-lg navbar-light mr-auto" id="main-nav">
<div class="container-fluid">
<a class="navbar-brand" href="">
<img src="tmlr_logo.jpeg" height="40px">
Transactions on Machine Learning Research
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav"
aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse text-right flex-grow-1" id="navbarNav">
<ul class="navbar-nav ml-auto">
<!--
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button"
data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Main Conference
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/events/oral">Orals</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/events/spotlight">Spotlights</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/papers.html">Papers</a>
</div>
</li>
-->
<!--
<li class="nav-item">
<a class="nav-link" href="../">All Papers</a>
</li>
-->
<!--
<li class="nav-item">
<a class="nav-link" href="">Papers with Videos</a>
</li>
-->
<!--
<li class="nav-item">
<a class="nav-link" href="../featured_papers.html">Featured Papers</a>
</li>
-->
<!--
<li class="nav-item ">
<a class="nav-link" href="/virtual/2023/search"><i class="fas fa-search"></i> </a>
</li>
-->
</ul>
</div>
</div>
</nav>
<div class="container">
<br />
<div class="row">
<div class="col-md-12"></div>
<div class="title-centered" style="text-align:center">TMLR Infinite Conference</div>
</div>
</div>
<div class="row">
<div class="col-sm-12">
<div style="max-width: 1500px; margin:auto; border">
<div class="grid-displaycards">
<div class="displaycards touchup-date" id="event-k3Ab6RuJE9">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/k3Ab6RuJE9.html">Verbalized Machine Learning: Revisiting Machine Learning with Language Models</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Tim Z. Xiao · Robert Bamler · Bernhard Schölkopf · Weiyang Liu</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-k3Ab6RuJE9"></div>
<a href="paper_pages/k3Ab6RuJE9.html">
<img src="http://img.youtube.com/vi/LCl_np5oPWA/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-k3Ab6RuJE9" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-k3Ab6RuJE9" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-k3Ab6RuJE9">
Abstract <i id="caret-k3Ab6RuJE9" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-k3Ab6RuJE9">
<div class="abstract-display">
<p>Motivated by the progress of large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning (ML) models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt. Guided by this perspective, we revisit classical ML problems, such as regression and classification, and find that these problems can be solved by an LLM-parameterized learner and optimizer. The major advantages of VML include (1) easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner; (2) automatic model class selection: the optimizer can automatically select a model class based on data and verbalized prior knowledge, and it can update the model class during training; and (3) interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why an update is performed. We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-jJOVpnNrEp">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/jJOVpnNrEp.html">Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Minttu Alakuijala · Reginald McLean · Isaac Woungang · Nariman Farsad · Samuel Kaski · Pekka Marttinen · Kai Yuan</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-jJOVpnNrEp"></div>
<a href="paper_pages/jJOVpnNrEp.html">
<img src="http://img.youtube.com/vi/2oac-IKSs6k/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-jJOVpnNrEp" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-jJOVpnNrEp" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-jJOVpnNrEp">
Abstract <i id="caret-jJOVpnNrEp" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-jJOVpnNrEp">
<div class="abstract-display">
<p>Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-HkmymFPODz">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/HkmymFPODz.html">Deep Active Learning in the Open World</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Tian Xie · Jifan Zhang · Haoyue Bai · Robert D Nowak</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-HkmymFPODz"></div>
<a href="paper_pages/HkmymFPODz.html">
<img src="http://img.youtube.com/vi/MrKra6ZfVKo/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-HkmymFPODz" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-HkmymFPODz" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-HkmymFPODz">
Abstract <i id="caret-HkmymFPODz" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-HkmymFPODz">
<div class="abstract-display">
<p>Machine learning models deployed in open-world scenarios often encounter unfamiliar conditions and perform poorly in unanticipated situations. As AI systems advance and find application in safety-critical domains, effectively handling out-of-distribution (OOD) data is crucial to building open-world learning systems. In this work, we introduce ALOE, a novel active learning algorithm for open-world environments designed to enhance model adaptation by incorporating new OOD classes via a two-stage approach. First, diversity sampling selects a representative set of examples, followed by energy-based OOD detection to prioritize likely unknown classes for annotation. This strategy accelerates class discovery and learning, even under constrained annotation budgets. Evaluations on three long-tailed image classification benchmarks demonstrate that ALOE outperforms traditional active learning baselines, effectively expanding known categories while balancing annotation cost. Our findings reveal a crucial tradeoff between enhancing known-class performance and discovering new classes, setting the stage for future advancements in open-world machine learning.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-IrBYuh9W3T">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/IrBYuh9W3T.html">What Makes ImageNet Look Unlike LAION</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Ali Shirali · Moritz Hardt</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-IrBYuh9W3T"></div>
<a href="paper_pages/IrBYuh9W3T.html">
<img src="http://img.youtube.com/vi/ioUdKJgRrJ0/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-IrBYuh9W3T" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-IrBYuh9W3T" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-IrBYuh9W3T">
Abstract <i id="caret-IrBYuh9W3T" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-IrBYuh9W3T">
<div class="abstract-display">
<p>ImageNet was famously created by querying several image search engines such as Flickr. What if we recreated ImageNet instead by searching the massive LAION dataset based on image captions alone? In this work, we carry out this counterfactual investigation. We find that the resulting ImageNet recreation, which we call LAIONet, looks distinctly unlike the original. Specifically, the intra-class similarity of images in the original ImageNet is dramatically higher than it is for LAIONet. Consequently, models trained on ImageNet perform significantly worse on LAIONet. We propose a rigorous explanation for the discrepancy in terms of a subtle, yet important, difference in two plausible causal data-generating processes for the respective datasets, that we support with systematic experimentation. In a nutshell, searching based on an image caption alone creates an information bottleneck that mitigates the selection bias otherwise present in image-based filtering. Our explanation formalizes a long-held intuition in the community that ImageNet images are stereotypical, unnatural, and overly simple representations of the class category. At the same time, it provides a simple and actionable takeaway for future dataset creation efforts.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-1QeI99nH9k">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/1QeI99nH9k.html">Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Cullen Anderson · Jeff M. Phillips</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-1QeI99nH9k"></div>
<a href="paper_pages/1QeI99nH9k.html">
<img src="http://img.youtube.com/vi/Tp_1rFTRMBI/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-1QeI99nH9k" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-1QeI99nH9k" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-1QeI99nH9k">
Abstract <i id="caret-1QeI99nH9k" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-1QeI99nH9k">
<div class="abstract-display">
<p>Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on large data size requirements as a function of dimension. In this paper, we perform an extensive experimentation over various mean estimation techniques where data size might not meet this requirement due to the high-dimensional setting.
For data with inliers generated from a Gaussian with known covariance, we find experimentally that several robust mean estimation techniques can practically improve upon the sample mean, with the quantum entropy scaling approach from Dong \etal (NeurIPS 2019) performing consistently the best. However, this consistent improvement is conditioned on a couple of simple modifications to how the steps to prune outliers work in the high-dimension low-data setting, and when the inliers deviate significantly from Gaussianity. In fact, with these modifications, they are typically able to achieve roughly the same error as taking the sample mean of the uncorrupted inlier data, even with very low data size. In addition to controlled experiments on synthetic data, we also explore these methods on large language models, deep pretrained image models, and non-contextual word embedding models that do not necessarily have an inherent Gaussian distribution. Yet, in these settings, a mean point of a set of embedded objects is a desirable quantity to learn, and the data exhibits the high-dimension low-data setting studied in this paper. We show both the challenges of achieving this goal, and that our updated robust mean estimation methods can provide significant improvement over using just the sample mean. We additionally publish a library of Python implementations of robust mean estimation algorithms, allowing practitioners and researchers to apply these techniques and to perform further experimentation.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-H4S4ETc8c9">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/H4S4ETc8c9.html">Evaluation of Best-of-N Sampling Strategies for Language Model Alignment</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Yuki Ichihara · Yuu Jinnai · Tetsuro Morimura · Kenshi Abe · Kaito Ariu · Mitsuki Sakamoto · Eiji Uchibe</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-H4S4ETc8c9"></div>
<a href="paper_pages/H4S4ETc8c9.html">
<img src="http://img.youtube.com/vi/2HC6Bqvrdk4/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-H4S4ETc8c9" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-H4S4ETc8c9" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-H4S4ETc8c9">
Abstract <i id="caret-H4S4ETc8c9" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-H4S4ETc8c9">
<div class="abstract-display">
<p>Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) with human preferences at the time of decoding. BoN sampling is susceptible to a problem known as reward hacking. Since the reward model is an imperfect proxy for the true objective, an excessive focus on optimizing its value can lead to a compromise of its performance on the true objective. Previous work proposes Regularized BoN sampling (RBoN), a BoN sampling with regularization to the objective, and shows that it outperforms BoN sampling so that it mitigates reward hacking and empirically (Jinnai et al., 2024). However, Jinnai et al. (2024) introduce RBoN based on a heuristic and they lack the analysis of why such regularization strategy improves the performance of BoN sampling. The aim of this study is to analyze the effect of BoN sampling on regularization strategies. Using the regularization strategies corresponds to robust optimization, which maximizes the worst case over a set of possible perturbations in the proxy reward. Although the theoretical guarantees are not directly applicable to RBoN, RBoN corresponds to a practical implementation. This paper proposes an extension of the RBoN framework, called Stochastic RBoN sampling (SRBoN), which is a theoretically guaranteed approach to worst-case RBoN in proxy reward. We then perform an empirical evaluation using the AlpacaFarm and Anthropic’s hh-rlhf datasets to evaluate which factors of the regularization strategies contribute to the improvement of the true proxy reward. In addition, we also propose another simple RBoN method, the Sentence Length Regularized BoN, which has a better performance in the experiment as compared to the previous methods.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-O4CQ5AM5yP">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/O4CQ5AM5yP.html">REX: GPU-Accelerated Sim2Real Framework with Delay and Dynamics Estimation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Bas van der Heijden · Jens Kober · Robert Babuska · Laura Ferranti</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-O4CQ5AM5yP"></div>
<a href="paper_pages/O4CQ5AM5yP.html">
<img src="http://img.youtube.com/vi/7j30LUjTx_I/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-O4CQ5AM5yP" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-O4CQ5AM5yP" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-O4CQ5AM5yP">
Abstract <i id="caret-O4CQ5AM5yP" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-O4CQ5AM5yP">
<div class="abstract-display">
<p>Sim2real, the transfer of control policies from simulation to the real world, is crucial for efficiently solving robotic tasks without the risks associated with real-world learning. However, discrepancies between simulated and real environments, especially due to unmodeled dynamics and latencies, significantly impact the performance of these transferred policies. In this paper, we address the challenges of sim2real transfer caused by latency and asynchronous dynamics in real-world robotic systems. Our approach involves developing a novel framework, REX (Robotic Environments with jaX), that uses a graph-based simulation model to incorporate latency effects while optimizing for parallelization on accelerator hardware. Our framework simulates the asynchronous, hierarchical nature of real-world systems, while simultaneously estimating system dynamics and delays from real-world data and implementing delay compensation strategies to minimize the sim2real gap. We validate our approach on two real-world systems, demonstrating its effectiveness in improving sim2real performance by accurately modeling both system dynamics and delays. Our results show that the proposed framework supports both accelerated simulation and real-time processing, making it valuable for robot learning.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-HOnL5hjaIt">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/HOnL5hjaIt.html">Generalized Tangent Kernel: A Unified Geometric Foundation for Natural Gradient and Standard Gradient</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Qinxun Bai · Steven Rosenberg · Wei Xu</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-HOnL5hjaIt"></div>
<a href="paper_pages/HOnL5hjaIt.html">
<img src="http://img.youtube.com/vi/cp7QuHKvI8E/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-HOnL5hjaIt" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-HOnL5hjaIt" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-HOnL5hjaIt">
Abstract <i id="caret-HOnL5hjaIt" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-HOnL5hjaIt">
<div class="abstract-display">
<p>Natural gradients have been widely studied from both theoretical and empirical perspectives, and it is commonly believed that natural gradients have advantages over standard (Euclidean) gradients in capturing the intrinsic geometric structure of the underlying function space and being invariant under reparameterization. However, for function optimization, a fundamental theoretical issue regarding the existence of natural gradients on the function space remains underexplored. We address this issue by providing a geometric perspective and mathematical framework for studying both natural gradient and standard gradient that is more complete than existing studies. The key tool that unifies natural gradient and standard gradient is a generalized form of the Neural Tangent Kernel (NTK), which we name the Generalized Tangent Kernel (GTK). Using a novel orthonormality property of GTK, we show that for a fixed parameterization, GTK determines a Riemannian metric on the entire function space which makes the standard gradient as “natural" as the natural gradient in capturing the intrinsic structure of the parameterized function space. Many aspects of this approach relate to RKHS theory. For the practical side of this theory paper, we showcase that our framework motivates new solutions to the non-immersion/degenerate case of natural gradient and leads to new families of natural/standard gradient descent methods.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-ZRXwHRXm8i">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/ZRXwHRXm8i.html">CREW: Facilitating Human-AI Teaming Research</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Lingyu Zhang · Zhengran Ji · Boyuan Chen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-ZRXwHRXm8i"></div>
<a href="paper_pages/ZRXwHRXm8i.html">
<img src="http://img.youtube.com/vi/RINSo3uI0dI/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-ZRXwHRXm8i" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-ZRXwHRXm8i" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-ZRXwHRXm8i">
Abstract <i id="caret-ZRXwHRXm8i" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-ZRXwHRXm8i">
<div class="abstract-display">
<p>With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce \textbf{CREW}, a platform to facilitate Human-AI teaming research in real-time decision-making scenarios and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-Utjw2z1ale">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/Utjw2z1ale.html">Identifying Spurious Correlations using Counterfactual Alignment</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Joseph Paul Cohen · Louis Blankemeier · Akshay S Chaudhari</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-Utjw2z1ale"></div>
<a href="paper_pages/Utjw2z1ale.html">
<img src="http://img.youtube.com/vi/Qdz3woTLCF4/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-Utjw2z1ale" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-Utjw2z1ale" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-Utjw2z1ale">
Abstract <i id="caret-Utjw2z1ale" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-Utjw2z1ale">
<div class="abstract-display">
<p>Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-QezxDgd5hf">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/QezxDgd5hf.html">Mind the truncation gap: challenges of learning on dynamic graphs with recurrent architectures</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">João Bravo · Jacopo Bono · Hugo Ferreira · Pedro Saleiro · Pedro Bizarro</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-QezxDgd5hf"></div>
<a href="paper_pages/QezxDgd5hf.html">
<img src="http://img.youtube.com/vi/IdteTB8IzP8/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-QezxDgd5hf" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-QezxDgd5hf" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-QezxDgd5hf">
Abstract <i id="caret-QezxDgd5hf" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-QezxDgd5hf">
<div class="abstract-display">
<p>Systems characterized by evolving interactions, prevalent in social, financial, and biological domains, are effectively modeled as continuous-time dynamic graphs (CTDGs). To manage the scale and complexity of these graph datasets, machine learning (ML) approaches have become essential. However, CTDGs pose challenges for ML because traditional static graph methods fail to account for event timings naturally. Newer approaches, such as graph recurrent neural networks (GRNNs), are inherently time-aware and offer advantages over static methods for CTDGs. Yet, GRNNs face another issue: the short truncation of backpropagation-through-time (BPTT) whose impact has never been properly examined until now. In this work, we demonstrate that this truncation can limit the learning of dependencies more than a hop away, resulting in reduced performance. Through experiments on a novel synthetic task as well as real-world datasets, we reveal that there exists a performance gap between full backpropagation-through-time (F-BPTT) and the truncated backpropagation-through-time (T-BPTT) commonly used to train GRNN models. We term this gap the "truncation gap" and argue that understanding and addressing it is essential as the importance of CTDGs grows, discussing potential future directions of research for this type of models.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-7CUluLpLxV">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/7CUluLpLxV.html">Explaining Explainability: Recommendations for Effective Use of Concept Activation Vectors</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Angus Nicolson · Lisa Schut · Alison Noble · Yarin Gal</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-7CUluLpLxV"></div>
<a href="paper_pages/7CUluLpLxV.html">
<img src="http://img.youtube.com/vi/7LYfz5pKZdU/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-7CUluLpLxV" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-7CUluLpLxV" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-7CUluLpLxV">
Abstract <i id="caret-7CUluLpLxV" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-7CUluLpLxV">
<div class="abstract-display">
<p>Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs: (1) inconsistency across layers, (2) entanglement with other concepts, and (3) spatial dependency. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how each property can lead to misleading explanations, and provide recommendations to mitigate their impact. To demonstrate practical applications, we apply our recommendations to a melanoma classification task, showing how entanglement can lead to uninterpretable results and that the choice of negative probe set can have a substantial impact on the meaning of a CAV. Further, we show that understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on natural images (ImageNet), skin lesions (ISIC 2019), and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-BsMMc4MEGS">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/BsMMc4MEGS.html">CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Zachary S Siegel · Sayash Kapoor · Nitya Nadgir · Benedikt Stroebl · Arvind Narayanan</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-BsMMc4MEGS"></div>
<a href="paper_pages/BsMMc4MEGS.html">
<img src="http://img.youtube.com/vi/Nrml8ta3PFc/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-BsMMc4MEGS" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-BsMMc4MEGS" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-BsMMc4MEGS">
Abstract <i id="caret-BsMMc4MEGS" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-BsMMc4MEGS">
<div class="abstract-display">
<p>AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 19% on the hardest level of tasks, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step toward building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-aWRMvXTvPf">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/aWRMvXTvPf.html">Shapley Values of Structured Additive Regression Models and Application to RKHS Weightings of Functions</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Gabriel Dubé · Mario Marchand</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-aWRMvXTvPf"></div>
<a href="paper_pages/aWRMvXTvPf.html">
<img src="http://img.youtube.com/vi/hQM7fJ_aBrQ/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-aWRMvXTvPf" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-aWRMvXTvPf" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-aWRMvXTvPf">
Abstract <i id="caret-aWRMvXTvPf" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-aWRMvXTvPf">
<div class="abstract-display">
<p>Shapley values are widely used in machine learning to interpret model predictions. However, they have an important drawback in their computational time, which is exponential in the number of variables in the data. Recent work has yielded algorithms that can efficiently and exactly calculate the Shapley values of specific model families, such as Decision Trees and Generalized Additive Models (GAMs). Unfortunately, these model families are fairly restricted.
Consequently, we present STAR-SHAP, an algorithm for efficiently calculating the Shapley values of Structured Additive Regression (STAR) models, a generalization of GAMs which allow any number of variable interactions. While the computational cost of STAR-SHAP scales exponentially in the size of these interactions, it is independent of the total number of variables. This allows the interpretation of more complex and flexible models. As long as the variable interactions are moderately-sized, the computation of the Shapley values will be fast, even on high-dimensional datasets.
Since STAR models with more than pairwise interactions (e.g. GA2Ms) are seldom used in practice, we also present a new class of STAR models built on the RKHS Weightings of Functions paradigm. More precisely, we introduce a new RKHS Weighting instantiation, and show how to transform it and other RKHS Weightings into STAR models. We therefore introduce a new family of STAR models, as well as the means to interpret their outputs in a timely manner.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-MHJlFCqXdA">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/MHJlFCqXdA.html">Is Value Functions Estimation with Classification Plug-and- play for Offline Reinforcement Learning?</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Denis Tarasov · Kirill Brilliantov · Dmitrii Kharlapenko</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-MHJlFCqXdA"></div>
<a href="paper_pages/MHJlFCqXdA.html">
<img src="http://img.youtube.com/vi/xwfQ2Oa6ycs/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-MHJlFCqXdA" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-MHJlFCqXdA" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-MHJlFCqXdA">
Abstract <i id="caret-MHJlFCqXdA" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-MHJlFCqXdA">
<div class="abstract-display">
<p>In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-iVV7IzI55V">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/iVV7IzI55V.html">On Inherent Adversarial Robustness of Active Vision Systems</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Amitangshu Mukherjee · Timur Ibrayev · Kaushik Roy</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-iVV7IzI55V"></div>
<a href="paper_pages/iVV7IzI55V.html">
<img src="http://img.youtube.com/vi/_o7cw6MI5o0/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-iVV7IzI55V" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-iVV7IzI55V" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-iVV7IzI55V">
Abstract <i id="caret-iVV7IzI55V" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-iVV7IzI55V">
<div class="abstract-display">
<p>Deep Neural Networks (DNNs) are susceptible to adversarial inputs, such as imperceptible noise and naturally occurring challenging samples. This vulnerability likely arises from their passive, one-shot processing approach. In contrast, neuroscience suggests that human vision robustly identifies salient object features by actively switching between multiple fixation points (saccades) and processing surroundings with non-uniform resolution (foveation). This information is processed via two pathways: the dorsal (where) and ventral (what) streams, which identify relevant input portions and discard irrelevant details. Building on this perspective, we outline a deep learning-based active dorsal-ventral vision system and adapt two prior methods, FALcon and GFNet, within this framework to evaluate their robustness. We conduct a comprehensive robustness analysis across three categories: adversarially crafted inputs evaluated under transfer attack scenarios, natural adversarial images, and foreground-distorted images. By learning from focused, downsampled glimpses at multiple distinct fixation points, these active methods significantly enhance the robustness of passive networks, achieving a 2-21 % increase in accuracy. This improvement is demonstrated against state-of-the-art transferable black-box attack. On ImageNet-A, a benchmark for naturally occurring hard samples, we show how distinct predictions from multiple fixation points yield performance gains of 1.5-2 times for both CNN and Transformer based networks. Lastly, we qualitatively demonstrate how an active vision system aligns more closely with human perception for structurally distorted images. This alignment leads to more stable and resilient predictions, with lesser catastrophic mispredictions. In contrast, passive methods, which rely on single-shot learning and inference, often lack the necessary structural understanding.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-QlTLkH6xRC">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/QlTLkH6xRC.html">TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Sabera J Talukder · Yisong Yue · Georgia Gkioxari</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-QlTLkH6xRC"></div>
<a href="paper_pages/QlTLkH6xRC.html">
<img src="http://img.youtube.com/vi/OqrCpdb6MJk/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-QlTLkH6xRC" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-QlTLkH6xRC" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-QlTLkH6xRC">
Abstract <i id="caret-QlTLkH6xRC" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-QlTLkH6xRC">
<div class="abstract-display">
<p>This work studies the problem of time series analysis with generalist (or foundation) models, which are models trained across many data domains. Drawing inspiration from the widespread success of large language models, we consider the simple strategy of discretely tokenizing time series data drawn from a myriad of datasets via self-supervision, then using the fixed tokenization to solve a variety of tasks across many data domains. Canonically, time series models are either trained on a single dataset or built in a task-specific manner (e.g., a forecasting-only model), where many use patches of time as inputs to the model. As such, performant generalist, discrete representation time series models explored across many tasks are of value. Our method, TOkenized Time Series EMbeddings (TOTEM), produces such generalist time series models with minimal or no fine-tuning while exhibiting strong zero-shot performance. We evaluate TOTEM extensively over nearly 500 experiments on three commonly-studied time series tasks with real-world data: imputation (17 baselines, 12 datasets), anomaly detection (19 baselines, 25 datasets), and forecasting (14 baselines, 12 datasets). We conclude that TOTEM matches or outperforms existing state-of-the-art models in both the canonical specialist setting (i.e., training one model on one domain) as well as the generalist setting (i.e., training a single model on many domains), which demonstrates the efficacy of tokenization for general time series analysis. The open-source implementation is available here: https://github.com/SaberaTalukder/TOTEM; a video summary is available here: https://www.youtube.com/watch?v=OqrCpdb6MJk.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-lIy0TEUou7">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/lIy0TEUou7.html">Modular Quantization-Aware Training for 6D Object Pose Estimation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Saqib Javed · Chengkun Li · Andrew Lawrence Price · Yinlin Hu · Mathieu Salzmann</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-lIy0TEUou7"></div>
<a href="paper_pages/lIy0TEUou7.html">
<img src="http://img.youtube.com/vi/EBNr0qNem8U/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-lIy0TEUou7" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-lIy0TEUou7" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-lIy0TEUou7">
Abstract <i id="caret-lIy0TEUou7" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-lIy0TEUou7">
<div class="abstract-display">
<p>Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D object pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D object pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques. Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Additionally, we observe that MQAT quantized models can achieve an accuracy boost (>7% ADI-0.1d) over the baseline full-precision network while reducing model size by a factor of 4x or more.
Project Page: https://saqibjaved1.github.io/MQAT_</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-q7YXEbFOAt">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/q7YXEbFOAt.html">$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Kin G. Olivares · Geoffrey Négiar · Ruijun Ma · Oinam Nganba Meetei · Mengfei Cao · Michael W. Mahoney</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-q7YXEbFOAt"></div>
<a href="paper_pages/q7YXEbFOAt.html">
<img src="https://drive.google.com/thumbnail?id=1-xkmuYSB7YQDXOEaBKeFa-pAa1Gq1-i8" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-q7YXEbFOAt" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-q7YXEbFOAt" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-q7YXEbFOAt">
Abstract <i id="caret-q7YXEbFOAt" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-q7YXEbFOAt">
<div class="abstract-display">
<p>Obtaining accurate probabilistic forecasts is an operational challenge in many applications, such as energy management, climate forecasting, supply chain planning, and resource allocation.
Many of these applications present a natural hierarchical structure over the forecasted quantities; and forecasting systems that adhere to this hierarchical structure are said to be coherent.
Furthermore, operational planning benefits from the accuracy at all levels of the aggregation hierarchy. However, building accurate and coherent forecasting systems is challenging: classic multivariate time series tools and neural network methods are still being adapted for this purpose. In this paper, we augment an MQForecaster neural network architecture with a modified multivariate Gaussian factor model that achieves coherence by construction. The factor model samples can be differentiated with respect to the model parameters, allowing optimization on arbitrary differentiable learning objectives that align with the forecasting system's goals, including quantile loss and the scaled Continuous Ranked Probability Score (CRPS). We call our method the Coherent Learning Objective Reparametrization Neural Network (CLOVER). In comparison to state-of-the-art coherent forecasting methods,
CLOVER achieves significant improvements in scaled CRPS forecast accuracy, with average gains of 15%, as measured on six publicly-available datasets.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-tYxRyNT0TC">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/tYxRyNT0TC.html">Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Pingcheng Jian · Easop Lee · Zachary I. Bell · Michael M. Zavlanos · Boyuan Chen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-tYxRyNT0TC"></div>
<a href="paper_pages/tYxRyNT0TC.html">
<img src="http://img.youtube.com/vi/H6SD9Tcvhrg/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-tYxRyNT0TC" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-tYxRyNT0TC" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-tYxRyNT0TC">
Abstract <i id="caret-tYxRyNT0TC" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-tYxRyNT0TC">
<div class="abstract-display">
<p>Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our key idea is to enforce modularity of visual encoders by aligning the latent visual features among different visuomotor policies. Our method disentangles the perceptual knowledge with the downstream motion skills and allows the reuse of the visual encoders by directly stitching them to a policy network trained with partially different visual conditions. We evaluate our method in various simulated and real-world manipulation tasks. While baseline methods failed at all attempts, our method could achieve zero-shot success in real-world visuomotor tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-V2SD2uVKEE">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/V2SD2uVKEE.html">Zero-shot CLIP Class Forgetting via Text-image Space Adaptation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Alexey Kravets · Vinay P. Namboodiri</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-V2SD2uVKEE"></div>
<a href="paper_pages/V2SD2uVKEE.html">
<img src="tmlr_logo.jpeg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-V2SD2uVKEE" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-V2SD2uVKEE" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-V2SD2uVKEE">
Abstract <i id="caret-V2SD2uVKEE" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-V2SD2uVKEE">
<div class="abstract-display">
<p>Efficient class forgetting has attracted significant interest due to the high computational cost of retraining models from scratch whenever classes need to be forgotten. This need arises from data privacy regulations, the necessity to remove outdated information, and the possibility to enhance model robustness and security.
In this paper we address class forgetting in vision-language CLIP model. Modern class forgetting methods for CLIP have demonstrated that zero-shot forgetting is achievable by generating synthetic data and fine-tuning both visual and textual encoders with a regularization loss. Our approach shows that class forgetting in CLIP can be accomplished in a zero-shot manner without any visual data by adapting the shared vision-text space of CLIP, thereby making the class forgetting process more efficient. Our method delivers superior results, demonstrating strong performance and complete class removal, regardless of the visual encoder used in CLIP. Furthermore, we explore what exactly is being targeted by the class forgetting algorithm discovering some interesting properties of CLIP features.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-4c9UzDhg49">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/4c9UzDhg49.html">On the theoretical limit of gradient descent for Simple Recurrent Neural Networks with finite precision</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Volodimir Mitarchuk · Rémi Emonet · Remi Eyraud · Amaury Habrard</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-4c9UzDhg49"></div>
<a href="paper_pages/4c9UzDhg49.html">
<img src="http://img.youtube.com/vi/ap6LOok_Vtk/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-4c9UzDhg49" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-4c9UzDhg49" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-4c9UzDhg49">
Abstract <i id="caret-4c9UzDhg49" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-4c9UzDhg49">
<div class="abstract-display">
<p>Despite their great practical successes, the understanding of neural network behavior is still
a topical research issue. In particular, the class of functions learnable in the context of a
finite precision configuration is an open question. In this paper, we propose to study the
limits of gradient descent when such a configuration is set for the class of Simple Recurrent
Networks (SRN). We exhibit conditions under which the gradient descend will provably fail.
We also design a class of SRN based on Deterministic finite State Automata (DFA) that
fulfills the failure requirements. The definition of this class is constructive: we propose an
algorithm that, from any DFA, constructs a SRN that computes exactly the same function,
a result of interest by its own.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-SP8DLl6jgb">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/SP8DLl6jgb.html">Feature Distillation Improves Zero-Shot Transfer from Synthetic Images</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Niclas Popp · Jan Hendrik Metzen · Matthias Hein</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-SP8DLl6jgb"></div>
<a href="paper_pages/SP8DLl6jgb.html">
<img src="http://img.youtube.com/vi/KbdacNWGiAM/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-SP8DLl6jgb" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-SP8DLl6jgb" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-SP8DLl6jgb">
Abstract <i id="caret-SP8DLl6jgb" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-SP8DLl6jgb">
<div class="abstract-display">
<p>Vision-language foundation models such as CLIP have showcased impressive zero-shot capabilities. However, their applicability in resource-constrained environments is limited due to their size and the resulting latency. Knowledge distillation allows to mitigate these challenges by distilling small image encoders that can replace the large CLIP image encoder. In a zero-shot setting, where only the class names are known, no real domain images can be used for this process. Instead, we investigate the use of synthetic images for this purpose. Unlike existing works that focus on improving the quality of synthetic images to bridge the performance gap compared to training on natural images, we find the choice of loss to be a crucial factor. Specifically, minimizing only the distance between the student and teacher image features, without incorporating image captions in the loss function, increases the robustness to spurious features and data corruptions. As a result, this feature distillation approach greatly improves the transfer performance from synthetic to real images. Leveraging these insights, we are able to train domain-specific students that achieve zero-shot performance comparable to a ViT-B/32 teacher on six fine-grained classification datasets while using up to 92% fewer parameters.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-QdGtwjDgub">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/QdGtwjDgub.html">Contaminated Online Convex Optimization</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Tomoya Kamijima · Shinji Ito</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-QdGtwjDgub"></div>
<a href="paper_pages/QdGtwjDgub.html">
<img src="https://drive.google.com/thumbnail?id=1EwrCZxiGUj_iw5_i787d88x3Nqy-Kcij" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-QdGtwjDgub" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-QdGtwjDgub" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-QdGtwjDgub">
Abstract <i id="caret-QdGtwjDgub" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-QdGtwjDgub">
<div class="abstract-display">
<p>In online convex optimization, some efficient algorithms have been designed for each of the individual classes of objective functions, e.g., convex, strongly convex, and exp-concave. However, existing regret analyses, including those of universal algorithms, are limited to cases in which the objective functions in all rounds belong to the same class and cannot be applied to cases in which the property of objective functions may change in each time step. This paper introduces a novel approach to address such cases, proposing a new regime we term as \textit{contaminated} online convex optimization. For the contaminated case, we demonstrate that the regret is lower bounded by $\Omega(\log T + \sqrt{k})$. Here, $k$ signifies the level of contamination in the objective functions. We also demonstrate that the regret is bounded by $O(\log T+\sqrt{k\log T})$ when universal algorithms are used. When our proposed algorithms with additional information are employed, the regret is bounded by $O(\log T+\sqrt{k})$, which matches the lower bound. These are intermediate bounds between a convex case and a strongly convex or exp-concave case.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-t9c3pfrR1X">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/t9c3pfrR1X.html">OmniPred: Language Models as Universal Regressors</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Xingyou Song · Oscar Li · Chansoo Lee · Bangding Yang · Daiyi Peng · Sagi Perel · Yutian Chen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-t9c3pfrR1X"></div>
<a href="paper_pages/t9c3pfrR1X.html">
<img src="http://img.youtube.com/vi/fv-cK9LgQmk/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-t9c3pfrR1X" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-t9c3pfrR1X" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-t9c3pfrR1X">
Abstract <i id="caret-t9c3pfrR1X" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-t9c3pfrR1X">
<div class="abstract-display">
<p>Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over (x,y) data from arbitrary formats. Using data sourced from Google Vizier, one of the largest proprietary blackbox optimization databases in the world, our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-XxbQAsxrRC">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/XxbQAsxrRC.html">Maximally Expressive GNNs for Outerplanar Graphs</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Franka Bause · Fabian Jogl · Patrick Indri · Tamara Drucks · David Penz · Nils Morten Kriege · Thomas Gärtner · Pascal Welke · Maximilian Thiessen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-XxbQAsxrRC"></div>
<a href="paper_pages/XxbQAsxrRC.html">
<img src="http://img.youtube.com/vi/AW6Cy6pcc1Y/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-XxbQAsxrRC" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-XxbQAsxrRC" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-XxbQAsxrRC">
Abstract <i id="caret-XxbQAsxrRC" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-XxbQAsxrRC">
<div class="abstract-display">
<p>We propose a linear time graph transformation that enables the Weisfeiler-Leman (WL) algorithm and message passing graph neural networks (MPNNs) to be maximally expressive on outerplanar graphs. Our approach is motivated by the fact that most pharmaceutical molecules correspond to outerplanar graphs. Existing research predominantly enhances the expressivity of graph neural networks without specific graph families in mind. This often leads to methods that are impractical due to their computational complexity. In contrast, the restriction to outerplanar graphs enables us to encode the Hamiltonian cycle of each biconnected component in linear time. As the main contribution of the paper we prove that our method achieves maximum expressivity on outerplanar graphs. Experiments confirm that our graph transformation improves the predictive performance of MPNNs on molecular benchmark datasets at negligible computational overhead.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-lh6vOAHuvo">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/lh6vOAHuvo.html">AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Subhojeet Pramanik · Esraa Elelimy · Marlos C. Machado · Adam White</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-lh6vOAHuvo"></div>
<a href="paper_pages/lh6vOAHuvo.html">
<img src="http://img.youtube.com/vi/-bTe48JIUds/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-lh6vOAHuvo" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-lh6vOAHuvo" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-lh6vOAHuvo">
Abstract <i id="caret-lh6vOAHuvo" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-lh6vOAHuvo">
<div class="abstract-display">
<p>In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their applicability in online reinforcement learning: (1) in order to remember all past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance in harder tasks.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-M3SkSMfWcP">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/M3SkSMfWcP.html">Adaptive Multi-step Refinement Network for Robust Point Cloud Registration</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Zhi Chen · Yufan Ren · Tong Zhang · Zheng Dang · Wenbing Tao · Sabine Susstrunk · Mathieu Salzmann</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-M3SkSMfWcP"></div>
<a href="paper_pages/M3SkSMfWcP.html">