推荐阅读：
- 2020-2021年计算机视觉综述论文汇总
- 国内外优秀的计算机视觉团队汇总

CVPR2021最新信息及论文下载贴（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）

官网链接：http://cvpr2021.thecvf.com
时间：2021年6月19日-6月25日
论文接收公布时间：2021年2月28日

1.CVPR2021接受论文/代码分方向整理(持续更新)

分类目录：

1. 检测

图像目标检测(Image Object Detection)
视频目标检测(Video Object Detection)
三维目标检测(3D Object Detection)
人物交互检测(HOI Detection)
伪装目标检测(Camouflaged Object Detection)
旋转目标检测(Rotation Object Detection)
显著性目标检测(Saliency Object Detection)
图像异常检测(Anomally Detection in Image))
关键点检测(Keypoint Detection)

2. 分割(Segmentation)

图像分割(Image Segmentation)
全景分割(Panoptic Segmentation)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
超像素(Superpixel)
视频目标分割(Video Object Segmentation)
抠图(Matting)
密集预测(Dense Prediction)

3. 图像处理(Image Processing)

超分辨率(Super Resolution)
图像复原/图像增强(Image Restoration)
图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
图像去噪/去模糊/去雨去雾(Image Denoising)
图像编辑/修复(Image Edit/Image Inpainting)
图像翻译(Image Translation)
图像质量评估(Image Quality Assessment)
风格迁移(Style Transfer)

4. 估计(Estimation)

姿态估计(Pose Estimation)
手势估计(Gesture Estimation)
光流/位姿/运动估计(Flow/Pose/Motion Estimation)
深度估计(Depth Estimation)

5. 图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)
行人重识别/检测(Re-Identification/Detection)
图像/视频字幕(Image/Video Caption)

6. 人脸(Face)

人脸识别/检测(Facial Recognition/Detection)
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

10. 文本检测/识别(Text Detection/Recognition)

11. 遥感图像(Remote Sensing Image)

12. GAN/生成式/对抗式(GAN/Generative/Adversarial)

13. 图像生成/合成(Image Generation/Image Synthesis)

视图合成(View Synthesis)

14. 场景图(Scene Graph

场景图生成(Scene Graph Generation)
场景图预测(Scene Graph Prediction)
场景图理解(Scene Graph Understanding)

16. 视觉推理/视觉问答(Visual Reasoning/VQA)

17. 图像分类(Image Classification)

18. 神经网络结构设计(Neural Network Structure Design)

Transformer
图神经网络(GNN)
神经网络架构搜索(NAS)

19. 模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)
剪枝(Pruning)
量化(Quantization)

20. 模型训练/泛化(Model Training/Generalization)

噪声标签(Noisy Label)
长尾分布(Long-Tailed Distribution)

22. 数据处理(Data Processing)

数据增广(Data Augmentation)
表征学习(Representation Learning)
归一化/正则化(Batch Normalization)
图像聚类(Image Clustering)
图像压缩(Image Compression)
异常检测(Anomaly Detection)

24. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)

25. 持续学习(Continual Learning/Life-long Learning)

26. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

28. 对比学习(Contrastive Learning)

29. 增量学习(Incremental Learning)

30. 强化学习(Reinforcement Learning)

32. 多模态学习(Multi-Modal Learning)

视听学习(Audio-visual Learning)

33. 视觉预测(Vision-based Prediction)

检测

图像目标检测(Image Object Detection)

[25] Domain-Specific Suppression for Adaptive Object Detection(领域特定的自适应对象检测抑制)
paper

[24] Line Segment Detection Using Transformers without Edges(【线段检测】使用没有边缘的Transformer进行线段检测)
paper

[23] IQDet: Instance-wise Quality Distribution Sampling for Object Detection(IQDet：用于对象检测的按实例进行质量分布采样)
paper

[22] Adaptive Class Suppression Loss for Long-Tail Object Detection(长尾目标检测的自适应类抑制损失)
paper | code

[21] DAP: Detection-Aware Pre-training with Weak Supervision(具有弱监督的可感知检测的预训练)
paper

[20] Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection(稠密关系蒸馏与上下文感知聚合用于小样本对象检测)
paper ｜ code

[19] Scale-aware Automatic Augmentation for Object Detection(用于物体检测的可感知规模的自动增强)
paper | code

[18] Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection(数据不确定性指导的多阶段学习，用于半监督对象检测)
paper

[17] OTA: Optimal Transport Assignment for Object Detection(OTA：用于对象检测的最佳传输分配)
paper | code

[16] Distilling Object Detectors via Decoupled Features(通过解耦功能蒸馏物体检测器)
paper | code

[15] I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors(I ^ 3Net：用于适应一阶段对象检测器的隐式实例不变网络)
paper

[14] Robust and Accurate Object Detection via Adversarial Learning(通过对抗学习进行稳健而准确的目标检测)
paper

[13] You Only Look One-level Feature
paper | code

[12] End-to-End Object Detection with Fully Convolutional Network()
paper | code
解读：丢弃Transformer，FCN也可以实现E2E检测

[11] FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding(通过对比提案编码进行的小样本目标检测)
paper

[10] Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection(学习可靠的定位质量估计用于密集目标检测)
paper | code
解读:大白话 Generalized Focal Loss V2

[9] MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection(用于类别识别无监督域自适应对象检测)
paper

[8] OPANAS: One-Shot Path Aggregation Network Architecture Search for Object(一键式路径聚合网络体系结构搜索对象)
paper | code

[7] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(小样本目标检测的语义关系推理)
paper

[6] General Instance Distillation for Object Detection(通用实例蒸馏技术在目标检测中的应用)
paper

[5] Instance Localization for Self-supervised Detection Pretraining(自监督检测预训练的实例定位)
paper｜code

[4] Multiple Instance Active Learning for Object Detection（用于对象检测的多实例主动学习）
paper | code

[3] Towards Open World Object Detection(开放世界中的目标检测)
paper | code

[2] Positive-Unlabeled Data Purification in the Wild for Object Detection(野外检测对象的阳性无标签数据提纯)

[1] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
paper | code
解读：无监督预训练检测器

视频目标检测(Video Object Detection)

[4] Dogfight: Detecting Drones from Drones Videos(从无人机视频中检测无人机)
paper

[3] Depth from Camera Motion and Object Detection(相机运动和物体检测的深度)
paper

[2] There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge(多模态知识提取的自监督多目标检测与有声跟踪)
paper | video | project

[1] Dogfight: Detecting Drones from Drone Videos（从无人机视频中检测无人机）

三维目标检测(3D object detection)

[14] SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud()
paper | code

[13] Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds(点云中基于投票的3D对象检测的回溯代表点)
paper | code

[12] Objects are Different: Flexible Monocular 3D Object Detection(对象不同：灵活的单眼3D对象检测)
paper | code

[11] HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection(HVPR：用于单阶段3D对象检测的混合体素点表示)
paper

[10] GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection(用于单眼3D对象检测的数学可微分的分组NMS)
paper | code

[9] Delving into Localization Errors for Monocular 3D Object Detection(深入研究单目3D对象检测的定位错误)
paper | code

[8] Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection(用于单眼3D对象检测的深度条件动态消息传播)
paper | code

[7] LiDAR R-CNN: An Efficient and Universal 3D Object Detector(高效且通用的3D对象检测器)
paper | code

[6] M3DSSD: Monocular 3D Single Stage Object Detector(单眼3D单级目标检测器)
paper

[5] MonoRUn: Monocular 3D Object Detection by Self-Supervised Reconstruction and Uncertainty Propagation(通过自我监督的重构和不确定性传播进行单眼3D目标检测)
paper

[4] ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection(ST3D：在三维目标检测上进行无监督域自适应的自训练)
paper | code

[3] Center-based 3D Object Detection and Tracking(基于中心的3D目标检测和跟踪)
paper | code

[2] 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection(利用IoU预测进行半监督3D对象检测)
paper | code | project | video

[1] Categorical Depth Distribution Network for Monocular 3D Object Detection(用于单目三维目标检测的分类深度分布网络)
paper

人物交互检测(HOI Detection)

[7] HOTR: End-to-End Human-Object Interaction Detection with Transformers(HOTR：使用变压器进行端到端的人与对象交互检测)
paper

[6] Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection(凝视与凝视：推断行动感知点，用于一阶段的人物交互检测)
paper

[5] Affordance Transfer Learning for Human-Object Interaction Detection(物价转移学习用于人物交互检测)
paper | code

[4] Detecting Human-Object Interaction via Fabricated Compositional Learning(通过人为构图学习检测人与物体的相互作用)
paper | code

[3] Reformulating HOI Detection as Adaptive Set Prediction(将人物交互检测重新配置为自适应集预测)
paper | code

[2] QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information(具有图像范围的上下文信息的基于查询的成对人物交互检测)
paper | code

[1] End-to-End Human Object Interaction Detection with HOI Transformer(使用HOI Transformer进行端到端的人类对象交互检测)
paper | code

伪装目标检测(Camouflaged Object Detection)

[2] Uncertainty-aware Joint Salient Object and Camouflaged Object Detection(不确定度联合显着物体和伪装物体检测)
paper

[1] Simultaneously Localize, Segment and Rank the Camouflaged Objects(同时定位，分割和排序伪装的对象)
paper | code

旋转目标检测(Rotation Object Detection)

[2] ReDet: A Rotation-equivariant Detector for Aerial Object Detection(ReDet：用于航空物体检测的等速旋转检测器)
paper | code

[1] Dense Label Encoding for Boundary Discontinuity Free Rotation Detection(密集标签编码，用于边界不连续自由旋转检测)
paper | code | 解读-DCL：旋转目标检测新方法

显著性检测(Saliency Object Detection)

[3] Weakly Supervised Video Salient Object Detection(弱监督视频显著性目标检测)
paper

[2] Group Collaborative Learning for Co-Salient Object Detection(协同显著性目标检测的小组协作学习)
paper | project

[1] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)
paper

图像异常检测(Anomally Detection in Image)

[1] Multiresolution Knowledge Distillation for Anomaly Detection(用于异常检测的多分辨率知识蒸馏)
paper

关键点检测(Keypoint Detection)

[1] Skeleton Merger: an Unsupervised Aligned Keypoint Detector(骨架合并：无监督的对准关键点检测器)
paper | code

分割(Segmentation)

图像分割(Image Segmentation)

[11] Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation(每种注释都至关重要：医学图像分割的多标签深度监管)
paper

[10] Camouflaged Object Segmentation with Distraction Mining(【伪装目标分割】基于分心挖掘的伪装目标分割)
paper

[9] Adaptive Prototype Learning and Allocation for Few-Shot Segmentation(小样本分割的自适应原型学习和分配)
paper

[8] DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation(DiNTS：用于3D医学图像分割的可区分神经网络拓扑搜索)
paper

[7] Self-Guided and Cross-Guided Learning for Few-Shot Segmentation(自我指导和交叉指导学习，用于少量分割)
paper

[6] Locate then Segment: A Strong Pipeline for Referring Image Segmentation(找到然后分割：用于参考图像分割的强大管道)
paper

[5] Boundary IoU: Improving Object-Centric Image Segmentation Evaluation(边界IoU：改进以对象为中心的图像分割评估)
paper | code

[4] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)
paper

[3] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space(在连续频率空间中通过情景学习进行医学图像分割的联合域泛化)
paper | code

[2] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?(【小样本】没有元学习的小样本分割：你只需要一个好的转换推论？)
paper | code

[1] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)

全景分割(Panoptic Segmentation)

[4] Panoptic Segmentation Forecasting(全景分割预测)
paper

[3] Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation(无提案的LiDAR点云全景分割)
paper

[2] Cross-View Regularization for Domain Adaptive Panoptic Segmentation(用于域自适应全景分割的跨视图正则化)
paper

[1] 4D Panoptic LiDAR Segmentation（4D全景LiDAR分割）
paper

语义分割(Semantic Segmentation)

[24] Self-supervised Augmentation Consistency for Adapting Semantic Segmentation(自适应语义分割的自我监督增强一致性)
paper | code

[23] DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation(DANNet：一种用于无监督夜间语义切分的单阶段域自适应网络)
paper

[22] Improving Online Performance Prediction for Semantic Segmentation(改进用于语义分割的在线性能预测)
paper

[21] Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization(生成模型的语义分割：半监督学习和强大的域外泛化)
paper ｜ code

[20] Progressive Semantic Segmentation(渐进式语义分割)
paper

[19] InverseForm: A Loss Function for Structured Boundary-Aware Segmentation(结构化边界感知分割的损失函数)
paper

[18] 3D-to-2D Distillation for Indoor Scene Parsing(用于室内场景解析的3D到2D蒸馏)
paper

[17] One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation(一键式点击：一种用于弱监督3D语义分割的自训练方法)
paper

[16] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation(弱监督语义分割的背景感知池和噪声感知损失)
paper

[15] PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering(PiCIE：在聚类中使用不变性和等方差的无监督语义分割)
paper | code

[14] Source-Free Domain Adaptation for Semantic Segmentation(用于语义分割的无源域自适应)
paper

[13] RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening(通过实例选择性增白提高城市场景分割中的域泛化)
paper | code

[12] Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization(具有光度对齐和类别中心正则化的粗到细域自适应语义分割)
paper

[11] Cross-Dataset Collaborative Learning for Semantic Segmentation(跨数据集协同学习的语义分割)
paper

[10] BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation(用于弱监督语义和实例细分的边界框归因图)
paper

[9] Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations(通过稀疏和纠缠的潜在表示的排斥力进行连续语义分割)
paper

[8] Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion(通过双边扩充和自适应融合对实点云场景进行语义分割)
paper

[7] Capturing Omni-Range Context for Omnidirectional Segmentation(捕获全方位上下文进行全方位分割)
paper

[6] MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation(MetaCorrection：语义分割中无监督域自适应的域感知元丢失校正)
paper

[5] Learning Statistical Texture for Semantic Segmentation(学习用于语义分割的统计纹理)
paper

[4] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation(基于双层域混合的半监督域自适应语义分割)
paper

[3] Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation(多源领域自适应与协作学习的语义分割)
paper

[2] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)
paper | code

[1] PLOP: Learning without Forgetting for Continual Semantic Segmentation（PLOP：学习而不会忘记连续的语义分割）
paper

实例分割(Instance Segmentation)

[11] A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation(A ^ 2-FPN：基于注意力聚合的特征金字塔网络，用于实例分割)
paper

[10] RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features(RefineMask：通过细粒度功能实现高质量实例分割)
paper | code

[9] Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation(看起来更接近以更好地分割：用于实例分割的边界补丁优化)
paper

[8] Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation(空间特征校准和时间融合，以实现有效的一级视频实例分割)
paper | code

[7] DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images(DARCNN：用于生物医学图像中无监督实例分割的基于域自适应区域的卷积神经网络)
paper

[6] Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images(通过带有显着图像的类不可知学习进行弱监督实例分割)
paper | code

[5] FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter(FAPIS：少量基于无锚的基于实例分割器)
paper

[4] Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency(具有时间掩码一致性的视频的弱监督实例分割)
paper

[3] Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers(具有重叠BiLayer的深度遮挡感知实例分割)
paper | code

[2] BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation(用于弱监督语义和实例细分的边界框归因图)
paper

[1] End-to-End Video Instance Segmentation with Transformers(使用Transformer的端到端视频实例分割)
paper | code

超像素(Superpixel)

[1] Learning the Superpixel in a Non-iterative and Lifelong Manner(以非迭代和终身的方式学习超像素)
paper

视频目标分割(Video Object Segmentation)

[6] Learning Position and Target Consistency for Memory-based Video Object Segmentation(基于内存的视频对象分割的学习位置和目标一致性)
paper

[5] Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps(基于可靠性的注意映射引导交互式视频对象分割)
paper | code

[4] Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation(无监督视频多对象分割的目标感知对象发现和关联)
paper

[3] Efficient Regional Memory Network for Video Object Segmentation(用于视频对象分割的高效区域存储网络)
paper

[2] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild(学习推荐帧用于交互式野外视频对象分割)
paper | code

[1] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion(模块化交互式视频对象分割：面具交互，传播和差异感知融合)
paper | project

抠图(Matting)

[1] Real-Time High Resolution Background Matting
paper | code | project | video

密集预测(Dense Prediction)

[3] Generic Perceptual Loss for Modeling Structured Output Dependencies(用于建模结构化输出依存关系的一般感知损失)
paper

[2]Densely connected multidilated convolutional networks for dense prediction tasks（用于密集预测任务的多重卷积连接网络）
paper

[1] Dense Contrastive Learning for Self-Supervised Visual Pre-Training(自监督视觉预训练的密集对比学习)
paper | code

估计(Estimation)

姿态估计(Human Pose Estimation)

[19] Unsupervised Human Pose Estimation through Transforming Shape Templates(通过变换形状模板的无监督人体姿势估计)
paper | project

[18] Body Meshes as Points(身体网格物体为点)
paper | code

[17] PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation(PoseAug：用于3D人类姿势估计的可微分姿势增强框架)
paper | code

[16] AGORA: Avatars in Geography Optimized for Regression Analysis(AGORA：针对回归分析进行了优化的地理头像)
paper | project

[15] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration(用于3D人体网格配准的局部感知分段变换字段)
paper | code

[14] Pose Recognition with Cascade Transformers(级联Transformer的姿势识别)
paper | code

[13] Lite-HRNet: A Lightweight High-Resolution Network(Lite-HRNet：轻巧的高分辨率网络)
paper | code

[12] Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo(具有平面扫描立体声的多视图多人3D姿势估计)
paper | code

[11] Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression(通过解聚的关键点自下而上的人体姿势估计)
paper | code

[10] Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks(通过集成自上而下和自下而上的网络进行单眼3D多人姿势估计)
paper | code

[9] Reconstructing 3D Human Pose by Watching Humans in the Mirror(通过照镜子中的人来重建3D人的姿势)
paper | project

[8] SimPoE: Simulated Character Control for 3D Human Pose Estimation(用于3D人体姿势估计的模拟角色控制)
paper | project

[7] Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors(人体姿势系统（HPS）：人体安装传感器在大场景中的3D人体姿势估计和自定位)
paper | project

[6] Graph Stacked Hourglass Networks for 3D Human Pose Estimation(用于3D人体姿势估计的图形堆叠沙漏网络)
paper

[5] From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation(【动物姿态估计】从合成到真实：用于动物姿势估计的无监督域自适应)
paper | code

[4] DCPose: Deep Dual Consecutive Network for Human Pose Estimation(用于人体姿态估计的深度双重连续网络)
paper | code

[3] Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing(用于实例感知人类语义解析的可微分多粒度人类表示学习)
paper | code

[2] CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild（野外自监督的单眼3D人类姿态估计）

[1] PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers（具有透视作物层的3D姿势的几何感知神经重建）
paper

手势估计(Gesture Estimation)

[5] ContactOpt: Optimizing Contact to Improve Grasps(ContactOpt：优化联系人以提高抓地力)
paper

[4] Fingerspelling Detection in American Sign Language(美国手语中的手指拼写检测)
paper

[3] Read and Attend: Temporal Localisation in Sign Language Videos(阅读和参加：手语视频中的时间本地化)
paper | [project](https://www.robots.ox.ac.uk/ ̃vgg/research/bslattend/)

[2] Skeleton Based Sign Language Recognition Using Whole-body Keypoints(基于全身关键点的基于骨架的手语识别)
paper | code

[1] Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration(基于语义聚合和自适应2D-1D配准的相机空间手部网格恢复)
paper | code

光流/位姿/运动估计(Optical Flow/Pose/Motion Estimation)

[16] Extreme Rotation Estimation using Dense Correlation Volumes(使用密集相关体积的极端旋转估计)
paper | project

[15] Motion Representations for Articulated Animation(【运动估计&表示】关节动画的运动表示)
paper | code

[14] Self-Supervised Pillar Motion Learning for Autonomous Driving(【运动估计】用于自动驾驶的自我监督支柱运动学习)
paper

[13] Single-view robot pose and joint angle estimation via render & compare(通过渲染和比较进行单视图机器人姿态和关节角度估计)
paper | code

[12] Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty(新旧融合：通过几何引导的不确定性学习相对相机姿势)
paper

[11] VOLDOR: Visual Odometry from Log-logistic Dense Optical flow Residuals(【视觉测距】VOLDOR：来自对数逻辑密集光流残差的视觉里程表)
paper

[10] DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency(【6D位姿估计】通过双尺度一致性学习6DoF对象姿势估计)
paper

[9] Learning optical flow from still images(【光流估计】从静止图像中学习光流)
paper | project

[8] Learning Optical Flow from a Few Matches(【光流估计】通过少量匹配学习光流)
paper | code

[7] FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds(【光流估计】FESTA：场景点云通过时空注意进行光流估计)
paper

[6] Wide-Depth-Range 6D Object Pose Estimation in Space(【6D位姿估计】空间中的深度范围6D对象姿态估计)
paper

[5] Deep Two-View Structure-from-Motion Revisited(重新审视运动的深层两视图结构)
paper

[4] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism(【6D位姿估计】具有分离旋转机制的类别级6D对象姿势估计的快速基于形状的网络)
paper

[3] GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation(【6D位姿估计】用于单眼6D对象姿态估计的几何引导直接回归网络)
paper | code

[2] Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments(在动态室内环境中，通过空间划分的鲁棒神经路由可实现摄像机的重新定位)
paper | project

[1] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)
paper | code

深度估计(Depth Estimation)

[16] Self-Supervised Multi-Frame Monocular Scene Flow(自监督多帧单眼场景流)
paper | code

[15] Binary TTC: A Temporal Geofence for Autonomous Navigation(【接触时间估计】二进制TTC：自主导航的时空地理围栏)
paper

[14] The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth(时间机会主义者：自我监督的多帧单眼深度)
paper

[13] Lighting, Reflectance and Geometry Estimation from 360∘ Panoramic Stereo(360∘全景立体的光照、反射率和几何估计)
paper | code

[12] Depth Completion using Plane-Residual Representation(使用平面残差表示法的深度补全)
paper

[11] StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision(通过立体视觉进行深度感知的布景人类数字化)
paper | projec

[10] Self-supervised Learning of Depth Inference for Multi-view Stereo(多视图立体声深度推理的自我监督学习)
paper | code

[9] Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries(遮挡边界处的深度补全和双曲面外推)
paper

[8] S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation(学习通用的深度特定的结构表示)
paper

[7] RGB-D Local Implicit Function for Depth Completion of Transparent Objects(RGB-D局部隐式函数用于透明对象的深度补全)
paper | code

[6] LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering(通过可分辨深度渲染进行单眼360布局估算)
paper | project

[5] Deep Two-View Structure-from-Motion Revisited(重新审视运动的深层两视图结构)
paper

[4] Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging(学习微透镜掩模以在飞行时间成像中进行飞行像素校正)
paper | project

[3] Generalizing to the Open World: Deep Visual Odometry with Online Adaptation(推广到开放世界：具有在线适应功能的深度视觉里程表)
paper

[2] Beyond Image to Depth: Improving Depth Prediction using Echoes(超越图像深度：使用回声改善深度预测)
paper | code

[1] PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss(具有神经位置编码和蒸馏消光损耗的自我监督单视图深度估计的像素级精度)
paper

图像处理(Image Processing)

[1] Invertible Image Signal Processing(可逆图像信号处理)
paper | code

超分辨率(Super Resolution)

[8] Temporal Modulation Network for Controllable Space-Time Video Super-Resolution(可控时空视频超分辨率的时间调制网络)
paper | code

[7] SRWarp: Generalized Image Super-Resolution under Arbitrary Transformation(SRWarp：任意变换下的广义图像超分辨率)
paper

[6] Unsupervised Degradation Representation Learning for Blind Super-Resolution(盲超分辨率的无监督退化表示学习)
paper | code

[5] Flow-based Kernel Prior with Application to Blind Super-Resolution(基于流的内核先于盲超分辨率的应用)
paper | code

[4] ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic(通过数据特征加速超分辨率网络的通用框架)
paper | 解读-超分性能不降低，计算量降低50%：加速图像超分的ClassSR

[3] Learning Continuous Image Representation with Local Implicit Image Function(通过局部隐含图像功能学习连续图像表示)
paepr | code | video | project

[2] Data-Free Knowledge Distillation For Image Super-Resolution(DAFL算法的SR版本)

[1] AdderSR: Towards Energy Efficient Image Super-Resolution(将加法网路应用到图像超分辨率中)
paper | code
解读：华为开源加法神经网络

图像复原/图像增强(Image Restoration)

[3] Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network(利用动态跳跃连接网络消除显示下摄像机衍射图像伪影)
paper

[2] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)
paper | code

[1] Multi-Stage Progressive Image Restoration(多阶段渐进式图像复原)
paper | code

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[3] From Shadow Generation to Shadow Removal(从阴影生成到阴影去除)
paper

[2] Robust Reflection Removal with Reflection-free Flash-only Cues(通过无反射的仅含Flash线索进行鲁棒的反射去除)
paper | code

[1] Auto-Exposure Fusion for Single-Image Shadow Removal(用于单幅图像阴影去除的自动曝光融合)
paper | code

图像去噪/去模糊/去雨去雾(Image Denoising)

[5] Contrastive Learning for Compact Single Image Dehazing(紧凑型单图像去雾的对比学习)
paper | code

[4] Explore Image Deblurring via Blur Kernel Space(通过模糊内核空间探索图像去模糊)
paper

[3] Semi-Supervised Video Deraining with Dynamic Rain Generator(带动态雨水产生器的半监督视频去雨)
paper

[2] ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring(学习用于视频去模糊的全范围体积对应)
paper

[1] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects(快速移动物体的去模糊和形状恢复)
paper | code | video

图像编辑/图像修复(Image Edit/Inpainting)

[11] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(PD-GAN：用于图像修复的概率多样GAN)
paper

[10] StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing(StyleMapGAN：利用GAN中潜在的空间维度进行实时图像编辑)
paper | code

[9] Image Inpainting with External-internal Learning and Monochromic Bottleneck(具有内在内在学习和单色瓶颈的图像修复)
paper

[8] TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations(通过合并多个颜色和空间变换进行参考引导的图像修复) paper

[7] DeFLOCNet: Deep Image Editing via Flexible Low-level Controls(通过灵活的低级控件进行深度图像编辑)
paper

[6] Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE(使用分层VQ-VAE生成图像修复的多样结构)
paper | code

[5] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[4] DeFLOCNet: Deep Image Editing via Flexible Low level Controls(通过灵活的低级控件进行深度图像编辑)

[3] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(用于图像修复的概率多样GAN)

[2] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)
paper | code

[1] Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing（利用GAN中潜在的空间维度进行实时图像编辑）

图像翻译(Image Translation)

[8] Visualizing Adapted Knowledge in Domain Transfer(领域转移中适应性知识的可视化)
paper | code

[7] Memory-guided Unsupervised Image-to-image Translation(内存引导的无监督图像到图像翻译)
paper

[6] ReMix: Towards Image-to-Image Translation with Limited Data(使用有限的数据实现图像到图像的翻译)
paper

[5] Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation(闭环：通过解图像翻译联合产生和去除雨水)
paper

[4] CoMoGAN: continuous model-guided image-to-image translation(连续的模型指导的图像到图像翻译)
paper | code

[3] Spatially-Adaptive Pixelwise Networks for Fast Image Translation(空间自适应像素网络，用于快速图像翻译)
paper | project

[2] Image-to-image Translation via Hierarchical Style Disentanglement
paper | code | 解读-层次风格解耦：人脸多属性篡改终于可控了

[1] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation(样式编码：用于图像到图像翻译的StyleGAN编码器)
paper | code | project

图像质量评估(Image Quality Assessment)

[1] SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance(具有相似分布距离的无监督人脸图像质量评估)
paper

风格迁移(Style Transfer)

[6] Style-Aware Normalized Loss for Improving Arbitrary Style Transfer(一种改进任意风格转换的风格感知归一化损失算法)
paper

[5] Instagram Filter Removal on Fashionable Images(删除时尚图片上的Instagram滤镜)
paper

[4] Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer(起草和修订：拉普拉斯金字塔网络，用于快速高质量的艺术风格转移)
paper | code

[3] Rethinking and Improving the Robustness of Image Style Transfer(重新思考和改善图像风格迁移的鲁棒性)
paper

[2] ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows(通过可逆神经流进行无偏的图像风格迁移)
paper

[1] Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes(重新考虑风格迁移：从像素到参数化笔触)
paper

人脸(Face)

[6] Continuous Face Aging via Self-estimated Residual Age Embedding(通过自我估计的残差年龄嵌入来实现连续的面部老化)
paper

[5] Towards High Fidelity Face Relighting with Realistic Shadows(逼真的阴影逼真的高保真面部)
paper

[4] Unsupervised Disentanglement of Linear-Encoded Facial Semantics(线性编码的面部语义的无监督解缠)
paper

[3] High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation(通过深度照明自适应实现AR / VR的高保真人脸跟踪)
paper | project

[2] Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes(具有10^7个节点的大规模图上的结构感知人脸聚类)
paper | code&project

[1] SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance(具有相似分布距离的无监督人脸图像质量评估)
paper

人脸识别/检测(Facial Recognition/Detection)

[11] Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition(特征分解与重构学习对有效的面部表情识别)
paper

[10] FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems(FACESEC：用于人脸识别系统的细粒度鲁棒性评估框架)
paper

[9] IronMask: Modular Architecture for Protecting Deep Face Template(用于保护深脸模板的模块化体系结构)
paper

[8] HLA-Face: Joint High-Low Adaptation for Low Light Face Detection(用于低光人脸检测的联合高低适应)
paper | project

[7] Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition(潜入歧义：面部表情识别的潜在分布挖掘和成对不确定性估计)
paper

[6] Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition(情感过程：情感和面部表情识别的时态随机模型)
paper

[5] Cross-Domain Similarity Learning for Face Recognition in Unseen Domains(跨域相似性学习在未知领域中的人脸识别)
paper

[4] MagFace: A Universal Representation for Face Recognition and Quality Assessment(MagFace：人脸识别和质量评估的通用表示形式)
paper | code

[3] CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement(用于模型不可知的面部检测细化的置信度排名)
paper

[2] A 3D GAN for Improved Large-pose Facial Recognition(用于改善大姿势面部识别的3D GAN)
paper

[1] WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition(揭示了百万级深度人脸识别力量的基准测试)
paper | benchmark

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[12] Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation(基于隐式模块化视听表示的姿态可控人脸生成)
paper | code

[11] Audio-Driven Emotional Video Portraits(音频情感视频肖像)
paper

[10] Pixel Codec Avatars(像素编解码器头像)
paper

[9] Riggable 3D Face Reconstruction via In-Network Optimization(通过网络内优化进行可操纵的3D人脸重建)
paper | code

[8] Everything's Talkin': Pareidolia Face Reenactment(一切都在说话'：帕累多利亚脸部重现)
paper | project

[7] High-Fidelity and Arbitrary Face Editing(高保真和任意脸部编辑)
paper

[6] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)
paper | project

[5] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis(进行全面伪造分析的多功能基准)
paper | code

[4] Image-to-image Translation via Hierarchical Style Disentanglement(通过分层样式分解实现图像到图像的翻译)
paper | code

[3] When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework(当年龄不变的人脸识别遇到人脸年龄合成时：一个多任务学习框架)
paper | code

[2] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[1] Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders(分析和改进自省变分自动编码器)
paper | code | project

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[6] Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features(通过精确的几何特征提高假脸检测的效率和鲁棒性)
paper

[5] Face Forensics in the Wild(人脸伪造数据集)
paper | code

[4] Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection(【人脸伪造检测】由单中心损失监督的频率感知判别特征学习，用于人脸伪造检测)
paper

[3] MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes(面罩引导的检测和重建，以防御深造假)
paper

[2] Cross Modal Focal Loss for RGBD Face Anti-Spoofing(跨模态焦点损失，用于RGBD人脸反欺骗) paper

[1] Multi-attentional Deepfake Detection(多注意的Deepfake检测)
paper

目标跟踪(Object Tracking)

[17] LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search(LightTrack：通过一站式架构搜索找到用于跟踪对象的轻型神经网络)
paper ｜ code

[16] Multiple Object Tracking with Correlation Learning(相关学习的多目标跟踪)
paper

[15] Learning to Track Instances without Video Annotations(学习在没有视频注释的情况下跟踪实例)
paper

[14] STMTrack: Template-free Visual Tracking with Space-time Memory Networks(具有时空存储网络的无模板视觉跟踪)
paper | code

[13] Online Multiple Object Tracking with Cross-Task Synergy(具有跨任务协同作用的在线多对象跟踪)
paper | code

[12] Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark(使用自然语言实现更灵活，准确的对象跟踪：算法和基准)
paper

[11] Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking(可学习的图匹配：将图分区与深度特征学习相结合以实现多对象跟踪)
paper | code

[10] IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking(IoU攻击：针对视觉对象跟踪的临时相干黑盒对抗攻击)
paper | code

[9] Transformer Tracking(Transformer跟踪)
paper | code

[8] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking(Transformer与追踪器相遇：利用时间上下文进行可靠的视觉追踪)
paper

[7] Track to Detect and Segment: An Online Multi-Object Tracker(跟踪检测和分段：在线多目标跟踪器)
paper | code

[6] Learning a Proposal Classifier for Multiple Object Tracking(用于多对象跟踪的分类器)
paper | code

[5] Center-based 3D Object Detection and Tracking(基于中心的3D目标检测和跟踪)
paper | code

[4] HPS: localizing and tracking people in large 3D scenes from wearable sensors(通过可穿戴式传感器对大型3D场景中的人进行定位和跟踪)

[3] Track to Detect and Segment: An Online Multi-Object Tracker(跟踪检测和分段：在线多对象跟踪器)
project | video

[2] Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking(多目标跟踪的概率小波计分和修复)
paper

[1] Rotation Equivariant Siamese Networks for Tracking（旋转等距连体网络进行跟踪）
paper

图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

[9] 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition(2D还是2D？自适应3D卷积选择以实现有效的视频识别)
paper

[8] FrameExit: Conditional Early Exiting for Efficient Video Recognition(【视频理解】帧退出：有条件提前退出以实现有效的视频识别)
paper

[7] T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval(T2VLAD：用于文本视频检索的全局局部序列比对)
paper

[6] Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers(快速思考和缓慢思考：使用变压器进行高效的文本到视觉检索)
paper

[5] StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval(StyleMeUp：迈向与风格无关的基于草图的图像检索)
paper

[4] More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval(您只需要更多照片：基于半监督学习的细粒度基于草图的图像检索)
paper | code

[3] Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning(使用分层Transformer和自我监督学习改进跨模态食谱检索)
paper

[2] On Semantic Similarity in Video Retrieval(视频检索中的语义相似度)
paper ｜ code

[1] QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval(实用的查询高效的图像检索黑盒攻击)
paper

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[22] Home Action Genome: Cooperative Compositional Action Understanding(家庭行动基因组：合作组成行动的理解)
paper

[21] Weakly Supervised Action Selection Learning in Video(视频中的弱监督动作选择学习)
paper | code

[20] Global2Local: Efficient Structure Search for Video Action Segmentation()
paper | code

[19] Self-Supervised Learning for Semi-Supervised Temporal Action Proposal(自我监督学习的半监督时间行动建议)
paper

[18] Anchor-Constrained Viterbi for Set-Supervised Action Segmentation(锚约束维特比用于集合监督的动作分割)
paper

[17] Action Shuffle Alternating Learning for Unsupervised Action Segmentation(动作洗牌交替学习，实现无监督动作分割)
paper

[16] Self-supervised Motion Learning from Static Images(从静态图像进行自我监督的运动学习)
paper

[15] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning(带有片段对比学习的弱监督实时动作定位) paper

[14] Recognizing Actions in Videos from Unseen Viewpoints(从看不见的角度识别视频中的动作)
paper

[13] No frame left behind: Full Video Action Recognition(没有残影：完整的视频动作识别)
paper

[12] Learning Salient Boundary Feature for Anchor-free Temporal Action Localization(学习显着边界特征以实现无锚时间动作定位)
paper | code

[11] Temporal Context Aggregation Network for Temporal Action Proposal Refinement(时间上下文聚合网络，用于改进时间行动建议)
paper

[10] The Blessings of Unlabeled Background in Untrimmed Videos(未修饰视频中未标记背景的祝福)
paper

[9] Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation(临时加权层次聚类，实现无监督动作分割)
paper | code

[8] Coarse-Fine Networks for Temporal Activity Detection in Videos(粗细网络，用于视频中的时间活动检测)
paper

[7] Learning Discriminative Prototypes with Dynamic Time Warping(通过动态时间扭曲学习判别性原型)
paper

[6] Temporal Action Segmentation from Timestamp Supervision(时间监督中的时间动作分割)
paper

[5] ACTION-Net: Multipath Excitation for Action Recognition(用于动作识别的多路径激励)
paper ｜ code

[4] BASAR:Black-box Attack on Skeletal Action Recognition(骨骼动作识别的黑匣子攻击)
paper

[3] Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack(了解对抗攻击下基于骨骼的动作识别的鲁棒性)
paper

[2] Temporal Difference Networks for Efficient Action Recognition(用于有效动作识别的时差网络)
paper | code

[1] Behavior-Driven Synthesis of Human Dynamics(行为驱动的人类动力学综合)
paper | code<>

行人重识别/检测(Re-Identification/Detection)

[11] BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification(BiCnet-TKS：学习有效的时空表示以重新识别视频人)
paper

[10] Unsupervised Multi-Source Domain Adaptation for Person Re-Identification(用于行人重新识别的无监督多源域适配)
paper

[9] Combined Depth Space based Architecture Search For Person Re-identification(基于组合深度空间的架构搜索以进行行人重识别)
paper

[8] Neural Feature Search for RGB-Infrared Person Re-Identification(神经特征搜索以重新识别RGB红外人)
paper

[7] Group-aware Label Transfer for Domain Adaptive Person Re-identification(组感知标签传输，用于域自适应行人重识别)
paper

[6] Lifelong Person Re-Identification via Adaptive Knowledge Accumulation(通过自适应知识积累对终身行人重识别)
paper

[5] Anchor-Free Person Search(Anchor-Free行人搜索)
paper | code

[4] Intra-Inter Camera Similarity for Unsupervised Person Re-Identification(摄像机内部相似度用于无监督人员重新识别)
paper

[3] Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification(基于视频的人员重新识别的全球指导对等学习)
paper

[2] Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification(联合抗噪学习和元相机移位自适应，用于无监督人员的重新识别)
paper

[1] Meta Batch-Instance Normalization for Generalizable Person Re-Identification(通用批处理人员重新标识的元批实例规范化)
paper

图像/视频字幕(Image/Video Caption)

[7] Towards Accurate Text-based Image Captioning with Content Diversity Exploration(借助内容多样性探索实现精确的基于文本的图像字幕)
paper

[6] Human-like Controllable Image Captioning with Verb-specific Semantic Roles(具有动词特定语义作用的类人可控图像字幕)
paper | code

[5] Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos(语义注意的共同接地网络，用于引用视频中的表达理解)
paper | project

[4] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles(多实例字幕：从组织病理学教科书和文章中学习表示形式)
paper

[3] Open-book Video Captioning with Retrieve-Copy-Generate Network(带有检索复制生成网络的开卷视频字幕)
paper

[2] VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs(基于视频的文本生成的端到端学习来自多模式输入)
paper

[1] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans(：RGB-D扫描中的上下文感知密集字幕) paper | code | project | video

医学影像(Medical Imaging)

[13] Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation(每种注释都至关重要：医学图像分割的多标签深度监管)
paper

[12] DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation(DiNTS：用于3D医学图像分割的可区分神经网络拓扑搜索)
paper

[11] Confluent Vessel Trees with Accurate Bifurcations(分叉的融合容器树) paper

[10] Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net(无监督多元规范CSCℓ4Net的脑图像合成)
paper

[9] XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations(使用全局和局部解释诊断胸部X光片)
paper

[8] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space(在连续频率空间中通过情景学习进行医学图像分割的联合域泛化)
paper | code

[7] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles(多实例字幕：从组织病理学教科书和文章中学习表示形式)
paper

[6] Discovering Hidden Physics Behind Transport Dynamics(在运输动力学背后发现隐藏物理)
paper

[5] DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images(一种心脏标记磁共振图像运动跟踪的无监督深度学习方法)
paper

[4] Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning(多机构协作改进基于深度学习的联合学习磁共振图像重建)
paper | code

[3] 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management(用于胰腺肿块分割，诊断和定量患者管理的3D图形解剖学几何集成网络)

[2] Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies(深部病变追踪器：在4D纵向成像研究中监控病变)
paper

[1] Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization(通过脊柱矫正和解剖学约束优化在CT中自动进行椎骨定位和识别)
paper

文本检测/识别(Text Detection/Recognition)

[6] Fourier Contour Embedding for Arbitrary-Shaped Text Detection(基于Fourier轮廓嵌入的任意形状文本检测)
paper

[5] Scene Text Retrieval via Joint Text Detection and Similarity Learning(通过联合文本检测和相似性学习检索场景文本)
paper | code

[4] MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition(迈向写作者自适应的手写文本识别)
paper

[3] MOST: A Multi-Oriented Scene Text Detector with Localization Refinement(具有本地化优化功能的多方位场景文本检测器)
paper

[2] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition(像人类一样阅读：用于场景文本识别的自主，双向和迭代语言建模)
paper | code

[1] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels(如果我们仅将真实数据集用于场景文本识别该怎么办？带有较少标签的场景文本识别)
paepr | code

遥感图像(Remote Sensing Image)

[3] SIPSA-Net: Shift-Invariant Pan Sharpening with Moving Object Alignment for Satellite Imagery(SIPSA-Net：带有移动目标对准的卫星图像平移不变锐化)
paper

[2] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)
paper

[1] Deep Gradient Projection Networks for Pan-sharpening(【超分辨率】泛锐化的深梯度投影网络)
paper | code

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[25] Continuous Face Aging via Self-estimated Residual Age Embedding(通过自我估计的残差年龄嵌入来实现连续的面部老化)
paper

[24] Unsupervised 3D Shape Completion through GAN Inversion(通过GAN反演实现无监督3D形状补全)
paper | project

[23] Delving into Data: Effectively Substitute Training for Black-box Attack(深入研究数据：有效替代黑盒攻击的培训)
paper

[22] LAFEAT: Piercing Through Adversarial Defenses with Latent Features(LAFEAT：通过具有潜在功能的对抗性防御突围)
paper

[21] Surrogate Gradient Field for Latent Space Manipulation(潜在空间操纵的替代梯度场)
paper

[20] DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort(DatasetGAN：只需最少的人力即可获得的高效标签数据工厂)
paper

[19] Regularizing Generative Adversarial Networks under Limited Data(在有限数据下对生成性对抗网络进行正则化)
paper | project | code

[18] Content-Aware GAN Compression(内容感知GAN压缩)
paper

[17] Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer(口红还不够：超出配色范围的野外化妆效果)
paper | code

[16] LiBRe: A Practical Bayesian Approach to Adversarial Detection(LiBRe：对抗性检测的实用贝叶斯方法)
paper

[15] DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network(通过对比生成对抗网络进行多种条件图像合成)
paper

[14] Diverse Semantic Image Synthesis via Probability Distribution Modeling(基于概率分布建模的多种语义图像合成)
paper | code

[13] HumanGAN: A Generative Model of Humans Images(人类图像的生成模型)
paper

[12] MetaSimulator: Simulating Unknown Target Models for Query-Efficient Black-box Attacks(模拟未知目标模型以提高查询效率的黑盒攻击)
paper | code

[11] Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders(分析和改进自省变分自动编码器)
paper | code | project

[10] LOHO: Latent Optimization of Hairstyles via Orthogonalization(LOHO：通过正交化潜在地优化发型)
paper

[9] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[8] Closed-Form Factorization of Latent Semantics in GANs(GAN中潜在语义的闭式分解)
paper | code

[7] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(用于图像修复的概率多样GAN)

[6] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)
paper | code

[5] Efficient Conditional GAN Transfer with Knowledge Propagation across Classes(高效的有条件GAN转移以及跨课程的知识传播)
paper | code

[4] Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing（利用GAN中潜在的空间维度进行实时图像编辑）

[3] Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs(Hijack-GAN：意外使用经过预训练的黑匣子GAN)
paper

[2] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation(样式编码：用于图像到图像翻译的StyleGAN编码器)
paper | code | project

[1] A 3D GAN for Improved Large-pose Facial Recognition(用于改善大姿势面部识别的3D GAN)
paper

图像生成/图像合成(Image Generation/Image Synthesis)

[23] GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving(GeoSim：通过可自动驾驶的几何感知合成进行逼真的视频模拟)
paper

[22] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields(GIRAFFE：将场景表示为合成的生成神经特征场)
paper | project

[21] Ensembling with Deep Generative Views(融入深刻的生成观点)
paper | code

[20] StylePeople: A Generative Model of Fullbody Human Avatars(StylePeople：全身人类化身的生成模型)
paper | code

[19] See through Gradients: Image Batch Recovery via GradInversion(透视渐变：通过GradInversion恢复图像批处理)
paper

[18] StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis(StEP：用于多模式图像合成的基于样式的编码器预训练)
paper

[17] Few-shot Image Generation via Cross-domain Correspondence(通过跨域对应小样本图像生成)
paper

[16] IMAGINE: Image Synthesis by Image-Guided Model Inversion(想象：通过图像指导模型反演的图像合成)
paper

[15] Variational Transformer Networks for Layout Generation(用于布局生成的变电站网络)
paper

[14] VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization(通过未对准感知的归一化进行高分辨率的虚拟试戴)
paper

[13] A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection(仔细研究CNN生成图像检测的傅立叶光谱差异)
paper | code

[12] Semi-supervised Synthesis of High-Resolution Editable Textures for 3D Humans(用于3D人类的高分辨率可编辑纹理的半监督合成)
paper

[11] Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling(个性化几何和纹理建模的少量人体运动传递)
paper | code

[10] Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net(无监督多元规范CSCℓ4Net的脑图像合成)
paper

[9] Context-Aware Layout to Image Generation with Enhanced Object Appearance(具有增强的对象外观的上下文感知布局到图像生成)
paper

[8] DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network(通过对比生成对抗网络进行多种条件图像合成)
paper

[7] HumanGAN: A Generative Model of Humans Images(人类图像的生成模型)
paper

[6] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[5] SMPLicit: Topology-aware Generative Model for Clothed People(穿衣服的人的拓扑感知生成模型)
paper | code

[4] Diversifying Sample Generation for Data-Free Quantization（多样化的样本生成，实现无数据量化）
paper

[3] Diverse Semantic Image Synthesis via Probability Distribution Modeling(基于概率分布建模的多种语义图像合成)
paper | code

[2] When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework(当年龄不变的人脸识别遇到人脸年龄合成时：一个多任务学习框架)
paper | code

[1] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)
paper | code

视图合成(View Synthesis)

[6] Stable View Synthesis(稳定的视图合成)
paper | code

[5] Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes(立体辐射场（SRF）：学习新颖的场景的稀疏视图的视图合成)
paper | project

[4] Layout-Guided Novel View Synthesis from a Single Indoor Panorama(单一室内全景的布局引导式新颖视图合成)
paper | project

[3] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)
paper | code

[2] ID-Unet: Iterative Soft and Hard Deformation for View Synthesis(视图合成的迭代软硬变形)
paper

[1] Self-Supervised Visibility Learning for Novel View Synthesis(自我监督的可视性学习，用于新颖的视图合成)
paper

三维视觉(3D Vision)

[6] Learning Feature Aggregation for Deep 3D Morphable Models(深度3D可变形模型的学习特征聚合)
paper

[5] Deep Polarization Imaging for 3D shape and SVBRDF Acquisition(用于3D形状和SVBRDF采集的深偏振成像)
paper

[4] Unsupervised 3D Shape Completion through GAN Inversion(通过GAN反演实现无监督3D形状补全)
paper | project

[3] KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control(【3D关键点】关键点变形器：用于形状控制的无监督三维关键点发现)
paper | project

[2] A Deep Emulator for Secondary Motion of 3D Characters(三维角色二次运动的深度仿真器) paper

[1] 3D CNNs with Adaptive Temporal Feature Resolutions(具有自适应时间特征分辨率的3D CNN)
paper

点云(Point Cloud)

[26] VoxelContext-Net: An Octree based Framework for Point Cloud Compression(VoxelContext-Net：基于Octree的点云压缩框架)
paper

[25] Variational Relational Point Completion Network(变分关系点完备网络)
paper | project

[24] SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements(规模：使用关节局部元素的表面编解码器模拟穿衣服的人)
paper | code

[23] RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2D-Tree Representation(RPSRNet：使用Barnes-Hut二维树表示法的端到端可训练刚性点集配准网络)
paper

[22] View-Guided Point Cloud Completion(视图引导的点云完成)
paper

[21] DeepI2P: Image-to-Point Cloud Registration via Deep Classification(通过深度分类的图像到点云配准)
paper | code

[20] FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds(FESTA：场景点云通过时空注意进行光流估计)
paper

[19] Denoise and Contrast for Category Agnostic Shape Completion(类别不可知形状完成的消噪和对比度)
paper

[18] Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation(无提案的LiDAR点云全景分割)
paper

[17] ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning(ReAgent：使用模仿和强化学习进行点云配准)
paper

[16] Equivariant Point Network for 3D Point Cloud Analysis(等变点网络进行3D点云分析)
paper

[15] PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds(PAConv：点云上具有动态内核组装的位置自适应卷积)
paper | code

[14] Skeleton Merger: an Unsupervised Aligned Keypoint Detector(骨架合并：无监督的对准关键点检测器)
paper | code

[13] Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding(使用缺失区域编码的循环变换完成不成对的点云)
paper

[12] Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion(通过双边扩充和自适应融合对实点云场景进行语义分割)
paper

[11] How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines(线云如何保护隐私？从3D线中恢复场景详细信息)
paper | code

[10] PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency(使用深度空间一致性进行稳健的点云配准)
paper | code

[9] Robust Point Cloud Registration Framework Based on Deep Graph Matching(基于深度图匹配的鲁棒点云配准框架)
paper | code

[8] TPCN: Temporal Point Cloud Networks for Motion Forecasting(面向运动预测的时态点云网络) paper | code

[7] PointGuard: Provably Robust 3D Point Cloud Classification(可证明稳健的三维点云分类)
paper

[6] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)
paper | code

[5] SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration(SpinNet：学习用于3D点云配准的通用表面描述符)
paper | code

[4] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)
paper | code

[3] Diffusion Probabilistic Models for 3D Point Cloud Generation(三维点云生成的扩散概率模型)
paper | code

[2] Style-based Point Generator with Adversarial Rendering for Point Cloud Completion(用于点云补全的对抗性渲染基于样式的点生成器)
paper

[1] PREDATOR: Registration of 3D Point Clouds with Low Overlap(预测器：低重叠的3D点云的配准)
paper | code | project

三维重建(3D Reconstruction)

[18] LASR: Learning Articulated Shape Reconstruction from a Monocular Video(LASR：从单眼视频中学习关节形状的重建)
paper | code

[17] Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images(重访长方体：学习适合单个RGB图像的稳健3D形状)
paper

[16] Multi-person Implicit Reconstruction from a Single Image(从单个图像进行多人隐式重建)
paper

[15] CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo(CodedStereo：为大景深立体声而设计的相位掩模)
paper

[14] StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision(通过立体视觉进行深度感知的布景人类数字化)
paper | projec

[13] Global Transport for Fluid Reconstruction with Learned Self-Supervision(具有自学指导的流体重建的全球运输)
paper | code

[12] Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction(全面了解通用对象：建模，分段和重构)
paper

[11] Reconstructing 3D Human Pose by Watching Humans in the Mirror(通过照镜子中的人来重建3D人的姿势)
paper | project

[10] Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors(通过学习局部和全局形状先验的层次结构，促进单视图3D重构中的泛化)
paper

[9] NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video(单目视频的实时相干3D重建)
paper | project

[8] Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction(从时空描述符中学习并行密集对应，以进行有效且鲁棒的4D重建)
paper | code

[7] POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture(用于单视图人体体积捕获的姿势引导选择性融合)
paper | project

[6] Deep Implicit Moving Least-Squares Functions for 3D Reconstruction(用于3D重构的深层隐式移动最小二乘函数)
paper | code

[5] Model-based 3D Hand Reconstruction via Self-Supervised Learning(通过自我监督学习进行基于模型的3D手重建)
paper

[4] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)
paper | project

[3] Learning Compositional Representation for 4D Captures with Neural ODE(使用神经ODE学习4D捕捉的合成表示)
paper

[2] SMPLicit: Topology-aware Generative Model for Clothed People(穿衣服的人的拓扑感知生成模型)
paper | code

[1] PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers（具有透视作物层的3D姿势的几何感知神经重建）
paper

模型压缩(Model Compression)

[4] Skip-Convolutions for Efficient Video Processing(https://arxiv.org/abs/2104.11487)
paper

[3] Content-Aware GAN Compression(内容感知GAN压缩)
paper

[2] Dynamic Slimmable Network(动态可压缩网络)
paper | code

[1] Learning Student Networks in the Wild（一种不需要原始训练数据的模型压缩和加速技术）
paper | code
解读：华为诺亚方舟实验室提出无需数据网络压缩技术

知识蒸馏(Knowledge Distillation)

[11] Distilling Knowledge via Knowledge Review(通过知识回顾提炼知识)
paper | code

[10] 3D-to-2D Distillation for Indoor Scene Parsing(用于室内场景解析的3D到2D蒸馏)
paper

[9] Complementary Relation Contrastive Distillation(互补关系对比蒸馏)
paper

[8] Distilling Object Detectors via Decoupled Features(通过解耦功能蒸馏物体检测器)
paper | code

[7] Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation(通过自学来完善自己：通过自我蒸馏提炼特征)
paper | code

[6] Knowledge Evolution in Neural Networks(神经网络中的知识进化)
paper | code

[5] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning(少班级增量学习的语义感知知识蒸馏)
paper

[4] Teachers Do More Than Teach: Compressing Image-to-Image Models(https://arxiv.org/abs/2103.03467)
paper | code

[3] General Instance Distillation for Object Detection(通用实例蒸馏技术在目标检测中的应用)
paper

[2] Multiresolution Knowledge Distillation for Anomaly Detection(用于异常检测的多分辨率知识蒸馏)
paper

[1] Distilling Object Detectors via Decoupled Features（前景背景分离的蒸馏技术）

剪枝(Pruning)

[3] Convolutional Neural Network Pruning with Structural Redundancy Reduction(减少结构冗余的卷积神经网络修剪)
paper

[2] Neural Response Interpretation through the Lens of Critical Pathways(关键途径对神经反应的解释)
paper | code1 | code2

[1] Manifold Regularized Dynamic Network Pruning(流形规则化动态网络剪枝)
paper

量化(Quantization)

[3] Network Quantization with Element-wise Gradient Scaling(逐元素梯度缩放的网络量化)
paper

[2] Zero-shot Adversarial Quantization(零样本对抗量化)
paper | code

[1] Learnable Companding Quantization for Accurate Low-bit Neural Networks(精确低位神经网络的可学习压扩量化)
paper

神经网络结构设计(Neural Network Structure Design)

[14] Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation(用于自适应、高效和可控计算的异构网格卷积)
paper

[13] AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks(AsymmNet：利用不对称瓶颈迈向超轻型卷积神经网络)
paper | code

[12] CondenseNet V2: Sparse Feature Reactivation for Deep Networks(CondenseNet V2：深度网络的稀疏功能重新激活)
paper

[11] Convolutional Hough Matching Networks(卷积霍夫匹配网络)
paper

[10] Capsule Network is Not More Robust than Convolutional Network(胶囊网络并不比卷积网络更健壮)
paper

[9] Diverse Branch Block: Building a Convolution as an Inception-like Unit(多元分支块：将卷积构建为类似初始的单位)
paper | code

[8] Scaling Local Self-Attention For Parameter Efficient Visual Backbones(扩展局部自注意力以获得有效的参数视觉主干)
paper

[7] Fast and Accurate Model Scaling(快速准确的模型缩放)
paper

[6] Involution: Inverting the Inherence of Convolution for Visual Recognition(反转卷积的固有性以进行视觉识别)
paper | code

[5] Inception Convolution with Efficient Dilation Search(具有有效膨胀搜索的初始卷积)
paper | code | 解读-Inception convolution

[4] Coordinate Attention for Efficient Mobile Network Design(协调注意力以实现高效的移动网络设计)
paper

[3] Rethinking Channel Dimensions for Efficient Model Design(重新考虑通道尺寸以进行有效的模型设计)
paper | code

[2] Inverting the Inherence of Convolution for Visual Recognition（颠倒卷积的固有性以进行视觉识别）

[1] RepVGG: Making VGG-style ConvNets Great Again
paper | code
解读：RepVGG：极简架构，SOTA性能，让VGG式模型再次伟大

Transformer

[2] Transformer Interpretability Beyond Attention Visualization(注意力可视化之外的Transformer可解释性)
paper | code

[1] Pre-Trained Image Processing Transformer(底层视觉预训练模型)
paper | 解读-Transformer再下一城！low-level多个任务榜首被占领，北大华为等联合提出预训练模型IPT

图神经网络(GNN)

[3] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts(窥探神经网络的推理：解读结构视觉概念)
paper

[2] Quantifying Explainers of Graph Neural Networks in Computational Pathology(计算病理学中图神经网络的量化解释器)
paper

[1] Sequential Graph Convolutional Network for Active Learning(主动学习的顺序图卷积网络)
paper

神经网络架构搜索(NAS)

[11] Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search(具有里程碑意义的正则化：神经体系结构搜索中的排名指导超级网络培训)
paper

[10] NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization(具有快速超级网络培训和架构优化的高效神经架构搜索)
paper | project

[9] One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking(通过分流引导的搜索空间缩小实现一站式神经集成结构搜索)
paper | code

[8] Dynamic Slimmable Network(动态可压缩网络)
paper | code

[7] Prioritized Architecture Sampling with Monto-Carlo Tree Search(蒙特卡洛树搜索的优先架构采样)
paper | code

[6] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator(通过生成进行搜索：带有架构生成器的灵活高效的一键式NAS)
paper | code

[5] Contrastive Neural Architecture Search with Neural Architecture Comparators(带有神经结构比较器的对比神经网络架构搜索)
paper | code

[4] OPANAS: One-Shot Path Aggregation Network Architecture Search for Object(一键式路径聚合网络体系结构搜索对象)
paper | code

[3] AttentiveNAS: Improving Neural Architecture Search via Attentive(通过注意力改善神经架构搜索)
paper

[2] ReNAS: Relativistic Evaluation of Neural Architecture Search(NAS predictor当中ranking loss的重要性)
paper

[1] HourNAS: Extremely Fast Neural Architecture（降低NAS的成本）
paper

数据处理(Data Processing)

数据增广(Data Augmentation)

[2] AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation(通过可扩展的概率隐式微分对带有标签噪声的有偏数据进行鲁棒的自动增强)
paper

[1] KeepAugment: A Simple Information-Preserving Data Augmentation(一种简单的保存信息的数据扩充)
paper

表征学习(Representation Learning)

[15] Representation Learning via Global Temporal Alignment and Cycle-Consistency(通过全局时间对齐和周期一致性进行表示学习)
paper

[14] Multi-Perspective LSTM for Joint Visual Representation Learning(用于联合视觉表示学习的多视角LSTM)
paper | code

[13] Unsupervised Visual Representation Learning by Tracking Patches in Video(通过跟踪视频中的补丁来进行无监督的视觉表示学习)
paper | code

[12] A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning(无监督时空表示学习的大规模研究)
paper

[11] Where and What? Examining Interpretable Disentangled Representations(在哪里和什么？检查可解释的纠缠表示)
paper

[10] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning(眼见为实：视觉语言表示学习的端到端预训练)
paper

[9] Self-supervised Video Representation Learning by Context and Motion Decoupling(通过上下文和运动解耦进行自我监督的视频表示学习)
paper

[8] Jigsaw Clustering for Unsupervised Visual Representation Learning(拼图聚类的无监督视觉表示学习)
paper | code

[7] Learning by Aligning Videos in Time(【视频表征】通过时间对齐视频进行学习)
paper

[6] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting(矢量化和光栅化：素描和手写的自我指导学习)
paper | code

[5] Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks(神经零件：使用可逆神经网络学习富有表现力的3D形状提取)
paper

[4] VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples(对比视频表示学习和临时对抗示例)
paper

[3] Spatially Consistent Representation Learning(空间一致表示学习)
paper

[2] Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning(通过添加背景来删除背景：朝着背景进行鲁棒的自我监督视频表示学习)
paper | code | project | 解读

[1] VirTex: Learning Visual Representations from Textual Annotations（从文本注释中学习视觉表示）
paper | code

归一化/正则化(Batch Normalization)

[3] Adaptive Consistency Regularization for Semi-Supervised Transfer Learning(半监督转移学习的自适应一致性正则化)
paper | code

[2] Meta Batch-Instance Normalization for Generalizable Person Re-Identification(通用批处理人员重新标识的元批实例规范化)
paper

[1] Representative Batch Normalization with Feature Calibration（具有特征校准功能的代表性批量归一化）

图像聚类(Image Clustering)

[4] Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes(具有10^7个节点的大规模图上的结构感知人脸聚类)
paper | code&project

[3] COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction(通过对比预测的不完整多视图聚类)
paper | code

[2] Improving Unsupervised Image Clustering With Robust Learning（通过鲁棒学习改善无监督图像聚类）
paper | code

[1] Reconsidering Representation Alignment for Multi-view Clustering(重新考虑多视图聚类的表示对齐方式)
paper | code

图像压缩(Image Compression)

[4] Learning Scalable ℓ∞-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression(通过联合有损图像和残差压缩学习可伸缩ℓ∞约束的近无损图像压缩)
paper | code

[3] Checkerboard Context Model for Efficient Learned Image Compression(高效学习图像压缩的棋盘上下文模型)
paper

[2] Slimmable Compressive Autoencoders for Practical Neural Image Compression(实用神经图像压缩的可压缩压缩自动编码器)
paper

[1] Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton(通过压缩感知显着性骨架的深度重构来进行注意力引导的图像压缩)
paper

异常检测(Anomaly Detection)

[5] MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space(MOS：面向大型语义空间的规模化异常样本检测)
paper

[4] MOOD: Multi-level Out-of-distribution Detection(MOOD：多级异常样本检测)
paper

[3] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization(CutPaste：用于异常检测和定位的自我监督学习)
paper

[2] MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection(用于视频异常检测的多实例自训练框架)
paper

[1] Learning Placeholders for Open-Set Recognition(学习占位符以进行开放式识别)
paper

模型训练/泛化(Model Training/Generalization)

[8] A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks(【优化算法】Bop和超越：二值神经网络的二阶优化器)
paper

[7] Simpler Certified Radius Maximization by Propagating Covariances(通过传播协方差简化认证半径最大化)
paper | video

[6] Differentiable Patch Selection for Image Recognition(用于图像识别的差异化补丁选择)
paper | code

[5] Towards Evaluating and Training Verifiably Robust Neural Networks(评估和训练可验证的稳健神经网络)
paper | code

[4] Student-Teacher Learning from Clean Inputs to Noisy Inputs(从纯净输入到噪音输入的师生学习)
paper

[3] Uncertainty-guided Model Generalization to Unseen Domains(不确定性指导的模型泛化)
paper

[2] Knowledge Evolution in Neural Networks(神经网络中的知识进化)
paper | code

[1] PGT: A Progressive Method for Training Models on Long Videos(一种在长视频上训练模型的渐进方法)
paper | code

噪声标签(Noisy Label)

[1] Partially View-aligned Representation Learning with Noise-robust Contrastive Loss(面向部分视图对齐表示学习的噪声鲁棒对比损失函数)
paper | code

长尾分布(Long-Tailed Distribution)

[7] Adversarial Robustness under Long-Tailed Distribution(长尾分布下的对抗鲁棒性)
paper | code

[6] Adaptive Class Suppression Loss for Long-Tail Object Detection(长尾目标检测的自适应类抑制损失)
paper | code

[5] Improving Calibration for Long-Tailed Recognition(改善长尾识别的校准)
paper | code

[4] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification(基于对比学习的混合网络的长尾图像分类)
paper

[3] PML: Progressive Margin Loss for Long-tailed Age Classification(长尾年龄分类的累进边际损失)
paper

[2] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition(MetaSAug：用于长尾视觉识别的元语义增强)
paper

[1] Distribution Alignment: A Unified Framework for Long-tail Visual Recognition(分布对齐：长尾视觉识别的统一框架)
paper | code

模型评估(Model Evaluation)

[1] Are Labels Necessary for Classifier Accuracy Evaluation?(测试集没有标签，我们可以拿来测试模型吗？)
paper | 解读

多模态学习(Multi-Modal Learning)

[8] Distilling Audio-Visual Knowledge by Compositional Contrastive Learning(运用组合对比学习提取视听知识)
paper | code

[7] Cross-Modal Center Loss for 3D Cross-Modal Retrieval(用于3D跨模态检索的跨模态中心损失)
paper | code

[6] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)
paper

[5] There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge(多模态知识提取的自监督多目标检测与有声跟踪)
paper | video | project

[4] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)
paper

[3] LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents(动态代理的多模态未来轨迹的车道感知预测)
paper

[2] Multimodal Motion Prediction with Stacked Transformers(堆叠式Transformer的多模态运动预测)
paper | code

[1] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving(用于端到端自动驾驶的多模态融合Transformer)
paper

视听学习(Audio-visual Learning)

[7] Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions(语音时刻：从视频描述中学习联合视听表示)
paper

[6] Visually Informed Binaural Audio Generation without Binaural Audios(无需双耳音频的可视化双耳音频生成)
paper | project

[5] Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation(观察您的语音：学习跨模态亲和力以进行视听语音分离)
paper | project

[4] Localizing Visual Sounds the Hard Way(视觉声音定位的困难方法)
paper

[3] Can audio-visual integration strengthen robustness under multimodal attacks?(视听集成能否增强多模式攻击下的鲁棒性？)
paper

[2] Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation(探测对象视觉接地与声音分离的循环共同学习)
paper

[1] Positive Sample Propagation along the Audio-Visual Event Line(沿视听事件线的正样本传播)
paper | code

视觉预测(Vision-based Prediction)

[10] Interpretable Social Anchors for Human Trajectory Forecasting in Crowds(人群中人类轨迹预测的可解释社会锚点)
paper

[9] DriveGAN: Towards a Controllable High-Quality Neural Simulation(DriveGAN：迈向可控的高质量神经仿真)
paper

[8] Learning Semantic-Aware Dynamics for Video Prediction(视频预测中的语义感知动态学习)
paper

[7] Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction(车道感知不同轨迹预测的分而治之)
paper

[6] GATSBI: Generative Agent-centric Spatio-temporal Object Interaction(GATSBI：以生成代理为中心的时空对象交互)
paper

[5] SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction(SGCN：行人轨迹预测的稀疏图卷积网络)
paper

[4] LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents(动态代理的多模态未来轨迹的车道感知预测)
paper

[3] Multimodal Motion Prediction with Stacked Transformers(堆叠式Transformer的多模态运动预测)
paper | code

[2] Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning(通过记忆对准学习的视频预测调用长期运动环境)
paper

[1] MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions(针对复杂时空运动的通用视频预测模型)
paper | 解读

数据集(Dataset)

[16] Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark(检测，跟踪和计数遇到人群中的无人机：基准)
paper | dataset&code

[15] AGORA: Avatars in Geography Optimized for Regression Analysis(AGORA：针对回归分析进行了优化的地理头像)
paper | project

[14] Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets(【数据集标注】寻求有效注释大型图像分类数据集的良好做法)
paper | project

[13] Learning To Count Everything(【视觉计数】学习计算一切)
paper | dataset&code

[12] DexYCB: A Benchmark for Capturing Hand Grasping of Objects(DexYCB：捕获对象的手抓握的基准)
paper |dataset&code

[11] The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions(多智能体行为数据集：鼠标二元社交互动)
paper | dataset

[10] Deep Animation Video Interpolation in the Wild(野外深度动画视频插帧)
paper | dataset&code

[9] Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes(在动态场景中实现卷帘快门校正和去模糊)
paper | dataset&code

[8] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles(无人机-人类：了解无人机行为的大型基准)
paper

[7] Visual Semantic Role Labeling for Video Understanding(【视频理解】用于视频理解的视觉语义角色标签)
paper | dataset&code

[6] Face Forensics in the Wild(人脸伪造数据集)
paper | dataset&code

[5] Benchmarking Representation Learning for Natural World Image Collections(【自然图像分类】自然世界影像收藏的基准表示学习)
paper

[4] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark(多标签下水道缺陷分类数据集和基准)
paper | project&dataset

[3] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)
paper | project

[2] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)
paper | code

[1] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels（重新标记ImageNet：从单标签到多标签，从全局标签到本地标签）
paper | code

主动学习(Active Learning)

[3] Vab-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning
paper

[2] Multiple Instance Active Learning for Object Detection（用于对象检测的多实例主动学习）
paper | code

[1] Sequential Graph Convolutional Network for Active Learning(主动学习的顺序图卷积网络)
paper

小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

[10] Learning Graph Embeddings for Compositional Zero-shot Learning(组成零样本学习的学习图嵌入)
paper | code

[9] Self-Guided and Cross-Guided Learning for Few-Shot Segmentation(自我指导和交叉指导学习，用于小样本分割)
paper

[8] Contrastive Embedding for Generalized Zero-Shot Learning(广义零样本学习的对比嵌入)
paper | code

[7] Learning Dynamic Alignment via Meta-filter for Few-shot Learning(通过元过滤器学习动态对齐，以进行小样本学习)
paper

[6] Goal-Oriented Gaze Estimation for Zero-Shot Learning(零样本学习的目标导向注视估计)
paper | code

[5] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?
paper | code

[4] Counterfactual Zero-Shot and Open-Set Visual Recognition(反事实零样本和开集视觉识别)
paper | code

[3] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(小样本目标检测的语义关系推理)
paper

[2] Few-shot Open-set Recognition by Transformation Consistency(转换一致性的小样本开放集识别)

[1] Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning(探索小样本学习的不变表示形式和等变表示形式的互补强度)
paper

持续学习(Continual Learning/Life-long Learning)

[5] Rectification-based Knowledge Retention for Continual Learning(基于矫正的知识保留用于持续学习)
paper

[4] Rainbow Memory: Continual Learning with a Memory of Diverse Samples(彩虹记忆：持续学习与多种样本的记忆)
paper | code

[3] Efficient Feature Transformations for Discriminative and Generative Continual Learning(区分性和生成性持续学习的有效特征转换)
paper

[2] Rainbow Memory: Continual Learning with a Memory of Diverse Samples（不断学习与多样本的记忆）

[1] Learning the Superpixel in a Non-iterative and Lifelong Manner(以非迭代和终身的方式学习超像素)
paper

场景图(Scene Graph)

场景图生成(Scene Graph Generation)

[4] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation(具有自适应消息传递功能的二分图网络，用于无偏场景图的生成)
paper

[3] Fully Convolutional Scene Graph Generation(全卷积场景图生成)
paper

[2] Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation(场景图生成的语义歧义概率建模)
paper

[1] Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis(利用基于边缘的推理进行基于3D点的场景图分析)
paper

场景图预测(Scene Graph Prediction)

[1] SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences(基于RGB-D序列的增量3D场景图预测)
paper

场景图理解(Scene Graph Understanding)

[4] Semantic Scene Completion via Integrating Instances and Scene in-the-Loop(通过集成实例和场景在环来完成语义场景)
paper | code

[3] 3D-to-2D Distillation for Indoor Scene Parsing(用于室内场景解析的3D到2D蒸馏)
paper

[2] Bidirectional Projection Network for Cross Dimension Scene Understanding(双向投影网络，用于跨维度场景理解)
paper | code

[1] Monte Carlo Scene Search for 3D Scene Understanding(蒙特卡洛场景搜索以了解3D场景)
paper

视觉定位(Visual Localization)

[1] LoFTR: Detector-Free Local Feature Matching with Transformers(【图像特征匹配】LoFTR：与变压器互不影响的无检测器局部特征)
paper | project

视觉推理/视觉问答(Visual Reasoning/VQA)

[9] Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules(找到了我的理由？使用胶囊进行弱监督的地面视觉问答)
paper

[8] Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering(通往答案的桥梁：用于视频问答的结构感知图交互网络)
paper

[7] PQA: Perceptual Question Answering(感性问题解答)
paper

[6] Domain-robust VQA with diverse datasets and methods but no target labels(具有各种数据集和方法，但没有目标标签的领域稳健的VQA)
paper

[5] AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning(AGQA：组成时空推理的基准)
paper

[4] Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution(通过概率绑架和执行进行抽象时空推理) paper | project | supplementary

[3] ACRE: Abstract Causal REasoning Beyond Covariation(ACRE：超越协方差的抽象因果推理)
paper | project | Supplementary

[2] TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events(问题解答基准和有效的交通事件视频推理网络)
paper | project

[1] Transformation Driven Visual Reasoning(转型驱动的视觉推理)
paper | code | project

图像分类(Image Classification)

[5] Benchmarking Representation Learning for Natural World Image Collections(自然世界影像收藏的基准表示学习)
paper

[4] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark(多标签下水道缺陷分类数据集和基准)
paper | project&dataset

[3] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification(基于对比学习的混合网络的长尾图像分类)
paper

[2] PML: Progressive Margin Loss for Long-tailed Age Classification(长尾年龄分类的累进边际损失)
paper

[1] A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification(细粒度分类的半监督学习的现实评估)
paper

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[20] Visualizing Adapted Knowledge in Domain Transfer(领域转移中适应性知识的可视化)
paper | code

[19] Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation(基于实例级亲和力的无监督域自适应传输)
paper | code

[18] Unsupervised Multi-source Domain Adaptation Without Access to Source Data(无需访问源数据的无监督多源域适配)
paper

[17] Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation(多目标领域适应的课程图协同教学)
paper

[16] Divergence Optimization for Noisy Universal Domain Adaptation(噪声通用域自适应的发散优化)
paper

[15] Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation(典型的跨域自我监督学习，适用于少拍无监督领域自适应)
paper | project

[14] Progressive Domain Expansion Network for Single Domain Generalization(用于单域泛化的渐进域扩展网络)
paper

[13] Dynamic Domain Adaptation for Efficient Inference(动态域自适应以实现高效推理)
paper

[12] Adaptive Methods for Real-World Domain Generalization(真实世界域自适应的自适应方法)
paper

[11] OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations(跨域跨任务表示的可传递性度量标准)
paper

[10] DRANet: Disentangling Representation and Adaptation Networks for Unsupervised Cross-Domain Adaptation(分解表示和自适应网络以实现无监督的跨域自适应)
paper

[9] MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation(无监督域自适应的协调域对齐和分类)
paper

[8] Transferable Semantic Augmentation for Domain Adaptation(可转移的语义增强以适应领域)
paper | code

[7] Dynamic Transfer for Multi-Source Domain Adaptation(多源域自适应的动态传输)
paper

[6] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation(基于双层域混合的半监督域自适应语义分割)
paper

[5] Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation(多源领域自适应与协作学习的语义分割)
paper

[4] Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning(通过域随机化和元学习对视觉表示进行连续调整)
paper

[3] Domain Generalization via Inference-time Label-Preserving Target Projections(基于推理时间保标目标投影的区域泛化)
paper

[2] MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing(可伸缩的自适应视频压缩传感重建)
paper | code

[1] FSDR: Frequency Space Domain Randomization for Domain Generalization(用于域推广的频域随机化)
paper

度量学习(Metric Learning)

[4] MetricOpt: Learning to Optimize Black-Box Evaluation Metrics(MetricOpt：学习优化黑盒评估指标)
paper

[3] Noise-resistant Deep Metric Learning with Ranking-based Instance Selection(具有基于排名的实例选择的抗噪深度度量学习)
paper

[2] Embedding Transfer with Label Relaxation for Improved Metric Learning(嵌入转移与标签松弛功能以改善度量学习)
paper

[1] Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales(动态度量学习：迈向可扩展的度量空间以适应多个语义尺度)
paper | code

对比学习(Contrastive Learning)

[3] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification(基于对比学习的混合网络的长尾图像分类)
paper

[2] AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries(有效对比自我训练的负面对抗无监督表示的对抗性对比)
paper | code | 解读-AdCo基于对抗的对比学习]

[1] Fine-grained Angular Contrastive Learning with Coarse Labels(粗标签的细粒度角度对比学习)
paper

增量学习(Incremental Learning)

[4] Few-Shot Incremental Learning with Continually Evolved Classifiers(借助不断发展的分类器进行少量增量学习)
paper

[3] DER: Dynamically Expandable Representation for Class Incremental Learning(于类增量学习的动态可扩展表示形式)
paper

[2] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning(少类别增量学习的语义感知知识蒸馏)
paper

[1] On Learning the Geodesic Path for Incremental Learning(关于学习增量学习的测地线路径)
paper

强化学习(Reinforcement Learning)

[2] Unsupervised Visual Attention and Invariance for Reinforcement Learning(强化学习的无监督视觉注意和不变性)
paper

[1] Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach(无监督学习以进行稳健拟合：一种强化学习方法)
paper

元学习(Meta Learning)

[3] Faster Meta Update Strategy for Noise-Robust Deep Learning(更快的元更新策略，适用于杂乱无章的深度学习)
paper

[2] Meta-Mining Discriminative Samples for Kinship Verification(进行亲缘关系验证的元挖掘歧视性样本)
paper

[1] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition(MetaSAug：用于长尾视觉识别的元语义增强)
paper

暂无分类

Stochastic Image-to-Video Synthesis using cINNs(使用cINN的随机图像到视频合成)
paper | project

NeRD: Neural 3D Reflection Symmetry Detector(NeRD：神经3D反射对称检测器)
paper | code

Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors(【人体体积捕获】Function4D：从非常稀疏的消费类RGBD传感器实时采集人体体积)
paper | project | video

AutoFlow: Learning a Better Training Set for Optical Flow(AutoFlow：学习更好的光流训练集)
paper | code

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection(【场景边界检测】【对比学习】用于场景边界检测的镜头对比自我监督学习)
paper

Practical Wide-Angle Portraits Correction with Deep Structured Models(【人像校正】深度结构模型的实用广角人像校正)
paper

Deep Lucas-Kanade Homography for Multimodal Image Alignment(【图像对齐】用于多模态图像对齐的Deep-Lucas-Kanade单应) paper | code

Hierarchical Motion Understanding via Motion Programs(【人体动作理解】基于运动程序的分层运动理解)
paper | project

ManipulaTHOR: A Framework for Visual Object Manipulation(操纵器：一个视觉对象操纵的框架)
paper

Learning To Count Everything(【视觉计数】学习计算一切)
paper | dataset&code

Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos(Ego-Exo：将视觉表示从第三人称视频转移到第一人称视频)
paper

Harmonious Semantic Line Detection via Maximal Weight Clique Selection(【语义线检测】通过最大权重集团选择进行和谐的语义线检测)
paper | code

Neural Camera Simulators(神经相机模拟器)
paper

All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training(并非所有标签都相等：通过标签分组和共同训练增强半监督)
paper | code

Shape and Material Capture at Home(在家中进行形状和材料捕获)
paper ｜ project

SOLD2: Self-supervised Occlusion-aware Line Description and Detection(【图像匹配】自我监督的遮挡感知线描述和检测)
paper | code

Progressive Temporal Feature Alignment Network for Video Inpainting(【视频修复】用于视频修复的渐进时间特征对齐网络)
paper | code

A Decomposition Model for Stereo Matching(【立体声匹配】立体匹配的分解模型)
paper

CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(【立体声匹配】CFNet：稳健的立体声匹配的级联和融合成本)
paper | code

SMD-Nets: Stereo Mixture Density Networks(【立体声匹配】立体声混合密度网络)
paper | project

De-rendering the World's Revolutionary Artefacts(渲染世界革命文物)
paper | project

Learning Triadic Belief Dynamics in Nonverbal Communication from Videos(【视频摘要】从视频中学习非语言交流中的三重性信念动力学)
paper

Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories(超越短片：具有协作记忆的端到端视频级学习)
paper

Passive Inter-Photon Imaging(被动光子间成像)
paper

PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting(PhySG：球形高斯逆渲染，用于基于物理的材质编辑和重新照明)
paper | project

Learning Camera Localization via Dense Scene Matching(【密集场景匹配】通过密集场景匹配学习相机定位)
paper | code

SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification(半监督分类的类似伪标签开发)
paper | code

Online Learning of a Probabilistic and Adaptive Scene Representation(概率自适应场景表示的在线学习)
paper

Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding(拥抱不确定性：去耦和去偏置以实现可靠的实时落地)
paper

Model-Contrastive Federated Learning(模型对比联合学习)
paper

Repopulating Street Scenes(重新填充街景)
paper

Visual Room Rearrangement(视觉室重新布置)
paper

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB(可调红外截止滤光片，用于从RGB感知照明的光谱重建)
paper

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling(具有联合优化策略的视频缩放网络，用于缩小和放大)
paper

Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction(用于域外人网格重构的双层在线适应)
paper | project

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes(【网格简化】毕加索：基于CUDA的3D网格深度学习库)
paper | library

Cloud2Curve: Generation and Vectorization of Parametric Sketches(参数草图的生成和矢量化)
paper

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression(【不确定性学习】学习概率序数嵌入以进行不确定性感知回归)
paper

SSLayout360: Semi-Supervised Indoor Layout Estimation from 360◦ Panorama(【布局估计】360°全景图的半监督室内布局估计)
paper

Convex Online Video Frame Subset Selection using Multiple Criteria for Data Efficient Autonomous Driving(使用多种标准的凸面在线视频帧子集选择，以实现数据高效自动驾驶)
paper

Scene-Intuitive Agent for Remote Embodied Visual Grounding(场景直观的代理，用于远程实现可视化接地)
paper

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding(【visual grounding】弱监督视觉接地的关系感知实例细化)
paper | code

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding(上下文感知的Biaffine本地化网络，用于临时Sentence Grounding)
paper

Dynamic Face Video Segmentation via Reinforcement Learning(通过强化学习进行动态人脸视频分割)
paper | code

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose(从像素到姿势学习可靠的相机定位)
paper | code

Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging(【优化】旋转坐标下降用于快速全局最优旋转平均)
paper

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality(使用情感因果关系对多媒体内容进行情感分析)
paper

Deep Graph Matching under Quadratic Constraint(【图匹配】二次约束下的深度图匹配)
paper

Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging(用于光谱压缩成像的深高斯比例混合气)
paper | code

Limitations of Post-Hoc Feature Alignment for Robustness(健壮性的赛后特征对齐的局限性)
paper

Consensus Maximisation Using Influences of Monotone Boolean Functions(利用单调布尔函数的影响实现共识最大化)
paper

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food(实现对通用食品的自动营养理解)
paper

Structured Scene Memory for Vision-Language Navigation(用于视觉语言导航的结构化场景存储器)
paper | code

Learning Asynchronous and Sparse Human-Object Interaction in Videos(视频中异步稀疏人-物交互的学习)
paper

Self-supervised Geometric Perception(自我监督的几何知觉)
paper

Quantifying Explainers of Graph Neural Networks in Computational Pathology(计算病理学中图神经网络的量化解释器)
paper

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts(探索具有对比场景上下文的数据高效3D场景理解)
paper | project | video

Data-Free Model Extraction(无数据模型提取)
paper

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition(用于【位置识别】的局部全局描述符的【多尺度融合】)
paper | code

Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations(适用于正确概念的权利：通过可解释性来修正神经符号概念)
paper

Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph(基于目标关系图的分层部分可观测目标驱动策略学习)
paper

Domain Generalization via Inference-time Label-Preserving Target Projections（通过保留推理时间的目标投影进行域泛化）
paper

DeRF: Decomposed Radiance Fields（分解的辐射场）
project

Multi-Objective Interpolation Training for Robustness to Label Noise(多目标插值训练的鲁棒性)
paper | code

CDFI: Compression-Driven Network Design for Frame Interpolation(用于帧插值的压缩驱动网络设计)
paper | code

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation（【视频插帧】FLAVR：用于快速帧插值的与流无关的视频表示）
paper | code | project

Deep Animation Video Interpolation in the Wild(【视频插帧】野外深度动画视频插帧)
paper | code&dataset

Probabilistic Embeddings for Cross-Modal Retrieval（跨模态检索的概率嵌入）
paper

Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map(道路动力学和成本图的自监督式多步同时预测)

IIRC: Incremental Implicitly-Refined Classification(增量式隐式定义的分类)
paper | project

Fair Attribute Classification through Latent Space De-biasing(通过潜在空间去偏的公平属性分类)
paper | code | project

Information-Theoretic Segmentation by Inpainting Error Maximization(修复误差最大化的信息理论分割)
paper

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain(Kaleido-BERT：时尚领域的视觉语言预训练)
paper | code

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pretraining(【视频语言学习】UC2：通用跨语言跨模态视觉和语言预培训)

Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling(通过稀疏采样进行视频和语言学习)
paper | code

D-NeRF: Neural Radiance Fields for Dynamic Scenes(D-NeRF：动态场景的神经辐射场)
paper | project

Weakly Supervised Learning of Rigid 3D Scene Flow(刚性3D场景流的弱监督学习)
paper | code | project

2. CVPR2021 Oral

[94] MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space(MOS：面向大型语义空间的规模化异常样本检测)
paper

[93] Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors(【人体体积捕获】Function4D：从非常稀疏的消费类RGBD传感器实时采集人体体积)
paper | project | video

[92] Deep Polarization Imaging for 3D shape and SVBRDF Acquisition(用于3D形状和SVBRDF采集的深偏振成像)
paper

[91] GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving(GeoSim：通过可自动驾驶的几何感知合成进行逼真的视频模拟)
paper

[90] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields(GIRAFFE：将场景表示为合成的生成神经特征场)
paper | project

[89] DriveGAN: Towards a Controllable High-Quality Neural Simulation(DriveGAN：迈向可控的高质量神经仿真)
paper

[88] HOTR: End-to-End Human-Object Interaction Detection with Transformers(HOTR：使用变压器进行端到端的人与对象交互检测)
paper

[87] FrameExit: Conditional Early Exiting for Efficient Video Recognition(【视频理解】帧退出：有条件提前退出以实现有效的视频识别)
paper

[86] Unsupervised Multi-Source Domain Adaptation for Person Re-Identification(用于行人重新识别的无监督多源域适配)
paper

[85] Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets(寻求有效注释大型图像分类数据集的良好做法)
paper | project

[84] KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control(【3D关键点】关键点变形器：用于形状控制的无监督三维关键点发现)
paper | project

[85] ManipulaTHOR: A Framework for Visual Object Manipulation(操纵器：一个视觉对象操纵的框架)
paper

[84] Variational Relational Point Completion Network(变分关系点完备网络)
paper | project

[83] Style-Aware Normalized Loss for Improving Arbitrary Style Transfer(一种改进任意风格转换的风格感知归一化损失算法)
paper

[82] Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps(基于可靠性的注意映射引导交互式视频对象分割)
paper | code

[81] MetricOpt: Learning to Optimize Black-Box Evaluation Metrics(MetricOpt：学习优化黑盒评估指标)
paper

[80] LAFEAT: Piercing Through Adversarial Defenses with Latent Features(LAFEAT：通过具有潜在功能的对抗性防御突围)
paper

[79] Single-view robot pose and joint angle estimation via render & compare(通过渲染和比较进行单视图机器人姿态和关节角度估计)
paper | code

[78] Temporal Query Networks for Fine-grained Video Understanding(时间查询网络，用于细粒度的视频理解)
paper | project

[77] Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction(车道感知不同轨迹预测的分而治之)
paper

[76] Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty(新旧融合：通过几何引导的不确定性学习相对相机姿势)
paper

[75] DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort(DatasetGAN：只需最少的人力即可获得的高效标签数据工厂)
paper

[74] Pixel Codec Avatars(像素编解码器头像)
paper

[73] CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo(CodedStereo：为大景深立体声而设计的相位掩模)
paper

[72] Rethinking and Improving the Robustness of Image Style Transfer(重新思考和改善图像风格迁移的鲁棒性)
paper

[71] Simpler Certified Radius Maximization by Propagating Covariances(通过传播协方差简化认证半径最大化)
paper | video

[70] Global Transport for Fluid Reconstruction with Learned Self-Supervision(具有自学指导的流体重建的全球运输)
paper | code

[69] SOLD2: Self-supervised Occlusion-aware Line Description and Detection(【图像匹配】自我监督的遮挡感知线描述和检测)
paper | code

[68] InverseForm: A Loss Function for Structured Boundary-Aware Segmentation(结构化边界感知分割的损失函数)
paper

[67] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos(【视频摘要】从视频中学习非语言交流中的三重性信念动力学)
paper

[66] Adversarial Robustness under Long-Tailed Distribution(长尾分布下的对抗鲁棒性)
paper | code

[65] S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation(学习通用的深度特定的结构表示)
paper

[64] Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning(【视频预测】通过记忆对准学习的视频预测调用长期运动环境)
paper

[63] Passive Inter-Photon Imaging(被动光子间成像)
paper

[62] Jigsaw Clustering for Unsupervised Visual Representation Learning(拼图聚类的无监督视觉表示学习)
paper | code

[61] Reconstructing 3D Human Pose by Watching Humans in the Mirror(通过照镜子中的人来重建3D人的姿势)
paper | project

[60] Towards Evaluating and Training Verifiably Robust Neural Networks(评估和训练可验证的稳健神经网络)
paper | code

[59] LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering(通过可分辨深度渲染进行单眼360布局估算)
paper | project

[58] A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification(细粒度分类的半监督学习的现实评估)
paper

[57] SimPoE: Simulated Character Control for 3D Human Pose Estimation(用于3D人体姿势估计的模拟角色控制)
paper | project

[56] DER: Dynamically Expandable Representation for Class Incremental Learning(【增量学习】用于类增量学习的动态可扩展表示形式)
paper

[55] Convolutional Hough Matching Networks(卷积霍夫匹配网络)
paper

[54] A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection(仔细研究CNN生成图像检测的傅立叶光谱差异)
paper | code

[53] DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation(DiNTS：用于3D医学图像分割的可区分神经网络拓扑搜索)
paper

[52] Face Forensics in the Wild(人脸伪造数据集)
paper | paper

[51] Fully Convolutional Scene Graph Generation(全卷积场景图生成)
paper

[50] Visual Room Rearrangement(视觉室重新布置)
paper

[49] Adaptive Methods for Real-World Domain Generalization(真实世界域自适应的自适应方法)
paper

[48] Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB(可调红外截止滤光片，用于从RGB感知照明的光谱重建)
paper

[47] Learning Placeholders for Open-Set Recognition(学习占位符以进行开放式识别)
paper

[46] Zero-shot Adversarial Quantization(零样本对抗量化)
paper | code

[45] POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture(用于单视图人体体积捕获的姿势引导选择性融合)
paper | project

[44] RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening(通过实例选择性增白提高城市场景分割中的域泛化)
paper | code

[43] Bidirectional Projection Network for Cross Dimension Scene Understanding(【场景理解】双向投影网络，用于跨维度场景理解)
paper | code

[42] Dynamic Slimmable Network(动态可压缩网络)
paper | code

[41] Scaling Local Self-Attention For Parameter Efficient Visual Backbones(扩展局部自注意力以获得有效的参数视觉主干)
paper

[40] PGT: A Progressive Method for Training Models on Long Videos(一种在长视频上训练模型的渐进方法)
paper | code

[39] Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net(无监督多元规范CSCℓ4Net的脑图像合成)
paper

[38] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking(Transformer与追踪器相遇：利用时间上下文进行可靠的视觉追踪)
paper

[37] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)
paper

[36] Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging(【优化】旋转坐标下降用于快速全局最优旋转平均)
paper

[35] MagFace: A Universal Representation for Face Recognition and Quality Assessment(MagFace：人脸识别和质量评估的通用表示形式)
paper | code

[34] CoMoGAN: continuous model-guided image-to-image translation(连续的模型指导的图像到图像翻译)
paper | code

[33] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism(具有分离旋转机制的类别级6D对象姿势估计的快速基于形状的网络)
paper

[32] Knowledge Evolution in Neural Networks(神经网络中的知识进化)
paper | code

[31] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)
paper | code

[30] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis(进行全面伪造分析的多功能基准)
paper | code

[29] Dense Contrastive Learning for Self-Supervised Visual Pre-Training(自监督视觉预训练的密集对比学习)
paper | code

[28] Consensus Maximisation Using Influences of Monotone Boolean Functions(利用单调布尔函数的影响实现共识最大化)
paper

[27] Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing(用于实例感知人类语义解析的可微分多粒度人类表示学习)
paper | code

[26] Discovering Hidden Physics Behind Transport Dynamics(在运输动力学背后发现隐藏的物理)
paper

[25] Learning Continuous Image Representation with Local Implicit Image Function(通过局部隐含图像功能学习连续图像表示)
paepr | code | video | project | 解读-真正的无极放大！30x插值效果惊艳，英伟达等开源LIIF：巧妙的图像超分新思路

[24] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
paper | code
解读：无监督预训练检测器

[23] Self-supervised Geometric Perception(自我监督的几何知觉)
paper

[22] DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images(一种心脏标记磁共振图像运动跟踪的无监督深度学习方法)
paper

[21] Modeling Multi-Label Action Dependencies for Temporal Action Localization(为时间动作定位建模多标签动作相关性)
paper

[20] HPS: localizing and tracking people in large 3D scenes from wearable sensors(通过可穿戴式传感器对大型3D场景中的人进行定位和跟踪)

[19] Real-Time High Resolution Background Matting(实时高分辨率背景抠像)
paper | code | project | video

[18] Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts(探索具有对比场景上下文的数据高效3D场景理解)
paper | project | video

[17] Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments(在动态室内环境中，通过空间划分的鲁棒神经路由可实现摄像机的重新定位)
paper | project

[16] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)
paper | code

[15] Categorical Depth Distribution Network for Monocular 3D Object Detection(用于单目三维目标检测的分类深度分布网络)
paper

[14] PatchmatchNet: Learned Multi-View Patchmatch Stereo(学习多视图立体声)
paper | code

[13] Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning(通过域随机化和元学习对视觉表示进行连续调整)
paper

[12] Single-Stage Instance Shadow Detection with Bidirectional Relation Learning(具有双向关系学习的单阶段实例阴影检测)

[11] Neural Geometric Level of Detail:Real-time Rendering with Implicit 3D Surfaces(神经几何细节水平：隐式3D曲面的实时渲染)
paper | code | project

[9] PREDATOR: Registration of 3D Point Clouds with Low Overlap(预测器：低重叠的3D点云的配准)
paper | code | project

[8] Domain Generalization via Inference-time Label-Preserving Target Projections(通过保留推理时间的目标投影进行域泛化)
paper

[7] Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction(全局一致的非刚性重建的神经变形图)
paper | project | video

[6] Fine-grained Angular Contrastive Learning with Coarse Labels(粗标签的细粒度角度对比学习)
paper

[5] Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling(通过稀疏采样进行视频和语言学习)
paper | code

[4] Cross-View Regularization for Domain Adaptive Panoptic Segmentation(用于域自适应全景分割的跨视图正则化)
paper

[3] Image-to-image Translation via Hierarchical Style Disentanglement(通过分层样式分解实现图像到图像的翻译)
paper | code

[2] Towards Open World Object Detection(开放世界中的目标检测)
paper | code

[1] End-to-End Video Instance Segmentation with Transformers(使用Transformer的端到端视频实例分割)
paper

CVPR2021 论文解读汇总

【13】无监督预训练检测器(CVPR2021 Oral)
无监督预训练模型无论是在nlp(BERT,GPT,XLNet)还是在cv(MoCo,SimCLR,BYOL)上都取得了突破性的进展。而对于无监督（自监督）预训练而言，最重要的就是设计一个合理的pretext，典型的像BERT的masked language model，MoCo的instance discrimination。他们都通过一定的方式，从样本中无监督的构造了一个"label"，从而对模型进行预训练，提高下游任务的表现。那么，对于DETR而言，既然CNN可以是无监督预训练的，那么transformer能不能也无监督预训练一下？
paper | code

【12】GFLV2：目标检测良心技术，无Cost涨点！
本文是检测领域首次引入用边界框的不确定性的统计量来高效地指导定位质量估计，从而基本无cost（包括在训练和测试阶段）地提升one-stage的检测器性能，涨幅在1~2个点AP。
paper | code

【11】DCL：旋转目标检测新方法
Densely Coded Labels (DCL)是 Circular Smooth Label (CSL)的优化版本。DCL主要从两方面进行了优化：过于厚重的预测层以及对类正方形目标检测的不友好。
paper | code

【10】层次风格解耦：人脸多属性篡改终于可控了(CVPR2021 Oral)
从CycleGAN提出后，图像翻译面临的最大的两个问题就是扩展性（同时处理多种篡改）和多样性（生成不同的结果），然而，一直没有一个很好的方法，可以兼顾扩展性和多样性的同时，又能使得这种篡改满足预期。例如，对于人脸属性篡改任务，我们想要给人脸加上刘海，可是却改变了发色或是背景，再例如，我们想要给人脸加上眼睛，结果竟然性别和年龄也改变了。HiSD就是为了解决这些问题，并且还同时支持从噪声中生成或者从图像中提取这样的风格。
paper | code

【9】Transformer再下一城！low-level多个任务榜首被占领，北大华为等联合提出预训练模型IPT
对low-level计算机视觉任务（比如降噪、超分、去雨）进行了研究并提出了一种新的预训练模型：IPT(image processing transformer)。为最大挖掘transformer的能力，作者采用知名的ImageNet制作了大量的退化图像数据对，然后采用这些训练数据对对所提IPT(它具有多头、多尾以适配多种退化降质模型)模型进行训练。此外，作者还引入了对比学习以更好的适配不同的图像处理任务。经过微调后，预训练模型可以有效的应用不到的任务中。仅仅需要一个预训练模型，IPT即可在多个low-level基准上取得优于SOTA方案的性能。
paper

【8】真正的无极放大！30x插值效果惊艳，英伟达等开源LIIF：巧妙的图像超分新思路
一种新颖的连续图像表达方案。它在离散2D图像与连续2D图像之间构建了一种巧妙的连接。受益于所提方法的“连续表达”，它能够对图像进行分辨率调整，做到了真正意义上的“无极放大”，甚至可以进行30x的放大处理。
paper | code | video | project

【7】AdCo基于对抗的对比学习
自监督学习领域，基于contrastive learning（对比学习）的思路已经在下游分类检测和任务中取得了明显的优势。其中如何充分利用负样本提高学习效率和学习效果一直是一个值得探索的方向，本文第一次全新提出了用对抗的思路end-to-end来直接学习负样本，在ImageNet和下游任务均达到SOTA。AdCo仅仅用8196个负样本（八分之一的MoCo v2的负样本量），就能达到与之相同的精度。同时，这些可直接训练的负样本在和BYOL中Prediction MLP参数量相同的情况下依然能够取得相似的效果。这说明了在自监督学习时代，通过将负样本可学习化，对比学习仍然具有学习效率高、训练稳定和精度高等一系列优势。
paper | code

【6】超分性能不降低，计算量降低50%：加速图像超分的ClassSR
本文是在low-level领域关于超分网络加速的一次探索。它创新性的将分类与超分进行了融合，根据不同子块的复原难度自适应选择合适的超分分支以降低整体计算复杂度：复原难度低的平坦区域选择复杂度低的超分分支，复原难度高的纹理区域选择复杂度高的超分分支。在不降低超分性能的情况下，该方法可以最高可以节省50%的计算量。
paper

【5】 MotionRNN：针对复杂时空运动的通用视频预测模型
视频预测方法被广泛应用于降水预报（Precipitation Nowcasting）、交通流预测（Traffic Flow Prediction）、机器人视觉规划（Visual Planning）等众多任务中。然而现实世界的运动极其复杂，且往往处于不断变化中，比如人体运动中的变向、变速、肢体运动，雷达回波中的云团产生、消散、位移、形变等等。这种复杂的时空变化使得准确预测未来的运动极具挑战性。
针对复杂时空运动，我们关注到现实世界的运动在时空上可以分解为整体运动趋势（motion trend）与瞬时变化（transient variation），并基于此提出了名为MotionRNN的模型，对运动趋势与瞬时变化进行统一建模。同时，作为一个通用的视频预测模型，MotionRNN具有很好的灵活性，可以结合众多的基于RNN的时空预测模型，稳定提升它们应对复杂时空运动的能力。
paper

【4】Statistical Texture Learning
从底层细节纹理分析与增强优化视觉学习问题，并在分割任务上得到了验证，直观、合理且有效涨点。我们从传统图像分析领域获得灵感，构建了这样一套Statistical Texture Learning框架，有效的在CNN架构中学习底层纹理（分析+增强），从而获得了非常有效的性能涨点。
paper

【3】二次元妹子五官画风都能改，周博磊团队用无监督方法控制GAN(CVPR2021 Oral)
现在，GAN不仅能画出二次元妹子，还能精准调节五官、表情、姿势和绘画风格。而且在调控某个因素的时候，其他条件能尽量保持不变。SeFa适用于PGGAN、StyleGAN、BigGAN和StyleGAN2等常见GAN模型，不仅对二次元妹子有效，甚至还能调控猫咪上下左右不同方向。
paper | code | Colab

【2】Inception convolution
我们最近被CVPR2021接受的工作，主要使用一些优化手段来找到新的卷积模式，目标是能够找到一个部署友好简单的卷积来帮助下游各个任务更好的提升baseline。
paper | code

【1】RepVGG：极简架构，SOTA性能，让VGG式模型再次伟大（CVPR-2021）
我们最近的工作RepVGG，用结构重参数化（structural re-parameterization）实现VGG式单路极简架构，一路3x3卷到底，在速度和性能上达到SOTA水平，在ImageNet上超过80%正确率。已经被CVPR-2021接收。不用NAS，不用attention，不用各种新颖的激活函数，甚至不用分支结构，只用3x3卷积和ReLU，也能达到SOTA性能。
paper | 开源预训练模型和代码（PyTorch版） | MegEngine版

4. CVPR2021 Workshop

[81] EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration(ESPN：用于模糊图像恢复的增强型深金字塔网络)
paper

[80] ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research(ChaLearn LAP大规模签名人独立的隔离手语识别挑战：设计，结果和未来研究)
paper

[79] Rethinking of Radar's Role: A Camera-Radar Dataset and Systematic Annotator via Coordinate Alignment(重新考虑雷达的作用：通过坐标对齐的摄像机-雷达数据集和系统注释器)
paper

[78] Good Practices and A Strong Baseline for Traffic Anomaly Detection(【AI CITY第一名】良好做法和强大的交通异常检测基准)
paper | code

[77] Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms(Dynamic-OFA：用于在异构嵌入式平台上进行性能扩展的运行时DNN架构切换)
paper

[76] The iWildCam 2021 Competition Dataset(iWildCam 2021竞赛数据集)
paper

[75] Pareto-Optimal Quantized ResNet Is Mostly 4-bit(帕累托最优量化ResNet主要为4位)
paper | code

[74] Neural 3D Scene Compression via Model Compression(通过模型压缩进行神经3D场景压缩)
paper

[73] BasisNet: Two-stage Model Synthesis for Efficient Inference(BasisNet：有效推理的两阶段模型综合)
paper

[72] Effectively Leveraging Attributes for Visual Similarity(有效地利用属性实现视觉相似性)
paper

[71] Physically Inspired Dense Fusion Networks for Relighting(灵感来自于物理的密集融合网络)
paper

[70] Feedback control of event cameras(事件摄像机的反馈控制)
paper

[69] EQFace: A Simple Explicit Quality Network for Face Recognition(EQFace：用于面部识别的简单显式质量网络)
paper | code

[68] S3Net: A Single Stream Structure for Depth Guided Image Relighting(S3Net：用于深度引导图像重新照明的单一流结构)
paper

[67] Multi-modal Bifurcated Network for Depth Guided Image Relighting(用于深度引导图像重新照明的多模式分叉网络)
paper

[66] Renofeation: A Simple Transfer Learning Method for Improved Adversarial Robustness(Renofeation：一种简单的转移学习方法，以提高对抗性的鲁棒性)
paper

[65] D-LEMA: Deep Learning Ensembles from Multiple Annotations -- Application to Skin Lesion Segmentation(D-LEMA：来自多种注释的深度学习集合-在皮肤病变分割中的应用)
paper

[64] Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection(伪IoU：改进无锚对象检测中的标签分配)
paper | code

[63] Cluster-driven Graph Federated Learning over Multiple Domains(跨域的集群驱动图联合学习)
paper

[62] Perceptual Image Quality Assessment with Transformers(Transformer的感知图像质量评估)
paper | code

[61] CASSOD-Net: Cascaded and Separable Structures of Dilated Convolution for Embedded Vision Systems and Applications(CASSOD-Net：扩展的卷积的层叠和可分离结构，用于嵌入式视觉系统和应用)
paper

[60] NTIRE 2021 Challenge on Video Super-Resolution(NTIRE 2021视频超分辨率挑战)
paper

[59] NTIRE 2021 Challenge on Image Deblurring(NTIRE 2021图像去模糊挑战)
paper

[58] Differentiable Event Stream Simulator for Non-Rigid 3D Tracking(用于非刚性3D跟踪的可区分事件流模拟器)
paper ｜ code

[57] Sign Segmentation with Changepoint-Modulated Pseudo-Labelling(具有Changepoint调制伪标签的符号分割)
paper

[56] Boosting Co-teaching with Compression Regularization for Label Noise(通过压缩正则化促进共教学以消除标签噪声)
paper | project

[55] Towards Fair Federated Learning with Zero-Shot Data Augmentation(【数据增广】借助零散散数据增强实现公平的联合学习)
paper

[54] Unsupervised Detection of Cancerous Regions in Histology Imagery using Image-to-Image Translation(【图像翻译】【医学影像】使用图像到图像翻译的组织学图像中癌区域的无监督检测)
paper

[53] Width Transfer: On the (In)variance of Width Optimization(宽度传递：关于宽度优化的（输入）方差)
paper

[52] Three-stream network for enriched Action Recognition(【动作识别】三流网络，用于丰富动作识别)
paper

[51] Detecting and Matching Related Objects with One Proposal Multiple Predictions(用一个多预测提案检测和匹配相关对象)
paper | code

[50] The 5th AI City Challenge(第五届AI城市挑战赛)
paper

[49] Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of Quantization on Depthwise Separable Convolutional Networks Through the Eyes of Multi-scale Distributional Dynamics(【模型压缩】【移动端】所有MobileNets量化效果不佳吗？通过多尺度分布动力学了解量化对深度可分离卷积网络的影响)
paper

[48] Multi-Scale Hourglass Hierarchical Fusion Network for Single Image Deraining(【图像去雨】用于单图像去雨的多尺度沙漏分层融合网络)
paper | code

[47] Class-Incremental Experience Replay for Continual Learning under Concept Drift(【增量学习】在概念漂移下继续学习的班级增量体验重播)
paper

[46] SBNet: Segmentation-based Network for Natural Language-based Vehicle Search(SBNet：基于分段的自然语言车辆搜索网络)
paper

[45] Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities(补丁快捷方式：可解释的代理模型有效地发现黑盒漏洞)
paper

[44] Region-Adaptive Deformable Network for Image Quality Assessment(用于图像质量评价的区域自适应变形网络)
paper | code

[43] Network Space Search for Pareto-Efficient Spaces(【深度学习训练】Pareto有效空间的网络空间搜索)
paper

[42] A Strong Baseline for Vehicle Re-Identification(【AI城市】车辆重新识别的强大基线)
paper | code

[41] Multi-task Learning with Attention for End-to-end Autonomous Driving(【自动驾驶】端到端自主驾驶的多任务注意力学习)
paper

[40] Perceptual Loss for Robust Unsupervised Homography Estimation(鲁棒无监督单应估计的感知损失)
paper

[39] Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation(【人体姿态估计】基于二维人体姿态估计的乒乓球笔划识别)
paper

[38] Comparing Representations in Tracking for Event Camera-based SLAM(基于SLAM的事件摄像机跟踪中的比较表示)
paper

[37] Shadow Neural Radiance Fields for Multi-view Satellite Photogrammetry(【遥感图像】多视点卫星摄影测量中的阴影神经辐射场)
paper

[36] Distill on the Go: Online knowledge distillation in self-supervised learning(【知识蒸馏】在线提取：自我监督学习中的在线知识提取)
paper

[35] An Efficient Approach for Anomaly Detection in Traffic Videos(【视频异常检测】一种有效的交通视频异常检测方法)
paper

[34] Class-Incremental Learning with Generative Classifiers(【增量学习】基于生成分类器的课堂增量学习)
paper

[33] Engineering Sketch Generation for Computer-Aided Design(面向计算机辅助设计的工程草图生成)
paper

[32] IB-DRR: Incremental Learning with Information-Back Discrete Representation Replay(【增量学习】IB-DRR：基于信息反馈的增量学习)
paper

[31] Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis(【可解释性】重新审视类激活映射的可解释性评价：一个新的度量和实验分析)
paper

[30] GAN-Based Data Augmentation and Anonymization for Skin-Lesion Analysis: A Critical Review(基于GAN的皮肤损伤分析的数据增强和匿名化：一项重要的评论)
paper

[29] Brittle Features May Help Anomaly Detection(脆弱的功能可能有助于异常检测)
paper

[28] I Only Have Eyes for You: The Impact of Masks On Convolutional-Based Facial Expression Recognition(我只有你的眼睛：口罩对基于卷积的面部表情识别的影响)
paper

[27] Assessment of deep learning based blood pressure prediction from PPG and rPPG signals(从PPG和rPPG信号评估基于深度学习的血压预测)
paper

[26] A Two-branch Neural Network for Non-homogeneous Dehazing via Ensemble Learning(通过集合学习进行非均匀去雾的两分支神经网络)
paper

[25] End-to-End Interactive Prediction and Planning with Optical Flow Distillation for Autonomous Driving(用于自动驾驶的带有光流蒸馏的端到端交互式预测和计划)
paper | project

[24] Reconsidering CO2 emissions from Computer Vision(考虑计算机视觉产生的二氧化碳排放量)
paper

[23] On Training Sketch Recognizers for New Domains(关于新领域的训练草图识别器)
paper

[22] Filtering Empty Camera Trap Images in Embedded Systems(过滤嵌入式系统中的空相机陷阱图像)
paper

[21] Contrastive Learning Improves Model Robustness Under Label Noise(对比学习提高了标签噪声下的模型鲁棒性)
paper | code

[20] Restoration of Video Frames from a Single Blurred Image with Motion Understanding(通过运动理解从单个模糊图像恢复视频帧)
paper

[19] LSPnet: A 2D Localization-oriented Spacecraft Pose Estimation Neural Network(LSPnet：面向二维本地化的航天器姿态估计神经网络)
paper

[18] Plants Don't Walk on the Street: Common-Sense Reasoning for Reliable Semantic Segmentation(植物不在大街上行走：可靠语义分割的常识推理)
paper

[17] Temporal Consistency Loss for High Resolution Textured and Clothed 3DHuman Reconstruction from Monocular Video(从单眼视频的高分辨率纹理化和布料化3D人体重建的时间一致性损失)
paper

[16] A Mathematical Analysis of Learning Loss for Active Learning in Regression(回归中主动学习的学习损失的数学分析)
paper

[15] Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting(SoccerNet-v2中的摄像机校准和球员本地化以及用于动作识别的研究)
paper

[14] DANICE: Domain adaptation without forgetting in neural image compression(DANICE：在不忘记神经图像压缩的情况下进行域自适应)
paper

[13] OmniLayout: Room Layout Reconstruction from Indoor Spherical Panoramas(OmniLayout：从室内球形全景图进行房间布局重建)
paper | code

[12] Dual Contrastive Learning for Unsupervised Image-to-Image Translation(【图像翻译】双重对比学习，实现无监督的图像到图像翻译)
paper | code

[11] OmniFlow: Human Omnidirectional Optical Flow(OmniFlow：人类全向光流)
paper

[10] I Find Your Lack of Uncertainty in Computer Vision Disturbing(在计算机视觉干扰方面缺乏不确定性)
paper

[9] Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks(神经网络中快速Walsh-Hadamard变换和基于平滑阈值的二进制层)
paper

[8] Machine-learned 3D Building Vectorization from Satellite Imagery(【遥感图像】通过卫星图像进行机器学习的3D建筑矢量化)
paper

[7] Graph-based Person Signature for Person Re-Identifications(【行人重识别】用于行人重识别的基于图的人员签名)
paper

[6] Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts(【动作识别】用于足球广播中动作识别的临时感知功能池)
paper

[5] Continual learning in cross-modal retrieval(【持续学习】跨模式检索中的持续学习)
paper

[4] Towards Automated and Marker-less Parkinson Disease Assessment: Predicting UPDRS Scores using Sit-stand videos(迈向自动无标记帕金森病评估：使用站立式视频预测UPDRS得分)
paper

[3] Efficient Space-time Video Super Resolution using Low-Resolution Flow and Mask Upsampling(【图像超分】使用低分辨率流和遮罩上采样的高效时空视频超分辨率)
paper | project

[2] Generalizable Multi-Camera 3D Pedestrian Detection(【行人检测】通用多摄像机3D行人检测)
paper

[1] Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation(【知识蒸馏】【视觉问答】通过知识蒸馏处理视觉问题回答差异预测任务中的缺失模态)
paper

5. To do list

CVPR2021论文分享

Files

CVPR2021.md

Latest commit

History