Paper reading record

2025.6

Jingze Su 6.24

(25’CVPR) Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Qi Li 6.18

(25’CVPR) SNet: See Large, Focus Small

Qi Li 6.10

(25’CVPR) MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Gaocheng Zhang 6.10

(25’CVPR) Samba: A Unified Mamba-based Framework for General Salient Object Detection

Chunxiao Chen 6.3

(25’CVPR) Adaptive Rectangular Convolution for Remote Sensing Pansharpening

2025.5

Jiaxin Cai 5.27

(25’CVPR) BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

Jingze Su 5.20

(25’CVPR) OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

Qi Li 5.13

(25’CVPR) Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Gaocheng Zhang 5.13

(25’CVPR) COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting

2025.4

Chunxiao Chen 4.23

(25’CVPR) CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Jiaxin Cai 4.15

(25’CVPR) DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution

Jingze Su 4.7

(25’CVPR) SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model

Qi Li 4.1

(25’CVPR) ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

2025.3

Gaocheng Zhang 3.25

(25’Arxiv) Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling

Jiaxin Cai 3.19

(24’Neurips) Parameter-Inverted Image Pyramid Networks
(25’TCSVT) CPAL: Cross-prompting Adapter with LoRAs forRGB+X Semantic Segmentation

Liwang Chen 3.13

(25’AAAI) Maximizing the Position Embedding for Vision Transformers with Global Average Pooling

2025.2

Jiexin Luo 2.27

(24’ICLR) Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

2025.1

Jingze Su 1.16

(24’CVPR) Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

Qi Li 1.9

(24’Neurips) Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation
(24’MM) Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation

Jiaxin Cai 1.2

(25’AAAI) FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection

2024.12

Liwang Chen 12.19

(24’ECCV) Textual Query-Driven Mask Transformerfor Domain Generalized Segmentation

Jiexin Luo 12.12

(24’CVPR) UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Jingze Su 12.5

(24’Neurips) Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts

————1st stage: over————

2024.9

Jiaxin Cai 9.23

(24’TPAMI) Frequency-aware Feature Fusion for Dense Image Prediction

Qi Li 9.6

(24’ECCV) EAFormer: Scene Text Segmentation with Edge-Aware Transformers

2024.8

Liwang Chen 8.8

(24’ECCV) LookupViT:Compressing visual information to a limited number of tokens

Jiexin Luo 8.1

(24’TPAMI) CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention

2024.7

Jingze Su 7.25

(24’TIP) HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

Jiaxin Cai 7.18

(24’ECCV) IRSAM Advancing Segment Anything Model for Infrared Small Target Detection

Qi Li 7.11

(24’‘Arxiv) Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

2024.6

Jiexin Luo 6.20

(24’AAAI) VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Qi Li 6.20

(23’CVPR) PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers

Qi Li 6.13

(24’CVPR) Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models

Jingze Su 6.6

(24’CVPR) Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

2024.5

Jiaxin Cai 5.23

(24’CVPR) VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
(24’CVPR) Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

Jiexin Luo 5.16

(24’CVPR) TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Qi Li 5.9

(24’CVPR) Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

2024.4

Jingze Su 4.25

(23’CVPR) Efficient Multimodal Fusion via Interactive Prompting
(24’CVPR) Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Jiaxin Cai 4.25

(24’CVPR) SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Jiexin Luo 4.18

(24’CVPR) Masked AutoDecoder is Effective Multi-Task Vision Generalist
(22’Neurips) A Unified Sequence Interface for Vision Tasks

Jiaxin Cai 4.18

(24’CVPR) MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning
(24’Arxiv) MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection

Qi Li 4.10

(24’CVPR) SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Before 2024.4

Qi Li

(21’CVPR) Distilling Knowledge via Knowledge Review
(23’ICCV) Lightweight Image Super-Resolution with Superpixel Token Interaction
(23’ICCV) Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion
(23’ICCV) TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

Jiexin Luo

(21’CVPR) Multi-Scale Aligned Distillation for Low-Resolution Detection
(21’CVPR) Channel-wise Knowledge Distillation for Dense Prediction
(23’CVPR) EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
(23’ICCV) EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
(22’CVPR) Masked Autoencoders Are Scalable Vision Learners
(24’CVPR) EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Jingze Su

(17’ICLR) Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
(23’CVPR) Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
(22’EMNLP) Mixture of Attention Heads: Selecting Attention Heads Per Token
(23’ICCV) Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts
(23’CVPR) Visual Prompt Multi-Modal Tracking
(23’CVPR) Multimodal Prompting with Missing Modalities for Visual Recognition

Jiaxin Cai

No record