Paper reading record
2025.6
Jingze Su 6.24
(25’CVPR) Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Qi Li 6.18
(25’CVPR) SNet: See Large, Focus Small
Qi Li 6.10
(25’CVPR) MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Gaocheng Zhang 6.10
(25’CVPR) Samba: A Unified Mamba-based Framework for General Salient Object Detection
Chunxiao Chen 6.3
(25’CVPR) Adaptive Rectangular Convolution for Remote Sensing Pansharpening
2025.5
Jiaxin Cai 5.27
(25’CVPR) BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
Jingze Su 5.20
(25’CVPR) OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Qi Li 5.13
(25’CVPR) Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Gaocheng Zhang 5.13
(25’CVPR) COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
2025.4
Chunxiao Chen 4.23
(25’CVPR) CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Jiaxin Cai 4.15
(25’CVPR) DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Jingze Su 4.7
(25’CVPR) SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model
Qi Li 4.1
(25’CVPR) ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
2025.3
Gaocheng Zhang 3.25
(25’Arxiv) Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
Jiaxin Cai 3.19
(24’Neurips) Parameter-Inverted Image Pyramid Networks
(25’TCSVT) CPAL: Cross-prompting Adapter with LoRAs forRGB+X Semantic Segmentation
Liwang Chen 3.13
(25’AAAI) Maximizing the Position Embedding for Vision Transformers with Global Average Pooling
2025.2
Jiexin Luo 2.27
(24’ICLR) Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
2025.1
Jingze Su 1.16
(24’CVPR) Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
Qi Li 1.9
(24’Neurips) Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation
(24’MM) Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation
Jiaxin Cai 1.2
(25’AAAI) FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection
2024.12
Liwang Chen 12.19
(24’ECCV) Textual Query-Driven Mask Transformerfor Domain Generalized Segmentation
Jiexin Luo 12.12
(24’CVPR) UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Jingze Su 12.5
(24’Neurips) Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts
————1st stage: over————
2024.9
Jiaxin Cai 9.23
(24’TPAMI) Frequency-aware Feature Fusion for Dense Image Prediction
Qi Li 9.6
(24’ECCV) EAFormer: Scene Text Segmentation with Edge-Aware Transformers
2024.8
Liwang Chen 8.8
(24’ECCV) LookupViT:Compressing visual information to a limited number of tokens
Jiexin Luo 8.1
(24’TPAMI) CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention
2024.7
Jingze Su 7.25
(24’TIP) HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation
Jiaxin Cai 7.18
(24’ECCV) IRSAM Advancing Segment Anything Model for Infrared Small Target Detection
Qi Li 7.11
(24’‘Arxiv) Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
2024.6
Jiexin Luo 6.20
(24’AAAI) VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Qi Li 6.20
(23’CVPR) PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers
Qi Li 6.13
(24’CVPR) Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
Jingze Su 6.6
(24’CVPR) Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
2024.5
Jiaxin Cai 5.23
(24’CVPR) VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
(24’CVPR) Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
Jiexin Luo 5.16
(24’CVPR) TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Qi Li 5.9
(24’CVPR) Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
2024.4
Jingze Su 4.25
(23’CVPR) Efficient Multimodal Fusion via Interactive Prompting
(24’CVPR) Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Jiaxin Cai 4.25
(24’CVPR) SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Jiexin Luo 4.18
(24’CVPR) Masked AutoDecoder is Effective Multi-Task Vision Generalist
(22’Neurips) A Unified Sequence Interface for Vision Tasks
Jiaxin Cai 4.18
(24’CVPR) MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning
(24’Arxiv) MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection
Qi Li 4.10
(24’CVPR) SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Before 2024.4
Qi Li
(21’CVPR) Distilling Knowledge via Knowledge Review
(23’ICCV) Lightweight Image Super-Resolution with Superpixel Token Interaction
(23’ICCV) Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion
(23’ICCV) TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Jiexin Luo
(21’CVPR) Multi-Scale Aligned Distillation for Low-Resolution Detection
(21’CVPR) Channel-wise Knowledge Distillation for Dense Prediction
(23’CVPR) EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
(23’ICCV) EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
(22’CVPR) Masked Autoencoders Are Scalable Vision Learners
(24’CVPR) EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Jingze Su
(17’ICLR) Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
(23’CVPR) Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
(22’EMNLP) Mixture of Attention Heads: Selecting Attention Heads Per Token
(23’ICCV) Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts
(23’CVPR) Visual Prompt Multi-Modal Tracking
(23’CVPR) Multimodal Prompting with Missing Modalities for Visual Recognition
Jiaxin Cai
No record