A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
AAAI 2026
Full, up-to-date list also onGoogle Scholar ↗DBLP ↗
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
AAAI 2026
AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization
AAAI 2026
EcoAgent: An Efficient Device-Cloud Collaborative Multi-Agent Framework for Mobile Automation
AAAI 2026
Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs
CVPR 2026
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
AAAI 2026
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
EACL 2026
MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study
NeurIPS 2026
ThinkRec: Thinking-based recommendation via LLM
WWW 2026
Towards Meta-Cognitive Knowledge Editing for Multimodal LLMs
WWW 2026
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits
CVPR 2026
Instruction Tuning for Large Language Models: A Survey
ACM Computing Surveys 2026
NaviCache: Test-Time Self-Calibration Caching for Video Generation
ICML 2026
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
ICLR 2026
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
ACL Findings 2026
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
arXiv 2026
Reinforcement Learning in Generative Multimodal AI: A Survey
TechRxiv 2026
SafePred: A Predictive Guardrail for Computer-Using Agents via World Models
arXiv 2026
Semantic Trimming and Auxiliary Multi-step Prediction for Generative Recommendation
arXiv 2026
World-Model-Augmented Web Agents with Action Correction
arXiv 2026
CHORD: Customizing Hybrid-precision On-device Model for Sequential Recommendation with Device-cloud Collaboration
ACM MM 2025
Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation
KDD 2025
Cuff-KT: Tackling Learners' Real-time Learning Pattern Adjustment via Tuning-Free Knowledge State Guided Model Updating
KDD 2025
Democratizing AI through model fusion: A comprehensive review and future directions
Nexus 2025
Device-Cloud Collaborative Correction for On-Device Recommendation
IJCAI 2025
Disentangled Knowledge Tracing for Alleviating Cognitive Bias
WWW 2025
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
ICLR 2025
Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
ACM MM 2025
ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation
IJCAI 2025
FedMcon: an adaptive aggregation method for federated learning via meta controller
Frontiers of Information Technology & Electronic Engineering 2025
Forward Once for All: Structural Parameterized Adaptation for Efficient Cloud-coordinated On-device Recommendation
KDD 2025
Knowledge-empowered, collaborative, and co-evolving AI models: The post-LLM roadmap
Engineering 2025
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
ACL 2025
MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities
AAAI 2025
Optimize Incompatible Parameters Through Compatibility-aware Knowledge Integration
AAAI 2025
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
ACL 2025
Tackling Device Data Distribution Real-time Shift via Prototype-based Parameter Editing
ACM MM 2025
FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning
AAAI 2025
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving
EMNLP 2025
Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems
IEEE Trans. Knowl. Data Eng. 2024
SLED: Structure Learning based Denoising for Recommendation
ACM Trans. Inf. Syst. 2024
Transferring Causal Mechanism over Meta-representations for Target-Unknown Cross-domain Recommendation
ACM Trans. Inf. Syst. 2024
CoreRec: A Counterfactual Correlation Inference for Next Set Recommendation
AAAI 2024
MPOD123: One Image to 3D Content Generation Using Mask-Enhanced Progressive Outline-to-Detail Optimization
CVPR 2024
LLMCO4MR: LLMs-Aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
ECCV (75) 2024
PhiloGPT: A Philology-Oriented Large Language Model for Ancient Chinese Manuscripts with Dunhuang as Case Study
EMNLP 2024
Domaindiff: Boost out-of-Distribution Generalization with Synthetic Data
ICASSP 2024
AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation
ICLR 2024
ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation
arXiv 2024
DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation
KDD 2024
Cross-modal Observation Hypothesis Inference
ACM Multimedia 2024
GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
ACM Multimedia 2024
Semantic Codebook Learning for Dynamic Recommendation Models
ACM Multimedia 2024
Intelligent Model Update Strategy for Sequential Recommendation
WWW 2024
Personalized Latent Structure Learning for Recommendation
IEEE Trans. Pattern Anal. Mach. Intell. 2023
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
IEEE Trans. Knowl. Data Eng. 2023
Video-Audio Domain Generalization via Confounder Disentanglement
AAAI 2023
Multi-modal Action Chain Abductive Reasoning
ACL (1) 2023
Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning
ACL (1) 2023
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
CVPR 2023
WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding
CVPR 2023
ART: rule bAsed futuRe-inference deducTion
EMNLP 2023
Reconnecting the Broken Civilization: Patchwork Integration of Fragments from Ancient Manuscripts
ACM Multimedia 2023
Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning
ACM Multimedia 2023
DisCover: Disentangled Music Representation Learning for Cover Song Identification
SIGIR 2023
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization
WWW 2023
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
AAAI 2022
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding
ACL (1) 2022
MIC: Model-agnostic Integrated Cross-channel Recommender
CIKM 2022
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
CVPR 2022
Intelligent Request Strategy Design in Recommender System
KDD 2022
HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding
ACM Multimedia 2022
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
ACM Multimedia 2022
Weakly-supervised Disentanglement Network for Video Fingerspelling Detection
ACM Multimedia 2022
Uncovering Causal Effects of Online Short Videos on Consumer Behaviors
WSDM 2022
Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation
WWW 2022
Contrastive Learning with Positive-Negative Frame Mask for Music Representation
WWW 2022
Why Do We Click: Visual Impression-aware News Recommendation
ACM Multimedia 2021
MGD-GAN: Text-to-Pedestrian Generation through Multi-Grained Discrimination
PRCV (2) 2021
CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation
SIGIR 2021
Future-Aware Diverse Trends Framework for Recommendation
WWW 2021
Comprehensive Information Integration Modeling Framework for Video Titling
KDD 2020
Poet: Product-oriented Video Captioner for E-commerce
ACM Multimedia 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
ACM Multimedia 2020
Temporality-enhanced knowledgememory network for factoid question answering
Frontiers Inf. Technol. Electron. Eng. 2018
Multi-Label Community-Based Question Classification via Personalized Sequence Memory Network Learning
AAAI 2018
Text-to-Image Synthesis via Visual-Memory Creative Adversarial Network
PCM (3) 2018