
Qwen-VL: A Versatile Vision-Language Model for Understanding ...
Sep 19, 2023 · In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images. …
Gated Attention for Large Language Models: Non-linearity, …
Sep 18, 2025 · The authors response that they will add experiments in QWen architecture, give the hyperparameters, and promise to open-source one of the models. Reviewer bMKL is the …
Rank-1 LoRAs Encode Interpretable Reasoning Signals
Sep 29, 2025 · Specifically, we use a rank-1 LoRA to create a minimal parameter adapter for \texttt {Qwen-2.5-32B-Instruct} which recovers 73-90% of reasoning-benchmark performance …
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Jan 22, 2025 · Superior Performance: LLaVA-MoD surpasses larger models like Qwen-VLChat-7B in various benchmarks, demonstrating the effectiveness of its knowledge distillation approach.
MagicDec: Breaking the Latency-Throughput Tradeoff for Long …
Jan 22, 2025 · (a) Summary of Scientific Claims and Findings The paper presents MagicDec, a speculative decoding technique aimed at improving throughput and reducing latency for long …
In this paper, we explore a way out and present the newest members of the open-sourced Qwen fam-ilies: Qwen-VL series. Qwen-VLs are a series of highly performant and versatile vision …
Towards Federated RLHF with Aggregated Client Preference for …
Jan 22, 2025 · For example, our experiments demonstrate that the Qwen-2-0.5B selector provides strong performance enhancements to larger base models like Gemma-2B while ensuring …
Towards Interpretable Time Series Foundation Models - OpenReview
Jun 9, 2025 · Leveraging a synthetic dataset of mean-reverting time series with systematically varied trends and noise levels, we generate natural language annotations using a large …
ADIFF: Explaining audio difference using natural language
Jan 22, 2025 · We evaluate our model using objective metrics and human evaluation and show our model enhancements lead to significant improvements in performance over naive baseline …
Towards Understanding Distilled Reasoning Models: A...
Mar 5, 2025 · To explore this, we train a crosscoder on Qwen-series models and their fine-tuned variants. Our results suggest that the crosscoder learns features corresponding to various …