Pulse · ggml-org/llama.cpp · GitHub

April 27, 2025 – April 30, 2025

Overview

46 Active pull requests

44 Active issues

27 Releases published by 1 person

b5196
published Apr 27, 2025
b5197
published Apr 27, 2025
b5198
published Apr 27, 2025
b5199
published Apr 27, 2025
b5200
published Apr 27, 2025
b5201
published Apr 28, 2025
b5202
published Apr 28, 2025
b5204
published Apr 28, 2025
b5205
published Apr 28, 2025
b5207
published Apr 28, 2025
b5208
published Apr 28, 2025
b5209
published Apr 28, 2025
b5210
published Apr 28, 2025
b5211
published Apr 28, 2025
b5212
published Apr 28, 2025
b5213
published Apr 28, 2025
b5214
published Apr 28, 2025
b5215
published Apr 28, 2025
b5216
published Apr 29, 2025
b5217
published Apr 29, 2025
b5218
published Apr 29, 2025
b5219
published Apr 29, 2025
b5220
published Apr 29, 2025
b5221
published Apr 29, 2025
b5222
published Apr 29, 2025
b5223
published Apr 29, 2025
b5225
published Apr 30, 2025

30 Pull requests merged by 16 people

rpc : fix cache directory initialization
#13188 merged Apr 30, 2025
scripts: n_depth support for compare-llama-bench
#13201 merged Apr 29, 2025
Prefilling assistant message in openai compatible API
#13174 merged Apr 29, 2025
sampling : when top-k <= 0 -> noop
#13173 merged Apr 29, 2025
llama-bench: fixed size of fields to correctly map to values
#13183 merged Apr 29, 2025
CUDA: fix non-cont. inputs for batched mat mul
#13155 merged Apr 29, 2025
llama : llm_type order by size
#13177 merged Apr 29, 2025
mtmd : add qwen2vl and qwen2.5vl
#13141 merged Apr 29, 2025
llama : set qwen3 model type sizes
#13175 merged Apr 29, 2025
llama-graph : fix text position for mrope
#13159 merged Apr 29, 2025
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 merged Apr 28, 2025
clip : fix model size display
#13153 merged Apr 28, 2025
fix(rpc): Improve input validation and error handling
#13069 merged Apr 28, 2025
llama-bench: add -d depth arg
#13096 merged Apr 28, 2025
mtmd : fix glm-edge redundant token count
#13139 merged Apr 28, 2025
delete buffer clear as suggested
#13152 merged Apr 28, 2025
llama : (mrope) allow using normal 1D position for text token
#13138 merged Apr 28, 2025
clip : refactor set input for cgraph + fix qwen2.5vl input
#13136 merged Apr 28, 2025
SYCL: Add all missing unary kernels
#13074 merged Apr 28, 2025
readme : update hot topics
#13150 merged Apr 28, 2025
common : fix noreturn compile warning
#13151 merged Apr 28, 2025
llama-chat : fix typo GML --> GLM
#13143 merged Apr 28, 2025
musa: fix typo in cc control
#13144 merged Apr 28, 2025
CUDA: fix q_nope_absorbed precision for Deepseek 2 Lite f16
#13137 merged Apr 28, 2025
arg : fix unused variable
#13142 merged Apr 28, 2025
llama-bench : Add --override-tensors arg
#12922 merged Apr 27, 2025
fix wrong template in GLM4-0414
#13140 merged Apr 27, 2025
musa: fix build warning
#13129 merged Apr 27, 2025
Fixes Qwen2.5VL segfault during inference
#13133 merged Apr 27, 2025
Add Qwen2.5VL support
#12402 merged Apr 27, 2025

16 Pull requests opened by 11 people

CUDA: build archs as virtual for GGML_NATIVE=OFF
#13135 opened Apr 27, 2025
PowerPC: Enable MMA for BF16 in llamafile_sgemm
#13148 opened Apr 28, 2025
musa: enable MMA
#13149 opened Apr 28, 2025
common: Ensure libcommon.so is build if BUILD_SHARED_LIBS=ON (#13156)
#13158 opened Apr 28, 2025
[CANN] Update CANN model support status
#13162 opened Apr 29, 2025
fix(rpc): validate graph operands
#13167 opened Apr 29, 2025
Fix for issue #13170
#13176 opened Apr 29, 2025
ggml-cpu: enable z17 compile detection
#13182 opened Apr 29, 2025
mtmd : add C public API
#13184 opened Apr 29, 2025
test: non-cont. b in test-backend-ops -o MUL_MAT
#13187 opened Apr 29, 2025
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader
#13191 opened Apr 29, 2025
vulkan: use uint array index to avoid glslang bug
#13193 opened Apr 29, 2025
kv-cache : add SWA support
#13194 opened Apr 29, 2025
[RFC] handling jinja extra template kwargs (Qwen3 enable_thinking feature)
#13196 opened Apr 29, 2025
CUDA: batched+noncont MMQ, refactor bs>1 MoE code
#13199 opened Apr 29, 2025
arg : allow using -hf offline
#13202 opened Apr 29, 2025

26 Issues closed by 10 people

Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf
#11325 closed Apr 30, 2025
How do I know which operator the code is computing has an error?
#12389 closed Apr 30, 2025
Eval bug: KV cache changes the inference results, even when context fits and no quantization
#12396 closed Apr 30, 2025
Eval bug: llama-qwen2vl-cli gives too short or cut response
#12408 closed Apr 30, 2025
Eval bug: main: failed to load image /home/data1/protected/hyperscope/9/4/3/1/6/2025-02-21/Raw Crystal of Morganite Gemstone.jpg. Terminating
#12410 closed Apr 30, 2025
failed to quantize: unknown model architecture: 'qwen3moe'
#13200 closed Apr 29, 2025
Misc. bug: top_k=0 Abysmal Performance
#13171 closed Apr 29, 2025
Eval bug: llama-bench seems to be broken
#13169 closed Apr 29, 2025
Misc. bug: Only using 1 compute core on AMD
#12978 closed Apr 29, 2025
Misc. bug: DeepSeek-R1-Distill-Qwen-1.5B-F16.gguf failed to run on Android
#13179 closed Apr 29, 2025
Misc. bug: Missing <think> tag in response (DeepSeek R1)
#11861 closed Apr 29, 2025
Eval bug: clip_model_loader: model_size not init.
#13147 closed Apr 28, 2025
Misc. bug: RPC server crash on `SET_TENSOR` with invalid `ggml_type`
#13067 closed Apr 28, 2025
llama-server bug: Prompt caching fails when editing the second user input
#13126 closed Apr 28, 2025
Eval bug: Gemma-3 Vision failed with CUDA
#12973 closed Apr 28, 2025
Misc. bug: llama-server throws "Unsupported param: tools"
#10920 closed Apr 28, 2025
Misc. bug: I'm seeing gibberish output
#11463 closed Apr 28, 2025
Eval bug: llama-cpp-deepseek-r1.jinja template will miss the <think> tag
#12107 closed Apr 28, 2025
Eval bug: loading model: vk::PhysicalDevice::createDevice: ErrorExtensionNotPresent. Not falling back to CPU
#12163 closed Apr 28, 2025
Misc. bug: convert_hf_to_gguf failing for deepseek-r1 full
#12255 closed Apr 28, 2025
Eval bug: Segfault in `ggml_compute_forward_dup_bytes`
#12354 closed Apr 28, 2025
Trojan:Script/Wacatac.B!ml warning on latest release that's six hours ago.
#12355 closed Apr 28, 2025
Misc. bug: llama-server command line options are ignored
#12363 closed Apr 28, 2025
Eval bug: [/SOLUTION] visible in granite 8B
#12384 closed Apr 28, 2025
Hi all,
#12385 closed Apr 28, 2025
Math & Code Benchmark/Testing for GGUFs
#13127 closed Apr 27, 2025

18 Issues opened by 18 people

Misc. bug: Docker images on GHCR stuck at **b5174** – “Publish Docker image” workflow failing since 2025‑04‑24
#13203 opened Apr 30, 2025
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 opened Apr 29, 2025
Compile bug: /usr/bin/ld: test-quantize-stats.cpp:(.text+0x2cec): undefined reference to `ggml_get_f32_1d'
#13192 opened Apr 29, 2025
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 opened Apr 29, 2025
Misc. bug: rpc-server crash without cache
#13185 opened Apr 29, 2025
Eval bug: Qwen3, failed to parse chat template (jinja)
#13178 opened Apr 29, 2025
Misc. bug: llama-parallel segmentation fault
#13172 opened Apr 29, 2025
Compile bug: Build fails on ppc64le
#13170 opened Apr 29, 2025
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 opened Apr 29, 2025
Compile bug: llama-server-cuda docker image build failure
#13166 opened Apr 29, 2025
Eval bug: Unreadable output when using qwen2-vl model.
#13165 opened Apr 29, 2025
Misc. bug: Qwen3 30B A3B Q4_K_M loads on server but quickly dies after requesting inference through Llama.cpp web UI
#13164 opened Apr 29, 2025
Eval bug: Qwen3 Q4_0 not working with SYCL
#13163 opened Apr 29, 2025
Eval bug: SIGILL
#13161 opened Apr 29, 2025
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 opened Apr 29, 2025
bug: ValueError: Architecture qwen3 not supported
#13157 opened Apr 28, 2025
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 opened Apr 28, 2025
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 opened Apr 28, 2025

39 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

sycl : Implemented reorder Q4_K mmvq
#13109 commented on Apr 29, 2025 • 12 new comments
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 commented on Apr 30, 2025 • 8 new comments
ggml : fix more imatrix nan cases
#11773 commented on Apr 29, 2025 • 1 new comment
`server`: inject date_string in llama 3.x template + fix date for firefunction v2
#12802 commented on Apr 29, 2025 • 1 new comment
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on Apr 28, 2025 • 0 new comments
Feature Proposal: Server Model Switching at Runtime
#13027 commented on Apr 30, 2025 • 0 new comments
Feature Request: Qwen 2.5 VL
#11483 commented on Apr 30, 2025 • 0 new comments
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on Apr 29, 2025 • 0 new comments
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on Apr 29, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Apr 27, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Apr 29, 2025 • 0 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on Apr 28, 2025 • 0 new comments
kv-cache : separate recurrent vs non-recurrent impl
#12799 commented on Apr 30, 2025 • 0 new comments
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 commented on Apr 29, 2025 • 0 new comments
ggml: Implement yield barrier using futex for improved thread scheduling efficiency
#13079 commented on Apr 30, 2025 • 0 new comments
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 commented on Apr 28, 2025 • 0 new comments
[sync #10544] llama/ggml: add LLM training support
#13105 commented on Apr 30, 2025 • 0 new comments
llama : try loading tensors with pre-computed hashes
#13106 commented on Apr 30, 2025 • 0 new comments
context : allow cache-less context for embeddings
#13108 commented on Apr 30, 2025 • 0 new comments
convert : improve model arch handling
#13122 commented on Apr 27, 2025 • 0 new comments
Misc. bug:
#12623 commented on Apr 28, 2025 • 0 new comments
Compile bug: SYCL backend build fail on debug config
#12602 commented on Apr 28, 2025 • 0 new comments
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on Apr 28, 2025 • 0 new comments
Feature Request: Support Codestral Mamba
#8519 commented on Apr 28, 2025 • 0 new comments
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on Apr 28, 2025 • 0 new comments
Eval bug: GLM-Z1-9B-0414
#12946 commented on Apr 28, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Apr 28, 2025 • 0 new comments
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 commented on Apr 29, 2025 • 0 new comments
Move gguf fuzzers to the llama.cpp repository
#11514 commented on Apr 29, 2025 • 0 new comments
Compile bug: Vulkan Cross compile for arm64
#13068 commented on Apr 29, 2025 • 0 new comments
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on Apr 29, 2025 • 0 new comments
Misc. bug: OpenCL: Issue with Adreno 610
#13115 commented on Apr 29, 2025 • 0 new comments
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on Apr 29, 2025 • 0 new comments
server: Bring back multimodal support
#8010 commented on Apr 29, 2025 • 0 new comments
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 commented on Apr 30, 2025 • 0 new comments
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU (vs Linux).
#12651 commented on Apr 30, 2025 • 0 new comments
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 commented on Apr 30, 2025 • 0 new comments
csm : implement Sesame-based conversation example
#12392 commented on Apr 30, 2025 • 0 new comments
Feature Request: resize an existing context
#11577 commented on Apr 30, 2025 • 0 new comments