Pulse · ggml-org/llama.cpp · GitHub

April 22, 2025 – April 29, 2025

Overview

76 Active pull requests

83 Active issues

47 Releases published by 1 person

b5171
published Apr 23, 2025
b5173
published Apr 23, 2025
b5174
published Apr 23, 2025
b5175
published Apr 24, 2025
b5176
published Apr 24, 2025
b5177
published Apr 24, 2025
b5178
published Apr 24, 2025
b5180
published Apr 24, 2025
b5181
published Apr 24, 2025
b5184
published Apr 24, 2025
b5185
published Apr 24, 2025
b5186
published Apr 24, 2025
b5187
published Apr 25, 2025
b5188
published Apr 25, 2025
b5189
published Apr 25, 2025
b5190
published Apr 25, 2025
b5191
published Apr 25, 2025
b5192
published Apr 26, 2025
b5193
published Apr 26, 2025
b5194
published Apr 26, 2025
b5195
published Apr 26, 2025
b5196
published Apr 27, 2025
b5197
published Apr 27, 2025
b5198
published Apr 27, 2025
b5199
published Apr 27, 2025
b5200
published Apr 27, 2025
b5201
published Apr 28, 2025
b5202
published Apr 28, 2025
b5204
published Apr 28, 2025
b5205
published Apr 28, 2025
b5207
published Apr 28, 2025
b5208
published Apr 28, 2025
b5209
published Apr 28, 2025
b5210
published Apr 28, 2025
b5211
published Apr 28, 2025
b5212
published Apr 28, 2025
b5213
published Apr 28, 2025
b5214
published Apr 28, 2025
b5215
published Apr 28, 2025
b5216
published Apr 29, 2025
b5217
published Apr 29, 2025
b5218
published Apr 29, 2025
b5219
published Apr 29, 2025
b5220
published Apr 29, 2025
b5221
published Apr 29, 2025
b5222
published Apr 29, 2025
b5223
published Apr 29, 2025

52 Pull requests merged by 24 people

scripts: n_depth support for compare-llama-bench
#13201 merged Apr 29, 2025
Prefilling assistant message in openai compatible API
#13174 merged Apr 29, 2025
sampling : when top-k <= 0 -> noop
#13173 merged Apr 29, 2025
llama-bench: fixed size of fields to correctly map to values
#13183 merged Apr 29, 2025
CUDA: fix non-cont. inputs for batched mat mul
#13155 merged Apr 29, 2025
llama : llm_type order by size
#13177 merged Apr 29, 2025
mtmd : add qwen2vl and qwen2.5vl
#13141 merged Apr 29, 2025
llama : set qwen3 model type sizes
#13175 merged Apr 29, 2025
llama-graph : fix text position for mrope
#13159 merged Apr 29, 2025
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 merged Apr 28, 2025
clip : fix model size display
#13153 merged Apr 28, 2025
fix(rpc): Improve input validation and error handling
#13069 merged Apr 28, 2025
llama-bench: add -d depth arg
#13096 merged Apr 28, 2025
mtmd : fix glm-edge redundant token count
#13139 merged Apr 28, 2025
delete buffer clear as suggested
#13152 merged Apr 28, 2025
llama : (mrope) allow using normal 1D position for text token
#13138 merged Apr 28, 2025
clip : refactor set input for cgraph + fix qwen2.5vl input
#13136 merged Apr 28, 2025
SYCL: Add all missing unary kernels
#13074 merged Apr 28, 2025
readme : update hot topics
#13150 merged Apr 28, 2025
common : fix noreturn compile warning
#13151 merged Apr 28, 2025
llama-chat : fix typo GML --> GLM
#13143 merged Apr 28, 2025
musa: fix typo in cc control
#13144 merged Apr 28, 2025
CUDA: fix q_nope_absorbed precision for Deepseek 2 Lite f16
#13137 merged Apr 28, 2025
arg : fix unused variable
#13142 merged Apr 28, 2025
llama-bench : Add --override-tensors arg
#12922 merged Apr 27, 2025
fix wrong template in GLM4-0414
#13140 merged Apr 27, 2025
musa: fix build warning
#13129 merged Apr 27, 2025
Fixes Qwen2.5VL segfault during inference
#13133 merged Apr 27, 2025
Add Qwen2.5VL support
#12402 merged Apr 27, 2025
common : add common_remote_get_content
#13123 merged Apr 26, 2025
clip : improve projector naming
#13118 merged Apr 26, 2025
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs
#13107 merged Apr 26, 2025
grammar : handle maxItems == 0 in JSON schema (#13116)
#13117 merged Apr 26, 2025
llama : fix K-shift with quantized K and BLAS backend
#13113 merged Apr 25, 2025
Force FP32 compute in GLM4 FFN Down
#13101 merged Apr 25, 2025
clip : fix pixtral on some GPU backends
#13097 merged Apr 25, 2025
[SYCL][OPT] Fix reorder optimization for Q4_0
#13003 merged Apr 25, 2025
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR
#12943 merged Apr 25, 2025
clip : remove boi/eoi embeddings for GLM-edge model (⚠️ breaking change)
#13081 merged Apr 24, 2025
embeddings : fix batch sizes
#13076 merged Apr 24, 2025
sync : ggml
#13098 merged Apr 24, 2025
CUDA: use switch statements in constexpr functions
#13095 merged Apr 24, 2025
cmake : do not include ./src as public for libllama
#13062 merged Apr 24, 2025
clang-tidy : disable warning about missing math parenthesis
#13091 merged Apr 24, 2025
arg : add --no-mmproj-offload
#13093 merged Apr 24, 2025
arg : clean up handling --mmproj with -hf
#13082 merged Apr 24, 2025
metal : fix floating-point range of attention scores in FA kernels
#13090 merged Apr 24, 2025
vulkan: matmul gcn tuning
#13016 merged Apr 24, 2025
llama-mtmd-cli: Sigint rework in mtmd vision example
#13080 merged Apr 23, 2025
mtmd : Support Pixtral 12B
#13065 merged Apr 23, 2025
Append mult-eos,half-rope,bos to GLM4-0414 and Z
#13021 merged Apr 23, 2025
rpc : add command line option for number of threads for the CPU backend
#13060 merged Apr 23, 2025

24 Pull requests opened by 15 people

ggml: Implement yield barrier using futex for improved thread scheduling efficiency
#13079 opened Apr 23, 2025
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 opened Apr 25, 2025
[sync #10544] llama/ggml: add LLM training support
#13105 opened Apr 25, 2025
llama : try loading tensors with pre-computed hashes
#13106 opened Apr 25, 2025
context : allow cache-less context for embeddings
#13108 opened Apr 25, 2025
sycl : Implemented reorder Q4_K mmvq
#13109 opened Apr 25, 2025
convert : improve model arch handling
#13122 opened Apr 26, 2025
CUDA: build archs as virtual for GGML_NATIVE=OFF
#13135 opened Apr 27, 2025
PowerPC: Enable MMA for BF16 in llamafile_sgemm
#13148 opened Apr 28, 2025
musa: enable MMA
#13149 opened Apr 28, 2025
common: Ensure libcommon.so is build if BUILD_SHARED_LIBS=ON (#13156)
#13158 opened Apr 28, 2025
[CANN] Update CANN model support status
#13162 opened Apr 29, 2025
fix(rpc): validate graph operands
#13167 opened Apr 29, 2025
Fix for issue #13170
#13176 opened Apr 29, 2025
ggml-cpu: enable z17 compile detection
#13182 opened Apr 29, 2025
mtmd : add C public API
#13184 opened Apr 29, 2025
test: non-cont. b in test-backend-ops -o MUL_MAT
#13187 opened Apr 29, 2025
rpc : fix cache directory initialization
#13188 opened Apr 29, 2025
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader
#13191 opened Apr 29, 2025
vulkan: use uint array index to avoid glslang bug
#13193 opened Apr 29, 2025
kv-cache : add SWA support
#13194 opened Apr 29, 2025
[RFC] handling jinja extra template kwargs (Qwen3 enable_thinking feature)
#13196 opened Apr 29, 2025
CUDA: batched+noncont MMQ, refactor bs>1 MoE code
#13199 opened Apr 29, 2025
arg : allow using -hf offline
#13202 opened Apr 29, 2025

52 Issues closed by 15 people

Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf
#11325 closed Apr 30, 2025
How do I know which operator the code is computing has an error?
#12389 closed Apr 30, 2025
Eval bug: KV cache changes the inference results, even when context fits and no quantization
#12396 closed Apr 30, 2025
Eval bug: llama-qwen2vl-cli gives too short or cut response
#12408 closed Apr 30, 2025
Eval bug: main: failed to load image /home/data1/protected/hyperscope/9/4/3/1/6/2025-02-21/Raw Crystal of Morganite Gemstone.jpg. Terminating
#12410 closed Apr 30, 2025
failed to quantize: unknown model architecture: 'qwen3moe'
#13200 closed Apr 29, 2025
Misc. bug: top_k=0 Abysmal Performance
#13171 closed Apr 29, 2025
Eval bug: llama-bench seems to be broken
#13169 closed Apr 29, 2025
Misc. bug: Only using 1 compute core on AMD
#12978 closed Apr 29, 2025
Misc. bug: DeepSeek-R1-Distill-Qwen-1.5B-F16.gguf failed to run on Android
#13179 closed Apr 29, 2025
Misc. bug: Missing <think> tag in response (DeepSeek R1)
#11861 closed Apr 29, 2025
Eval bug: clip_model_loader: model_size not init.
#13147 closed Apr 28, 2025
Misc. bug: RPC server crash on `SET_TENSOR` with invalid `ggml_type`
#13067 closed Apr 28, 2025
llama-server bug: Prompt caching fails when editing the second user input
#13126 closed Apr 28, 2025
Eval bug: Gemma-3 Vision failed with CUDA
#12973 closed Apr 28, 2025
Misc. bug: llama-server throws "Unsupported param: tools"
#10920 closed Apr 28, 2025
Misc. bug: I'm seeing gibberish output
#11463 closed Apr 28, 2025
Eval bug: llama-cpp-deepseek-r1.jinja template will miss the <think> tag
#12107 closed Apr 28, 2025
Eval bug: loading model: vk::PhysicalDevice::createDevice: ErrorExtensionNotPresent. Not falling back to CPU
#12163 closed Apr 28, 2025
Misc. bug: convert_hf_to_gguf failing for deepseek-r1 full
#12255 closed Apr 28, 2025
Eval bug: Segfault in `ggml_compute_forward_dup_bytes`
#12354 closed Apr 28, 2025
Trojan:Script/Wacatac.B!ml warning on latest release that's six hours ago.
#12355 closed Apr 28, 2025
Misc. bug: llama-server command line options are ignored
#12363 closed Apr 28, 2025
Eval bug: [/SOLUTION] visible in granite 8B
#12384 closed Apr 28, 2025
Hi all,
#12385 closed Apr 28, 2025
Math & Code Benchmark/Testing for GGUFs
#13127 closed Apr 27, 2025
Misc. bug: server not exit after `missing result_output tensor` error
#11808 closed Apr 27, 2025
Misc. bug: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)
#11812 closed Apr 27, 2025
Misc. bug: JSON schema that defines array with 0 elements generates un-parseable GBNF
#13116 closed Apr 26, 2025
Misc. bug: Vulcan premature out of memory exception on AMD Instinct MI60
#11598 closed Apr 26, 2025
Misc. bug: llama-server: SegFault with json_schema containing unsupported pattern
#12252 closed Apr 26, 2025
Compile bug: 执行python3 convert-hf-to-gguf.py D:\DevelopSoftware\Ollamamodel\DeepSeek-R1-Medical-COT-500命令，没有响应
#12286 closed Apr 26, 2025
Misc. bug: malloc error in #8 0x00007fffda1ea8ec in unicode_regex_split
#12335 closed Apr 26, 2025
GGUF to MLX conversion?
#12349 closed Apr 26, 2025
Compile bug: Flash attention with RocmWMMA 6.4 seems to be broken on gfx1201
#13110 closed Apr 25, 2025
Misc. bug: Unsupported op "CPY" / SIGABRT on Apple CPU
#13112 closed Apr 25, 2025
Eval bug: load converted deepseek-r1-bf16 gguf, print many ":1case"
#13094 closed Apr 25, 2025
Regarding llama-bench and llama-parallel commands
#12106 closed Apr 25, 2025
Research: Benchmarking DeepSeek-R1 IQ1_S 1.58bit
#11474 closed Apr 25, 2025
Compile bug: undefined reference to `ggml_set_f32_nd'
#12281 closed Apr 25, 2025
Misc. bug: llama-embedding asserts: GGML_ASSERT(params.n_batch >= params.n_ctx);
#12860 closed Apr 24, 2025
Eval bug: HIP: llama.cpp server locks up when running multiple instances on the same gpu
#12991 closed Apr 24, 2025
CUDA compiler out of heap space
#13086 closed Apr 24, 2025
Deleted
#13089 closed Apr 24, 2025
Misc. bug: Sporadic MUL_MAT Failures in test-backend-ops for Nvidia backend
#11972 closed Apr 24, 2025
Misc. bug: Web-UI now unusably slow - over network or locally.
#12026 closed Apr 24, 2025
Feature Request: Support for Phi4MMForCausalLM Architecture
#12117 closed Apr 24, 2025
Feature Request: Add support for InstellaForCausalLM model architecture
#12270 closed Apr 24, 2025
Feature Request: grammar / json schema with reasoning format. Allow model free to think but strict to answer.
#12276 closed Apr 24, 2025
[CANN] bug: Test results on CANN support for Deepseek-V3/R1
#12324 closed Apr 24, 2025
Misc. bug: CPU Usage low in rpc-server mode
#13051 closed Apr 23, 2025
Why is a 16 core CPU slower than an 8-core CPU?
#13075 closed Apr 23, 2025

31 Issues opened by 29 people

Misc. bug: Docker images on GHCR stuck at **b5174** – “Publish Docker image” workflow failing since 2025‑04‑24
#13203 opened Apr 30, 2025
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 opened Apr 29, 2025
Compile bug: /usr/bin/ld: test-quantize-stats.cpp:(.text+0x2cec): undefined reference to `ggml_get_f32_1d'
#13192 opened Apr 29, 2025
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 opened Apr 29, 2025
Misc. bug: rpc-server crash without cache
#13185 opened Apr 29, 2025
Eval bug: Qwen3, failed to parse chat template (jinja)
#13178 opened Apr 29, 2025
Misc. bug: llama-parallel segmentation fault
#13172 opened Apr 29, 2025
Compile bug: Build fails on ppc64le
#13170 opened Apr 29, 2025
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 opened Apr 29, 2025
Compile bug: llama-server-cuda docker image build failure
#13166 opened Apr 29, 2025
Eval bug: Unreadable output when using qwen2-vl model.
#13165 opened Apr 29, 2025
Misc. bug: Qwen3 30B A3B Q4_K_M loads on server but quickly dies after requesting inference through Llama.cpp web UI
#13164 opened Apr 29, 2025
Eval bug: Qwen3 Q4_0 not working with SYCL
#13163 opened Apr 29, 2025
Eval bug: SIGILL
#13161 opened Apr 29, 2025
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 opened Apr 29, 2025
bug: ValueError: Architecture qwen3 not supported
#13157 opened Apr 28, 2025
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 opened Apr 28, 2025
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 opened Apr 28, 2025
Plamo arch no longer working
#13130 opened Apr 27, 2025
Feature Request: Allow `-hf` to be used offline
#13128 opened Apr 26, 2025
Feature Request: Add C api for mtmd
#13124 opened Apr 26, 2025
Eval bug: EXAONE fails to run with quantized KV cache
#13121 opened Apr 26, 2025
Misc. bug: OpenCL: Issue with Adreno 610
#13115 opened Apr 25, 2025
Feature Request: Kimi-Audio-7B
#13114 opened Apr 25, 2025
Feature Request: define key bindings for quick deletion of the previous conversation.
#13111 opened Apr 25, 2025
Misc. bug: Retrieval sample not decoding token successfully
#13102 opened Apr 24, 2025
Eval: HIP: Llama-server multi-instance lockup
#13100 opened Apr 24, 2025
Eval bug: Flash Attention not working with NVIDIA GeForce RTX 4060 Ti
#13092 opened Apr 24, 2025
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 opened Apr 23, 2025
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 opened Apr 23, 2025
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 opened Apr 23, 2025

84 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 commented on Apr 30, 2025 • 9 new comments
server : (experimental) vision support via libmtmd
#12898 commented on Apr 26, 2025 • 5 new comments
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
#13053 commented on Apr 25, 2025 • 4 new comments
ggml : fix more imatrix nan cases
#11773 commented on Apr 29, 2025 • 1 new comment
(draft) tts: Orpheus support
#12487 commented on Apr 25, 2025 • 1 new comment
`server`: inject date_string in llama 3.x template + fix date for firefunction v2
#12802 commented on Apr 29, 2025 • 1 new comment
tts : implement sesame CSM + Mimi decoder
#12648 commented on Apr 25, 2025 • 1 new comment
Reduce enum sizes some are used in structs, which allowed them to be optimized.
#13071 commented on Apr 25, 2025 • 0 new comments
llama/ggml: add LLM training support
#10544 commented on Apr 23, 2025 • 0 new comments
rpc : copy tensors across servers
#8032 commented on Apr 23, 2025 • 0 new comments
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on Apr 29, 2025 • 0 new comments
Feature Proposal: Server Model Switching at Runtime
#13027 commented on Apr 30, 2025 • 0 new comments
Feature Request: resize an existing context
#11577 commented on Apr 30, 2025 • 0 new comments
csm : implement Sesame-based conversation example
#12392 commented on Apr 30, 2025 • 0 new comments
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 commented on Apr 30, 2025 • 0 new comments
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU (vs Linux).
#12651 commented on Apr 30, 2025 • 0 new comments
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 commented on Apr 30, 2025 • 0 new comments
server: Bring back multimodal support
#8010 commented on Apr 29, 2025 • 0 new comments
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on Apr 29, 2025 • 0 new comments
Compile bug: Vulkan Cross compile for arm64
#13068 commented on Apr 29, 2025 • 0 new comments
Move gguf fuzzers to the llama.cpp repository
#11514 commented on Apr 29, 2025 • 0 new comments
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 commented on Apr 29, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Apr 28, 2025 • 0 new comments
Eval bug: GLM-Z1-9B-0414
#12946 commented on Apr 28, 2025 • 0 new comments
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on Apr 28, 2025 • 0 new comments
Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file
#13058 commented on Apr 23, 2025 • 0 new comments
[CANN]Support OP MUL_MAT_ID
#13042 commented on Apr 27, 2025 • 0 new comments
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 commented on Apr 23, 2025 • 0 new comments
gguf-py : avoid requiring PySide6 for packaged scripts
#13036 commented on Apr 24, 2025 • 0 new comments
Nix portability improvements
#13005 commented on Apr 25, 2025 • 0 new comments
sycl: use DNN in the first part of ggml_sycl_mul_mat_batched_sycl
#12972 commented on Apr 23, 2025 • 0 new comments
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 commented on Apr 29, 2025 • 0 new comments
kv-cache : separate recurrent vs non-recurrent impl
#12799 commented on Apr 29, 2025 • 0 new comments
opencl: fix couple crashes
#12795 commented on Apr 24, 2025 • 0 new comments
Support for OuteTTS 1.0
#12794 commented on Apr 25, 2025 • 0 new comments
opencl: Add support for multiple devices
#12622 commented on Apr 24, 2025 • 0 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on Apr 28, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Apr 29, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Apr 27, 2025 • 0 new comments
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on Apr 29, 2025 • 0 new comments
Allow user to compile with any cuda version using github actions
#10928 commented on Apr 25, 2025 • 0 new comments
Fix compilation on Pop!_OS 22.04 LTS CUDA
#10835 commented on Apr 25, 2025 • 0 new comments
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 commented on Apr 26, 2025 • 0 new comments
Feature Request: Add kv-quant fa kernel variants for head sizes other than 128
#12989 commented on Apr 25, 2025 • 0 new comments
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on Apr 25, 2025 • 0 new comments
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on Apr 25, 2025 • 0 new comments
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352 commented on Apr 25, 2025 • 0 new comments
Misc. bug: auto scroll doesn't work in WebUI
#12362 commented on Apr 25, 2025 • 0 new comments
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 commented on Apr 25, 2025 • 0 new comments
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 commented on Apr 25, 2025 • 0 new comments
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 commented on Apr 25, 2025 • 0 new comments
Misc. bug: Vulkan performance depends on thread priority
#12976 commented on Apr 24, 2025 • 0 new comments
Eval bug: model producing gibberish for Orion14b-chat
#12411 commented on Apr 24, 2025 • 0 new comments
Compile bug: common.cuh(3): fatal error c1083 cannot open include file: "ggml.h" : No such file or directory
#13073 commented on Apr 24, 2025 • 0 new comments
Misc. bug: llama-cli (vulkan backend) output gibberish with old vulkan sdk
#13044 commented on Apr 24, 2025 • 0 new comments
Feature Request: Prefix assistant answer
#11536 commented on Apr 24, 2025 • 0 new comments
Feature Request: allow mmap to take advantage of hugepage feature which has 10x speedup
#12444 commented on Apr 24, 2025 • 0 new comments
Misc. bug: Flash attention on Vulkan
#12526 commented on Apr 24, 2025 • 0 new comments
Eval bug: seemed it cannot convert theQwen2.5-VL-7B-Instruct, please help advice, Thank you.
#12534 commented on Apr 24, 2025 • 0 new comments
Eval bug: crash when pooling_type == LLAMA_POOLING_TYPE_MEAN
#12543 commented on Apr 24, 2025 • 0 new comments
server : add support for file upload to the Web UI
#11611 commented on Apr 23, 2025 • 0 new comments
Add a new `llama_load_model_from_buffer()` method to compliment `llama_load_model_from_file()`
#6311 commented on Apr 23, 2025 • 0 new comments
Model Repeats Nonsensical Output
#13066 commented on Apr 23, 2025 • 0 new comments
Feature Request: Support Codestral Mamba
#8519 commented on Apr 28, 2025 • 0 new comments
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on Apr 28, 2025 • 0 new comments
Compile bug: SYCL backend build fail on debug config
#12602 commented on Apr 28, 2025 • 0 new comments
Misc. bug:
#12623 commented on Apr 28, 2025 • 0 new comments
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on Apr 28, 2025 • 0 new comments
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 commented on Apr 27, 2025 • 0 new comments
Eval bug: run failed when run lora adapter(no merged) on android
#12592 commented on Apr 27, 2025 • 0 new comments
[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 commented on Apr 27, 2025 • 0 new comments
Feature Request: Add "trust_remote_code support" to 'convert_hf_to_gguf.py' for compatibility with modern HF models
#12610 commented on Apr 27, 2025 • 0 new comments
Misc. bug: Data check in examples/gguf
#12617 commented on Apr 27, 2025 • 0 new comments
Suport for Jamba JambaForCausalLM
#6372 commented on Apr 26, 2025 • 0 new comments
Feature Request: Add support for Kokoro TTS
#11050 commented on Apr 26, 2025 • 0 new comments
Support Hybrid Models
#12331 commented on Apr 26, 2025 • 0 new comments
Compile bug: Linux with CUDA 12.6
#11696 commented on Apr 26, 2025 • 0 new comments
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on Apr 26, 2025 • 0 new comments
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on Apr 26, 2025 • 0 new comments
Eval bug: the swiftui keeps saying the same thing
#12558 commented on Apr 26, 2025 • 0 new comments
Misc. bug: performance drop with 2x SYCL GPUs
#12575 commented on Apr 26, 2025 • 0 new comments
-ngl to load ·last n layers· to gpu
#12577 commented on Apr 26, 2025 • 0 new comments
Qwen2.5-vl support and conversion？
#12584 commented on Apr 26, 2025 • 0 new comments
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 commented on Apr 26, 2025 • 0 new comments