-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
27 Releases published by 1 person
-
b5196
published
Apr 27, 2025 -
b5197
published
Apr 27, 2025 -
b5198
published
Apr 27, 2025 -
b5199
published
Apr 27, 2025 -
b5200
published
Apr 27, 2025 -
b5201
published
Apr 28, 2025 -
b5202
published
Apr 28, 2025 -
b5204
published
Apr 28, 2025 -
b5205
published
Apr 28, 2025 -
b5207
published
Apr 28, 2025 -
b5208
published
Apr 28, 2025 -
b5209
published
Apr 28, 2025 -
b5210
published
Apr 28, 2025 -
b5211
published
Apr 28, 2025 -
b5212
published
Apr 28, 2025 -
b5213
published
Apr 28, 2025 -
b5214
published
Apr 28, 2025 -
b5215
published
Apr 28, 2025 -
b5216
published
Apr 29, 2025 -
b5217
published
Apr 29, 2025 -
b5218
published
Apr 29, 2025 -
b5219
published
Apr 29, 2025 -
b5220
published
Apr 29, 2025 -
b5221
published
Apr 29, 2025 -
b5222
published
Apr 29, 2025 -
b5223
published
Apr 29, 2025 -
b5225
published
Apr 30, 2025
30 Pull requests merged by 16 people
-
rpc : fix cache directory initialization
#13188 merged
Apr 30, 2025 -
scripts: n_depth support for compare-llama-bench
#13201 merged
Apr 29, 2025 -
Prefilling assistant message in openai compatible API
#13174 merged
Apr 29, 2025 -
sampling : when top-k <= 0 -> noop
#13173 merged
Apr 29, 2025 -
llama-bench: fixed size of fields to correctly map to values
#13183 merged
Apr 29, 2025 -
CUDA: fix non-cont. inputs for batched mat mul
#13155 merged
Apr 29, 2025 -
llama : llm_type order by size
#13177 merged
Apr 29, 2025 -
mtmd : add qwen2vl and qwen2.5vl
#13141 merged
Apr 29, 2025 -
llama : set qwen3 model type sizes
#13175 merged
Apr 29, 2025 -
llama-graph : fix text position for mrope
#13159 merged
Apr 29, 2025 -
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 merged
Apr 28, 2025 -
clip : fix model size display
#13153 merged
Apr 28, 2025 -
fix(rpc): Improve input validation and error handling
#13069 merged
Apr 28, 2025 -
llama-bench: add
-d
depth arg#13096 merged
Apr 28, 2025 -
mtmd : fix glm-edge redundant token count
#13139 merged
Apr 28, 2025 -
delete buffer clear as suggested
#13152 merged
Apr 28, 2025 -
llama : (mrope) allow using normal 1D position for text token
#13138 merged
Apr 28, 2025 -
clip : refactor set input for cgraph + fix qwen2.5vl input
#13136 merged
Apr 28, 2025 -
SYCL: Add all missing unary kernels
#13074 merged
Apr 28, 2025 -
readme : update hot topics
#13150 merged
Apr 28, 2025 -
common : fix noreturn compile warning
#13151 merged
Apr 28, 2025 -
llama-chat : fix typo GML --> GLM
#13143 merged
Apr 28, 2025 -
musa: fix typo in cc control
#13144 merged
Apr 28, 2025 -
CUDA: fix q_nope_absorbed precision for Deepseek 2 Lite f16
#13137 merged
Apr 28, 2025 -
arg : fix unused variable
#13142 merged
Apr 28, 2025 -
llama-bench : Add
--override-tensors
arg#12922 merged
Apr 27, 2025 -
fix wrong template in GLM4-0414
#13140 merged
Apr 27, 2025 -
musa: fix build warning
#13129 merged
Apr 27, 2025 -
Fixes Qwen2.5VL segfault during inference
#13133 merged
Apr 27, 2025 -
Add Qwen2.5VL support
#12402 merged
Apr 27, 2025
16 Pull requests opened by 11 people
-
CUDA: build archs as virtual for GGML_NATIVE=OFF
#13135 opened
Apr 27, 2025 -
PowerPC: Enable MMA for BF16 in llamafile_sgemm
#13148 opened
Apr 28, 2025 -
musa: enable MMA
#13149 opened
Apr 28, 2025 -
common: Ensure libcommon.so is build if BUILD_SHARED_LIBS=ON (#13156)
#13158 opened
Apr 28, 2025 -
[CANN] Update CANN model support status
#13162 opened
Apr 29, 2025 -
fix(rpc): validate graph operands
#13167 opened
Apr 29, 2025 -
Fix for issue #13170
#13176 opened
Apr 29, 2025 -
ggml-cpu: enable z17 compile detection
#13182 opened
Apr 29, 2025 -
mtmd : add C public API
#13184 opened
Apr 29, 2025 -
test: non-cont. b in test-backend-ops -o MUL_MAT
#13187 opened
Apr 29, 2025 -
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader
#13191 opened
Apr 29, 2025 -
vulkan: use uint array index to avoid glslang bug
#13193 opened
Apr 29, 2025 -
kv-cache : add SWA support
#13194 opened
Apr 29, 2025 -
[RFC] handling jinja extra template kwargs (Qwen3 enable_thinking feature)
#13196 opened
Apr 29, 2025 -
CUDA: batched+noncont MMQ, refactor bs>1 MoE code
#13199 opened
Apr 29, 2025 -
arg : allow using -hf offline
#13202 opened
Apr 29, 2025
26 Issues closed by 10 people
-
Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf
#11325 closed
Apr 30, 2025 -
How do I know which operator the code is computing has an error?
#12389 closed
Apr 30, 2025 -
Eval bug: KV cache changes the inference results, even when context fits and no quantization
#12396 closed
Apr 30, 2025 -
Eval bug: llama-qwen2vl-cli gives too short or cut response
#12408 closed
Apr 30, 2025 -
failed to quantize: unknown model architecture: 'qwen3moe'
#13200 closed
Apr 29, 2025 -
Misc. bug: top_k=0 Abysmal Performance
#13171 closed
Apr 29, 2025 -
Eval bug: llama-bench seems to be broken
#13169 closed
Apr 29, 2025 -
Misc. bug: Only using 1 compute core on AMD
#12978 closed
Apr 29, 2025 -
Misc. bug: DeepSeek-R1-Distill-Qwen-1.5B-F16.gguf failed to run on Android
#13179 closed
Apr 29, 2025 -
Misc. bug: Missing <think> tag in response (DeepSeek R1)
#11861 closed
Apr 29, 2025 -
Eval bug: clip_model_loader: model_size not init.
#13147 closed
Apr 28, 2025 -
Misc. bug: RPC server crash on `SET_TENSOR` with invalid `ggml_type`
#13067 closed
Apr 28, 2025 -
llama-server bug: Prompt caching fails when editing the second user input
#13126 closed
Apr 28, 2025 -
Eval bug: Gemma-3 Vision failed with CUDA
#12973 closed
Apr 28, 2025 -
Misc. bug: llama-server throws "Unsupported param: tools"
#10920 closed
Apr 28, 2025 -
Misc. bug: I'm seeing gibberish output
#11463 closed
Apr 28, 2025 -
Eval bug: llama-cpp-deepseek-r1.jinja template will miss the <think> tag
#12107 closed
Apr 28, 2025 -
Eval bug: loading model: vk::PhysicalDevice::createDevice: ErrorExtensionNotPresent. Not falling back to CPU
#12163 closed
Apr 28, 2025 -
Misc. bug: convert_hf_to_gguf failing for deepseek-r1 full
#12255 closed
Apr 28, 2025 -
Eval bug: Segfault in `ggml_compute_forward_dup_bytes`
#12354 closed
Apr 28, 2025 -
Trojan:Script/Wacatac.B!ml warning on latest release that's six hours ago.
#12355 closed
Apr 28, 2025 -
Misc. bug: llama-server command line options are ignored
#12363 closed
Apr 28, 2025 -
Eval bug: [/SOLUTION] visible in granite 8B
#12384 closed
Apr 28, 2025 -
Hi all,
#12385 closed
Apr 28, 2025 -
Math & Code Benchmark/Testing for GGUFs
#13127 closed
Apr 27, 2025
18 Issues opened by 18 people
-
Compile bug: /usr/bin/ld: test-quantize-stats.cpp:(.text+0x2cec): undefined reference to `ggml_get_f32_1d'
#13192 opened
Apr 29, 2025 -
Misc. bug: rpc-server crash without cache
#13185 opened
Apr 29, 2025 -
Eval bug: Qwen3, failed to parse chat template (jinja)
#13178 opened
Apr 29, 2025 -
Misc. bug: llama-parallel segmentation fault
#13172 opened
Apr 29, 2025 -
Compile bug: Build fails on ppc64le
#13170 opened
Apr 29, 2025 -
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 opened
Apr 29, 2025 -
Compile bug: llama-server-cuda docker image build failure
#13166 opened
Apr 29, 2025 -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 opened
Apr 29, 2025 -
Eval bug: Qwen3 Q4_0 not working with SYCL
#13163 opened
Apr 29, 2025 -
Eval bug: SIGILL
#13161 opened
Apr 29, 2025 -
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 opened
Apr 29, 2025 -
bug: ValueError: Architecture qwen3 not supported
#13157 opened
Apr 28, 2025 -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 opened
Apr 28, 2025 -
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 opened
Apr 28, 2025
39 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
sycl : Implemented reorder Q4_K mmvq
#13109 commented on
Apr 29, 2025 • 12 new comments -
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 commented on
Apr 30, 2025 • 8 new comments -
ggml : fix more imatrix nan cases
#11773 commented on
Apr 29, 2025 • 1 new comment -
`server`: inject date_string in llama 3.x template + fix date for firefunction v2
#12802 commented on
Apr 29, 2025 • 1 new comment -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on
Apr 28, 2025 • 0 new comments -
Feature Proposal: Server Model Switching at Runtime
#13027 commented on
Apr 30, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Apr 30, 2025 • 0 new comments -
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on
Apr 29, 2025 • 0 new comments -
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on
Apr 29, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Apr 27, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Apr 29, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Apr 28, 2025 • 0 new comments -
kv-cache : separate recurrent vs non-recurrent impl
#12799 commented on
Apr 30, 2025 • 0 new comments -
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 commented on
Apr 29, 2025 • 0 new comments -
ggml: Implement yield barrier using futex for improved thread scheduling efficiency
#13079 commented on
Apr 30, 2025 • 0 new comments -
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 commented on
Apr 28, 2025 • 0 new comments -
[sync #10544] llama/ggml: add LLM training support
#13105 commented on
Apr 30, 2025 • 0 new comments -
llama : try loading tensors with pre-computed hashes
#13106 commented on
Apr 30, 2025 • 0 new comments -
context : allow cache-less context for embeddings
#13108 commented on
Apr 30, 2025 • 0 new comments -
convert : improve model arch handling
#13122 commented on
Apr 27, 2025 • 0 new comments -
Misc. bug:
#12623 commented on
Apr 28, 2025 • 0 new comments -
Compile bug: SYCL backend build fail on debug config
#12602 commented on
Apr 28, 2025 • 0 new comments -
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on
Apr 28, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Apr 28, 2025 • 0 new comments -
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on
Apr 28, 2025 • 0 new comments -
Eval bug: GLM-Z1-9B-0414
#12946 commented on
Apr 28, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Apr 28, 2025 • 0 new comments -
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 commented on
Apr 29, 2025 • 0 new comments -
Move gguf fuzzers to the llama.cpp repository
#11514 commented on
Apr 29, 2025 • 0 new comments -
Compile bug: Vulkan Cross compile for arm64
#13068 commented on
Apr 29, 2025 • 0 new comments -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on
Apr 29, 2025 • 0 new comments -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 commented on
Apr 29, 2025 • 0 new comments -
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on
Apr 29, 2025 • 0 new comments -
server: Bring back multimodal support
#8010 commented on
Apr 29, 2025 • 0 new comments -
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 commented on
Apr 30, 2025 • 0 new comments -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU (vs Linux).
#12651 commented on
Apr 30, 2025 • 0 new comments -
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 commented on
Apr 30, 2025 • 0 new comments -
csm : implement Sesame-based conversation example
#12392 commented on
Apr 30, 2025 • 0 new comments -
Feature Request: resize an existing context
#11577 commented on
Apr 30, 2025 • 0 new comments