-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
47 Releases published by 1 person
-
b5171
published
Apr 23, 2025 -
b5173
published
Apr 23, 2025 -
b5174
published
Apr 23, 2025 -
b5175
published
Apr 24, 2025 -
b5176
published
Apr 24, 2025 -
b5177
published
Apr 24, 2025 -
b5178
published
Apr 24, 2025 -
b5180
published
Apr 24, 2025 -
b5181
published
Apr 24, 2025 -
b5184
published
Apr 24, 2025 -
b5185
published
Apr 24, 2025 -
b5186
published
Apr 24, 2025 -
b5187
published
Apr 25, 2025 -
b5188
published
Apr 25, 2025 -
b5189
published
Apr 25, 2025 -
b5190
published
Apr 25, 2025 -
b5191
published
Apr 25, 2025 -
b5192
published
Apr 26, 2025 -
b5193
published
Apr 26, 2025 -
b5194
published
Apr 26, 2025 -
b5195
published
Apr 26, 2025 -
b5196
published
Apr 27, 2025 -
b5197
published
Apr 27, 2025 -
b5198
published
Apr 27, 2025 -
b5199
published
Apr 27, 2025 -
b5200
published
Apr 27, 2025 -
b5201
published
Apr 28, 2025 -
b5202
published
Apr 28, 2025 -
b5204
published
Apr 28, 2025 -
b5205
published
Apr 28, 2025 -
b5207
published
Apr 28, 2025 -
b5208
published
Apr 28, 2025 -
b5209
published
Apr 28, 2025 -
b5210
published
Apr 28, 2025 -
b5211
published
Apr 28, 2025 -
b5212
published
Apr 28, 2025 -
b5213
published
Apr 28, 2025 -
b5214
published
Apr 28, 2025 -
b5215
published
Apr 28, 2025 -
b5216
published
Apr 29, 2025 -
b5217
published
Apr 29, 2025 -
b5218
published
Apr 29, 2025 -
b5219
published
Apr 29, 2025 -
b5220
published
Apr 29, 2025 -
b5221
published
Apr 29, 2025 -
b5222
published
Apr 29, 2025 -
b5223
published
Apr 29, 2025
52 Pull requests merged by 24 people
-
scripts: n_depth support for compare-llama-bench
#13201 merged
Apr 29, 2025 -
Prefilling assistant message in openai compatible API
#13174 merged
Apr 29, 2025 -
sampling : when top-k <= 0 -> noop
#13173 merged
Apr 29, 2025 -
llama-bench: fixed size of fields to correctly map to values
#13183 merged
Apr 29, 2025 -
CUDA: fix non-cont. inputs for batched mat mul
#13155 merged
Apr 29, 2025 -
llama : llm_type order by size
#13177 merged
Apr 29, 2025 -
mtmd : add qwen2vl and qwen2.5vl
#13141 merged
Apr 29, 2025 -
llama : set qwen3 model type sizes
#13175 merged
Apr 29, 2025 -
llama-graph : fix text position for mrope
#13159 merged
Apr 29, 2025 -
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 merged
Apr 28, 2025 -
clip : fix model size display
#13153 merged
Apr 28, 2025 -
fix(rpc): Improve input validation and error handling
#13069 merged
Apr 28, 2025 -
llama-bench: add
-d
depth arg#13096 merged
Apr 28, 2025 -
mtmd : fix glm-edge redundant token count
#13139 merged
Apr 28, 2025 -
delete buffer clear as suggested
#13152 merged
Apr 28, 2025 -
llama : (mrope) allow using normal 1D position for text token
#13138 merged
Apr 28, 2025 -
clip : refactor set input for cgraph + fix qwen2.5vl input
#13136 merged
Apr 28, 2025 -
SYCL: Add all missing unary kernels
#13074 merged
Apr 28, 2025 -
readme : update hot topics
#13150 merged
Apr 28, 2025 -
common : fix noreturn compile warning
#13151 merged
Apr 28, 2025 -
llama-chat : fix typo GML --> GLM
#13143 merged
Apr 28, 2025 -
musa: fix typo in cc control
#13144 merged
Apr 28, 2025 -
CUDA: fix q_nope_absorbed precision for Deepseek 2 Lite f16
#13137 merged
Apr 28, 2025 -
arg : fix unused variable
#13142 merged
Apr 28, 2025 -
llama-bench : Add
--override-tensors
arg#12922 merged
Apr 27, 2025 -
fix wrong template in GLM4-0414
#13140 merged
Apr 27, 2025 -
musa: fix build warning
#13129 merged
Apr 27, 2025 -
Fixes Qwen2.5VL segfault during inference
#13133 merged
Apr 27, 2025 -
Add Qwen2.5VL support
#12402 merged
Apr 27, 2025 -
common : add common_remote_get_content
#13123 merged
Apr 26, 2025 -
clip : improve projector naming
#13118 merged
Apr 26, 2025 -
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs
#13107 merged
Apr 26, 2025 -
grammar : handle maxItems == 0 in JSON schema (#13116)
#13117 merged
Apr 26, 2025 -
llama : fix K-shift with quantized K and BLAS backend
#13113 merged
Apr 25, 2025 -
Force FP32 compute in GLM4 FFN Down
#13101 merged
Apr 25, 2025 -
clip : fix pixtral on some GPU backends
#13097 merged
Apr 25, 2025 -
[SYCL][OPT] Fix reorder optimization for Q4_0
#13003 merged
Apr 25, 2025 -
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR
#12943 merged
Apr 25, 2025 -
clip : remove boi/eoi embeddings for GLM-edge model (⚠️ breaking change)
#13081 merged
Apr 24, 2025 -
embeddings : fix batch sizes
#13076 merged
Apr 24, 2025 -
sync : ggml
#13098 merged
Apr 24, 2025 -
CUDA: use switch statements in constexpr functions
#13095 merged
Apr 24, 2025 -
cmake : do not include ./src as public for libllama
#13062 merged
Apr 24, 2025 -
clang-tidy : disable warning about missing math parenthesis
#13091 merged
Apr 24, 2025 -
arg : add --no-mmproj-offload
#13093 merged
Apr 24, 2025 -
arg : clean up handling --mmproj with -hf
#13082 merged
Apr 24, 2025 -
metal : fix floating-point range of attention scores in FA kernels
#13090 merged
Apr 24, 2025 -
vulkan: matmul gcn tuning
#13016 merged
Apr 24, 2025 -
llama-mtmd-cli: Sigint rework in mtmd vision example
#13080 merged
Apr 23, 2025 -
mtmd : Support Pixtral 12B
#13065 merged
Apr 23, 2025 -
Append mult-eos,half-rope,bos to GLM4-0414 and Z
#13021 merged
Apr 23, 2025 -
rpc : add command line option for number of threads for the CPU backend
#13060 merged
Apr 23, 2025
24 Pull requests opened by 15 people
-
ggml: Implement yield barrier using futex for improved thread scheduling efficiency
#13079 opened
Apr 23, 2025 -
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 opened
Apr 25, 2025 -
[sync #10544] llama/ggml: add LLM training support
#13105 opened
Apr 25, 2025 -
llama : try loading tensors with pre-computed hashes
#13106 opened
Apr 25, 2025 -
context : allow cache-less context for embeddings
#13108 opened
Apr 25, 2025 -
sycl : Implemented reorder Q4_K mmvq
#13109 opened
Apr 25, 2025 -
convert : improve model arch handling
#13122 opened
Apr 26, 2025 -
CUDA: build archs as virtual for GGML_NATIVE=OFF
#13135 opened
Apr 27, 2025 -
PowerPC: Enable MMA for BF16 in llamafile_sgemm
#13148 opened
Apr 28, 2025 -
musa: enable MMA
#13149 opened
Apr 28, 2025 -
common: Ensure libcommon.so is build if BUILD_SHARED_LIBS=ON (#13156)
#13158 opened
Apr 28, 2025 -
[CANN] Update CANN model support status
#13162 opened
Apr 29, 2025 -
fix(rpc): validate graph operands
#13167 opened
Apr 29, 2025 -
Fix for issue #13170
#13176 opened
Apr 29, 2025 -
ggml-cpu: enable z17 compile detection
#13182 opened
Apr 29, 2025 -
mtmd : add C public API
#13184 opened
Apr 29, 2025 -
test: non-cont. b in test-backend-ops -o MUL_MAT
#13187 opened
Apr 29, 2025 -
rpc : fix cache directory initialization
#13188 opened
Apr 29, 2025 -
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader
#13191 opened
Apr 29, 2025 -
vulkan: use uint array index to avoid glslang bug
#13193 opened
Apr 29, 2025 -
kv-cache : add SWA support
#13194 opened
Apr 29, 2025 -
[RFC] handling jinja extra template kwargs (Qwen3 enable_thinking feature)
#13196 opened
Apr 29, 2025 -
CUDA: batched+noncont MMQ, refactor bs>1 MoE code
#13199 opened
Apr 29, 2025 -
arg : allow using -hf offline
#13202 opened
Apr 29, 2025
52 Issues closed by 15 people
-
Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf
#11325 closed
Apr 30, 2025 -
How do I know which operator the code is computing has an error?
#12389 closed
Apr 30, 2025 -
Eval bug: KV cache changes the inference results, even when context fits and no quantization
#12396 closed
Apr 30, 2025 -
Eval bug: llama-qwen2vl-cli gives too short or cut response
#12408 closed
Apr 30, 2025 -
failed to quantize: unknown model architecture: 'qwen3moe'
#13200 closed
Apr 29, 2025 -
Misc. bug: top_k=0 Abysmal Performance
#13171 closed
Apr 29, 2025 -
Eval bug: llama-bench seems to be broken
#13169 closed
Apr 29, 2025 -
Misc. bug: Only using 1 compute core on AMD
#12978 closed
Apr 29, 2025 -
Misc. bug: DeepSeek-R1-Distill-Qwen-1.5B-F16.gguf failed to run on Android
#13179 closed
Apr 29, 2025 -
Misc. bug: Missing <think> tag in response (DeepSeek R1)
#11861 closed
Apr 29, 2025 -
Eval bug: clip_model_loader: model_size not init.
#13147 closed
Apr 28, 2025 -
Misc. bug: RPC server crash on `SET_TENSOR` with invalid `ggml_type`
#13067 closed
Apr 28, 2025 -
llama-server bug: Prompt caching fails when editing the second user input
#13126 closed
Apr 28, 2025 -
Eval bug: Gemma-3 Vision failed with CUDA
#12973 closed
Apr 28, 2025 -
Misc. bug: llama-server throws "Unsupported param: tools"
#10920 closed
Apr 28, 2025 -
Misc. bug: I'm seeing gibberish output
#11463 closed
Apr 28, 2025 -
Eval bug: llama-cpp-deepseek-r1.jinja template will miss the <think> tag
#12107 closed
Apr 28, 2025 -
Eval bug: loading model: vk::PhysicalDevice::createDevice: ErrorExtensionNotPresent. Not falling back to CPU
#12163 closed
Apr 28, 2025 -
Misc. bug: convert_hf_to_gguf failing for deepseek-r1 full
#12255 closed
Apr 28, 2025 -
Eval bug: Segfault in `ggml_compute_forward_dup_bytes`
#12354 closed
Apr 28, 2025 -
Trojan:Script/Wacatac.B!ml warning on latest release that's six hours ago.
#12355 closed
Apr 28, 2025 -
Misc. bug: llama-server command line options are ignored
#12363 closed
Apr 28, 2025 -
Eval bug: [/SOLUTION] visible in granite 8B
#12384 closed
Apr 28, 2025 -
Hi all,
#12385 closed
Apr 28, 2025 -
Math & Code Benchmark/Testing for GGUFs
#13127 closed
Apr 27, 2025 -
Misc. bug: server not exit after `missing result_output tensor` error
#11808 closed
Apr 27, 2025 -
Misc. bug: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)
#11812 closed
Apr 27, 2025 -
Misc. bug: JSON schema that defines array with 0 elements generates un-parseable GBNF
#13116 closed
Apr 26, 2025 -
Misc. bug: Vulcan premature out of memory exception on AMD Instinct MI60
#11598 closed
Apr 26, 2025 -
Misc. bug: llama-server: SegFault with json_schema containing unsupported pattern
#12252 closed
Apr 26, 2025 -
Misc. bug: malloc error in #8 0x00007fffda1ea8ec in unicode_regex_split
#12335 closed
Apr 26, 2025 -
GGUF to MLX conversion?
#12349 closed
Apr 26, 2025 -
Compile bug: Flash attention with RocmWMMA 6.4 seems to be broken on gfx1201
#13110 closed
Apr 25, 2025 -
Misc. bug: Unsupported op "CPY" / SIGABRT on Apple CPU
#13112 closed
Apr 25, 2025 -
Eval bug: load converted deepseek-r1-bf16 gguf, print many ":1case"
#13094 closed
Apr 25, 2025 -
Regarding llama-bench and llama-parallel commands
#12106 closed
Apr 25, 2025 -
Research: Benchmarking DeepSeek-R1 IQ1_S 1.58bit
#11474 closed
Apr 25, 2025 -
Compile bug: undefined reference to `ggml_set_f32_nd'
#12281 closed
Apr 25, 2025 -
Misc. bug: llama-embedding asserts: GGML_ASSERT(params.n_batch >= params.n_ctx);
#12860 closed
Apr 24, 2025 -
Eval bug: HIP: llama.cpp server locks up when running multiple instances on the same gpu
#12991 closed
Apr 24, 2025 -
CUDA compiler out of heap space
#13086 closed
Apr 24, 2025 -
Deleted
#13089 closed
Apr 24, 2025 -
Misc. bug: Sporadic MUL_MAT Failures in test-backend-ops for Nvidia backend
#11972 closed
Apr 24, 2025 -
Misc. bug: Web-UI now unusably slow - over network or locally.
#12026 closed
Apr 24, 2025 -
Feature Request: Support for Phi4MMForCausalLM Architecture
#12117 closed
Apr 24, 2025 -
Feature Request: Add support for InstellaForCausalLM model architecture
#12270 closed
Apr 24, 2025 -
[CANN] bug: Test results on CANN support for Deepseek-V3/R1
#12324 closed
Apr 24, 2025 -
Misc. bug: CPU Usage low in rpc-server mode
#13051 closed
Apr 23, 2025 -
Why is a 16 core CPU slower than an 8-core CPU?
#13075 closed
Apr 23, 2025
31 Issues opened by 29 people
-
Compile bug: /usr/bin/ld: test-quantize-stats.cpp:(.text+0x2cec): undefined reference to `ggml_get_f32_1d'
#13192 opened
Apr 29, 2025 -
Misc. bug: rpc-server crash without cache
#13185 opened
Apr 29, 2025 -
Eval bug: Qwen3, failed to parse chat template (jinja)
#13178 opened
Apr 29, 2025 -
Misc. bug: llama-parallel segmentation fault
#13172 opened
Apr 29, 2025 -
Compile bug: Build fails on ppc64le
#13170 opened
Apr 29, 2025 -
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 opened
Apr 29, 2025 -
Compile bug: llama-server-cuda docker image build failure
#13166 opened
Apr 29, 2025 -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 opened
Apr 29, 2025 -
Eval bug: Qwen3 Q4_0 not working with SYCL
#13163 opened
Apr 29, 2025 -
Eval bug: SIGILL
#13161 opened
Apr 29, 2025 -
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 opened
Apr 29, 2025 -
bug: ValueError: Architecture qwen3 not supported
#13157 opened
Apr 28, 2025 -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 opened
Apr 28, 2025 -
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 opened
Apr 28, 2025 -
Plamo arch no longer working
#13130 opened
Apr 27, 2025 -
Feature Request: Allow `-hf` to be used offline
#13128 opened
Apr 26, 2025 -
Feature Request: Add C api for mtmd
#13124 opened
Apr 26, 2025 -
Eval bug: EXAONE fails to run with quantized KV cache
#13121 opened
Apr 26, 2025 -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 opened
Apr 25, 2025 -
Feature Request: Kimi-Audio-7B
#13114 opened
Apr 25, 2025 -
Feature Request: define key bindings for quick deletion of the previous conversation.
#13111 opened
Apr 25, 2025 -
Misc. bug: Retrieval sample not decoding token successfully
#13102 opened
Apr 24, 2025 -
Eval: HIP: Llama-server multi-instance lockup
#13100 opened
Apr 24, 2025 -
Eval bug: Flash Attention not working with NVIDIA GeForce RTX 4060 Ti
#13092 opened
Apr 24, 2025 -
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 opened
Apr 23, 2025 -
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 opened
Apr 23, 2025 -
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 opened
Apr 23, 2025
84 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 commented on
Apr 30, 2025 • 9 new comments -
server : (experimental) vision support via libmtmd
#12898 commented on
Apr 26, 2025 • 5 new comments -
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
#13053 commented on
Apr 25, 2025 • 4 new comments -
ggml : fix more imatrix nan cases
#11773 commented on
Apr 29, 2025 • 1 new comment -
(draft) tts: Orpheus support
#12487 commented on
Apr 25, 2025 • 1 new comment -
`server`: inject date_string in llama 3.x template + fix date for firefunction v2
#12802 commented on
Apr 29, 2025 • 1 new comment -
tts : implement sesame CSM + Mimi decoder
#12648 commented on
Apr 25, 2025 • 1 new comment -
Reduce enum sizes some are used in structs, which allowed them to be optimized.
#13071 commented on
Apr 25, 2025 • 0 new comments -
llama/ggml: add LLM training support
#10544 commented on
Apr 23, 2025 • 0 new comments -
rpc : copy tensors across servers
#8032 commented on
Apr 23, 2025 • 0 new comments -
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on
Apr 29, 2025 • 0 new comments -
Feature Proposal: Server Model Switching at Runtime
#13027 commented on
Apr 30, 2025 • 0 new comments -
Feature Request: resize an existing context
#11577 commented on
Apr 30, 2025 • 0 new comments -
csm : implement Sesame-based conversation example
#12392 commented on
Apr 30, 2025 • 0 new comments -
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 commented on
Apr 30, 2025 • 0 new comments -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU (vs Linux).
#12651 commented on
Apr 30, 2025 • 0 new comments -
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 commented on
Apr 30, 2025 • 0 new comments -
server: Bring back multimodal support
#8010 commented on
Apr 29, 2025 • 0 new comments -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on
Apr 29, 2025 • 0 new comments -
Compile bug: Vulkan Cross compile for arm64
#13068 commented on
Apr 29, 2025 • 0 new comments -
Move gguf fuzzers to the llama.cpp repository
#11514 commented on
Apr 29, 2025 • 0 new comments -
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 commented on
Apr 29, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Apr 28, 2025 • 0 new comments -
Eval bug: GLM-Z1-9B-0414
#12946 commented on
Apr 28, 2025 • 0 new comments -
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on
Apr 28, 2025 • 0 new comments -
Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file
#13058 commented on
Apr 23, 2025 • 0 new comments -
[CANN]Support OP MUL_MAT_ID
#13042 commented on
Apr 27, 2025 • 0 new comments -
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 commented on
Apr 23, 2025 • 0 new comments -
gguf-py : avoid requiring PySide6 for packaged scripts
#13036 commented on
Apr 24, 2025 • 0 new comments -
Nix portability improvements
#13005 commented on
Apr 25, 2025 • 0 new comments -
sycl: use DNN in the first part of ggml_sycl_mul_mat_batched_sycl
#12972 commented on
Apr 23, 2025 • 0 new comments -
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 commented on
Apr 29, 2025 • 0 new comments -
kv-cache : separate recurrent vs non-recurrent impl
#12799 commented on
Apr 29, 2025 • 0 new comments -
opencl: fix couple crashes
#12795 commented on
Apr 24, 2025 • 0 new comments -
Support for OuteTTS 1.0
#12794 commented on
Apr 25, 2025 • 0 new comments -
opencl: Add support for multiple devices
#12622 commented on
Apr 24, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Apr 28, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Apr 29, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Apr 27, 2025 • 0 new comments -
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on
Apr 29, 2025 • 0 new comments -
Allow user to compile with any cuda version using github actions
#10928 commented on
Apr 25, 2025 • 0 new comments -
Fix compilation on Pop!_OS 22.04 LTS CUDA
#10835 commented on
Apr 25, 2025 • 0 new comments -
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 commented on
Apr 26, 2025 • 0 new comments -
Feature Request: Add kv-quant fa kernel variants for head sizes other than 128
#12989 commented on
Apr 25, 2025 • 0 new comments -
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on
Apr 25, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Apr 25, 2025 • 0 new comments -
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352 commented on
Apr 25, 2025 • 0 new comments -
Misc. bug: auto scroll doesn't work in WebUI
#12362 commented on
Apr 25, 2025 • 0 new comments -
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 commented on
Apr 25, 2025 • 0 new comments -
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 commented on
Apr 25, 2025 • 0 new comments -
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 commented on
Apr 25, 2025 • 0 new comments -
Misc. bug: Vulkan performance depends on thread priority
#12976 commented on
Apr 24, 2025 • 0 new comments -
Eval bug: model producing gibberish for Orion14b-chat
#12411 commented on
Apr 24, 2025 • 0 new comments -
Compile bug: common.cuh(3): fatal error c1083 cannot open include file: "ggml.h" : No such file or directory
#13073 commented on
Apr 24, 2025 • 0 new comments -
Misc. bug: llama-cli (vulkan backend) output gibberish with old vulkan sdk
#13044 commented on
Apr 24, 2025 • 0 new comments -
Feature Request: Prefix assistant answer
#11536 commented on
Apr 24, 2025 • 0 new comments -
Feature Request: allow mmap to take advantage of hugepage feature which has 10x speedup
#12444 commented on
Apr 24, 2025 • 0 new comments -
Misc. bug: Flash attention on Vulkan
#12526 commented on
Apr 24, 2025 • 0 new comments -
Eval bug: seemed it cannot convert theQwen2.5-VL-7B-Instruct, please help advice, Thank you.
#12534 commented on
Apr 24, 2025 • 0 new comments -
Eval bug: crash when pooling_type == LLAMA_POOLING_TYPE_MEAN
#12543 commented on
Apr 24, 2025 • 0 new comments -
server : add support for file upload to the Web UI
#11611 commented on
Apr 23, 2025 • 0 new comments -
Add a new `llama_load_model_from_buffer()` method to compliment `llama_load_model_from_file()`
#6311 commented on
Apr 23, 2025 • 0 new comments -
Model Repeats Nonsensical Output
#13066 commented on
Apr 23, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Apr 28, 2025 • 0 new comments -
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on
Apr 28, 2025 • 0 new comments -
Compile bug: SYCL backend build fail on debug config
#12602 commented on
Apr 28, 2025 • 0 new comments -
Misc. bug:
#12623 commented on
Apr 28, 2025 • 0 new comments -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on
Apr 28, 2025 • 0 new comments -
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 commented on
Apr 27, 2025 • 0 new comments -
Eval bug: run failed when run lora adapter(no merged) on android
#12592 commented on
Apr 27, 2025 • 0 new comments -
[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 commented on
Apr 27, 2025 • 0 new comments -
Feature Request: Add "trust_remote_code support" to 'convert_hf_to_gguf.py' for compatibility with modern HF models
#12610 commented on
Apr 27, 2025 • 0 new comments -
Misc. bug: Data check in examples/gguf
#12617 commented on
Apr 27, 2025 • 0 new comments -
Suport for Jamba JambaForCausalLM
#6372 commented on
Apr 26, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Apr 26, 2025 • 0 new comments -
Support Hybrid Models
#12331 commented on
Apr 26, 2025 • 0 new comments -
Compile bug: Linux with CUDA 12.6
#11696 commented on
Apr 26, 2025 • 0 new comments -
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on
Apr 26, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Apr 26, 2025 • 0 new comments -
Eval bug: the swiftui keeps saying the same thing
#12558 commented on
Apr 26, 2025 • 0 new comments -
Misc. bug: performance drop with 2x SYCL GPUs
#12575 commented on
Apr 26, 2025 • 0 new comments -
-ngl to load ·last n layers· to gpu
#12577 commented on
Apr 26, 2025 • 0 new comments -
Qwen2.5-vl support and conversion?
#12584 commented on
Apr 26, 2025 • 0 new comments -
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 commented on
Apr 26, 2025 • 0 new comments