Skip to content

Tags: ggml-org/llama.cpp

Tags

b5223

Toggle b5223's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : Prefilling assistant message in openai compatible API (#13174)

* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

b5222

Toggle b5222's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
sampling : when top-k <= 0 -> noop (#13173)

ggml-ci

b5221

Toggle b5221's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-bench: fixed size of fields to correctly map to values (#13183)

b5220

Toggle b5220's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: fix non-cont. inputs for batched mat mul (#13155)

b5219

Toggle b5219's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : llm_type order by size (#13177)

b5218

Toggle b5218's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd : add qwen2vl and qwen2.5vl (#13141)

* llava : add clip_n_output_tokens, deprecate clip_n_patches

* mtmd : add qwen2vl and qwen2.5vl

* decode_embd_batch::set_position_...

* working version

* deprecate llama-qwen2vl-cli

* correct order W, H of clip_embd_nbytes_by_img

* edit existing line in hot topics

b5217

Toggle b5217's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : set qwen3 model type sizes (#13175)

b5216

Toggle b5216's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-graph : fix text position for mrope (#13159)

* llama-graph : fix text position for mrope

* fix typo

* explicitly set 4th dim in the loop

b5215

Toggle b5215's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (

#12466)

* Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture

- Adds MoE-based embedding model supporting multilingual embeddings.
- Selects architecture variant based on hyperparameter detection (MoE layers).
- Removes unnecessary subclass initialization checks for clarity.

https://www.nomic.ai/blog/posts/nomic-embed-text-v2

Co-authored-by: Jared Van Bortel <jared@nomic.ai>

* fix tokenizer

* don't rename this tensor

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>

b5214

Toggle b5214's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : fix model size display (#13153)