GGUF llama.cpp量化-badb0y-ChinaUnix博客

好想好好爱你！badb0y.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

badb0y

博客访问： 7778342
博文数量： 1777
博客积分： 18684
博客等级：上将
技术积分： 16412
用户组：普通用户
注册时间： 2010-06-02 10:28

个人简介

啥也没写

文章分类

全部博文（1777）

手记（15）
休闲无聊（104）
UNIX（1423）

code（170）

linux bsd（1196）

MAC X（7）

Solaris（22）

Tru64 OpenVMS（4）

AS400（9）

AIX（11）

HP-UX（3）
网络技术（46）
小工具（22）
服务器技术（146）
未分配的博文（21）

文章存档

2025年（7）

2024年（16）

2023年（44）

2022年（39）

2021年（46）

2020年（43）

2019年（27）

2018年（44）

2017年（50）

2016年（47）

2015年（15）

2014年（21）

2013年（43）

2012年（143）

2011年（228）

2010年（263）

2009年（384）

2008年（246）

2007年（30）

2006年（38）

2005年（2）

2004年（1）

我的朋友

相关博文

GGUF llama.cpp量化

分类：系统运维

2025-01-16 14:15:08

#安装llama.cpp
git clone https :// github.com/ggerganov/llama.cpp
cd llama.cpp
scl enable gcc-toolset-13 bash #需要GCC13

#cpu
cmake -B build
cmake --build build --config Release
#gpu
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

获取GGUF
python convert_hf_to_gguf.py /data/qwen3b/Qwen/Qwen2___5-3B-Instruct/ --outfile qwen2.5-3b-instruct-f16.gguf
有时，可能{BANNED}最佳好将fp32作为量化的起点。在这种情况下，使用
python convert-hf-to-gguf.py/data/qwen3b/Qwen/Qwen2___5-3B-Instruct/ --outtype f32 --outfile qwen2.5-3b-instruct-f32.gguf

量化成Q8_0
build_gpu/bin/llama-quantize qwen2.5-3b-instruct-f16.gguf qwen2.5-3b-instruct-q8_0.gguf Q8_0

阅读(138) | 评论(0) | 转发(0) |

上一篇：几个AI CODE

下一篇：ssh over socks5代理

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6