YOLOv3量化

YOLOv3 Int8量化

YOLOv3 Pytorch版本代码梳理

选用Pytorch版本YOLOv3代码链接：https://github.com/bubbliiiing/yolo3-pytorch/tree/bilibili

YOLOv3-Pytorch

img:存放detect图片。
logs：存放权值文件。
model_data：存放各种数据集的类别信息。
nets：网络模型结构文件夹：
- __init__.py
- darknet.py：定义残差结构，并按照卷积，BN，LeakyReLU，残差封装darknet53
- yolo_training.py：
  - YOLOLoss:
    - 整合损失
    - 获得网络的预测结果后，将预测结果进行解码，判断预测结果和真实值的重合程度，如果重合程度过大则忽略，因为这些特征点属于预测比较准确的特征点，作为负样本不合适。
    - 计算损失
    - 计算IOU
  - weights_init：初始化权重
- yolo.py：
  - conv2d：组合卷积，BN和relu到这个模块中
  - make_last_layers：七个卷积，前五个用于提取特征，后两个用于获得yolo网络的预测结果
  - YoloBody：定义backbone，加载权重文件，得到模型输出tensor，计算yolo_head的输出通道数，对于VOC数据集是75，输出三个不同尺寸的yolo_head
utils：
- __init__.py
- callback.py：画损失的时候会用到。
- dataloader：读取数据集，并进行数据增强。
- utils_bbox.py：DecodeBox：
  - 解码网络预测结果，调整先验框。利用预测结果对先验框进行调整，首先调整先验框的中心，从先验框中心向右下角偏移，再调整先验框的宽高。
  - 非极大值抑制。
- utils_fit.py：反向传播，记录损失值。
- utils_map.py：计算map，画图。
- utils.py：
  - 图像转换
  - resize
  - 获得类，先验框，学习率。
VOCdevkit：数据集，数据集中的train，val，test都在，根据txt进行区分，整个voc2007数据集是完整的。
get_map.py: 输出map值
kmeans_for_anchors.py：获取最适合的anchors。
predict.py：进行预测，可以预测图片，视频和文件夹。
summary.py：该部分代码用于看网络结构。

Int8量化

量化是一种减少模型大小和加速推理过程的技术，它通过将浮点数转换为整数来实现。在模型推理过程中，将可转换的operator，从float32转成int8，压缩参数，提升速度，降低内存占用，提升模型的推理速度，但是精度会有一定的下降。

根据量化数据表示的原始数据范围是否均匀，可以将量化分为线性量化和非线性量化。非线性量化的通用硬件加速比较困难，而且实现更加复杂，因此线性量化更加常用。线性量化中，根据浮点值的零点是否映射到量化值的零点，可以将量化分为对称量化(Symmetric)和非对称量化(Asymmetric)。

量化粒度：

Per channel：一个tensor里的所有value按照同一种方式去scale和offset；
Per tensor：粒度更细。对于tensor的某一个维度（通常是channel的维度）上的值按照一种方式去scale和offset，也就是一个tensor里有多种不同的scale和offset的方式（组成一个vector），如此以来，在量化的时候相比per tensor的方式会引入更少的错误。

Pytorch支持的两种量化模式：

Eager Mode Quantization：一种beta测试量化模式，需要手动指定量化和去量化发生的位置，只支持模块，不支持函数。
FX Graph Mode Quantization：
- 相对更高级的量化方式，不用手动设定各种配置，添加了对函数的支持。
- 使用的时候对模型进行一个trace追踪，但不能在任意模型中使用，有些模型无法追踪。
- FX是PyTorch的一个功能，可以将模型转换为一个Graph图表示，然后对图进行变换和量化。
- 这种方式通常比Eager mode更高效，因为它可以将多个操作融合在一起，减少运算时间，并且优化内存使用。

具体量化方式，可以分为两类，训练后量化和量化感知训练：

训练后量化：Post training Quantization
量化感知训练：Quantization aware training

训练后量化：Post training Quantization

接对已训练完成的模型进行量化，无需复杂的fine-tuning或训练过程，因此训练后量化的开销较小。
训练后量化无需或只需要一小部分数据驱动量化，因此能很好地应用于数据敏感的场景。但是训练后量化的模型精度下降可能要高于量化感知训练。
训练后量化，又分为静态量化和动态量化。
- 静态量化：Post Training Static Quantization：weight和activation都提前量化好，有calibration阶段。
  - 静态量化中离线计算好模型权重和激活的量化参数，推理的时候不再调整直接使用。
  - 对激活值量化需要获取激活值的分布信息。
  - 因此，静态量化中需要提供一定的数据来推理网络，收集网络的激活值信息，确定相关的量化参数。
- 动态量化：Post Training Dynamic Quantization：weight提前量化好，activation在inference过程中收集data range，从而确定scale无calibration阶段（一般用在含有LSTM GRU RNN的NLP类模型）
  - 在动态量化中，激活值相关的量化参数是在推理阶段实时计算的。虽然效果更好，但是会给推理带来额外的开销。因为很多nlp模型都是有一个范围在的，需要根据实际情况去动态训练。

静态量化和动态量化所支持的操作符：

支持的类型

量化感知训练：Quantization aware training

在训练好的模型上插入伪量化算子。
对数值量化然后反量化，模拟量化产生的误差。
然后在训练数据集更新权重并调整对应的量化参数，或者直接将量化参数作为可学习的参数在反向传播中更新。
整个计算过程使用浮点数计算，但是最后会得到一个量化后的模型。

Eager Mode Quantization

Eager mode 量化之前需要做的准备：

将输出重量化(因此需要额外参数)的操作从函数式转换为模块形式(例如，使用torch.nn.ReLU而不是torch.nn.function .relu)。
注意附加子模块的设置。
静态量化和量化感知训练需要指定激活被量化和去量化的位置。
使用FloatFunctional将需要特殊处理量化的张量操作包装到模块中。例如像add和cat这样的操作，它们需要特殊处理来确定输出量化参数。！
Fuse模块:将操作/模块组合成一个模块，以获得更高的精度和性能。目前支持：[Conv, Relu], [Conv, BatchNorm], [Conv, BatchNorm, Relu], [Linear, Relu]

Post training Dynamic Quatization

最简单的量化方式。
权重是提前量化的。
推理过程中的激活是动态量化的：因为模型执行时间主要从内存加载权重而不是计算矩阵乘法，多用于nlp，rnn，lstm和transformer。

# original model
# all tensors and computations are in floating point
previous_layer_fp32 -- linear_fp32 -- activation_fp32 -- next_layer_fp32
                 /
linear_weight_fp32

# dynamically quantized model
# linear and LSTM weights are in int8
previous_layer_fp32 -- linear_int8_w_fp32_inp -- activation_fp32 -- next_layer_fp32
                  /
linear_weight_int8

# statically quantized model
# weights and activations are in int8
previous_layer_int8 -- linear_with_activation_int8 -- next_layer_int8
                    /
  linear_weight_int8

调用代码：

定义一个模型
创建模型实例
利用torch api创建量化模型，指定模型动态量化的数据项。
运行量化后的模型。

import torch

# define a floating point model
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = torch.nn.Linear(4, 4)

    def forward(self, x):
        x = self.fc(x)
        return x

# create a model instance
model_fp32 = M()
# create a quantized model instance
model_int8 = torch.ao.quantization.quantize_dynamic(
    model_fp32,  # the original model
    {torch.nn.Linear},  # a set of layers to dynamically quantize
    dtype=torch.qint8)  # the target dtype for quantized weights

# run the model
input_fp32 = torch.randn(4, 4, 4, 4)
res = model_int8(input_fp32)

Post Training Static Quantization

ptsq中权重和激活都会提前量化。
Pytorch会将权重和激活融合在一起，做算子融合，如何可能得话，这样会一定程度的优化模型。
因为激活也要提前量化，需要有代表性的数据集校准，以确定激活的最佳量化参数。
当内存带宽和计算节省都很重要时，通常使用训练后静态量化，CNN是典型的用例。

# original model
# all tensors and computations are in floating point
previous_layer_fp32 -- linear_fp32 -- activation_fp32 -- next_layer_fp32
                 /
linear_weight_fp32

# statically quantized model
# weights and activations are in int8
previous_layer_int8 -- linear_with_activation_int8 -- next_layer_int8
                    /
  linear_weight_int8

代码流程：

定义一个包含量化和反量化的模型。
创建模型实例。
模型设置为eval模式，否则无法进行静态量化。
模型附加一个全局的qconfig，包含类型相关信息，包含一定的量化配置。例如指定对称非对称，MinMax等。
设置可以进行融合的层，常用的包括conv + relu和conv + batchnorm + relu。
创建静态量化准备模型，这个prepared model会在校准期间观察激活的张量。
用少量的代表性数据集，校准prepared model模型，以确定激活。
将prepared model转换为quantized model。
1. 量化权重
2. 计算和存储scale，bias，与激活张量一起使用。并取代键运算符。

代码示例：

import torch

# 定义一个包含量化和反量化设置的模型
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # QuantStub 
        self.quant = torch.ao.quantization.QuantStub()

        self.conv = torch.nn.Conv2d(1, 1, 1)
        self.relu = torch.nn.ReLU()

        # DeQuantStub 
        self.dequant = torch.ao.quantization.DeQuantStub()

    def forward(self, x):
        # 手动指定量化
        x = self.quant(x)

        x = self.conv(x)
        x = self.relu(x)

        # 手动指定反量化
        x = self.dequant(x)
        return x

# 创建模型实例
model_fp32 = M()

# 设置eval()模式
model_fp32.eval()

# 附加静态量化配置
model_fp32.qconfig = torch.ao.quantization.get_default_qconfig('x86')

# 设置融合，常见的： `conv + relu` and `conv + batchnorm + relu`
model_fp32_fused = torch.ao.quantization.fuse_modules(model_fp32, [['conv', 'relu']])

# 生成prepared model
model_fp32_prepared = torch.ao.quantization.prepare(model_fp32_fused)

# 数据集校准
input_fp32 = torch.randn(4, 1, 4, 4)
model_fp32_prepared(input_fp32)

# 转换
model_int8 = torch.ao.quantization.convert(model_fp32_prepared)

# run the model, relevant calculations will happen in int8
res = model_int8(input_fp32)

量化感知训练

量化感知训练：在训练过程中，对量化的影响进行建模，也就是模型知道自己要被量化了，学习量化过程中的参数。

在训练过程中：参数同样是以float32进行训练。
fake_quant模块可以模拟int8量化的结果。
在模型转换之后，尽可能的将激活和权重融合到一层，与静态量化相比，可以产生更高的精度，通常与CNN一起使用。

# 原始模型（所有的张量和计算都是浮点的）
previous_layer_fp32 -- linear_fp32 -- activation_fp32 -- next_layer_fp32
                      /
    linear_weight_fp32

# 使用 fake_quants 建模，用于在训练期间建模量化数值
previous_layer_fp32 -- fq -- linear_fp32 -- activation_fp32 -- fq -- next_layer_fp32
                           /
   linear_weight_fp32 -- fq

# 量化模型（权值和激活均是 int8）：
previous_layer_int8 -- linear_with_activation_int8 -- next_layer_int8
                     /
   linear_weight_int8

量化感知训练流程：(前5步和静态量化训练一样)

定义一个包含量化和反量化的模型。
创建模型实例。
模型设置为eval模式，否则无法进行静态量化。
模型附加一个全局的qconfig，包含类型相关信息，包含一定的量化配置。例如指定对称非对称，MinMax等。
设置可以进行融合的层，常用的包括conv + relu和conv + batchnorm + relu。
生成一个prepared模型，将插入observers and fake_quants，需要将模型设置为训练逻辑.train()。这样在校准期间可以观察权重和激活张量。
运行训练循环，进行模型训练。
将prepared模型设置为eval模式。
进行模型转换。

代码：

import torch

# define a floating point model where some layers could benefit from QAT
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # QuantStub
        self.quant = torch.ao.quantization.QuantStub()

        self.conv = torch.nn.Conv2d(1, 1, 1)
        self.bn = torch.nn.BatchNorm2d(1)
        self.relu = torch.nn.ReLU()

        # DeQuantStub 
        self.dequant = torch.ao.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        x = self.dequant(x)
        return x

# create a model instance
model_fp32 = M()

# model must be set to eval for fusion to work
model_fp32.eval()

# 模型附加一个全局的qconfig
model_fp32.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86')

# 设置可以进行融合的层
model_fp32_fused = torch.ao.quantization.fuse_modules(model_fp32,
    [['conv', 'bn', 'relu']])

# 生成一个prepared模型，将模型设置为训练逻辑.train()。
model_fp32_prepared = torch.ao.quantization.prepare_qat(model_fp32_fused.train())

# 运行训练循环
training_loop(model_fp32_prepared)

# 设置为eval模式
model_fp32_prepared.eval()
# 模型转换
model_int8 = torch.ao.quantization.convert(model_fp32_prepared)

# run the model
res = model_int8(input_fp32)

FX Graph Mode Quantization

FX类型的参数配置主要通过qconfig_mapping (prepare_fx函数的一个参数)完成。

代码示例：

import torch
from torch.ao.quantization import (
  get_default_qconfig_mapping,
  get_default_qat_qconfig_mapping,
  QConfigMapping,
)
import torch.ao.quantization.quantize_fx as quantize_fx
import copy

model_fp = UserModel() # 自己定义的模型

#### 训练后动态/仅权重量化 ####

# 深拷贝模型一份
model_to_quantize = copy.deepcopy(model_fp)
# 1. 设置eval()模式
model_to_quantize.eval()
# 2. 设置动态量化默认配置
qconfig_mapping = QConfigMapping().set_global(torch.ao.quantization.default_dynamic_qconfig)
# 3. 需要准备一些示例元组让模型去trace
example_inputs = (input_fp32)
# 4. 将示例元组喂给模型，生成prepared模型
model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)
# 5. 量化，不需要数据校准
model_quantized = quantize_fx.convert_fx(model_prepared)


#### 训练后静态量化 ####

# 深拷贝模型一份
model_to_quantize = copy.deepcopy(model_fp)
# 设置动态量化默认配置
qconfig_mapping = get_default_qconfig_mapping("qnnpack")
# eval模式设置
model_to_quantize.eval()
# 需要准备一些示例元组让模型去trace，准备模型
model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)
# calibrate：需要做示例数据的校准 (没有写出代码)
# 量化
model_quantized = quantize_fx.convert_fx(model_prepared)


#### 量化感知训练 ####

model_to_quantize = copy.deepcopy(model_fp)
qconfig_mapping = get_default_qat_qconfig_mapping("qnnpack")
model_to_quantize.train()
# prepare
model_prepared = quantize_fx.prepare_qat_fx(model_to_quantize, qconfig_mapping, example_inputs)
# training loop (not shown)
# quantize
model_quantized = quantize_fx.convert_fx(model_prepared)

YOLOv3模型量化

FX mode的训练后量化

import os
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import copy
import torchvision
from torchvision import transforms

from torch.quantization.quantize_fx import prepare_fx, convert_fx
from torch.ao.quantization import QConfigMapping
from torch.ao.quantization.qconfig import default_qconfig

from collections import OrderedDict
from nets.yolo import YoloBody

from torch.utils.data import DataLoader
from utils.dataloader import YoloDataset, yolo_dataset_collate
from utils.utils import get_anchors, get_classes

from nets.darknet import darknet53

def getDataloader():
    train_annotation_path   = '2007_train.txt'
    with open(train_annotation_path) as f:
        train_lines = f.readlines()

    input_shape     = [416, 416]
    train_dataset   = YoloDataset(train_lines, input_shape, num_classes, train = True)
    gen             = DataLoader(train_dataset, shuffle = True, batch_size = 12, num_workers = 4, pin_memory=True,
                                    drop_last=True, collate_fn=yolo_dataset_collate)
    return gen

def calibrate(model, data_loader):
    model.eval()
    i = 1 # 总共208轮
    
    for iteration, batch in enumerate(data_loader):
        images, targets = batch[0], batch[1]
        with torch.no_grad():
            
            images  = torch.from_numpy(images).type(torch.FloatTensor)
            targets = [torch.from_numpy(ann).type(torch.FloatTensor) for ann in targets]

            # print(images.shape)

            model(images)
            if i%10==0:
                print(i)
            i+=1

def print_size_of_model(model):
    if isinstance(model, torch.jit.RecursiveScriptModule):
        torch.jit.save(model, "temp.p")
    else:
        torch.jit.save(torch.jit.script(model), "temp.p")
    print("Size (MB):", os.path.getsize("temp.p")/1e6)
    os.remove("temp.p")

if __name__ == "__main__":
    
    state_dict = torch.load('logs/ep099-loss2.616-val_loss4.583.pth')

    classes_path    = 'model_data/voc_classes.txt'
    anchors_path    = 'model_data/yolo_anchors.txt'
    anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    class_names, num_classes = get_classes(classes_path)
    anchors, num_anchors     = get_anchors(anchors_path)

    traindataloader = getDataloader() # 生成训练数据集
    
    # print(next(iter(traindataloader))[0])

    float_model = YoloBody(anchors_mask, num_classes, pretrained=False) # 原始模型保存一份
    float_model.load_state_dict(state_dict, strict=False)
    float_model.to('cpu')
    float_model.eval()

    model_to_quantize = YoloBody(anchors_mask, num_classes, pretrained=False) # 用来量化的模型
    model_to_quantize.load_state_dict(state_dict, strict=False)
    model_to_quantize.to('cpu')
    model_to_quantize.eval()

    qconfig = default_qconfig
    qconfig_mapping = QConfigMapping().set_global(qconfig)
    
    # example_inputs = (next(iter(traindataloader))[0])
    example_inputs = torch.randn((12, 3, 416, 416))

    prepared_model = prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)
    # print(prepared_model.graph)
    calibrate(prepared_model, traindataloader)

    print("BEGIN") 
    quantized_model = convert_fx(prepared_model)
    # print(quantized_model) 
    print("OK") 

    print("Size of model before quantization")
    print_size_of_model(float_model)# 

    print("Size of model after quantization")
    print_size_of_model(quantized_model)#  

    torch.jit.save(torch.jit.script(quantized_model), 'logs/outQuant.pth')
    loaded_quantized_model = torch.jit.load('logs/outQuant.pth')

QAT

import torch
from torch.ao.quantization import (
  get_default_qconfig_mapping,
  get_default_qat_qconfig_mapping,
  QConfigMapping,
)
import torch.ao.quantization.quantize_fx as quantize_fx

import numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.optim as optim
from torch.utils.data import DataLoader

from nets.yolo import YoloBody
from nets.yolo_training import YOLOLoss, weights_init
from utils.callbacks import LossHistory
from utils.dataloader import YoloDataset, yolo_dataset_collate
from utils.utils import get_anchors, get_classes
from utils.utils_fit import fit_one_epoch

if __name__ == "__main__":

    Cuda            = True
    classes_path    = 'model_data/voc_classes.txt'
    
    anchors_path    = 'model_data/yolo_anchors.txt'
    anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    
    model_path      = ''
   
    input_shape     = [416, 416]
    pretrained      = False
    
    Init_Epoch          = 0
    Freeze_Epoch        = 0
    Freeze_batch_size   = 8
    Freeze_lr           = 1e-3
    
    UnFreeze_Epoch      = 1
    Unfreeze_batch_size = 4
    Unfreeze_lr         = 1e-4
    
    Freeze_Train        = True
   
    num_workers         = 4
    
    train_annotation_path   = '2007_train.txt'
    val_annotation_path     = '2007_val.txt'

    class_names, num_classes = get_classes(classes_path)
    anchors, num_anchors     = get_anchors(anchors_path)

    model_to_quantize  = YoloBody(anchors_mask, num_classes, pretrained=pretrained)
    model_to_quantize = model_to_quantize.to('cuda')

    # 量化感知训练
    model_to_quantize .eval()
    qconfig_mapping = get_default_qat_qconfig_mapping("qnnpack")
    model_to_quantize.train()
    example_inputs = torch.randn((12, 3, 416, 416))

    model_prepared = quantize_fx.prepare_qat_fx(model_to_quantize, qconfig_mapping, example_inputs)

    model_train = model_prepared.train()
    model_train = model_train.cuda()

    yolo_loss    = YOLOLoss(anchors, num_classes, input_shape, Cuda, anchors_mask)
    loss_history = LossHistory("logs/")

    with open(train_annotation_path) as f:
        train_lines = f.readlines()
    with open(val_annotation_path) as f:
        val_lines   = f.readlines()
    num_train   = len(train_lines)
    num_val     = len(val_lines)
    
    if True:
        batch_size  = Freeze_batch_size
        lr          = Freeze_lr
        start_epoch = Init_Epoch
        end_epoch   = Freeze_Epoch
                        
        epoch_step      = num_train // batch_size
        epoch_step_val  = num_val // batch_size
        
        if epoch_step == 0 or epoch_step_val == 0:
            raise ValueError("数据集过小，无法进行训练，请扩充数据集。")
        
        optimizer       = optim.Adam(model_train.parameters(), lr, weight_decay = 5e-4)
        lr_scheduler    = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.94)

        train_dataset   = YoloDataset(train_lines, input_shape, num_classes, train = True)
        val_dataset     = YoloDataset(val_lines, input_shape, num_classes, train = False)
        gen             = DataLoader(train_dataset, shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True,
                                    drop_last=True, collate_fn=yolo_dataset_collate)
        gen_val         = DataLoader(val_dataset  , shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 
                                    drop_last=True, collate_fn=yolo_dataset_collate)

        if Freeze_Train:
            for param in model_prepared.backbone.parameters():
                param.requires_grad = False

        for epoch in range(start_epoch, end_epoch):
            fit_one_epoch(model_train, model_prepared, yolo_loss, loss_history, optimizer, epoch, 
                    epoch_step, epoch_step_val, gen, gen_val, end_epoch, Cuda)
            lr_scheduler.step()
            
    if True:
        batch_size  = Unfreeze_batch_size
        lr          = Unfreeze_lr
        start_epoch = Freeze_Epoch
        end_epoch   = UnFreeze_Epoch
                        
        epoch_step      = num_train // batch_size
        epoch_step_val  = num_val // batch_size
        
        if epoch_step == 0 or epoch_step_val == 0:
            raise ValueError("数据集过小，无法进行训练，请扩充数据集。")
        
        optimizer       = optim.Adam(model_train.parameters(), lr, weight_decay = 5e-4)
        lr_scheduler    = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.94)

        train_dataset   = YoloDataset(train_lines, input_shape, num_classes, train = True)
        val_dataset     = YoloDataset(val_lines, input_shape, num_classes, train = False)
        gen             = DataLoader(train_dataset, shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True,
                                    drop_last=True, collate_fn=yolo_dataset_collate)
        gen_val         = DataLoader(val_dataset  , shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 
                                    drop_last=True, collate_fn=yolo_dataset_collate)

        if Freeze_Train:
            for param in model_prepared.backbone.parameters():
                param.requires_grad = True

        for epoch in range(start_epoch, end_epoch):
            fit_one_epoch(model_train, model_prepared, yolo_loss, loss_history, optimizer, epoch, 
                    epoch_step, epoch_step_val, gen, gen_val, end_epoch, Cuda)
            lr_scheduler.step()

    model_quantized = quantize_fx.convert_fx(model_prepared)
    torch.jit.save(torch.jit.script(model_quantized), 'logs/outQATQuant.pth')
    loaded_quantized_model = torch.jit.load('logs/outQATQuant.pth')

    """ 
    model_fused = quantize_fx.fuse_fx(model_quantized)
    torch.jit.save(torch.jit.script(model_fused), 'logs/outQATQuantfused.pth')
    loaded_quantized_model = torch.jit.load('logs/outQATQuantfused.pth') """

项目源码：https://github.com/cauccliu/YOLOv3Quant

参考列表：

人工智能

#Python #CV #模型优化

YOLOv3量化

https://cauccliu.github.io/2024/05/07/YOLOv3量化/

Author

Liuchang

Posted on

May 7, 2024

Licensed under

Pytorch基础复习 Previous

手写实现Pytorch版本YOLOv3 Next