YOLOv8n/s/m/l/x Pruning & Distillation Analysis

After validating pruning and distillation on a single YOLOv8m model, the experiment was extended to YOLOv8n/s/m/l/x to examine whether the same trends hold across model scales. Detailed experimental settings and single-model results are covered in the previous post. All experiments were conducted on an RTX 4060, and on edge devices such as Jetson, the performance gap between baseline and pruned models may become more pronounced.

For pruning, parameter count and inference time were compared between baseline and pruned models. The effect becomes more significant as model size increases. For example, YOLOv8x was reduced from approximately 68.2M to 38.3M parameters, while inference time improved from 25.5ms to 19.2ms. In contrast, YOLOv8n decreased from 3.16M to 2.27M, showing relatively limited room for reduction. Overall, larger models exhibit greater parameter compression and more substantial latency improvements. (Fig 1a, 1b)

For distillation, teacher, student (pruned), Knowledge Distillation (KD), and feature KD were compared. The effectiveness of KD varies with model size. In YOLOv8n, mAP50-95 dropped significantly from 0.7485 to 0.638 after pruning, but recovered to around 0.727 with KD. For models at the scale of YOLOv8m and above, KD not only recovered performance but in some cases surpassed the baseline. For instance, YOLOv8x improved from a baseline of 0.9238 to 0.935 (KD) and 0.9361 (feature KD). (Fig 2)

Two key trends emerge. First, pruning becomes more effective and stable as model size increases. Second, distillation is not merely a recovery step but acts as a restructuring process for the pruned model. In smaller models, KD plays a critical role in restoring performance, while in larger models it can lead to further improvements beyond the baseline.

Although this experiment was conducted on COCO128 and should be interpreted as a trend analysis rather than an absolute benchmark, the overall workflow - “large model → pruning → distillation” - demonstrates a consistent and effective optimization strategy.

Figure 1. Efficiency — (a) Parameter Reduction (M) | (b) Inference Time Reduction

Figure 2. Distillation Performance Comparison