Prefill-Decode异质性
-
揭秘LLM推理两阶段瓶颈:从GPU微架构根源到跨场景高效部署策略
关键词:LLM Inference、GPU、 Prefill-Decode Heterogeneity 、Microarchitectural Analysis 、Multi-GPU Scaling 、Energy Predictability A Systematic Characterization of LLM Inference on GPUs ht…