FlashOverlap: Minimizing Tail Latency in Communication Overlap for Distributed LLM Training — Rezaul Karim, Austin Wen, Wang Zongzuo, Weiwei Zhang, Yang Liu, Walid Ahmed | Kutubxona