The image depicts a man attending a presentation in a professional setting. The room is filled with computers and laptops, and the man is seated in front of a screen with a whiteboard. There are several people in the room, and they are all focused on the presentation. The image is a close-up of the man's face, and he appears to be listening attentively to the speaker. In the background, there is a train passing by, adding to the overall ambiance of the scene. Text transcribed from the image: Flash Attention-3: Optimizing FlashAttention for H100 GPU Jay Shah*, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao 1. New instructions on H100: WGMMA: higher throughput - TMA: faster loading from gmem <-> smem, saves registers 2. Asynchrony - Overlap gemm and softmax Builds on asynchronous wgmma, TMA, tx barrier - Inter-warpgroup overlapping: warp-specialization, pingpong - Intra-warpgroup overlapping 3. Low-precision - FP8: incoherent processing to reduce errors 10