Intro to GPU Programming Video: Introduction to NVIDIA GPU Computing Presentation: Inside the NVIDIA HPC SDK: the Compilers, Libraries and Tools for Accelerated Computing Video: ACM Winter School 2019 on HPC IIT Kanpur Libraries Documentation: Math and communication libraries Presentation: How CUDA Math Libraries Can Help You Unleash the Power of the New NVIDIA A100 GPU Online Course: Accelerating Applications with GPU-Accelerated Libraries in C/C++ (Request access through your GPU Hackathon organizers) Programming Models OpenACC Web Page: OpenACC.org Resources Forum: OpenACC Slack Channel Presentation: Zero to GPU Hero with OpenACC Training Series: OpenACC Training Series Videos: Directive Based GPU Programming GitHub: OpenACC Training Materials Docker Container: OpenACC Training Materials Online Course: Fundamentals of Accelerated Computing with OpenACC (Fee-based) CUDA Traning Series: CUDA Training Series Presentation: CUDA on NVIDIA Ampere GPU Architecture: Taking Your Algorithms to the Next Level of Performance Presentation: CUDA New Features And Beyond Presentation: Developing CUDA Kernels to Push Tensor Cores to the Absolute Limit on NVIDIA A100 Online Course: Fundamentals of Accelerated Computing with CUDA C/C++ (Fee-based) CUDA Fortran Documentation: Programming Guide Python Presentation: CuPy Overview: NumPy Syntax Computation with Advanced CUDA Features Presentation: Accelerating Python with CUDA Video: Valentin Haenel: Create CUDA kernels from Python using Numba and CuPy Online Course: Fundamentals of Accelerated Computing with CUDA Python (Fee-based) Kokkos GitHub: Kokkos Repository GitHub: Kokkos Tutorials Forum: Kokkos Slack Channel Video: Kokkos: C++ Performance Portability for Production RAJA GitHub: RAJA Repository Video: A Tutorial Introduction to RAJA Multi-GPU Presentation: Multi-GPU Programming with Message-Passing Interface Presentation: Multi-GPU Programming Presentation: A Partitioned Global Address Space Library for Large GPU Clusters Tools Video: NVIDIA Nsight™ Systems Tutorial (Use the following Nsight report files to follow the tutorial.) Video: NVIDIA Nsight Compute Tutorial (Use the following Nsight report files to follow the tutorial.) Videos: Roofline Analysis Workshop - Part 1 and 2, Part 3 Presentation: What the Profiler is Telling You: How to Get the Most Performance out of Your Hardware Presentation: Optimizing CUDA Kernels in HPC Simulation and Visualization Codes Using NVIDIA Nsight Compute Presentation: Roofline Performance Model for HPC and Deep-Learning Applications Video: Cross-Platform Performance Engineering with Arm Allinea Studio Blog: GPROF Tutorial – How to use Linux GNU GCC Profiling Tool Blog: Custom Application Profile Timelines with NVTX Data Science Web Page: RAPIDS overview Web Page: RAPIDS-Getting Started (Hands-on labs and other materials) Forum: RAPIDS Community Presentations: Collection of RAPIDS Talks Online Course: Fundamentals of Accelerated Data Science with RAPIDS (Fee-Based) AI/Deep Learning Libraries, Frameworks, SDKs Introduction to AI Presentation: Dive into Deep Learning Presentation: Do-it-Yourself Automatic Speech Recognition with NVIDIA Technologies Online Course: Fundamentals of Deep Learning for Computer Vision (Fee-Based) GitHub: Deep Learning Examples (The latest deep learning example networks for training and Inference.) CuDNN Documentation: CuDNN Developer Guide Presentation: cuDNN v8 New Advances in Deep Learning Acceleration: APIs, Optimizations, and How to Tackle the Future Challenges in Hardware and Software Presentation: Deep Learning Training with cuDNN NVIDIA CLARA Container: NVIDIA Clara Train SDK Jupyter Notebooks: Intro to Clara Train SDK Jupyter Notebook: Clara Federated Learning Presentation: Clara Developer Day: Scalable and Modular Deployment Powered by Clara Deploy SDK Web Page: NVIDIA Clara™ Parabricks Overview DeepStream Presentation: Developing IVA Software Using NVIDIA DeepStream SDK Online Course: Getting Started with DeepStream for Video Analytics on Jetson Nano (Fee-Based) Online Course: AI Workflows for Intelligent Video Analytics with DeepStream (Fee-Based) TensorRT Presentation: PyTorch-TensorRT: Accelerating Inference in PyTorch with TensorRT Presentation: TensorRT inference with TensorFlow 2.0 Online Course: Optimization and Deployment of TensorFlow Models with TensorRT (Fee-Based) TensorFlow Online Course: Image Classification with TensorFlow: Radiomics - 1p19q Chromosome Status Classification (Fee-Based) Presentation: Extensions of TensorFlow-Based Computational Fluid Dynamics Keras Online Course: Modeling Time Series Data with Recurrent Neural Networks in Keras (Fee-Based) Containers Video: Making Containers Easier with HPC Container Maker Online Course: High-Performance Computing with Containers (Fee-Based) Advanced Presentation: Inside the NVIDIA Ampere Architecture Presentation: Optimizing Applications for NVIDIA Ampere GPU Architecture