Get in Touch

Course Outline

Introduction

  • What is ROCm?
  • What is HIP?
  • ROCm vs. CUDA vs. OpenCL
  • Overview of ROCm and HIP features and architecture
  • ROCm for Windows vs. ROCm for Linux

Installation

  • Installing ROCm on Windows
  • Verifying installation and checking device compatibility
  • Updating or uninstalling ROCm on Windows
  • Troubleshooting common installation issues

Getting Started

  • Creating a new ROCm project using Visual Studio Code on Windows
  • Exploring project structure and files
  • Compiling and running the program
  • Displaying output using printf and fprintf

ROCm API

  • Using ROCm API in host programs
  • Querying device information and capabilities
  • Allocating and deallocating device memory
  • Copying data between host and device
  • Launching kernels and synchronizing threads
  • Handling errors and exceptions

HIP Language

  • Using HIP language in device programs
  • Writing kernels for GPU execution and data manipulation
  • Using data types, qualifiers, operators, and expressions
  • Using built-in functions, variables, and libraries

ROCm and HIP Memory Model

  • Using different memory spaces: global, shared, constant, and local
  • Using various memory objects: pointers, arrays, textures, and surfaces
  • Using memory access modes: read-only, write-only, read-write, etc.
  • Understanding memory consistency models and synchronization mechanisms

ROCm and HIP Execution Model

  • Utilizing execution models: threads, blocks, and grids
  • Using thread functions like hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
  • Using block functions like __syncthreads, __threadfence_block, etc.
  • Using grid functions like hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

  • Debugging ROCm and HIP programs on Windows
  • Using Visual Studio Code debugger to inspect variables, breakpoints, and call stacks
  • Using the ROCm Debugger for debugging ROCm and HIP programs on AMD devices
  • Using the ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

  • Optimizing ROCm and HIP programs on Windows
  • Improving memory throughput through coalescing techniques
  • Reducing memory latency using caching and prefetching techniques
  • Optimizing memory access and bandwidth using shared and local memory techniques
  • Measuring and improving execution time and resource utilization using profiling tools

Summary and Next Steps

Requirements

  • Understanding of C/C++ language and parallel programming concepts.
  • Basic knowledge of computer architecture and memory hierarchy.
  • Experience with command-line tools and code editors.
  • Familiarity with the Windows operating system and PowerShell.

Audience

  • Developers seeking to install and use ROCm on Windows to program AMD GPUs and exploit parallelism.
  • Developers aiming to write high-performance, scalable code compatible with various AMD devices.
  • Programmers interested in exploring low-level GPU programming aspects and optimizing code performance.
 21 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories