CUDA Visual Profiler

A useful CUDA tool from NVIDIA
Download

CUDA Visual Profiler Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Publisher Name:
  • Nvidia
  • Operating Systems:
  • Windows XP / Vista / XP X64 / Vista64
  • File Size:
  • 5.6 MB

CUDA Visual Profiler Tags


CUDA Visual Profiler Description

CUDA Visual Profiler is a dedicated tool for working with NVIDIA GPU cards and with the dedicated toolkit. Main features: Execute a CUDA program with profiling enabled and view the profiler output as a table. The table has the following columns for each GPU method: Timestamp: Start time stamp Method: GPU method name. This is either "memcopy" for memory copies or the name of a GPU kernel. GPU Time CPU Time Stream Id : Identification number for the stream Columns only for kernel methods: Occupancy Profiler counters: gld uncoalesced : Number of non-coalesced global memory loads gld coalesced : Number of coalesced global memory loads gst uncoalesced : Number of non-coalesced global memory stores gst coalesced : Number of coalesced global memory stores local load : Number of local memory loads local store : Number of local memory stores branch : Number of branch events (instruction and/or sync stack) divergent branch : Number of divergent branches within a warp instructions : Number of dynamic instructions (in fetch) warp serialize : Number of threads in a warp serialize based on address (GRF or constant) cta launched : Number of CTAs launched on the PM TPC grid size X : Number of blocks in the grid along dimension X grid size Y : Number of blocks in the grid along dimension Y block size X : Number of threads in a block along dimension X block size Y : Number of threads in a block along dimension Y block size Z : Number of threads in a block along dimension Z dyn smem per block: Dynamic shared memory size per block in bytes sta smem per block: Static shared memory size per block in bytes reg per thread: Number of registers per thread Columns only for memcopy methods: mem transfer dir : Memory transfer direction, 0: host to device, 1: device to host mem transfer size: Memory transfer size in bytes Please refer the "Interpreting Profiler Counters" section below for more information on profiler counters. Note that profiler counters are also referred to as profiler signals. Display the summary profiler table. It has the following columns for each GPU method: Method: Method name #calls: Number of calls GPU usec: Total GPU time in micro seconds CPU usec: Total CPU time in micro seconds %GPU time: Percentage GPU time Total counts for each profiler counter Display various kinds of plots: Summary profiling data bar plot GPU Time Height plot GPU Time Width plot Profiler counter bar plot Profiler output table column bar plot Comparison Summary plot Analysis of profiler output lists out method with high number of: incoherent stores incoherent loads warp serializations Compare profiler output for multiple program runs of the same program or for different programs. Each program run is referred to as a session. Save profiling data for multiple sessions. A group of sessions is referred to as a project. Import/Export CUDA Profiler CSV format data.


CUDA Visual Profiler Related Software