TABARNAC: Tools for Analyzing the Behavior of Applications Running on Numa ArChitectures provides several visualisations of memory traces and gives you hints to improve your applications' memory behavior.
For hints on NUMA optimization see General Advices
Note:
tabarnac -p expname
(where expname is the name of the
application you've just traced) should remove them.expname.acc.csv
,expname.structsStats.csv
,expname.structsModified.csv
will force tabarnac to parse the original trace file again.The following plot shows the structures sizes in pages (usually one page = 4096 bytes) this information can give a hint on the importance of the different data structures and on the possible kind of optimization.
Note:
AnonymousStruct#n
, you should try to
recompile your program with the “-g” option and re run TABARNAC.
The following visualization shows the number of accesses and the ratio Read/Write for every data structures. As the previous plot, it helps to understand the importance of each structure.
The next table tells which structures are not read or written by which thread, this information is useful to determine if duplication is a possible solution.
## [1] Struct Thread Type NbAccess
## <0 rows> (or 0-length row.names)
The following plots shows for each structure how much each thread access each
pages.The horizontal ̀Avg
lines indicate the average access per pages and
the vertical the average access per thread.
If the memory is correctly accessed, some groups of threads (from 1 to the maximum thread per NUMA node of the experimental machine) should appear. A group of thread is a set of thread accessing (mostly) the same set of pages. Moreover, the Average number of accesses should be more or less the same, for every threads and for every pages.
If you can identify groups of threads working on the same part of a structure, try to bind them on the same NUMA node, with the part of the structure they access.
If the average accesses per thread is imbalanced, it means that some thread access to much the memory while other don't. This usually means that the work is not correctly balanced between threads.
If the average accesses per page is imbalanced, it means that the workload is not not uniformly distributed over the structure. If it is not possible to distribute the accesses differently, you should at least ensure that there not hotspots (pages accessed a lot by every threads). If this is not possible, interleaving the structure on the NUMA nodes should still improve the performances.
By default, on recent operating systems, when an application runs on a NUMA machine, data are mapped according to the first touch policy aka near to the first thread which use it. The following plot show the repartition of the first touch over the structures. The first touch pattern should be similar to the distribution pattern of the previous plots. If the plot is a straight line, it means that one thread is initializing the whole structure which is generally a source of performance issues on NUMA machines, you should try to split the initialization such that each thread is responsible for a part or manually distribute the structure on the nodes.