Thread Vulnerability For Architectures
Continuously reducing transistor sizes and aggressive low power operating modes employed by modern architectures tend to increase transient error rates. The rates of soft errors keep increasing with smaller transistors and more aggressive power modes. To quantify the system vulnerability to soft errors and evaluate the efficiency of fault-tolerance techniques, a metric of vulnerability comparison is required. Although some metrics have been defined in literature for vulnerability analysis, they have limitations to present parallel program vulnerability. Given emerging trends toward multicore architectures and their ensembles, one needs a new metric to quantify vulnerability of multithreaded applications.
In our research, we explore the soft error reliability analysis of parallel applications running on multicore architectures. As part of of our TUBITAK project “Application Scheduling and Optimization for Chip Multiprocessor Architectures”, we already introduced and evaluated a novel metric, Thread Vulnerability Factor (TVF), in order to quantify thread vulnerability and to qualify the relative vulnerability of parallel applications to soft errors. We also presented the performance-vulnerability analysis of parallel applications for a variety of data intensive applications and discussed the effects of design choices on system performance and reliability.
We are in the process of utilizing our novel metric in two different new research directions.
In this research work, we use Simics toolset to collect performance and reliability statistics and compare different schemes quantitatively. The Simics toolset is the main simulation platform in our study. We also use trace and g-cache modules of the toolset in order to gather data about parallel applications running on the target multicore architecture.
*** We do thank to Wind River Company for software donations (Simics toolset and other modules) as part of its university program.