Abstract:Owing to the revolution of GPU architecture and improvement of developing platforms, GPU is widely used in scientific computing nowadays. Relationships among GPU architecture, programming model and memory hierarchy are illustrated by analyzing memory hierarchy and exploring key performance features of GPU. Three basic load balance strategies on mapping applications onto GPU are presented: Prefetch, stream computing, task division. The effective relationships among different factors and optimization efficiency are tested and exposed by experiments.