ABSTRACT: Recently the ratio of memory bandwidth to computational performance (B/F) of vector processors has decreased. To cover the insufficient B/F, an on-chip cache is employed by modern vector processors. The purpose of this work is to establish the performance tuning strategy for the vector processor with vector cache to exploit its potential. In the strategy, loop unrolling and cache blocking which are the important loop transformations for optimization are combined systematically to break the performance bottleneck. To decide which of loop unrolling and cache blocking is performed first, the roofline model is employed as the performance model. Then the optimization effective to remove the bottleneck is applied preferentially. To determine the number of loop unrolls and the blocking size, we employ the greedy search algorithm. The superiority of the strategy is evaluated several applications. The evaluation results show that the proposal improves the performance and reduce the energy consumption drastically.