ABSTRACT: Current HPC systems incorporate many thousands of cores with five levels of parallelism, mostly managed by software. The trend to include accelerators at each node adds at least two more levels of parallelism, and introduces processor and memory heterogeneity. Exascale systems will incorporate millions of cores running billions of threads. A programming model for such systems that incorporates resilience, while preserving (preferably improving) productivity, scalability and performance portability will be a challenge. The solution requires a portable node-level communication layer, lightweight socket-level and many-core parallelism, and ultra-lightweight accelerator-side parallelism. It must address the distinct requirements of a hardware-managed cache-based memory system coupled with a software managed stream-based memory system. We provide an overview of a programming model that meets the requirements and allows convenient expression of multi-level parallelism in such a way as to allow re-scaling and flexible decomposition of programs based on HPC system characteristics.