logo  title

ProjectOverview 

 


  Background
Targets
Research Documents
Schedule and History  


   
  Background 
   
   

Currently, there have been various discussions about possible designs of Exa-scale computers. Most of those designs predict that the number of nodes and the number of cores will be significantly increased, and the interconnect topology will be more complicated. On such systems, communication libraries should be re-designed to fullfil the requirements of scalability. Especially, memory usage and performance tuning will become the key issues.

As for memory usage, in the existing communication ilbraries, each process prepares some amount of receive buffer for each of other processors. This works efficiently up to hundreds of thousands of processes. However, when the number of processes becomes over 100 million, the total amount of memory for the buffer will be more than 100GB/process, even when the amount of each receive buffer is 1KB. At the same time, the available amount of memory per process is predicted to be the same level or reduced on Exa-scale computers. Therefore, communication libraries on such systems must be based on memory saved protocols.

On the other hand, as for performance tuning, static and manual methods are applied on the existing communication libraries. For example, the thresholds for changing algorithms of collective communications are decided by using some benchmark programs at the installation of the library. As the number of processes increases, and the interconnect topology becomes complicated, the search space of the optimization will be increased explosively. In addition to that, because of the complexed toplogy, it becomes quite difficult to predict performance statically. Therefore, some automatic and dynamic method will be needed for tuning communication libraries.

Another important point for performance tuning is the information of the programs. Existing communication libraries can only achieve information about parameters that have been specified at the invokation of communication functions. Therefore, libraries cannot analyze how those functions are used in the programs. For example, if the library can detect that the invoked function will be repeatedly invoked for many times, it can pay some overhead to apply more aggresive optimization at runtime. Or, there can be special approaches for implementing some popular patterns of computation and communication.

 

 

   
  Targets 
   
 

Advanced Communication Primitives (ACP) Library

 

A communication library designed to achieve sustained scalability towards exa-scale computing environments. To minimize the requirements of memory consumption at the initialization, ACP provides RDMA model as the basic communication layer.

On interconnect networks where RDMA is supported as a fundamental facility, this layer can be implemented with minimal memory consumption and overhead. Since programming with RDMA needs detailed operations such as memory registrations, address exchanges and synchronizations, ACP also prepares some sets of programmer-friendly interfaces as the middle layer.

To enable the library to consume just-enough amount of memory, each interface of the middle layer requires explicit allocation of the memory region before using it. This region can be explicitly de-allocated so that the memory region can be reused for other purposes. Each of the interfaces of this middle layer is primitive and independent.

For example, the channel interface in this layer only supports one-directional and in-order data transfer between a pair of processes. This helps to minimize the memory consumption and overhead in the implementation. Also, the independent interfaces enable precise control of the allocation and de-allocation of memory regions for them. At this point, interfaces of channels, vectors, lists and memory allocation are defined in ACP. There are plans for other interfaces such as group communications, deques, maps, sets and counters.

 

     Page Top
   
  Research Documents 
   
 

ISC High Performance 2015 Birds-of-a-Feather(BoF) session;
 How to Achieve Memory-Efficient Communication towards Exascale HPC

                                                     

 

   
  Schedule and History 
   
 

 2015-2016 : Evaluation and feedback.

         ACP-2.1 has been released on January 13. 2017.

         ACP-2.0 has been released on June 13. 2016.

         ACP-1.2 has been released on November 13. 2015.

         ACP-1.1.1. has been released on July 13. 2015.

         ACP-1.1 has been released on May 11. 2015.

         Mantis BTS is available on February 9. 2015.

 2013-2015 : Design, implement and publish ACE library.

         Hands-on Tutorial of ACP at SC14.

         ACP-1.0 has been released on September 1. 2014.

         Specification of ACP-1.0 is available on September 1. 2014.

 2012-2014 : Research and develop individual technologies.

 2011 : Start of the project.

 

 

     Page Top