Abstract:High-performance computing has witnessed a tremendous growth and acceptance over the last decade, primarily due to the availability of clusters. The performance of these clusters hinges upon the communication interface. User level communication based on storing address translation table in off-chip SRAM has deeply increased the design complex and cost of chip and system. The paper put forward and implemented a novel Free-Memory communication interface of many cores processor which differs from traditional HCA based on I/O bus, without local memory interface and reduced cost of address translation by efficient cache management method. Experimental results show the communication interface which we implemented not only can reduces the design complex and cost of chip and system, but also can achieve better bandwidth and latency than infiniband QDR HCA.