MEMOnet zyxwvu : Network interface plugged into a memory slot zy Noboru Tanabe, Junji Yamamoto, Hiroaki Nishi, Tomohiro Kudoh RWCP Tsukuba Research Center Tsukuba Mitsui Building, Tsukuba, JAPAN 305-0032 { zyxwv tanabe,junji,west,kudoh} @ trc.rwcp.or.jp Yoshihiro Hamada, Hironori Nakajo Tokyo University of Agriculture and Technology Hideharu Amano Keio University Naka-cho Koganei, JAPAN 184-8588 Hiyoshi Yokohama, JAPAN 223-8522 hamada@nj.cs.tuat.ac.jp, nakajo@cc.tuat.ac.jp hunga@am.ics.keio.ac.jp Abstract zyxwvut The communication architecture of the DIMMnet-I net- work interface, based on MEMOnet, is described. MEM- Onet is an architecture consisting of a network integace plugged into a memory slot. The DIMMnet-I prototype will have two banks of PC133 based SO-DIMM slots and an 8Gbps full duplex optical link or two 448M%/s zyxwvutsrq full duplex zyxwvu LVDS channel links. The sofrware overhead incurred to generate a message is only zyxwvutsr 1 CPU cycle and the estimated hardware delay is less than IOOns using the atomic on-the- $y sending with header TL%. The estimated achievable com- munication bandwidth with block on-the& sending with protection stampable window memory is 44OMB/s which was observed in our experiments writing to the DIMM area with a write combining attribute. This is 3.3 times higher than the maximum bandwidth of PCI. This high perfor- mance distributed computing environment is available us- ing economical personal computers with DIMM slots. 1 Introduction Network-based parallel processing using commodity components, such as personal computers, has received at- tention as an important parallel-computing environment. Considering the cost and time necessary for the develop- ment of dedicated massively parallel machines, connecting personal computers is a more realistic approach to realizing an inexpensive, high performance computing system. Many high performance clusters use system area net- works such as Myrinet[l] as an interconnection. Myrinet based on LANai7,64bit 64 MHz PCI and a 1.25 Gbps elec- trical link is broadly used. The sustained one-way data rate is 140 MB/s and its short-message latency is 13.37~s. Re- cently, Myrinet based on LANai9 running at 132MHz re- duced the latency to 9ps. On the other hand, a new SCI card[2] based on 64bit 33MHz PCI and a 400MB/s link was announced. It is claimed that the communication latency is 2.3~s. But most current PCs located in an office environ- ment have only 32bit 33MHz conventional PCI slots. If Myrinet or a PCI-SCI interface were to be plugged into a conventional PCI slot, the performance would be degraded. In addition, because of the half duplex bus, PCI based NIC (network interface card) bandwidth will degrade when the sending operation and the receiving operation are executed simultaneously. In order to recycle the unused computing power avail- able in offices and laboratories, we have to find a broadly adaptable high performance interface on the motherboards of low cost PCs for a high performance NIC. A high performance I/O specification for server type PCs such as Infiniband[3] is going to be developed by a consor- tium with many major companies participating. However, it is not designed for low cost PCs but for high performance servers. On the other hand, optical interconnection and high speed LVDS electrical channels are becoming more realis- tic. 15 Gbps class parallel optical links are getting cheaper since the number of these links adapted to exsisting systems is increasing. A pair of low cost LVDS channel link serial- izer / deserializer chips[4] can be used to provide a SGbps electrical link operating at about a 5m distance. Therefore, over lGB/s communication bandwidth per NIC is becoming more realistic. Our research group has developed RHiNET (RWCP High performance NETwork) [5][6] systems based on 1 Gbps or 8 Gbps parallel optical links and a PCI bus. But, because of PCI bandwidth limitations, bus handling overhead and protocol software overhead, the potential of parallel optical links or LVDS channel links can’t be fully 17 0-7695-0896-0100 $10.00 zyxwvutsrqp 0 2000 IEEE