0018-9340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2419654, IEEE Transactions on Computers JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 A Novel Computational Model for Non-linear Divisible Loads on a Linear Network Chi-Yeh Chen and Chih-Ping Chu Abstract—This work investigates the problem of a non-linear divisible load distribution on a homogeneous linear network. A novel computational model of non-linear loads that includes complete steps for processing them, is proposed. This model solves the problem of the classical model, whose performance degrades by separating the load. This work also presents an algorithm S (S ingle-installment) that uses single-installment processing to distribute a non-linear divisible load on a homogeneous linear network. An algorithm M (M ulti-installment) that applies multi-installment processing to reduce the initial distribution time for load is also proposed. Closed-form expressions for the parallel processing time and speed-up of the proposed algorithms are derived. The speed-up of algorithm S is much better than that of the classical algorithm that is based on the classical model. Algorithm M outperforms algorithm S in terms of speed-up when the load to be processed is very large or when the start-up costs are small. Index Terms—divisible load theory, non-linear computational loads, linear network, load distribution, multi-installment. 1 I NTRODUCTION A fine load distribution is importantly required to provide efficient utilization of the resources in a parallel and distributed system or a multiprocessor system. Several linear mathematical models have been proposed, such as queuing theory, electric resistive circuit theory and divisible load theory. The divis- ible load theory provides ease of computation, a schematic language, and equivalent network element modeling [33]. In divisible load theory, a load can be arbitrarily partitioned and distributed to at least two processors to reduce processing time, such that each part of the load can be independently processed on any processor. The assumption that the load is divisi- ble is reasonable for many practical applications, such as linear algebra [17], image processing, multimedia applications [29], database searching, large-scale data file processing [26], data-intensive applications [27], [34], numerical computing [38], biomedicinal and bioinformatic applications [30], and Internet packet scheduling [19]. Since Cheng and Robertazzi [12] developed di- visible load theory and Agrawal and Jagadsih [1] proposed the application of divisible load theory to mathematical programming, other interconnec- tion topologies have been proposed; these including bus [19], [42], linear array [10], tree [9], [28], hyper- cube [11], mesh [10], partitionable networks, arbitrary networks [43], clusters [22], [40], grids [41], and net- works of workstations [4]. For large loads, multi-installment processing is The authors are with the Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan, Taiwan, ROC. E-mail: chency@csie.ncku.edu.tw, chucp@csie.ncku.edu.tw. well-known to minimize the parallel computation time. In multi-installment algorithms, not all pro- cessors receive just a single partial load at a time: rather, at least one processor receives at least two frac- tional loads. Many multi-installment divisible load algorithms for chains, stars, and trees can be found elsewhere [7], [14], [36], [42]. This work investigates a divisible non-linear load in a static interconnection linear network. The proposed algorithms apply not only to a linear network, but also to a fat-tree network with cut-through switch- ing. A linear network can be embedded in a fat-tree network. The proposed algorithms can be applied to such networks using initial boundary processors. In the proposed algorithms, the main difference between a linear network and a fat-tree network is the com- munication time between two nodes in the network, which is the sum of the time to prepare a message for transmission and that taken by the message to traverse the network to its destination. Consider a message of a size L that is being trans- mitted over a static interconnection network between two directly connected nodes. The communication time is T comm = θ cm +T h +LT cm where θ cm represents the time required to set up a message at two nodes; T h is the time taken by the header of a message to travel between two directly connected nodes, and T cm is the time taken to transmit a unit of load. If the message traverses l links in a fat-tree network, then the communication time is T comm = θ cm + lT h + LT cm . The communication time for a message depends not only on the size of the message, but also on its routing path. However, in currently available parallel computers, T h is quite small and the diameter of most networks is also small. Therefore, the terms T h and lT h can be neglected with little loss of accuracy. In this work, the proposed algorithms exclude the start-