For enhancement the back-propagation algorithm has been parallelized in different manners: First the training set can be partitioned for the batch learning implementation. The neural network is duplicated on every processor of the parallel machine, and each processor works with a subset of the training set. After each epoch the calculated weight corrections are broadcasted and merged.

The second approach is a parallel calculation of the matrix products that are used in the learning algorithm.
The neurons on each layer are partitioned into **p** disjoint sets and
each set is mapped on one of the **p** processors.
After the calculation of the new activations of the neurons in one layer they are broadcasted.
We have implemented this on-line training in two variants:
For the first parallelization of Morgan et al.[5] one matrix product is not determined on one processor,
but it is calculated while the partial sums are sent around on the processor cycle.
The second method of Yoon et al.[4] tries to reduce the communication time. This leads to an overhead in both storage
and number of computational operations.

The parallel algorithms are implemented with both the message passing environments PARIX and PVM on PARSYTEC's Multicluster and PowerXplorer, based on Transputers resp. PowerPC processors.

Mon Jun 12 14:12:53 MET DST 1995