This paper describes a measurement study of the effects of thread placement on memory access times on the Kendall Square multiprocessor, the KSRl. The KSRl uses a conventional shared memory programming model in a distributed memory architecture. The architecture is based on a ring of rings of 64-bit superscalar microprocessors. The KSRl has a Cache-Only Memory Architecture (COMA). Memory consists of the local cache memoria attached to each processor. Whenever an address is accessed, the data item is automatically copied to the local cache memory module, 80 that access times for subsequent references will be minimal. If a local cache has space allocated for a particular data item, but does not have a current valid copy of that data item, then it is possible for the cache to acquire a valid read-only copy before it is requested by the local processor due to a request by a different processor that happens to pass by on the ring. This automatic prefetching can greatly reduce the average time for a thread to acquire data items. Because of the automatic prefetching, the time required to obtain a valid copy of a data item does not depend simply on the distance from the owner of the data item, but also depends on the placement and number of other processing threads which ehare the same data item. Also, the strategic placement of processing threads helps programs take advantage of the unique features of the memory architecture which help eliminate memory access bottlenecks for shared data sets. Experiments run on the KSRl across a wide variety of thread configurations show that shared memory access is accelerated through strategic placement of threads which share data. The results indicate strategies for improving the performance of applications programs, and illustrate that KSRl memory access times can remain nearly constant even when the number of participating threads increases.
Please use publisher's recommended citation.