[algogeeks] Re: optimisation Gene Wed Feb 29 09:02:03 2012
For big matrices, using all the caches well is very important. The usual approach is block tiling as you say. You want two blocks to fit nicely in the L1 cache. The most highly optimized schemes will have a hierarchy of tiles where two tiles at the second level will fit in the L2 cache, etc. Additional levels can be based on memory interfaces, interleaving, page size, and even cylinder size on disk (for really huge matrices). The idea is _never_ to generate more cache misses than you need to. A miss causes a factor of 10 to 10000 performance multiple on that operation. Multiplying within the innermost blocks should also consider prefetch: you'll get best performance touching locations in contiguous ascending order. The inner loops are for i = 1 to ma for j = 1 to nb for k = 1 to na r[i,j] += a[i,k] * b[k,j] This ignores that r[i,j] needs to be zeroed out somewhere. But with this assumption, the loops can be juxtaposed in any order without changing the result. You should explore the 6 possible orderings for the best one. Of course you also have to zero out the sums in the best possible manner. FWIW, a GPU will normally outperform a general purpose CPU with ease on this problem. Since even cell phones are getting GPUs these days, tweaking single-processor matrix code is a dying art. Cheers. On Feb 27, 6:57 pm, Arun Vishwanathan <[EMAIL PROTECTED]> wrote: > Hi all, > > We have this challenge to make the fastest executing serial matrix > multiplication code. I have tried using matrix transpose( in C for row > major ) and loop unrolling.I was able to obtain little speedup. Does anyone > have any hints/papers that I could read upon and try to speed up further?I > had tried a bit of block tiling but was not successful. > > Thanks > Arun -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [EMAIL PROTECTED] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
- [algogeeks] optimisation Arun Vishwanathan 2012/02/27