[mad-dev] A little help needed on code optimisation to deal with delay slots

1 Jul 2004


      I have ported the library to our hardware platform ( a set-top box with
ide interface and ARC8 processor) but I have found that we are running
short of horsepower. The problem stems from the fact that when I use the
multiplier it stalls the chip for 10cycles on every multiply
mul64 %r0, %r1
    lsl %r0, [%mhi], 4 --> Stalls until multiplier finishes
I have started looking at reordering the code as I can use other
instructions in these 10 cycles
<from synth.c dct32:282>
//	  t69  = t33 + t34;  t89  = MUL(t33 - t34, costab4);
    mad_f_mul_j( t33 - t34, costab4);	// We have 10 cycles to
wait
    t69  = t33 + t34; 			// 2-3
    t70 = t35 - t36;				// 2-3
    t89 = mad_f_mul_r();
//  t70  = t35 + t36;  t90  = MUL(t35 - t36, costab28);
//	t70 = t35 - t36;
    mad_f_mul_j( t70, costab28);		// We have 10 cycles to
wait
    t70  = t35 + t36;  			// 2-3
  t113 = t69  + t70;				// 2-3
    t71 = t37 - t38;				// 2-3
    t90  = mad_f_mul_r();
This gives me savings of around 6 cycles per multiply, so we only lose 4
cycles.
What I would like know is which routine/s should I convert first to save
the maximum amount of time
Joolz

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

[mad-dev] A little help needed on code optimisation to deal with delay slots