Topic : Fast multiplication time

Forum : ST7/STM8

Original Post
Post Information Post
June 24, 2010 - 11:27am
Guest

Hi,

I am using STM8S105C4 at 16MHz HSI clock. I can also see the clock cycles from CLK_CCO pin.
In ST7/STM8 C Compiler Manual, it says that fast multiplication takes about 70 cycles. At 16MHz, every clock cycle takes 62,5 ns. So fast multiplication must take about 62,5 ns * 70 = 4,38 us. But when I test the code below and watch the pin3 of portD using the oscilloscope to measure the time, I see that the fast multiplication takes 8,8 us.
Does the assembly function ?C?MULS161632 take about 70 cycles or the multiplication statement (long res = (long)a * b;) ?

short a, b;
GPIOD->ODR |= (u8)GPIO_PIN_3; //pin high
long res = (long)a * b; // Long mul, both operands 16-bit: Optimized to muls161632
GPIOD->ODR &= (u8)(~GPIO_PIN_3);//pin low

Thanks.

Replies
Post Information Post
+1
0
-1
June 25, 2010 - 10:08am
Raisonance Support Team

Hello Volkan,

Thanks for your report.
I made a quick experiment with the following code running on a STM8S208RBT6:

#pragma OPTIMIZE(7,SPEED)       // Full speed ahead!

long res;
unsigned short a = 3;
unsigned short b = 5;

void main(void)
{
    // Undivided master clock for full speed
    *(char*)0x50C6 = 0x00;      // CKDIV

    // Quick-and-dirty-but-portable setup of port B
    *(char*)0x5007 = 0xFF;      // PBDDR
    *(char*)0x5008 = 0xFF;      // PBCR1
    *(char*)0x5009 = 0x00;      // PBCR2

    while ( 1 )
    {
        *(char*)0x5005 = 0x00;  // Clear Port B
        res = (long)a * b;      // Long mul, both operands 16-bit: Optimized to muls161632
        *(char*)0x5005 = 0xFF;  // Set Port B
    }
}

I admit that the code is not very clean, but at least it is 100% portable to other compilers, so you can check by yourself how good our results are.

With a scope I measured the down pulse (time for the multiplication/affectation of res) with the following results (I used the RKit-STM8 v2.30, which will be released next week, but 2.28 should lead to the same figures):
- No optimization: 13.86us
- Size optimization: 11.51us
- Speed optimization: 4.34us

This is even better than in the specification!

Just ensure that your code is calling the fast function. You must be in speed optimization mode, and your variables should be unsigned so that the mulu161632 function from the C library is called. If mulu3232 is called, then you are using a full 32-bit multiplication, which is undoubtedly slower...

Regards,
Bruno

+1
0
-1
July 21, 2010 - 4:48pm
Guest

Hi Bruno,

Sorry for late answer!

After your warning, I changed my code like below, saw mulu161632 and it took 4,5us as expected.

/* CODE STARTS */
static unsigned long Result;
static unsigned int x = 10233;
static unsigned int y = 13777;

GPIOD->ODR |= (u8)GPIO_PIN_3; //To measure
Result = (unsigned long)x * y;
GPIOD->ODR &= (u8)(~GPIO_PIN_3); //To measure
/* CODE ENDS */

Thank you for this.

But when I change unsigned variables to signed variables like below, I can see the mulu3232 function is called. But I expect to see muls161632.

/* CODE STARTS */
static signed long Result;
static signed int x = 10233;
static signed int y = 13777;

GPIOD->ODR |= (u8)GPIO_PIN_3; //To measure
Result = (signed long)x * y;
GPIOD->ODR &= (u8)(~GPIO_PIN_3); //To measure
/* CODE ENDS */

What is my mistake? Why is muls161632 not called?

Thanks a lot.

+1
0
-1
July 22, 2010 - 9:56am
Raisonance Support Team

Hi,

I'm glad that these hints have been helpful.

Concerning the muls161632, it is not possible to reduce it to a 16x16->32 multiplication, as there may be a sign extension. So the multiplication routine cannot know whether the sign bit must be extended in the result or not. It is then impossible to build such a routine in a safe manner.

WHat you can do is to use only unsigned variables (if possible), or handle the sign by yourself, but the code would be a little bit more complex.

I hope this helps,
Bruno

+1
0
-1
July 22, 2010 - 12:25pm
Guest

Hi Bruno,

But in the "ST7/STM8 C Compiler Manual, doc. ver. 2010-04-27" on page 50, the first example says that:

short a, b; --> signed values
long res = (long)a * b; // Long mul, both operands 16-bit: Optimized to muls161632

Actually if I run my code on compiler ver. 2.28.10, it generates muls161632. But the compiler ver 2.30.10 generates mulu3232.
Could you check this difference please?

Thank you.

+1
0
-1
July 23, 2010 - 10:18am
Raisonance Support Team

Hi Volkan,

Thanks for this in-depth analysis. We reproduced the issue in our labs.
It looks like the unsigned patterns work fine, but signed multiplication is always analyzed as "dangerous" by the compiler, so it skips the muls161632 optimization.

I'll come back to you on this when we have further detail.
Best Regards,
Bruno