Tags
 IOS SQL HTML C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6

# Efficient Assembly multiplication

By : Marcin Wasilewski
Date : September 24 2020, 03:00 AM
it should still fix some issue The interesting part of this exercise is finding ways to use 1 or 2 LEA, SHL, and/or ADD/SUB instructions to implement multiplies by various constants.
Actually dispatching on the fly for a single multiply isn't very interesting, and would mean either actual JIT compiling or that you have every possible sequence already present in a giant table of tiny blocks of code. (Like switch statements.)
code :
``````mul15:              # gcc -O3 -mtune=bdver2
mov     eax, edi
sal     eax, 4
sub     eax, edi
ret
``````
``````mul15:             # clang -O3 -mtune=bdver2
lea     eax, [rdi + 4*rdi]
lea     eax, [rax + 2*rax]
ret
``````
``````        lea     eax, [rdi + 4*rdi]
lea     eax, [rdi + 2*rax]
``````
``````mul17:
mov     eax, edi
sal     eax, 4
ret
``````
``````mul17:
lea    eax,  [rdi + 8*rdi]  ; x*9
lea    eax,  [rax + 8*rdi]  ; x*9 + x*8 = x*17
``````
``````#define MULFUN(c) int mul##c(int x) { return x*c; }
MULFUN(9)
MULFUN(10)
MULFUN(11)
MULFUN(12)
...
``````

Share :

## Suggestions needed on most efficient way to retrieve an assembly version from an external assembly

By : Burhan Abiç
Date : March 29 2020, 07:55 AM
it fixes the issue Take a look at System.Reflection.AssemblyName. You can do
code :
``````AssemblyName.GetAssemblyName("foo.dll").Version
``````

## Multiplication or if: what is more efficient?

By : kmanross
Date : March 29 2020, 07:55 AM
I wish did fix the issue. The answer is - you really don't need to be worrying about this at this juncture. If your code is running too slow, then you can experiment with some optimization. However, "too slow" is subjective.
Premature Optimization Is The Root Of All Evil.

## Assembly code/AVX instructions for multiplication of complex numbers. (GCC inline assembly)

By : Luke
Date : March 29 2020, 07:55 AM
hope this fix your issue As you surmise, the problem is that you haven’t told GCC which registers you are clobbering. I’m surprised if they don’t yet support putting YMM registers in the clobber list; what version of GCC are you using?
In any event, it will almost certainly suffice to put the corresponding XMM registers in the clobber list instead:
code :
``````: "=m" (ret) : "m" (*a), "m" (*b) : "%xmm1", "%xmm2");
``````

## Efficient 4x4 matrix multiplication (C vs assembly)

By : Scenario Dis
Date : March 29 2020, 07:55 AM
To fix this issue There is a way to accelerate the code and outplay the compiler. It does not involve any sophisticated pipeline analysis or deep code micro-optimisation (which doesn't mean that it couldn't further benefit from these). The optimisation uses three simple tricks:
code :
``````    .text
.align 32                           # 1. function entry alignment
.globl matrixMultiplyASM            #    (for a faster call)
.type matrixMultiplyASM, @function
matrixMultiplyASM:
movaps   (%rdi), %xmm0
movaps 16(%rdi), %xmm1
movaps 32(%rdi), %xmm2
movaps 48(%rdi), %xmm3
movq \$48, %rcx                      # 2. loop reversal
1:                                      #    (for simpler exit condition)
movss (%rsi, %rcx), %xmm4           # 3. extended address operands
shufps \$0, %xmm4, %xmm4             #    (faster than pointer calculation)
mulps %xmm0, %xmm4
movaps %xmm4, %xmm5
movss 4(%rsi, %rcx), %xmm4
shufps \$0, %xmm4, %xmm4
mulps %xmm1, %xmm4
movss 8(%rsi, %rcx), %xmm4
shufps \$0, %xmm4, %xmm4
mulps %xmm2, %xmm4
movss 12(%rsi, %rcx), %xmm4
shufps \$0, %xmm4, %xmm4
mulps %xmm3, %xmm4
movaps %xmm5, (%rdx, %rcx)
subq \$16, %rcx                      # one 'sub' (vs 'add' & 'cmp')
jge 1b                              # SF=OF, idiom: jump if positive
ret
``````

## Efficient multiplication

By : LabRat
Date : March 29 2020, 07:55 AM
Hope that helps Well, yes. There are more advanced multiplication methods.
A quick and easy way to speed up your algorithm a bit is to move from base-10 (aka decimal places) into a number system which is more appropriate for computers. working with 32 bit or 64 bit integers in base-2 will be much faster. You do more work per calculation and also get rid of all the modulo calculations.