logo
Tags down

shadow

Efficient Assembly multiplication


By : Marcin Wasilewski
Date : September 24 2020, 03:00 AM
it should still fix some issue The interesting part of this exercise is finding ways to use 1 or 2 LEA, SHL, and/or ADD/SUB instructions to implement multiplies by various constants.
Actually dispatching on the fly for a single multiply isn't very interesting, and would mean either actual JIT compiling or that you have every possible sequence already present in a giant table of tiny blocks of code. (Like switch statements.)
code :
mul15:              # gcc -O3 -mtune=bdver2
        mov     eax, edi
        sal     eax, 4
        sub     eax, edi
        ret
mul15:             # clang -O3 -mtune=bdver2
        lea     eax, [rdi + 4*rdi]
        lea     eax, [rax + 2*rax]
        ret
        lea     eax, [rdi + 4*rdi]
        lea     eax, [rdi + 2*rax]
mul17:
        mov     eax, edi
        sal     eax, 4
        add     eax, edi
        ret
mul17:
        lea    eax,  [rdi + 8*rdi]  ; x*9
        lea    eax,  [rax + 8*rdi]  ; x*9 + x*8 = x*17
#define MULFUN(c) int mul##c(int x) { return x*c; }
MULFUN(9)
MULFUN(10)
MULFUN(11)
MULFUN(12)
...


Share : facebook icon twitter icon

Suggestions needed on most efficient way to retrieve an assembly version from an external assembly


By : Burhan Abiç
Date : March 29 2020, 07:55 AM
it fixes the issue Take a look at System.Reflection.AssemblyName. You can do
code :
AssemblyName.GetAssemblyName("foo.dll").Version

Multiplication or if: what is more efficient?


By : kmanross
Date : March 29 2020, 07:55 AM
I wish did fix the issue. The answer is - you really don't need to be worrying about this at this juncture. If your code is running too slow, then you can experiment with some optimization. However, "too slow" is subjective.
Premature Optimization Is The Root Of All Evil.

Assembly code/AVX instructions for multiplication of complex numbers. (GCC inline assembly)


By : Luke
Date : March 29 2020, 07:55 AM
hope this fix your issue As you surmise, the problem is that you haven’t told GCC which registers you are clobbering. I’m surprised if they don’t yet support putting YMM registers in the clobber list; what version of GCC are you using?
In any event, it will almost certainly suffice to put the corresponding XMM registers in the clobber list instead:
code :
: "=m" (ret) : "m" (*a), "m" (*b) : "%xmm1", "%xmm2");

Efficient 4x4 matrix multiplication (C vs assembly)


By : Scenario Dis
Date : March 29 2020, 07:55 AM
To fix this issue There is a way to accelerate the code and outplay the compiler. It does not involve any sophisticated pipeline analysis or deep code micro-optimisation (which doesn't mean that it couldn't further benefit from these). The optimisation uses three simple tricks:
code :
    .text
    .align 32                           # 1. function entry alignment
    .globl matrixMultiplyASM            #    (for a faster call)
    .type matrixMultiplyASM, @function
matrixMultiplyASM:
    movaps   (%rdi), %xmm0
    movaps 16(%rdi), %xmm1
    movaps 32(%rdi), %xmm2
    movaps 48(%rdi), %xmm3
    movq $48, %rcx                      # 2. loop reversal
1:                                      #    (for simpler exit condition)
    movss (%rsi, %rcx), %xmm4           # 3. extended address operands
    shufps $0, %xmm4, %xmm4             #    (faster than pointer calculation)
    mulps %xmm0, %xmm4
    movaps %xmm4, %xmm5
    movss 4(%rsi, %rcx), %xmm4
    shufps $0, %xmm4, %xmm4
    mulps %xmm1, %xmm4
    addps %xmm4, %xmm5
    movss 8(%rsi, %rcx), %xmm4
    shufps $0, %xmm4, %xmm4
    mulps %xmm2, %xmm4
    addps %xmm4, %xmm5
    movss 12(%rsi, %rcx), %xmm4
    shufps $0, %xmm4, %xmm4
    mulps %xmm3, %xmm4
    addps %xmm4, %xmm5
    movaps %xmm5, (%rdx, %rcx)
    subq $16, %rcx                      # one 'sub' (vs 'add' & 'cmp')
    jge 1b                              # SF=OF, idiom: jump if positive
    ret

Efficient multiplication


By : LabRat
Date : March 29 2020, 07:55 AM
Hope that helps Well, yes. There are more advanced multiplication methods.
A quick and easy way to speed up your algorithm a bit is to move from base-10 (aka decimal places) into a number system which is more appropriate for computers. working with 32 bit or 64 bit integers in base-2 will be much faster. You do more work per calculation and also get rid of all the modulo calculations.
Related Posts Related Posts :
  • Why do these two execvp produce different results?
  • Pass uintmax_t or size_t to custom printf conversion specifier
  • Why does free() leaves stuff in memory?
  • Why pointers can't be used to index arrays?
  • memory allocation eror in C
  • C custom datatypes mapped to C datatypes grouped under a single struct
  • pipe() data is not transferred to child process
  • Getting a core dump from a simple C program
  • Fatal error on makefile, need to understand the problem
  • How can I add a delay of 90 minutes when a port has gone from 0 to 1?
  • To use strcpy or not
  • the usage of strtok() in c shows warnings and returns segmentation fault(core dumped)
  • Trouble allocating array of structs
  • Only first char of user input used in array
  • Why does "int x = 5; printf("%d %d %d", x==5, x=10, x==5);" in C print "0 10 0"?
  • How to scan specific string format in C?
  • sscanf skipping the final value when reading multiple values from a line
  • How can I access full memory space in FreeDOS with C application
  • Semantics of sem_getvalue() in POSIX
  • What does the [x,y] symbol mean in a multidimensional array access?
  • compilation error: cast from pointer to integer of different size
  • why am i getting compilation error "error: conflicting types for ‘ptr’ " for the following code? static int va
  • Why does getw return -1 when trying to read a character?
  • Why C program in whch two functions call each other recursively gives segmentation fault on linux?
  • ssize_t is undefined
  • Passing a generic argument in a C function
  • Restoring stdout after redirecting it via freopen()
  • Tuples "cannot be marshalled in a foreign call" when trying to export a Haskell function to C
  • Dereferencing double pointer to pass to recursive function
  • Why is the output different in these two scenarios
  • what will happen if we dont use free() for allocated memory
  • counting words from user input
  • Why am I not getting the concatenated string?
  • not understandable pointers errors
  • Issue with function memory allocation
  • /usr/bin/ld: cannot find -lioutil in Makefile
  • Segmentation Fault when finding longest word in input
  • undefined reference to `palloc'
  • DMA transfer taking more time than CPU transfer
  • How can I make my code find solution while working at specific coordination?
  • How to fix implicit declaration of function?
  • Writing a file line by line in C
  • "accept" always returns 1, but no descriptor has been closed
  • How to fix 'Bad file descriptor' after implementing here-string redirection in my own shell
  • Why this output string to number?
  • Is there any way to make this adding an infinite?
  • for loop is iterating printf written after scanf only once
  • How to efficienty count from 0000 to 9999 in a digit display?
  • Implicit conversion and explicit conversion in c
  • Attempting to convert a value into 2s complement in C
  • else statement is resulting in identifier expected in c 13
  • Thread-safety vs atomicity in C
  • Segmentation fault - Trying to read binary file into memory
  • How to make a variable declared in constructor visible in the main file
  • What is the fastest way to reverse a power of two in C?
  • Reading and writing structure to binary file in C, then print structure elements
  • Why casting double to int might give different results?
  • Greedy algorithm in C not returning any values?
  • How to find base address of same char in string array using pointers in C language?
  • Quick question, why is scanf_s throwing an exception in runtime here? I am very confused
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk