coding techniques, hand optimization

Antmeopohedron (gseidman@speckle.ncsl.nist.gov)
Tue, 5 Dec 1995 11:45:41 -0500 (EST)


<sigh> This is getting ugly. Sorry about so much included text.

} > Say what? An if statement requires at a minimum a branch instruction and an
} > instruction for each of the assignments, plus an unconditional jump after
} > one of the assignments, four instructions, i.e.
} >
} > BRANCH on <boolean> to <true clause>
} > STORE <variable> value 0
} > JUMP to <end of true clause>
} > true clause:
} > STORE <variable> value 2
} > end of true clause:
} >
} > A multiply will by one instruction if the multiply instruction can do its
} > own memory storing, two if it can't:
} >
} > MULTIPLY <boolean> by 2 into <register or <variable>>
} > #and, if MULTIPLY does not do memory storage:
} > STORE <variable> from <register>
} >
} > In addition, execution speed is slowed by a branch instruction due to
} > today's pipelined RISC chips. There are advantages to the MULTIPLY over the
} > IF, *especially* if this code will be run (looped, most likely) over and
} > over. This is a hand optimization, not something the compiler can do. Do we
} > want efficient coding techniques available, or do we simply want to count
} > on unlimited computing power?
} >
} > <UNNECESSARY AND FRUSTRATED FLAME>
} > Take a class on it and try again!
} > </UNNECESSARY AND FRUSTRATED FLAME>
}
} Mmm. You should be more careful when you make statements like this. The
} branch will be faster than the multiply. No question at all.

I question that...

} Multiply > 4 cycles, probably, particularly if it involves FP conversion.

BZZZT! It's a boolean, i.e. integer. The destination variable could be a
floating point, though it is not specified as such in this example, but the
multiply would still be integer unless the constant we are multiplying by
is floating point.

} Branch < 3 cycles, with stalls, and maybe 1 with correct branch
} prediction (95% of cases). If you are talking a loop, then the branch
} will frequently be predicted right, with no branch penalty.

BZZZT! This is not a loop branch. This is a conditional branch based on a
boolean variable about which we can make no assumptions. Don't give speak
of branch prediction.

} Anyone who designs a modern CPU with a branch that is slower than a multiply
} should be shot. Most applications branch more than they multiply. Note I
} say "most", not all.

BZZZT! I did not say a branch was slower, I said that it reduces pipelining
speedup. A branch may have to clear half a pipeline (possibly with
rollback, depending on how things are done).

} Finally, if the compiler knows that the multiplication is by a constant,
} and it knows the latency of the multiply, it can replace the multiply
} by a sequence of shifts and adds. Don't laugh, gcc does this on an HP,
} compiling with -O. That would be faster than the branch, probably. Depends
} on register allocation and so on.

BZZ-, well, no, you're right. But you are arguing in my favor. This says
that the multiply will be even faster than using an actual multiply
instruction.

} Modern processors have such complex dynamic behaviour that statements of
} the type you made are no longer valid. There are too many factors that
} can bring you down in flames.
}
} Maybe you should take a class. Don't bother telling me to, I just did. ;-)

<NECESSARY AND FRUSTRATED FLAME>
Erm, 'scuse me? "...such complex dynamic behavior..."? "...too many
factors..."? How long ago did you take your class? Did you actually go to
class? Did you not study "modern processors," just the old VAX chip and
8088?
</NECESSARY AND FRUSTRATED FLAME>

} Steve.
--Greg


  • Next message: Omar Eljumaily: "Re: Displaying Copyright Notices"
  • Previous message: Antmeopohedron: "coding techniques, hand optimization"
  • In reply to: Stephen Chenney: "Re: Ambiguities in VRML-Syntax"
  • Next in thesad: Antmeopohedron: "Re: coding techniques, hand optimization"