Next: Constraints, Previous: Basic Asm, Up: Using Assembly Language with C
With extended asm
you can read and write C variables from
assembler and perform jumps from assembler code to C labels.
Extended asm
syntax uses colons (‘:’) to delimit
the operand parameters after the assembler template:
asm [volatile] ( AssemblerTemplate : OutputOperands [ : InputOperands [ : Clobbers ] ]) asm [volatile] goto ( AssemblerTemplate : : InputOperands : Clobbers : GotoLabels)
The asm
keyword is a GNU extension.
When writing code that can be compiled with -ansi and the
various -std options, use __asm__
instead of
asm
(see Alternate Keywords).
volatile
asm
statements is to manipulate input
values to produce output values. However, your asm
statements may
also produce side effects. If so, you may need to use the volatile
qualifier to disable certain optimizations. See Volatile.
goto
asm
statement may
perform a jump to one of the labels listed in the GotoLabels.
See GotoLabels.
goto
form of asm
, this section contains
the list of all C labels to which the code in the
AssemblerTemplate may jump.
See GotoLabels.
asm
statements may not perform jumps into other asm
statements,
only to the listed GotoLabels.
GCC's optimizers do not know about other jumps; therefore they cannot take
account of them when deciding how to optimize.
The total number of input + output + goto operands is limited to 30.
The asm
statement allows you to include assembly instructions directly
within C code. This may help you to maximize performance in time-sensitive
code or to access assembly instructions that are not readily available to C
programs.
Note that extended asm
statements must be inside a function. Only
basic asm
may be outside functions (see Basic Asm).
Functions declared with the naked
attribute also require basic
asm
(see Function Attributes).
While the uses of asm
are many and varied, it may help to think of an
asm
statement as a series of low-level instructions that convert input
parameters to output parameters. So a simple (if not particularly useful)
example for i386 using asm
might look like this:
int src = 1; int dst; asm ("mov %1, %0\n\t" "add $1, %0" : "=r" (dst) : "r" (src)); printf("%d\n", dst);
This code copies src
to dst
and add 1 to dst
.
GCC's optimizers sometimes discard asm
statements if they determine
there is no need for the output variables. Also, the optimizers may move
code out of loops if they believe that the code will always return the same
result (i.e. none of its input values change between calls). Using the
volatile
qualifier disables these optimizations. asm
statements
that have no output operands, including asm goto
statements,
are implicitly volatile.
This i386 code demonstrates a case that does not use (or require) the
volatile
qualifier. If it is performing assertion checking, this code
uses asm
to perform the validation. Otherwise, dwRes
is
unreferenced by any code. As a result, the optimizers can discard the
asm
statement, which in turn removes the need for the entire
DoCheck
routine. By omitting the volatile
qualifier when it
isn't needed you allow the optimizers to produce the most efficient code
possible.
void DoCheck(uint32_t dwSomeValue) { uint32_t dwRes; // Assumes dwSomeValue is not zero. asm ("bsfl %1,%0" : "=r" (dwRes) : "r" (dwSomeValue) : "cc"); assert(dwRes > 3); }
The next example shows a case where the optimizers can recognize that the input
(dwSomeValue
) never changes during the execution of the function and can
therefore move the asm
outside the loop to produce more efficient code.
Again, using volatile
disables this type of optimization.
void do_print(uint32_t dwSomeValue) { uint32_t dwRes; for (uint32_t x=0; x < 5; x++) { // Assumes dwSomeValue is not zero. asm ("bsfl %1,%0" : "=r" (dwRes) : "r" (dwSomeValue) : "cc"); printf("%u: %u %u\n", x, dwSomeValue, dwRes); } }
The following example demonstrates a case where you need to use the
volatile
qualifier.
It uses the x86 rdtsc
instruction, which reads
the computer's time-stamp counter. Without the volatile
qualifier,
the optimizers might assume that the asm
block will always return the
same value and therefore optimize away the second call.
uint64_t msr; asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. "shl $32, %%rdx\n\t" // Shift the upper bits left. "or %%rdx, %0" // 'Or' in the lower bits. : "=a" (msr) : : "rdx"); printf("msr: %llx\n", msr); // Do other work... // Reprint the timestamp asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. "shl $32, %%rdx\n\t" // Shift the upper bits left. "or %%rdx, %0" // 'Or' in the lower bits. : "=a" (msr) : : "rdx"); printf("msr: %llx\n", msr);
GCC's optimizers do not treat this code like the non-volatile code in the earlier examples. They do not move it out of loops or omit it on the assumption that the result from a previous call is still valid.
Note that the compiler can move even volatile asm
instructions relative
to other code, including across jump instructions. For example, on many
targets there is a system register that controls the rounding mode of
floating-point operations. Setting it with a volatile asm
, as in the
following PowerPC example, does not work reliably.
asm volatile("mtfsf 255, %0" : : "f" (fpenv)); sum = x + y;
The compiler may move the addition back before the volatile asm
. To
make it work as expected, add an artificial dependency to the asm
by
referencing a variable in the subsequent code, for example:
asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv)); sum = x + y;
Under certain circumstances, GCC may duplicate (or remove duplicates of) your assembly code when optimizing. This can lead to unexpected duplicate symbol errors during compilation if your asm code defines symbols or labels. Using ‘%=’ (see AssemblerTemplate) may help resolve this problem.
An assembler template is a literal string containing assembler instructions. The compiler replaces tokens in the template that refer to inputs, outputs, and goto labels, and then outputs the resulting string to the assembler. The string can contain any instructions recognized by the assembler, including directives. GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input. However, it does count the statements (see Size of an asm).
You may place multiple assembler instructions together in a single asm
string, separated by the characters normally used in assembly code for the
system. A combination that works in most places is a newline to break the
line, plus a tab character to move to the instruction field (written as
‘\n\t’).
Some assemblers allow semicolons as a line separator. However, note
that some assembler dialects use semicolons to start a comment.
Do not expect a sequence of asm
statements to remain perfectly
consecutive after compilation, even when you are using the volatile
qualifier. If certain instructions need to remain consecutive in the output,
put them in a single multi-instruction asm statement.
Accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Similarly, calling functions directly from an assembler template requires a detailed understanding of the target assembler and ABI.
Since GCC does not parse the assembler template, it has no visibility of any symbols it references. This may result in GCC discarding those symbols as unreferenced unless they are also listed as input, output, or goto operands.
In addition to the tokens described by the input, output, and goto operands, these tokens have special meanings in the assembler template:
asm
statement in the entire compilation. This option is useful when creating local
labels and referring to them multiple times in a single template that
generates multiple assembler instructions.
asm
templatesOn targets such as x86, GCC supports multiple assembler dialects. The -masm option controls which dialect GCC uses as its default for inline assembler. The target-specific documentation for the -masm option contains the list of supported dialects, as well as the default dialect if the option is not specified. This information may be important to understand, since assembler code that works correctly when compiled using one dialect will likely fail if compiled using another. See x86 Options.
If your code needs to support multiple assembler dialects (for example, if you are writing public headers that need to support a variety of compilation options), use constructs of this form:
{ dialect0 | dialect1 | dialect2... }
This construct outputs dialect0
when using dialect #0 to compile the code,
dialect1
for dialect #1, etc. If there are fewer alternatives within the
braces than the number of dialects the compiler supports, the construct
outputs nothing.
For example, if an x86 compiler supports two dialects (‘att’, ‘intel’), an assembler template such as this:
"bt{l %[Offset],%[Base] | %[Base],%[Offset]}; jc %l2"
is equivalent to one of
"btl %[Offset],%[Base] ; jc %l2" /* att dialect */ "bt %[Base],%[Offset]; jc %l2" /* intel dialect */
Using that same compiler, this code:
"xchg{l}\t{%%}ebx, %1"
corresponds to either
"xchgl\t%%ebx, %1" /* att dialect */ "xchg\tebx, %1" /* intel dialect */
There is no support for nesting dialect alternatives.
An asm
statement has zero or more output operands indicating the names
of C variables modified by the assembler code.
In this i386 example, old
(referred to in the template string as
%0
) and *Base
(as %1
) are outputs and Offset
(%2
) is an input:
bool old; __asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base. "sbb %0,%0" // Use the CF to calculate old. : "=r" (old), "+rm" (*Base) : "Ir" (Offset) : "cc"); return old;
Operands are separated by commas. Each operand has this format:
[ [asmSymbolicName] ] constraint (cvariablename)
asm
statement
that contains the definition. Any valid C variable name is acceptable,
including names already defined in the surrounding code. No two operands
within the same asm
statement can use the same symbolic name.
When not using an asmSymbolicName, use the (zero-based) position
of the operand
in the list of operands in the assembler template. For example if there are
three output operands, use ‘%0’ in the template to refer to the first,
‘%1’ for the second, and ‘%2’ for the third.
Output constraints must begin with either ‘=’ (a variable overwriting an
existing value) or ‘+’ (when reading and writing). When using
‘=’, do not assume the location contains the existing value
on entry to the asm
, except
when the operand is tied to an input; see Input Operands.
After the prefix, there must be one or more additional constraints
(see Constraints) that describe where the value resides. Common
constraints include ‘r’ for register and ‘m’ for memory.
When you list more than one possible location (for example, "=rm"
),
the compiler chooses the most efficient one based on the current context.
If you list as many alternates as the asm
statement allows, you permit
the optimizers to produce the best possible code.
If you must use a specific register, but your Machine Constraints do not
provide sufficient control to select the specific register you want,
local register variables may provide a solution (see Local Reg Vars).
When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers (see Clobbers).
Output operand expressions must be lvalues. The compiler cannot check whether
the operands have data types that are reasonable for the instruction being
executed. For output expressions that are not directly addressable (for
example a bit-field), the constraint must allow a register. In that case, GCC
uses the register as the output of the asm
, and then stores that
register into the output.
Operands using the ‘+’ constraint modifier count as two operands
(that is, both as input and output) towards the total maximum of 30 operands
per asm
statement.
Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.
The same problem can occur if one output parameter (a) allows a register constraint and another output parameter (b) allows a memory constraint. The code generated by GCC to access the memory address in b can contain registers which might be shared by a, and GCC considers those registers to be inputs to the asm. As above, GCC assumes that such input registers are consumed before any outputs are written. This assumption may result in incorrect behavior if the asm writes to a before using b. Combining the ‘&’ modifier with the register constraint on a ensures that modifying a does not affect the address referenced by b. Otherwise, the location of b is undefined if a is modified before using b.
asm
supports operand modifiers on operands (for example ‘%k2’
instead of simply ‘%2’). Typically these qualifiers are hardware
dependent. The list of supported modifiers for x86 is found at
x86 Operand modifiers.
If the C code that follows the asm
makes no use of any of the output
operands, use volatile
for the asm
statement to prevent the
optimizers from discarding the asm
statement as unneeded
(see Volatile).
This code makes no use of the optional asmSymbolicName. Therefore it
references the first output operand as %0
(were there a second, it
would be %1
, etc). The number of the first input operand is one greater
than that of the last output operand. In this i386 example, that makes
Mask
referenced as %1
:
uint32_t Mask = 1234; uint32_t Index; asm ("bsfl %1, %0" : "=r" (Index) : "r" (Mask) : "cc");
That code overwrites the variable Index
(‘=’),
placing the value in a register (‘r’).
Using the generic ‘r’ constraint instead of a constraint for a specific
register allows the compiler to pick the register to use, which can result
in more efficient code. This may not be possible if an assembler instruction
requires a specific register.
The following i386 example uses the asmSymbolicName syntax.
It produces the
same result as the code above, but some may consider it more readable or more
maintainable since reordering index numbers is not necessary when adding or
removing operands. The names aIndex
and aMask
are only used in this example to emphasize which
names get used where.
It is acceptable to reuse the names Index
and Mask
.
uint32_t Mask = 1234; uint32_t Index; asm ("bsfl %[aMask], %[aIndex]" : [aIndex] "=r" (Index) : [aMask] "r" (Mask) : "cc");
Here are some more examples of output operands.
uint32_t c = 1; uint32_t d; uint32_t *e = &c; asm ("mov %[e], %[d]" : [d] "=rm" (d) : [e] "rm" (*e));
Here, d
may either be in a register or in memory. Since the compiler
might already have the current value of the uint32_t
location
pointed to by e
in a register, you can enable it to choose the best location
for d
by specifying both constraints.
Input operands make values from C variables and expressions available to the assembly code.
Operands are separated by commas. Each operand has this format:
[ [asmSymbolicName] ] constraint (cexpression)
asm
statement
that contains the definition. Any valid C variable name is acceptable,
including names already defined in the surrounding code. No two operands
within the same asm
statement can use the same symbolic name.
When not using an asmSymbolicName, use the (zero-based) position
of the operand
in the list of operands in the assembler template. For example if there are
two output operands and three inputs,
use ‘%2’ in the template to refer to the first input operand,
‘%3’ for the second, and ‘%4’ for the third.
Input constraint strings may not begin with either ‘=’ or ‘+’. When you list more than one possible location (for example, ‘"irm"’), the compiler chooses the most efficient one based on the current context. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, local register variables may provide a solution (see Local Reg Vars).
Input constraints can also be digits (for example, "0"
). This indicates
that the specified input must be in the same place as the output constraint
at the (zero-based) index in the output constraint list.
When using asmSymbolicName syntax for the output operands,
you may use these names (enclosed in brackets ‘[]’) instead of digits.
asm
statement
as input. The enclosing parentheses are a required part of the syntax.
When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers (see Clobbers).
If there are no output operands but there are input operands, place two consecutive colons where the output operands would go:
__asm__ ("some instructions" : /* No outputs. */ : "r" (Offset / 8));
Warning: Do not modify the contents of input-only operands
(except for inputs tied to outputs). The compiler assumes that on exit from
the asm
statement these operands contain the same values as they
had before executing the statement.
It is not possible to use clobbers
to inform the compiler that the values in these inputs are changing. One
common work-around is to tie the changing input variable to an output variable
that never gets used. Note, however, that if the code that follows the
asm
statement makes no use of any of the output operands, the GCC
optimizers may discard the asm
statement as unneeded
(see Volatile).
asm
supports operand modifiers on operands (for example ‘%k2’
instead of simply ‘%2’). Typically these qualifiers are hardware
dependent. The list of supported modifiers for x86 is found at
x86 Operand modifiers.
In this example using the fictitious combine
instruction, the
constraint "0"
for input operand 1 says that it must occupy the same
location as output operand 0. Only input operands may use numbers in
constraints, and they must each refer to an output operand. Only a number (or
the symbolic assembler name) in the constraint can guarantee that one operand
is in the same place as another. The mere fact that foo
is the value of
both operands is not enough to guarantee that they are in the same place in
the generated assembler code.
asm ("combine %2, %0" : "=r" (foo) : "0" (foo), "g" (bar));
Here is an example using symbolic names.
asm ("cmoveq %1, %2, %[result]" : [result] "=r"(result) : "r" (test), "r" (new), "[result]" (old));
While the compiler is aware of changes to entries listed in the output
operands, the inline asm
code may modify more than just the outputs. For
example, calculations may require additional registers, or the processor may
overwrite a register as a side effect of a particular assembler instruction.
In order to inform the compiler of these changes, list them in the clobber
list. Clobber list items are either register names or the special clobbers
(listed below). Each clobber list item is a string constant
enclosed in double quotes and separated by commas.
Clobber descriptions may not in any way overlap with an input or output
operand. For example, you may not have an operand describing a register class
with one member when listing that register in the clobber list. Variables
declared to live in specific registers (see Explicit Reg Vars) and used
as asm
input or output operands must have no part mentioned in the
clobber description. In particular, there is no way to specify that input
operands get modified without also specifying them as output operands.
When the compiler selects which registers to use to represent input and output operands, it does not use any of the clobbered registers. As a result, clobbered registers are available for any use in the assembler code.
Here is a realistic example for the VAX showing the use of clobbered registers:
asm volatile ("movc3 %0, %1, %2" : /* No outputs. */ : "g" (from), "g" (to), "g" (count) : "r0", "r1", "r2", "r3", "r4", "r5");
Also, there are two special clobber arguments:
"cc"
"cc"
clobber indicates that the assembler code modifies the flags
register. On some machines, GCC represents the condition codes as a specific
hardware register; "cc"
serves to name this register.
On other machines, condition code handling is different,
and specifying "cc"
has no effect. But
it is valid no matter what the target.
"memory"
"memory"
clobber tells the compiler that the assembly code
performs memory
reads or writes to items other than those listed in the input and output
operands (for example, accessing the memory pointed to by one of the input
parameters). To ensure memory contains correct values, GCC may need to flush
specific register values to memory before executing the asm
. Further,
the compiler does not assume that any values read from memory before an
asm
remain unchanged after that asm
; it reloads them as
needed.
Using the "memory"
clobber effectively forms a read/write
memory barrier for the compiler.
Note that this clobber does not prevent the processor from doing
speculative reads past the asm
statement. To prevent that, you need
processor-specific fence instructions.
Flushing registers to memory has performance implications and may be an issue for time-sensitive code. You can use a trick to avoid this if the size of the memory being accessed is known at compile time. For example, if accessing ten bytes of a string, use a memory input like:
{"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}
.
asm goto
allows assembly code to jump to one or more C labels. The
GotoLabels section in an asm goto
statement contains
a comma-separated
list of all C labels to which the assembler code may jump. GCC assumes that
asm
execution falls through to the next statement (if this is not the
case, consider using the __builtin_unreachable
intrinsic after the
asm
statement). Optimization of asm goto
may be improved by
using the hot
and cold
label attributes (see Label Attributes).
An asm goto
statement cannot have outputs.
This is due to an internal restriction of
the compiler: control transfer instructions cannot have outputs.
If the assembler code does modify anything, use the "memory"
clobber
to force the
optimizers to flush all register values to memory and reload them if
necessary after the asm
statement.
Also note that an asm goto
statement is always implicitly
considered volatile.
To reference a label in the assembler template,
prefix it with ‘%l’ (lowercase ‘L’) followed
by its (zero-based) position in GotoLabels plus the number of input
operands. For example, if the asm
has three inputs and references two
labels, refer to the first label as ‘%l3’ and the second as ‘%l4’).
Alternately, you can reference labels using the actual C label name enclosed
in brackets. For example, to reference a label named carry
, you can
use ‘%l[carry]’. The label must still be listed in the GotoLabels
section when using this approach.
Here is an example of asm goto
for i386:
asm goto ( "btl %1, %0\n\t" "jc %l2" : /* No outputs. */ : "r" (p1), "r" (p2) : "cc" : carry); return 0; carry: return 1;
The following example shows an asm goto
that uses a memory clobber.
int frob(int x) { int y; asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5" : /* No outputs. */ : "r"(x), "r"(&y) : "r5", "memory" : error); return y; error: return -1; }
References to input, output, and goto operands in the assembler template
of extended asm
statements can use
modifiers to affect the way the operands are formatted in
the code output to the assembler. For example, the
following code uses the ‘h’ and ‘b’ modifiers for x86:
uint16_t num; asm volatile ("xchg %h0, %b0" : "+a" (num) );
These modifiers generate this assembler code:
xchg %ah, %al
The rest of this discussion uses the following code for illustrative purposes.
int main() { int iInt = 1; top: asm volatile goto ("some assembler instructions here" : /* No outputs. */ : "q" (iInt), "X" (sizeof(unsigned char) + 1) : /* No clobbers. */ : top); }
With no modifiers, this is what the output from the operands would be for the ‘att’ and ‘intel’ dialects of assembler:
Operand | masm=att | masm=intel
|
---|---|---|
%0
| %eax
| eax
|
%1
| $2
| 2
|
%2
| $.L2
| OFFSET FLAT:.L2
|
The table below shows the list of supported modifiers and their effects.
Modifier | Description | Operand | masm=att | masm=intel
|
---|---|---|---|---|
z
| Print the opcode suffix for the size of the current integer operand (one of b /w /l /q ).
| %z0
| l
|
|
b
| Print the QImode name of the register. | %b0
| %al
| al
|
h
| Print the QImode name for a “high” register. | %h0
| %ah
| ah
|
w
| Print the HImode name of the register. | %w0
| %ax
| ax
|
k
| Print the SImode name of the register. | %k0
| %eax
| eax
|
q
| Print the DImode name of the register. | %q0
| %rax
| rax
|
l
| Print the label name with no punctuation. | %l2
| .L2
| .L2
|
c
| Require a constant operand and print the constant expression with no punctuation. | %c1
| 2
| 2
|
asm
OperandsOn x86 targets, there are several rules on the usage of stack-like registers
in the operands of an asm
. These rules apply only to the operands
that are stack-like registers:
asm
, it is
necessary to know which are implicitly popped by the asm
, and
which must be explicitly popped by GCC.
An input register that is implicitly popped by the asm
must be
explicitly clobbered, unless it is constrained to match an
output operand.
asm
, it is
necessary to know how to adjust the stack to compensate for the pop.
If any non-popped input is closer to the top of the reg-stack than
the implicitly popped register, it would not be possible to know what the
stack looked like—it's not clear how the rest of the stack “slides
up”.
All implicitly popped input registers must be closer to the top of the reg-stack than any input that is not implicitly popped.
It is possible that if an input dies in an asm
, the compiler might
use the input register for an output reload. Consider this example:
asm ("foo" : "=t" (a) : "f" (b));
This code says that input b
is not popped by the asm
, and that
the asm
pushes a result onto the reg-stack, i.e., the stack is one
deeper after the asm
than it was before. But, it is possible that
reload may think that it can use the same register for both the input and
the output.
To prevent this from happening, if any input operand uses the ‘f’ constraint, all output register constraints must use the ‘&’ early-clobber modifier.
The example above is correctly written as:
asm ("foo" : "=&t" (a) : "f" (b));
Output operands must specifically indicate which register an output
appears in after an asm
. ‘=f’ is not allowed: the operand
constraints must select a class with a single register.
asm
, and are pushed by the asm
.
It makes no sense to push anywhere but the top of the reg-stack.
Output operands must start at the top of the reg-stack: output operands may not “skip” a register.
asm
statements may need extra stack space for internal
calculations. This can be guaranteed by clobbering stack registers
unrelated to the inputs and outputs.
This asm
takes one input, which is internally popped, and produces two outputs.
asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp));
This asm
takes two inputs, which are popped by the fyl2xp1
opcode,
and replaces them with one output. The st(1)
clobber is necessary
for the compiler to know that fyl2xp1
pops both inputs.
asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");