Duplicate Symbol? What?
Ever run into this one?
duplicate symbol _name in:
/var/folders/6x/vqbtyyvd5r136lb7hpb0qvmw0000gp/T/1-siEqxi.o
/var/folders/6x/vqbtyyvd5r136lb7hpb0qvmw0000gp/T/2-bMFgvM.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [all] Error 1
It's a bitch. Lemme explain what's up.
# A contrived example
Let's say you have the following files:
#include "name.h"
int main(int argc, char* argv[]) {
say_name();
return 0;
}
#ifndef NAME_H
#define NAME_H
char* name = "Heisenberg";
void say_name();
#endif
#include <stdio.h>
#include "name.h"
void say_name() {
printf("%s\n", name);
}
all:
cc main.c name.c -o say_my_name
When you run make
, you get the error we spoke about in the intro to this
article. Yours may differ slightly if you're not using clang and not on a Mac,
but the premise will be the same: it thinks you're defining the variable name
twice.
# Isn't this what the include guard is meant to prevent?
Not quite... It's tempting to think that if we wrap an ifndef
preprocessor
directive around our headers they'll only be included once in our program, but
that idea stems from a misunderstanding of what headers are for and how our
programs go from C into machine code.
Let's dive into what the cc
command is doing.
# Compiler drivers
The compilation of your code is a multi-stage thing that invokes a number of
different tools. The cc
command just wraps them all up nicely for us so that
we don't have to worry about them. This is called a "compiler driver", because
it drives the full compilation process for us.
Here's what's going to happen when we run make
:
- Run the C preprocessor,
cpp
onmain.c
- Compile the preprocessed
main.c
into assembly code,main.s
- Assemble
main.s
into machine code in an "object file",main.o
- Repeat the above three steps for
name.c
to getname.o
- Link
main.o
andname.o
together to create the executablename
To illustrate:
main.c -> main.s -> main.o
\
*> - say_my_name
/
name.c -> name.s -> name.o
# What the hell is an object file?
An object file consists of, at least, a symbol table and some machine code. It's a binary file that contains all of the functions and data from a specific C file in a way that allows other programs to pick and choose things to use from it.
# What the hell is a symbol table?
A symbol table is a set of key/value pairs. When you write a function, the function name is the key for that function's value, which is its binary code. Similarly, when you define variables, the name of the variable is the key and the value is, well, the value.
Let's take a look at an object file's symbol table:
$ nm name.o
0000000000000060 s EH_frame0
0000000000000026 s L_.str
0000000000000031 s L_.str1
0000000000000038 D _name
U _printf
0000000000000000 T _say_name
0000000000000078 S _say_name.eh
The nm
command allows us to see what symbols are in an object file. Symbols
come in a variety of shapes and sizes. In the output above, we have the address
of the symbol, the type of the symbol and the name of the symbol.
If you take a look inside the object file, you'll notice that the addresses specified above aren't references to how many bytes into the file the symbol is. Object files have "headers", which describe how to read the file and what "sections" it contains.
Object files have a number of different "sections". They're just logical separations. The "text" section is code, the "data" section is initialized data (any globally initialised constant ends up in here), and the "BSS" section is for uninitialised data. There are some more sections but these are the ones important to our explanation.
The symbol type "T" refers to code, the symbol type "D" refers to data. Notice
our _say_name
and _name
symbols. Also notice _printf
, which is type "U",
which means "undefined". This is because we've used the printf
function but
haven't defined it. It's defined in libc
, which comes in at a later stage in
the compilation process.
How about we take a look inside main.o
as well?
$ nm name.o
0000000000000060 s EH_frame0
0000000000000026 s L_.str
0000000000000031 s L_.str1
0000000000000038 D _name
U _printf
0000000000000000 T _say_name
0000000000000078 S _say_name.eh
Hrm. This object file also has a _name
symbol. If we use the strings
command, the situation starts to make sense:
$ strings main.o
Heisenberg
$ strings name.o
Heisenberg
# Aha!
Both files contain the symbol and the string constant! Because the compilation
process does each C file separately and then links them together later, both
object files end up having their own definition of _name
, which causes the
linker to throw the duplicate symbol error.
# The solution!
If you just scrolled to here from the error message at the top of the post, that's cool but I recommend giving the post a read so that this solution makes sense.
You need to move the declaration of name
out of the header file and into the
C file.
Doing this doesn't change the object file for name.o
in the slightest. The
_name
symbol will still be in there and you don't need to make any changes to
main.c
. You just avoid accidentally redefining name
in main.c
through the
inclusion of name.h
.
The changed files:
#ifndef NAME_H
#define NAME_H
void say_name();
#endif
#include <stdio.h>
#include "name.h"
char* name = "Heisenberg";
void say_name() {
printf("%s\n", name);
}
$ make
cc main.c name.c -o say_my_name
$ ./say_my_name
Heisenberg
And we're golden :)
#
Q: What if I want to access the name
variable in main.c
?
The _name
symbol is accessible inside name.o
, you just have to tell main.c
to look for it.
If we take a look at the new symbol table for main.c
, we'll see no reference
to _name
:
$ nm main.o
0000000000000048 s EH_frame0
0000000000000000 T _main
0000000000000060 S _main.eh
U _say_name
This is what the extern
keyword is for in C. Check this out:
#include <stdio.h>
#include "name.h"
extern char* name;
int main(int argc, char* argv[]) {
printf("%s\n", name);
return 0;
}
Build and run:
$ make
cc main.c name.c -o say_my_name
$ ./say_my_name
Heisenberg