Duplicate Symbol? What?
samwho keyboard logo

Duplicate Symbol? What?

Ever run into this one?

duplicate symbol _name in:
/var/folders/6x/vqbtyyvd5r136lb7hpb0qvmw0000gp/T/1-siEqxi.o
/var/folders/6x/vqbtyyvd5r136lb7hpb0qvmw0000gp/T/2-bMFgvM.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [all] Error 1

It's a bitch. Lemme explain what's up.

# A contrived example

Let's say you have the following files:

#include "name.h"

int main(int argc, char* argv[]) {
    say_name();
    return 0;
}
#ifndef NAME_H
#define NAME_H

char* name = "Heisenberg";
void say_name();

#endif
#include <stdio.h>
#include "name.h"

void say_name() {
    printf("%s\n", name);
}
all:
    cc main.c name.c -o say_my_name

When you run make, you get the error we spoke about in the intro to this article. Yours may differ slightly if you're not using clang and not on a Mac, but the premise will be the same: it thinks you're defining the variable name twice.

# Isn't this what the include guard is meant to prevent?

Not quite... It's tempting to think that if we wrap an ifndef preprocessor directive around our headers they'll only be included once in our program, but that idea stems from a misunderstanding of what headers are for and how our programs go from C into machine code.

Let's dive into what the cc command is doing.

# Compiler drivers

The compilation of your code is a multi-stage thing that invokes a number of different tools. The cc command just wraps them all up nicely for us so that we don't have to worry about them. This is called a "compiler driver", because it drives the full compilation process for us.

Here's what's going to happen when we run make:

To illustrate:

main.c -> main.s -> main.o
                           \
                            *> - say_my_name
                           /
name.c -> name.s -> name.o

# What the hell is an object file?

An object file consists of, at least, a symbol table and some machine code. It's a binary file that contains all of the functions and data from a specific C file in a way that allows other programs to pick and choose things to use from it.

# What the hell is a symbol table?

A symbol table is a set of key/value pairs. When you write a function, the function name is the key for that function's value, which is its binary code. Similarly, when you define variables, the name of the variable is the key and the value is, well, the value.

Let's take a look at an object file's symbol table:

$ nm name.o
0000000000000060 s EH_frame0
0000000000000026 s L_.str
0000000000000031 s L_.str1
0000000000000038 D _name
                 U _printf
0000000000000000 T _say_name
0000000000000078 S _say_name.eh

The nm command allows us to see what symbols are in an object file. Symbols come in a variety of shapes and sizes. In the output above, we have the address of the symbol, the type of the symbol and the name of the symbol.

If you take a look inside the object file, you'll notice that the addresses specified above aren't references to how many bytes into the file the symbol is. Object files have "headers", which describe how to read the file and what "sections" it contains.

Object files have a number of different "sections". They're just logical separations. The "text" section is code, the "data" section is initialized data (any globally initialised constant ends up in here), and the "BSS" section is for uninitialised data. There are some more sections but these are the ones important to our explanation.

The symbol type "T" refers to code, the symbol type "D" refers to data. Notice our _say_name and _name symbols. Also notice _printf, which is type "U", which means "undefined". This is because we've used the printf function but haven't defined it. It's defined in libc, which comes in at a later stage in the compilation process.

How about we take a look inside main.o as well?

$ nm name.o
0000000000000060 s EH_frame0
0000000000000026 s L_.str
0000000000000031 s L_.str1
0000000000000038 D _name
                 U _printf
0000000000000000 T _say_name
0000000000000078 S _say_name.eh

Hrm. This object file also has a _name symbol. If we use the strings command, the situation starts to make sense:

$ strings main.o
Heisenberg
$ strings name.o
Heisenberg

# Aha!

Both files contain the symbol and the string constant! Because the compilation process does each C file separately and then links them together later, both object files end up having their own definition of _name, which causes the linker to throw the duplicate symbol error.

# The solution!

If you just scrolled to here from the error message at the top of the post, that's cool but I recommend giving the post a read so that this solution makes sense.

You need to move the declaration of name out of the header file and into the C file.

Doing this doesn't change the object file for name.o in the slightest. The _name symbol will still be in there and you don't need to make any changes to main.c. You just avoid accidentally redefining name in main.c through the inclusion of name.h.

The changed files:

#ifndef NAME_H
#define NAME_H

void say_name();

#endif
#include <stdio.h>
#include "name.h"

char* name = "Heisenberg";

void say_name() {
    printf("%s\n", name);
}
$ make
cc main.c name.c -o say_my_name
$ ./say_my_name
Heisenberg

And we're golden :)

# Q: What if I want to access the name variable in main.c?

The _name symbol is accessible inside name.o, you just have to tell main.c to look for it.

If we take a look at the new symbol table for main.c, we'll see no reference to _name:

$ nm main.o
0000000000000048 s EH_frame0
0000000000000000 T _main
0000000000000060 S _main.eh
                 U _say_name

This is what the extern keyword is for in C. Check this out:

#include <stdio.h>
#include "name.h"

extern char* name;

int main(int argc, char* argv[]) {
    printf("%s\n", name);
    return 0;
}

Build and run:

$ make
cc main.c name.c -o say_my_name
$ ./say_my_name
Heisenberg

powered by buttondown