__attribute__
__attribute__
是在C, C++,
Objective-C语言中使用的编译指令,一般以__attribute__(xxx)
的形式出现在代码中,方便开发者向编译器表达某种要求,参与控制如Static
Analyzer、Name Mangling、Code Generation等过程。
Table of Contents
Attribute语法
关于Attribute的语法描述见官方文档Attribute Syntax
An attribute specifier is of the form
__attribute__ ((attribute-list))
. An attribute list is a possibly empty comma-separated sequence of attributes, where each attribute is one of the following:
Empty. Empty attributes are ignored.
An attribute name (which may be an identifier such as unused, or a reserved word such as const).
An attribute name followed by a parenthesized list of parameters for the attribute. These parameters take one of the following forms:
- An identifier. For example, mode attributes use this form.
- An identifier followed by a comma and a non-empty comma-separated list of expressions. For example, format attributes use this form.
- A possibly empty comma-separated list of expressions. For example, format_arg attributes use this form with the list being a single integer constant expression, and alias attributes use this form with the list being a single string constant.
used
used的作用是告诉编译器,我声明的这个符号是需要保留的。被used修饰以后,意味着即使函数没有被引用,在Release下也不会被优化。如果不加这个修饰,那么Release环境链接器会去掉没有被引用的段。具体的描述可以看gun的官方文档。
used
This attribute, attached to a variable with static storage, means that the variable must be emitted even if it appears that the variable is not referenced.
When applied to a static data member of a C++ class template, the attribute also means that the member is instantiated if the class itself is instantiated.
section
通常情况下,编译器会将对象放置于DATA段的data或者bss节中。但是,有时我们需要将数据放置于特殊的节中,此时section可以达到目的。例如,BeeHive中就把module注册数据存在__DATA数据段里面的"BeehiveMods"section中。
section通常用于修饰全局变量。以下是gnu官网对section属性的说明。
section ("section-name")
Normally, the compiler places the objects it generates in sections like data and bss. Sometimes, however, you need additional sections, or you need certain particular variables to appear in special sections, for example to map to special hardware. The section attribute specifies that a variable (or function) lives in a particular section. For example, this small program uses several specific section names:
1
2
3
4
5
6
7
8
9
10
11
12
13 struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };
struct duart b __attribute__ ((section ("DUART_B"))) = { 0 };
char stack[10000] __attribute__ ((section ("STACK"))) = { 0 };
int init_data __attribute__ ((section ("INITDATA")));
main() {
/* Initialize stack pointer */
init_sp (stack + sizeof (stack));
/* Initialize initialized data */
memcpy (&init_data, &data, &edata - &data);
/* Turn on the serial ports */
init_duart (&a);
init_duart (&b);
}Use the section attribute with global variables and not local variables, as shown in the example.
You may use the section attribute with initialized or uninitialized global variables but the linker requires each object be defined once, with the exception that uninitialized variables tentatively go in the common (or bss) section and can be multiply “defined”. Using the section attribute changes what section the variable goes into and may cause the linker to issue an error if an uninitialized variable has multiple definitions. You can force a variable to be initialized with the -fno-common flag or the nocommon attribute.
Some file formats do not support arbitrary sections so the section attribute is not available on all platforms. If you need to map the entire contents of a module to a particular section, consider using the facilities of the linker instead.
__attribute__
的更多使用示例可参考FBTweak
编译器提供了我们一种
__attribute__((section("xxx段,xxx节")
的方式让我们将一个指定的数据储存到我们需要的节当中。
在BeeHive框架中:
1 | @class BeeHive; char * kShopModule_mod __attribute((used, section("__DATA,""BeehiveMods"""))) = """ShopModule"""; |
通过使用__attribute__((section("name")))
来指明哪个段。数据则用__attribute__((used))
来标记,防止链接器会优化删除未被使用的段。
编译器编译源代码后生成的文件叫目标文件,从文件结构上来说,它已经是编译后可执行的文件格式,只是还没有经过链接的过程。可执行文件(Executable)格式主要是
- Windows下的PE(Portable Executable)
- Linux的ELF(Executable Linkable Format)
- macOS/iOS系统上的Mach-O(Mach Object File Format)
程序源程序代码被编译之后会主要分成两个段:程序指令和程序数据。代码段属于程序指令,data和.bss节属于数据段。
Mach-O的组成结构如上图所示包括了Header、Load commands、Data(包含Segement的具体数据),我们平时了解到的可执行文件、库文件、Dsym文件、动态库、动态连接器都是这种格式的。
一个现代编译器的主要工作流程如下:
源代码(source code)→ 预处理器(preprocessor)→ 编译器(compiler)→ 汇编程序(assembler)→ 目标代码(object code)→ 链接器(Linker)→ 可执行文件(executables), 最后打包好的文件就可以给计算机去判读运行了。
编译后各个section存储的内容如下:
Segment and Section Name | Contents |
---|---|
__TEXT,__text | Executable machine code. The compiler places only executable code in this section; no tables or data of any sort are stored here. |
__TEXT,__cstring | Constant C strings. A C string is a sequence of non-null bytes that ends with a null byte (‘\0’). The static linker coalesces constant C string values, removing duplicates, when building the final product. |
__TEXT,__picsymbol_stub | Position -independent indirect symbol stubs. See “Indirect Addressing” for more information. |
__TEXT,__symbol_stub | Indirect symbol stubs. See “Indirect Addressing” for more information. |
__TEXT,__const | Initialized constant variables. The compiler places all data declared const in this section. |
__TEXT,__literal4 | 4-byte literal values. The compiler places single-precision floating point constants in this section. The static linker coalesces these values, removing duplicates, when building the final product. With some CPU architectures, it is more efficient for the compiler to use immediate load instructions rather than adding to this section. |
__TEXT,__literal8 | 8-byte literal values. The compiler places double-precision floating point constants in this section. The static linker coalesces these values, removing duplicates, when building the final product. With some CPU architectures, it is more efficient for the compiler to use immediate load instructions rather than adding to this section. |
__DATA,__data | Initialized mutable variables, such as writable C strings and data arrays. |
__DATA,__la_symbol_ptr | Lazy symbol pointers, which are indirect references to functions imported from a different file. See “Indirect Addressing” for more information. |
__DATA,__nl_symbol_ptr | Non-lazy symbol pointers, which are indirect references to data items imported from a different file. See “Indirect Addressing” for more information. |
__DATA,__dyld | Information used by the static linker. |
__DATA,__const | Unintialized constant variables. |
__DATA,__mod_init_func | Module initialization functions. The C++ compiler places static constructors here. |
__DATA,__mod_term_func | Module termination functions. |
__DATA,__bss | Data for uninitialized static variables (for example, static int i;). |
__DATA,__common | Uninitialized imported symbol definitions (for example, int i;) located in the global scope (outside of a function declaration). |
从上面的表格可以看出:
- __TEXT,__text:可执行的机器码(代码段)
- __TEXT,__const:已初始化的常量,编译器会将所有声明为const的数据放置在该section;
- __DATA,__data:已初始化的可变全局变量;
- __DATA,__bss:未初始化的全局静态变量和局部静态变量,例如
static int i;
- __DATA,__common:未初始化的全局变量;
全局变量是放在全局内存中的,用static修饰的局部变量也是会放在放全局内存的,它的作用域是局部的,但生命期是全局的。
全局强调的是它的生命期,而不是它的作用域,所以有时可能把两者的概念互换。一般来说,在一起定义的两个全局变量,在内存的中位置是相邻的。这是一个简单的常识,但有时挺有用,如果一个全局变量被破坏了,不防先查查其前后相关变量的访问代码,看看是否存在越界访问的可能。
constructor
上一节只是讲到如何将数据存入特殊的section中,那么如何把存入的数据读取出来呢?
这里先介绍一下__attribute__((constructor))
。
constructor:顾名思义,构造器加上这个属性的函数会在可执行文件(或 shared library)load时被调用,可以理解为在 main() 函数调用前执行:
1 | __attribute__((constructor)) |
constructor 和 +load 都是在 main 函数执行前调用,但 +load 比 constructor 更加早一丢丢,因为 dyld(动态链接器,程序的最初起点)在加载 image(可以理解成 Mach-O 文件)时会先通知 objc runtime 去加载其中所有的类,每加载一个类时,它的 +load 随之调用,全部加载完成后,dyld 才会调用这个 image 中所有的 constructor 方法。所以 constructor 是一个干坏事的绝佳时机:
- 所有Class都已经加载完成
- main 函数还未执行
- 无需像 +load 还得挂载在一个Class中
读取section中的值
现在来了解如何将存储在特殊section中的数据读出。
在BeeHive源码中有下面一段代码:
1 | __attribute__((constructor)) |
其中void initProphet()
使用了__attribute__((constructor))
修饰,其执行时机已在上一节提到。该函数的实现体里使用了_dyld_register_func_for_add_image
函数,现在看看该函数的作用。
_dyld_register_func_for_add_image
:这个函数是用来注册回调,当dyld链接符号时,调用此回调函数。在dyld加载镜像时,会执行注册过的回调函数;当然,我们也可以使用下面的方法注册自定义的回调函数,同时也会为所有已经加载的镜像执行回调:
1 | /* |
对于每一个已经存在的镜像,当它被动态链接时,都会执行回调void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)
,传入文件的mach_header以及一个虚拟内存地址
intptr_t。
mach_header是定义在usr/include/mach-o/loader.h
中的数据结构:
1 | /* |
通过调用BHReadConfiguration函数,我们就可以拿到之前注册到BeehiveMods特殊段里面的各个Module的类名,该函数返回类名字符串的数组。
参考:
- Clang Attributes 黑魔法小记
- Attribute Syntax
- Overview of the Mach-O Executable Format
- PARSING MACH-O FILES
目前已转行教育行业,欢迎加微信交流:CaryaLiu