【GLIBC.0x01】glibc2.23 malloc源码分析 - II:malloc

本文最后更新于:2023年10月27日 中午

简简单单泄露个内存

0x00.malloc

我们在编写C语言程序需要动态分配一块合适大小的内存时,我们通常都要用到malloc函数(包括C++的new运算符的一种实现方式便是封装了malloc)

glibc源码中对于malloc函数的定义如下:

首先我们在malloc/malloc.h文件中可以看到对malloc函数的外部引用声明:

1
2
/* Allocate SIZE bytes of memory.  */
extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;

在这里我们会看到三个奇怪的符号:__THROW__attribute_malloc____wur似乎常规的C语言教科书上并没有介绍过这样奇怪的东西

__THROW:C++下不抛出异常

是一个宏定义,定义于misc/sys/cddefs.h下,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/* GCC can always grok prototypes.  For C++ programs we add throw()
to help it optimize the function calls. But this works only with
gcc 2.8.x and egcs. For gcc 3.2 and up we even mark C functions
as non-throwing using a function attribute since programs can use
the -fexceptions options for C code as well. */
# if !defined __cplusplus && __GNUC_PREREQ (3, 3)
# define __THROW __attribute__ ((__nothrow__ __LEAF))
# define __THROWNL __attribute__ ((__nothrow__))
# define __NTH(fct) __attribute__ ((__nothrow__ __LEAF)) fct
# else
# if defined __cplusplus && __GNUC_PREREQ (2,8)
# define __THROW throw ()
# define __THROWNL throw ()
# define __NTH(fct) __LEAF_ATTR fct throw ()
# else
# define __THROW
# define __THROWNL
# define __NTH(fct) fct
# endif
# endif

若是在C++环境,表明该函数不会扔出异常,以避免编译器生成对异常的处理代码,从而优化代码生成结果.

在C环境下没有任何效果

__attribute_malloc__:优化malloc类型函数

这个符号同样定义于misc/sys/cdefs.h下

1
2
3
4
5
6
7
8
/* At some point during the gcc 2.96 development the `malloc' attribute
for functions was introduced. We don't want to use it unconditionally
(although this would be possible) since it generates warnings. */
#if __GNUC_PREREQ (2,96)
# define __attribute_malloc__ __attribute__ ((__malloc__))
#else
# define __attribute_malloc__ /* Ignore */
#endif

对于gcc2.96及以上的版本,该段代码将会使用编译器的__attribute__指令

查阅gcc手册中说明如下:

1
malloc

Using this attribute can improve optimization. Compiler predicts that a function with the attribute returns non-null in most cases. Functions like malloc and calloc have this property because they return a pointer to uninitialized or zeroed-out storage. However, functions like realloc do not have this property, as they can return a pointer to storage containing pointers.

大意是malloc作为一种attribute可以用来优化类似malloc、calloc等内存分配函数的代码,我自己是声明用来优化像我自己这样的函数的,同时也阐明了realloc函数并不使用这样一个attribute特性,因为realloc函数可能会返回指向包含有其他指针的内存块的指针

对于gcc2.96以下的版本,宏定义__attribute_malloc__是无效的

__GNUC_PREREQ()

定义于sys/features.h下

1
2
3
4
5
6
7
8
9
10
11
12
13
/* Convenience macros to test the versions of glibc and gcc.
Use them like this:
#if __GNUC_PREREQ (2,8)
... code requiring gcc 2.8 or later ...
#endif
Note - they won't work for gcc1 or glibc1, since the _MINOR macros
were not defined then. */
#if defined __GNUC__ && defined __GNUC_MINOR__
# define __GNUC_PREREQ(maj, min) \
((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min))
#else
# define __GNUC_PREREQ(maj, min) 0
#endif

大意是检测gcc的版本,这里就不具体展开说明了

gcc pre-requirement

__wur:若函数返回值未被引用则抛出警告

依然是位于misc/sys/cdefs.h下的一个宏定义,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* If fortification mode, we warn about unused results of certain
function calls which can lead to problems. */
#if __GNUC_PREREQ (3,4)
# define __attribute_warn_unused_result__ \
__attribute__ ((__warn_unused_result__))
# if __USE_FORTIFY_LEVEL > 0
# define __wur __attribute_warn_unused_result__
# endif
#else
# define __attribute_warn_unused_result__ /* empty */
#endif
#ifndef __wur
# define __wur /* Ignore */
#endif

若是gcc版本为3.4及以上,则首先会将__attribute_warn_unused_result__定义为__attribute__ ((__warn_unused_result__))

该attribute在gcc手册中的说明如下:

1
warn_unused_result

The warn_unused_result attribute causes a warning to be emitted if a caller of the function with this attribute does not use its return value. This is useful for functions where not checking the result is either a security problem or always a bug, such as realloc.

1
2
3
4
5
6
7
int fn () __attribute__ ((warn_unused_result));
int foo ()
{
if (fn () < 0) return -1;
fn ();
return 0;
}

results in warning on line 5.

大意是若是该函数的返回值未被程序使用,则编译器会发出警告,然而又有谁会去注意warnings呢x

当__USE_FORTIFY_LEVEL > 0时__wur会被启用,功能同上所述:检查函数返回值是否被使用,若否则发出警告

若是gcc版本为3.4以下,则该宏定义无效

__USE_FORTIFY_LEVEL

定义于features.h中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#if defined _FORTIFY_SOURCE && _FORTIFY_SOURCE > 0
# if !defined __OPTIMIZE__ || __OPTIMIZE__ <= 0
# warning _FORTIFY_SOURCE requires compiling with optimization (-O)
# elif !__GNUC_PREREQ (4, 1)
# warning _FORTIFY_SOURCE requires GCC 4.1 or later
# elif _FORTIFY_SOURCE > 1
# define __USE_FORTIFY_LEVEL 2
# else
# define __USE_FORTIFY_LEVEL 1
# endif
#endif
#ifndef __USE_FORTIFY_LEVEL
# define __USE_FORTIFY_LEVEL 0
#endif

当_FORTIFY_SOURCE未被定义时则会置0,否则会对_FORTIFY_SOURCE的值进行判断,若大于1则为2,否则为1,无论如何只要定义了_FORTIFY_SOURCE且大于0,则__wur都会被启用

_FORTIFY_SOURCE

大概是和GCC编译选项相关的,可以使用-D_FORTIFY_SOURCE进行指定,如-D_FORTIFY_SOURCE=1

Ubuntu18下默认-D_FORTIFY_SOURCE=2

strong_alias:重命名malloc为__libc_malloc

在上面我们已经简要地解析了在malloc.h中对于malloc函数的三个标记,大致是针对C++默认不抛出异常以优化代码,根据返回值皆为malloc类型进行代码编译优化,返回值未被使用时抛出警告

但是在malloc.c中我们无法找到对于malloc函数的直接定义,难道说实际上并没有这个函数🦄,事实上还真的没有这个函数x

不过我们可以在malloc.h中发现存在如下代码:

1
strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc)

发现一个新的符号:strong_alias

该宏定义于libc-symbols.h下:

1
2
3
4
5
6
7
/* GCC understands weak symbols and aliases; use its interface where
possible, instead of embedded assembly language. */

/* Define ALIASNAME as a strong alias for NAME. */
# define strong_alias(name, aliasname) _strong_alias(name, aliasname)
# define _strong_alias(name, aliasname) \
extern __typeof (name) aliasname __attribute__ ((alias (#name)));

_strong_alias宏是strong_alias的宏展开,而_strong_alias宏展开后则是extern __typeof (name) aliasname __attribute__ ((alias (#name)));无 限 套 娃

配合着glibc的注释,不难理解,在这里将malloc声明为__libc_malloc的别名,即__libc_malloc才是我们在调用malloc时所真正调用的函数

这下就真相大白了

__typeof

拓展C关键字,用于获取某个东西的类型

用例如下:

1
2
3
4
5
6
#include <stdio.h>
int foo(int n){return n;}
int main(void)
{
__typeof(foo()) var;
}

这种情况下便可以获取到foo()函数的返回值为int类型,并以此声明一个int类型的变量var

需要注意的是__typeof(char)和__typeof(‘a’)所获得的结果是不一样的,后者会将之提升为int类型

于是我们最终转到__libc_malloc的声明,在前面发现了一大段对于malloc的注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
malloc(size_t n)
Returns a pointer to a newly allocated chunk of at least n bytes, or null
if no space is available. Additionally, on failure, errno is
set to ENOMEM on ANSI C systems.

If n is zero, malloc returns a minumum-sized chunk. (The minimum
size is 16 bytes on most 32bit systems, and 24 or 32 bytes on 64bit
systems.) On most systems, size_t is an unsigned type, so calls
with negative arguments are interpreted as requests for huge amounts
of space, which will often fail. The maximum supported value of n
differs across systems, but is in all cases less than the maximum
representable value of a size_t.
*/
void* __libc_malloc(size_t);
libc_hidden_proto (__libc_malloc)

故我们最终可以得知:当我们调用malloc函数时,其真正调用的函数应当是是__libc_malloc

于是接下来我们来看看__libc_malloc函数的源码

0x01.__libc_malloc

该段代码同样位于malloc/malloc.c中,我们首先先阅读这段关于malloc的注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
malloc(size_t n)
Returns a pointer to a newly allocated chunk of at least n bytes, or null
if no space is available. Additionally, on failure, errno is
set to ENOMEM on ANSI C systems.

If n is zero, malloc returns a minumum-sized chunk. (The minimum
size is 16 bytes on most 32bit systems, and 24 or 32 bytes on 64bit
systems.) On most systems, size_t is an unsigned type, so calls
with negative arguments are interpreted as requests for huge amounts
of space, which will often fail. The maximum supported value of n
differs across systems, but is in all cases less than the maximum
representable value of a size_t.
*/
void* __libc_malloc(size_t);
libc_hidden_proto (__libc_malloc)

(arttnba3译)

malloc(size_t n)

将会返回一个指向分配给的至少n bytes大小的chunk的指针,若空间不足则会返回NULL。此外,在ANSI C标准的系统上,errno会被设为ENOMEM

当n为0时,malloc将会返回一个最小size的chunk(32位系统下为16bytes,64位系统下则为24或32bytes)。在大部分系统上,size_t为无符号类型,因此负数大小的请求将会被解释为一个对于极巨大内存空间的请求,而这通常都会失败。对于支持的最大的内存请求通常在不同的操作系统上不一致,但都会比size_t类型所能表达的最大值要小

errno

定义于errno.h中的一个储存错误代码的变量,具体错误代码详见sysdeps/mach/hurd/bits/errno.h

其中ENOMEM的值为12

(笔者猜测ENOMEM应该为error: no memory的意思)

也就是说malloc(0)并不会返回内存大小为0的内存空间,而是返回一个最小size的chunk

同时我们还能发现:在__libc_malloc的函数声明下还有一行奇怪的代码——libc_hidden_proto

libc_hidden_proto:函数符号隐藏

该符号被定义在include/libc-symbols.h下:

1
2
3
4
5
6
7
#if IS_IN (libc)
# define libc_hidden_proto(name, attrs...) hidden_proto (name, ##attrs)
...//中间的代码太多了,这里我就不copy了
#else
# define libc_hidden_proto(name, attrs...)
...
#endif

在libc中该宏展开为hidden_proto宏,该宏同样定义于include/libc-symbols.h中,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#if defined SHARED && !defined NO_HIDDEN
# ifndef __ASSEMBLER__
# define __hidden_proto_hiddenattr(attrs...) \
__attribute__ ((visibility ("hidden"), ##attrs))
# define hidden_proto(name, attrs...) \
__hidden_proto (name, , __GI_##name, ##attrs)
# define hidden_tls_proto(name, attrs...) \
__hidden_proto (name, __thread, __GI_##name, ##attrs)
# define __hidden_proto(name, thread, internal, attrs...) \
extern thread __typeof (name) name __asm__ (__hidden_asmname (#internal)) \
__hidden_proto_hiddenattr (attrs);
...
#else
# ifndef __ASSEMBLER__
# define hidden_proto(name, attrs...)
...
#endif

看得出来在多重套娃之后这其实是一个attribute,里边套的是visibility (“hidden”),查看gcc手册我们可以发现其说明如下:

1
hidden

Hidden visibility indicates that the entity declared has a new form of linkage, which we call “hidden linkage”. Two declarations of an object with hidden linkage refer to the same object if they are in the same shared object.

大意是说这个attribute声明该函数的visibility为hidden,其有着一种新的链接形式,称之为“隐式链接”(hidden linkage),即对于链接该库的程序而言,该函数是隐藏的

非libc中该宏无效

attribute: visibility(“hidden”)

通俗来讲,hidden是用于抑制一个符号的表达的

设想如下情况:程序a需要同时链接libc-arttnba1.so和libc-arttnba2.so两个库,而两个库中若是存在同名的函数,那么库的加载先后顺序的不同将会导致程序所调用的函数不同,毫无疑问这将造成混乱调用

在使用attribute的visibility(“hidden”)的情况下,该函数将会抑制自身符号的表达

同样设想前面的情况,若是存在一个同名函数foo(),而libc-arttnba2.so中的foo()函数使用attribute设定了visibility为hidden的情况下,无论这两个动态链接库的加载先后顺序如何,都只会调用libc-arttnba1.so中的foo()函数

接下来我们看该函数的定义部分

__libc_malloc函数定义:

若是第一次阅读libc源码,毫无疑问我们会十分惊奇地发现该函数的代码超乎想象的短,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
void *
__libc_malloc (size_t bytes)
{
mstate ar_ptr;
void *victim;

void *(*hook) (size_t, const void *)
= atomic_forced_read (__malloc_hook);
if (__builtin_expect (hook != NULL, 0))
return (*hook)(bytes, RETURN_ADDRESS (0));

arena_get (ar_ptr, bytes);

victim = _int_malloc (ar_ptr, bytes);
/* Retry with another arena only if we were able to find a usable arena
before. */
if (!victim && ar_ptr != NULL)
{
LIBC_PROBE (memory_malloc_retry, 1, bytes);
ar_ptr = arena_get_retry (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
}

if (ar_ptr != NULL)
(void) mutex_unlock (&ar_ptr->mutex);

assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
ar_ptr == arena_for_chunk (mem2chunk (victim)));
return victim;
}
libc_hidden_def (__libc_malloc)

首先开头定义三个变量:mstate类型的ar_ptr,void类型指针victim,返回值为void类型指针的函数指针hook

不难想到,ar_ptr即为arena pointer;victim应当为指向分配的chunk的指针;而这个hook应当就是我们比赛中常打的__malloc_hook

1.检查钩子__malloc_hook是否为NULL,若否,则调用钩子

在函数一开始的时候会先调用atomic_forced_read()给hook变量赋值,传入这个函数的参数为__malloc_hook

我们先看atomic_forced_read(),是一个定义于include/atomic.h中的宏,代码如下:

1
2
3
4
#ifndef atomic_forced_read
# define atomic_forced_read(x) \
({ __typeof (x) __x; __asm ("" : "=r" (__x) : "0" (x)); __x; })
#endif

直接翻译函数名是用原子能来读取

大概是通过使用原子操作将x读入到一个寄存器中再返回,原子操作简单来说就是不会被线程的调度打断的操作,用处是防止由于多线程冲突导致的该值被重载

接下来我们来看__malloc_hook的声明,如下:

1
2
void *weak_variable (*__malloc_hook)
(size_t __size, const void *) = malloc_hook_ini;

可见__malloc_hook应为一函数指针,初始时指向malloc_hook_ini函数

weak_variable

定义于malloc.c中代码如下:

1
2
3
4
5
#ifndef weak_variable
/* In GNU libc we want the hook variables to be weak definitions to
avoid a problem with Emacs. */
# define weak_variable weak_function
#endif

还是套娃定义,跟进,在include/libc-symbols.h中可见其定义

1
2
3
4
/* This comes between the return type and function name in
a function definition to make that definition weak. */
# define weak_function __attribute__ ((weak))
# define weak_const_function __attribute__ ((weak, __const__))

也就是说这还是一个gcc的attribute,查阅手册中说明如下:

1
weak

The weak attribute causes a declaration of an external symbol to be emitted as a weak symbol rather than a global. This is primarily useful in defining library functions that can be overridden in user code, though it can also be used with non-function declarations. The overriding symbol must have the same type as the weak symbol. In addition, if it designates a variable it must also have the same size and alignment as the weak symbol. Weak symbols are supported for ELF targets, and also for a.out targets when using the GNU assembler and linker.

可知这个attribute大意是对于该符号而言链接器会去优先寻找全局符号,无法找寻到符合条件的全局符号后才会使用弱符号,以避免重定义错误;即当一个符号可能或需要被用户覆盖时,可将之声明为一个弱符号

hook钩子不为NULL时会执行其指向的函数并返回,__libc_malloc函数便于此暂时结束其旅程了

而__malloc_hook的初始值便不为NULL,故在第一次调用malloc函数时一定会转到执行malloc_hook_ini函数

①malloc_hook__ini:重置__malloc_hook为NULL,调用ptmalloc初始化函数,重新调用__libc_malloc分配内存块

由于malloc_hook_ini函数在第一次调用malloc时一定会被调用一次,故我们先来看看这个函数

malloc_hook_ini函数的声明在malloc.c中,如下:

1
2
static void *malloc_hook_ini (size_t sz,
const void *caller) __THROW;

该函数定义于malloc/hook.c中,如下:

1
2
3
4
5
6
7
static void *
malloc_hook_ini (size_t sz, const void *caller)
{
__malloc_hook = NULL;
ptmalloc_init ();
return __libc_malloc (sz);
}

该函数做了三件小事:

  • 将__malloc_hook赋值为NULL,这样一来在第二次及以后调用malloc时都不会进入malloc_hook_ini函数,而是接着__libc_malloc的函数进程往下走
  • 调用ptmalloc_init()函数
  • 重新调用__libc_malloc()函数分配合适的内存块返回给用户

于是我们接下来来看看ptmalloc_init()函数

②ptmalloc_init():初始化ptmalloc堆管理器

该函数定义于malloc/arena.c中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
static void
ptmalloc_init (void)
{
if (__malloc_initialized >= 0)
return;

__malloc_initialized = 0;

#ifdef SHARED
/* In case this libc copy is in a non-default namespace, never use brk.
Likewise if dlopened from statically linked program. */
Dl_info di;
struct link_map *l;

if (_dl_open_hook != NULL
|| (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
&& l->l_ns != LM_ID_BASE))
__morecore = __failing_morecore;
#endif

thread_arena = &main_arena;
thread_atfork (ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
const char *s = NULL;
if (__glibc_likely (_environ != NULL))
{
char **runp = _environ;
char *envline;

while (__builtin_expect ((envline = next_env_entry (&runp)) != NULL,
0))
{
size_t len = strcspn (envline, "=");

if (envline[len] != '=')
/* This is a "MALLOC_" variable at the end of the string
without a '=' character. Ignore it since otherwise we
will access invalid memory below. */
continue;

switch (len)
{
case 6:
if (memcmp (envline, "CHECK_", 6) == 0)
s = &envline[7];
break;
case 8:
if (!__builtin_expect (__libc_enable_secure, 0))
{
if (memcmp (envline, "TOP_PAD_", 8) == 0)
__libc_mallopt (M_TOP_PAD, atoi (&envline[9]));
else if (memcmp (envline, "PERTURB_", 8) == 0)
__libc_mallopt (M_PERTURB, atoi (&envline[9]));
}
break;
case 9:
if (!__builtin_expect (__libc_enable_secure, 0))
{
if (memcmp (envline, "MMAP_MAX_", 9) == 0)
__libc_mallopt (M_MMAP_MAX, atoi (&envline[10]));
else if (memcmp (envline, "ARENA_MAX", 9) == 0)
__libc_mallopt (M_ARENA_MAX, atoi (&envline[10]));
}
break;
case 10:
if (!__builtin_expect (__libc_enable_secure, 0))
{
if (memcmp (envline, "ARENA_TEST", 10) == 0)
__libc_mallopt (M_ARENA_TEST, atoi (&envline[11]));
}
break;
case 15:
if (!__builtin_expect (__libc_enable_secure, 0))
{
if (memcmp (envline, "TRIM_THRESHOLD_", 15) == 0)
__libc_mallopt (M_TRIM_THRESHOLD, atoi (&envline[16]));
else if (memcmp (envline, "MMAP_THRESHOLD_", 15) == 0)
__libc_mallopt (M_MMAP_THRESHOLD, atoi (&envline[16]));
}
break;
default:
break;
}
}
}
if (s && s[0])
{
__libc_mallopt (M_CHECK_ACTION, (int) (s[0] - '0'));
if (check_action != 0)
__malloc_check_init ();
}
void (*hook) (void) = atomic_forced_read (__malloc_initialize_hook);
if (hook != NULL)
(*hook)();
__malloc_initialized = 1;
}
1)修改ptmalloc初始化标志位

开头先检测__maloc_initialized是否大于等于0,若否才会继续执行接下来的一切,而第一步便是将之置0,故不难得出这是一个用来检测ptmalloc是否已经初始化的标志位;

该变量同样定义于arena.c中,如下:

1
2
/* Already initialized? */
int __malloc_initialized = -1;

其初始值被设置为-1,保证了这个函数一定会被执行一次

2)检查SHARED宏

接下来会检查SHARED宏是否有被定义过,注释说明:假使该libc副本在非默认的命名空间中,则不要使用brk系统调用;同样的,若是在使用dlopen()打开一个静态链接的程序的情况下亦是如此

这个宏的定义位置暂且找不到,这里就先不管了

3)获取main_arena,添加atfork_mem入fork_handler链表

接下来会先设置该线程arena为main_arena,即在ptmalloc未初始化时默认会使用main_arena,而不是新建arena

随后是使用了thread_atfork()宏,该宏定义于sysdeps/nptl/malloc-machine.h中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#define ATFORK_MEM static struct fork_handler atfork_mem

#ifdef SHARED
# define thread_atfork(prepare, parent, child) \
atfork_mem.prepare_handler = prepare; \
atfork_mem.parent_handler = parent; \
atfork_mem.child_handler = child; \
atfork_mem.dso_handle = __dso_handle; \
atfork_mem.refcntr = 1; \
__linkin_atfork (&atfork_mem)
#else
# define thread_atfork(prepare, parent, child) \
atfork_mem.prepare_handler = prepare; \
atfork_mem.parent_handler = parent; \
atfork_mem.child_handler = child; \
atfork_mem.dso_handle = &__dso_handle == NULL ? NULL : __dso_handle; \
atfork_mem.refcntr = 1; \
__linkin_atfork (&atfork_mem)
#endif

这里的atfork_mem变量的类型为fork_handler结构体,该结构体定义与sysdeps/nptl/fork.h中,如下:

1
2
3
4
5
6
7
8
9
10
struct fork_handler
{
struct fork_handler *next;
void (*prepare_handler) (void);
void (*parent_handler) (void);
void (*child_handler) (void);
void *dso_handle;
unsigned int refcntr;
int need_signal;
};

不难看出fork_handler之间同样链接成一个单向链表

那么我们可以得出thread_atfork()宏设置了三个函数指针,并最终又调用了函数__linkin_atfork(),该函数定义于nptl/register-atfork.c中,如下:

1
2
3
4
5
6
7
8
9
void
attribute_hidden
__linkin_atfork (struct fork_handler *newp)
{
do
newp->next = __fork_handlers;
while (catomic_compare_and_exchange_bool_acq (&__fork_handlers,
newp, newp->next) != 0);
}

得出__linkin_atfork()的作用应当为将atfork_mem添加入fork_handler链表中

这一系列下来的操作的作用是:在接下来程序fork出子线程时会调用atfork_mem中的三个函数指针所指向的函数

4)环境变量__environ相关操作

接下来会检查_environ变量是否为NULL,该变量为__environ变量的弱别名,定义于posix/environ.c中,如下:

1
2
3
4
5
6
7
/* This must be initialized; we cannot have a weak alias into bss.  */
char **__environ = NULL;
weak_alias (__environ, environ)

/* The SVR4 ABI says `_environ' will be the name to use
in case the user overrides the weak alias `environ'. */
weak_alias (__environ, _environ)

weak_alias和strong_alias是类似的,只不过定义的是弱别名,这里便不再赘叙

__environ变量一般都不为NULL,因此在这里使用了分支预测优化封装宏__glibc_likely()(详细定义前文已有说明)

接下来会调用next_env_entry()函数,同样定义于arena.c中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
static char *
internal_function
next_env_entry (char ***position)
{
char **current = *position;
char *result = NULL;

while (*current != NULL)
{
if (__builtin_expect ((*current)[0] == 'M', 0)
&& (*current)[1] == 'A'
&& (*current)[2] == 'L'
&& (*current)[3] == 'L'
&& (*current)[4] == 'O'
&& (*current)[5] == 'C'
&& (*current)[6] == '_')
{
result = &(*current)[7];

/* Save current position for next visit. */
*position = ++current;

break;
}

++current;
}

return result;
}

大概作用是在__environ中找到开头为”MALLOC_“的字符串的地址,若找不到则返回NULL

通常情况下__environ中不存在这样的字符串,因而在这里用了builtin_expect宏

若该字符串存在,则会调用strcspn()函数查找字符'=',若不存在则又会重新回去调用next_env_entry()函数

笔者认为这里有可能造成死循环

函数中设定的需要处理的字符串如下:

  • MALLOC_CHECK_
  • MALLOC_TOP_PAD_
  • MALLOC_PERTURB_
  • MALLOC_MMAP_MAX_
  • MALLOC_ARENA_MAX
  • MALLOC_ARENA_TEST
  • MALLOC_TRIM_THRESHOLD_
  • MALLOC_MMAP_THRESHOLD_

需要注意的是,针对MALLOC_CHECK_,ptmalloc_init()函数会单独进行处理,而剩下的则会调用__libc_mallopt()函数进行处理

该函数定义于malloc.c中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
int
__libc_mallopt (int param_number, int value)
{
mstate av = &main_arena;
int res = 1;

if (__malloc_initialized < 0)
ptmalloc_init ();
(void) mutex_lock (&av->mutex);
/* Ensure initialization/consolidation */
malloc_consolidate (av);

LIBC_PROBE (memory_mallopt, 2, param_number, value);

switch (param_number)
{
case M_MXFAST:
if (value >= 0 && value <= MAX_FAST_SIZE)
{
LIBC_PROBE (memory_mallopt_mxfast, 2, value, get_max_fast ());
set_max_fast (value);
}
else
res = 0;
break;

case M_TRIM_THRESHOLD:
LIBC_PROBE (memory_mallopt_trim_threshold, 3, value,
mp_.trim_threshold, mp_.no_dyn_threshold);
mp_.trim_threshold = value;
mp_.no_dyn_threshold = 1;
break;

case M_TOP_PAD:
LIBC_PROBE (memory_mallopt_top_pad, 3, value,
mp_.top_pad, mp_.no_dyn_threshold);
mp_.top_pad = value;
mp_.no_dyn_threshold = 1;
break;

case M_MMAP_THRESHOLD:
/* Forbid setting the threshold too high. */
if ((unsigned long) value > HEAP_MAX_SIZE / 2)
res = 0;
else
{
LIBC_PROBE (memory_mallopt_mmap_threshold, 3, value,
mp_.mmap_threshold, mp_.no_dyn_threshold);
mp_.mmap_threshold = value;
mp_.no_dyn_threshold = 1;
}
break;

case M_MMAP_MAX:
LIBC_PROBE (memory_mallopt_mmap_max, 3, value,
mp_.n_mmaps_max, mp_.no_dyn_threshold);
mp_.n_mmaps_max = value;
mp_.no_dyn_threshold = 1;
break;

case M_CHECK_ACTION:
LIBC_PROBE (memory_mallopt_check_action, 2, value, check_action);
check_action = value;
break;

case M_PERTURB:
LIBC_PROBE (memory_mallopt_perturb, 2, value, perturb_byte);
perturb_byte = value;
break;

case M_ARENA_TEST:
if (value > 0)
{
LIBC_PROBE (memory_mallopt_arena_test, 2, value, mp_.arena_test);
mp_.arena_test = value;
}
break;

case M_ARENA_MAX:
if (value > 0)
{
LIBC_PROBE (memory_mallopt_arena_max, 2, value, mp_.arena_max);
mp_.arena_max = value;
}
break;
}
(void) mutex_unlock (&av->mutex);
return res;
}
libc_hidden_def (__libc_mallopt)

首先会检查malloc_initialized的值,若是小于0则会调用ptmalloc_init()函数,这是又套娃回去了,不过这里在前面开头已经将其置0了,所以这种情况不会发生

接下来首先会锁上main_arena的互斥锁,随后调用malloc_consolidate()函数清空fastbin中的chunk,这个函数我们会在后面详细分析

接下来就是针对不同的情况对不同的变量设置不同的(初始?)值,完成后解锁main_arena并返回

而对于字符串"MALLOC_CHECK_",则会先调用__libc_mallopt()函数,之后,检查变量check_action的值,若不为0则会调用__malloc_check_init()函数

笔者认为这一步其实可以合并到前面的switch中,而不需要单开一个if

check_action定义于malloc.c中,如下:

1
2
3
4
5
#ifndef DEFAULT_CHECK_ACTION
# define DEFAULT_CHECK_ACTION 3
#endif

static int check_action = DEFAULT_CHECK_ACTION;

即该静态变量的值默认为3,即默认情况下对于字符串"MALLOC_CHECK",都会调用__malloc_check_init()函数

该函数定义于malloc/hook.c中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
static int disallow_malloc_check;
/* Activate a standard set of debugging hooks. */
void
__malloc_check_init (void)
{
if (disallow_malloc_check)
{
disallow_malloc_check = 0;
return;
}
using_malloc_checking = 1;
__malloc_hook = malloc_check;
__free_hook = free_check;
__realloc_hook = realloc_check;
__memalign_hook = memalign_check;
}

若是变量disallow_malloc_check不为0则会赋值为0后直接返回,否则会将using_malloc_check赋值为1后将四个hook钩子分别设为对应的check函数

5)检查钩子__malloc_initialized_hook

最后便是检查钩子__malloc_initialize_hook是否为NULL,若否,则会调用其所指向的函数

该hook定义于malloc.c中,如下:

1
void weak_variable (*__malloc_initialize_hook) (void) = NULL;

即初始情况下该钩子应当为NULL

最后将__malloc_initialized的值设为1后,这个函数就结束了

2.获取一个arena

接下来会通过宏arena_get()尝试获取一个arena,这个宏的具体实现在前面已经详细叙述过了,这里就不再赘叙了

由于在ptmalloc_init()函数中已经将thread_arena设为main_arena了,所以第一次调用malloc时便会直接锁上main_arena后使用

对于其他后续进行调用的线程则会先检查free_list是否有可用空闲arena,若没有可用空闲arena则尝试在Memory Mapping Segment新建一个arena,若失败则只能与其他线程共享arena

⭐3.调用_int_malloc()函数分配内存

接下来__libc_malloc()函数会调用_int_malloc()函数真正开始进行内存分配,也就是说内存分配的核心函数其实是_int_malloc()函数,__libc_malloc()函数其实本身还是对该函数的一层封装,并附加上一些堆管理器相关的操作,该函数我们会在后面详细进行解析

4.检测结果,若为NULL则重新获得arena并重新调用_int_malloc()分配内存

若是_int_malloc()函数未能够成功获取到一个适合的堆块,则会向上层函数返回NULL,因此在这里会对其返回的结果进行检测

由于ptmalloc考虑到可能是arena不足以提供所需空间的原因,故在这里首先会调用arena_gert_retry()函数,重新尝试获得一个新的arena

①arena_get_retry():重新获得一个arena

该函数定义于arena.c中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* If we don't have the main arena, then maybe the failure is due to running
out of mmapped areas, so we can try allocating on the main arena.
Otherwise, it is likely that sbrk() has failed and there is still a chance
to mmap(), so try one of the other arenas. */
static mstate
arena_get_retry (mstate ar_ptr, size_t bytes)
{
LIBC_PROBE (memory_arena_retry, 2, bytes, ar_ptr);
if (ar_ptr != &main_arena)
{
(void) mutex_unlock (&ar_ptr->mutex);
/* Don't touch the main arena if it is corrupt. */
if (arena_is_corrupt (&main_arena))
return NULL;

ar_ptr = &main_arena;
(void) mutex_lock (&ar_ptr->mutex);
}
else
{
(void) mutex_unlock (&ar_ptr->mutex);
ar_ptr = arena_get2 (bytes, ar_ptr);
}

return ar_ptr;
}

注释中说明有两种可能的情况:

  • 若是我们使用的不是main_arena,则说明mmaped区域(Memory Mapping Segment)已经耗尽了,因此我们尝试回到main_arena上进行内存分配(by (s)brk)
  • 若是我们使用的就是main_arena,则说明(s)brk系统调用失败(堆区可用空间耗尽),但我们仍然有可能在mmaped区域有可用的空间,此时ptmalloc选择尝试其他的arena

因此其流程分支如下:

1)原arena不为main_arena,则解锁后获取main_arena

首先会将原先使用的arena解锁,随后检查main_arena是否被破坏,若main_arena已被破坏则直接返回NULL,否则锁上main_arena

2)原arena为main_arena,解锁后调用arena_get2()获取其他arena

若原arena就是main_arena,则此时会尝试调用arena_get2()函数获取一个arena,在这里main_arena被作为avoid_arena参数传入因而不会又重新获取到main_arena

最后将获取到的arena(可能为NULL)返回,该函数的流程便结束了

②重新调用_int_malloc()分配内存

唯一需要注意的一点或许就是_int_malloc()函数返回的内存是mem而不是chunk,因此可以直接返回给用户进行使用

5.检测无误后返回内存块

大概是会检测如下三个点:

  • 所获得的堆块是否为NULL
  • 该堆块是否是通过mmap获得的
  • (若该堆块是通过mmap获得的)该堆块所属的arena是否为我们此前所使用的arena

在最后一步使用了arena_for_chunk()宏,该宏已经在前面解析过了,这里便不再赘叙,简单来说就是获取一个chunk所属的arena

若是三个都没通过则说明内存分配过程中可能出了问题,则会直接触发assert,随后abort()终止进程,core dump

任一检测通过,则将获取到的堆块返回给上层调用的用户,整个malloc的进程就结束了

libc_hidden_def:函数定义隐藏

与libc_hidden_proto一样定义于libc-symbols.h下,如下:

1
2
3
4
5
6
7
8
9
10
11
#if IS_IN (libc)
# define libc_hidden_proto(name, attrs...) hidden_proto (name, ##attrs)
# define libc_hidden_tls_proto(name, attrs...) hidden_tls_proto (name, ##attrs)
# define libc_hidden_def(name) hidden_def (name)
...
#else
# define libc_hidden_proto(name, attrs...)
# define libc_hidden_tls_proto(name, attrs...)
# define libc_hidden_def(name)
...
#endif

同样是套娃定义,转到hidden_def,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#if defined SHARED && !defined NO_HIDDEN
# ifndef __ASSEMBLER__
...
# define hidden_def(name) __hidden_ver1(__GI_##name, name, name);
...

# else
/* For assembly, we need to do the opposite of what we do in C:
in assembly gcc __REDIRECT stuff is not in place, so functions
are defined by its normal name and we need to create the
__GI_* alias to it, in C __REDIRECT causes the function definition
to use __GI_* name and we need to add alias to the real name.
There is no reason to use hidden_weak over hidden_def in assembly,
but we provide it for consistency with the C usage.
hidden_proto doesn't make sense for assembly but the equivalent
is to call via the HIDDEN_JUMPTARGET macro instead of JUMPTARGET. */
# define hidden_def(name) strong_alias (name, __GI_##name)
...
#else
# ifndef __ASSEMBLER__
# define hidden_proto(name, attrs...)
# define hidden_tls_proto(name, attrs...)
# else
# define HIDDEN_JUMPTARGET(name) JUMPTARGET(name)
# endif /* Not __ASSEMBLER__ */
# define hidden_weak(name)
# define hidden_def(name)
...
#endif

不难想到libc_hidden_def()和此前的libc_hidden_proto是相类似的,只不过该宏隐藏的是函数定义,这里便不再展开说明

0x02.malloc_consolidate

由于这个函数是ptmalloc中十分重要的一个函数,同时也是各种pwn题中经常用到的一个函数,故我们先来解析其代码

该函数定义于malloc.c中,用以清空fastbins中chunk、合并相邻空闲chunk,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
/*
------------------------- malloc_consolidate -------------------------

malloc_consolidate is a specialized version of free() that tears
down chunks held in fastbins. Free itself cannot be used for this
purpose since, among other things, it might place chunks back onto
fastbins. So, instead, we need to use a minor variant of the same
code.

Also, because this routine needs to be called the first time through
malloc anyway, it turns out to be the perfect place to trigger
initialization code.
*/

static void malloc_consolidate(mstate av)
{
mfastbinptr* fb; /* current fastbin being consolidated */
mfastbinptr* maxfb; /* last fastbin (for loop control) */
mchunkptr p; /* current chunk being consolidated */
mchunkptr nextp; /* next chunk to consolidate */
mchunkptr unsorted_bin; /* bin header */
mchunkptr first_unsorted; /* chunk to link to */

/* These have same use as in free() */
mchunkptr nextchunk;
INTERNAL_SIZE_T size;
INTERNAL_SIZE_T nextsize;
INTERNAL_SIZE_T prevsize;
int nextinuse;
mchunkptr bck;
mchunkptr fwd;

/*
If max_fast is 0, we know that av hasn't
yet been initialized, in which case do so below
*/

if (get_max_fast () != 0) {
clear_fastchunks(av);

unsorted_bin = unsorted_chunks(av);

/*
Remove each chunk from fast bin and consolidate it, placing it
then in unsorted bin. Among other reasons for doing this,
placing in unsorted bin avoids needing to calculate actual bins
until malloc is sure that chunks aren't immediately going to be
reused anyway.
*/

maxfb = &fastbin (av, NFASTBINS - 1);
fb = &fastbin (av, 0);
do {
p = atomic_exchange_acq (fb, 0);
if (p != 0) {
do {
check_inuse_chunk(av, p);
nextp = p->fd;

/* Slightly streamlined version of consolidation code in free() */
size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
nextchunk = chunk_at_offset(p, size);
nextsize = chunksize(nextchunk);

if (!prev_inuse(p)) {
prevsize = p->prev_size;
size += prevsize;
p = chunk_at_offset(p, -((long) prevsize));
unlink(av, p, bck, fwd);
}

if (nextchunk != av->top) {
nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

if (!nextinuse) {
size += nextsize;
unlink(av, nextchunk, bck, fwd);
} else
clear_inuse_bit_at_offset(nextchunk, 0);

first_unsorted = unsorted_bin->fd;
unsorted_bin->fd = p;
first_unsorted->bk = p;

if (!in_smallbin_range (size)) {
p->fd_nextsize = NULL;
p->bk_nextsize = NULL;
}

set_head(p, size | PREV_INUSE);
p->bk = unsorted_bin;
p->fd = first_unsorted;
set_foot(p, size);
}

else {
size += nextsize;
set_head(p, size | PREV_INUSE);
av->top = p;
}

} while ( (p = nextp) != 0);

}
} while (fb++ != maxfb);
}
else {
malloc_init_state(av);
check_malloc_state(av);
}
}

首先会先检查global_max_fast

  • 若不为0则会设置该arena的FASTCHUNKS_BIT标志位,意味着接下来该函数会清空该arena的fastbin中的chunks
  • 若为0则意味着该线程的arena未进行初始化,此时便会调用malloc_init_state()函数初始化该arena,随后使用check_malloc_state()宏进行检查后该函数便结束了,在非DEBUG模式下(未定义宏MALLOC_DEBUG)该宏为空

接在已经初始化的情况下接下来会分别获取指向arena中fastbinsY数组首尾元素的指针,用以对fastbinsY数组进行遍历,这个遍历由两层循环嵌套而成:

外层循环:遍历fastbinsY数组元素

我们单独将这个循环嵌套拿出来看为如下形式:

1
2
3
4
5
6
7
8
9
do {
p = atomic_exchange_acq (fb, 0);
if (p != 0)
{
do {
// inner loop there
} while ( (p = nextp) != 0);
}
} while (fb++ != maxfb);

我们不难看出外层循环的作用便是逐个取出fastbinsY数组中元素,随后交由内层循环进行下一步的处理

在这里使用了一个宏atomic_exchange_acq(),用途是通过原子读操作设置新值,返回旧值,前面已解析过类似宏,这里便不再赘叙

虽然前文我们有讲到fastbinsY数组的下标包括7及往后都是用不到的,但是在这里仍会尝试对齐进行遍历

内层循环:遍历fastbin 链表、合并空闲chunk

在内层循环中会从我们从外层循环中所取得的fastbins链表的头结点开始进行遍历,并进行空闲chunk的合并以减少内存碎片

获取到的头结点为NULL则会直接跳过该层循环,否则进入下面的步骤

在这里有一个宏check_inuse_chunk(),该宏在非DEBUG模式下(宏MALLOC_DEBUG undefined)也为空

首先会获取该节点chunk的size,在这里获取到的size清除了PREV_INUSE标志位与NON_MAIN_ARENA标志位,这是为了在后面的合并过程中能够准确地获取到高地址相邻chunk的位置

说起来网上很多关于malloc_consolidate()的博客都有讲到所谓「前向合并」和「后向合并」,但是哪边是前哪边是后笔者暂且蒙在古里(注释里笔者也没看到forward/backward…

那么下文我们按照对其称呼的惯例做出如下约定:将向高地址的合并称为「前向合并」,向低地址的合并称为「后向合并」

①后向合并:使用unlink取出相邻低地址chunk(若已free)

首先会检查该chunk的PREV_IN_USE标志位,若不为1则说明低地址相邻chunk必为存放在bins数组中的free chunk,这是因为当一个chunk被使用中/被放入fastbin中,其相邻高地址的chunk的PREV_INUSE标志位都不会被清除,只有放入bins中才会清除该标志位,相关机制我们会在后文的_int_free()函数中详细进行分析

这种情况下会在前面的size变量中加上原chunk的prevsize域的值,将chunk指针移动至指向该空闲chunk,如下图所示

image.png

最后使用unlink宏从bins中取出该chunk,后向合并的过程就完成了

image.png

需要注意的是在这里是使用原chunk的prevsize域计算该空闲chunk的大小,而并非直接使用低地址空闲chunk的size域

②前向合并:若为top chunk则合并入top chunk中,否则会尝试合并后插入unsorted_bin中

1)nextchunk为top chunk

首先会检查nextchunk是否为top chunk,若是则流程会简便得多:将该chunk的size加上nextchunk的size,设置PREV_INUSE标志位,将arena的top chunk设置为该chunk后便进入下一步

image.png

5)nextchunk不为top chunk

若nextchunk不为top chunk,则首先会检查nextchunk的物理相邻高地址chunk的PREV_INUSE标志位

  • 若不为0则会清除nextchunk的PREV_INSUE位

image.png

  • 若为0则使用unlink将nextchunk从bins中取出

image.png

接下来会将该chunk放入unsorted bin中,需要注意的是在这里使用的是头插法使其成为unsorted bin的头结点

image.png

接下来会检查chunk最终的大小,若是size不处于smallbins范围内则会设置其fd_nextsize与bk_nextsize为NULL

最后,设置该chunk的PREV_INUSE位为1,设置当前情况下的物理相邻chunk的prevsize域为该chunk的size

image.png

③检查链表中的下一个chunk,不为NULL则继续新一轮循环

虽然chunk指针可能在合并中被修改,但是我们在前面已经保存了其fd,若不为NULL则进行下一轮循环,故可以继续遍历链表的进程

0x03._int_malloc

该函数为ptmalloc内存分配机制的核心函数,定义于malloc.c中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
/*
------------------------------ malloc ------------------------------
*/

static void *
_int_malloc (mstate av, size_t bytes)
{
INTERNAL_SIZE_T nb; /* normalized request size */
unsigned int idx; /* associated bin index */
mbinptr bin; /* associated bin */

mchunkptr victim; /* inspected/selected chunk */
INTERNAL_SIZE_T size; /* its size */
int victim_index; /* its bin index */

mchunkptr remainder; /* remainder from a split */
unsigned long remainder_size; /* its size */

unsigned int block; /* bit map traverser */
unsigned int bit; /* bit map traverser */
unsigned int map; /* current word of binmap */

mchunkptr fwd; /* misc temp for linking */
mchunkptr bck; /* misc temp for linking */

const char *errstr = NULL;

/*
Convert request size to internal form by adding SIZE_SZ bytes
overhead plus possibly more to obtain necessary alignment and/or
to obtain a size of at least MINSIZE, the smallest allocatable
size. Also, checked_request2size traps (returning 0) request sizes
that are so large that they wrap around zero when padded and
aligned.
*/

checked_request2size (bytes, nb);

/* There are no usable arenas. Fall back to sysmalloc to get a chunk from
mmap. */
if (__glibc_unlikely (av == NULL))
{
void *p = sysmalloc (nb, av);
if (p != NULL)
alloc_perturb (p, bytes);
return p;
}

/*
If the size qualifies as a fastbin, first check corresponding bin.
This code is safe to execute even if av is not yet initialized, so we
can try it without checking, which saves some time on this fast path.
*/

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
{
idx = fastbin_index (nb);
mfastbinptr *fb = &fastbin (av, idx);
mchunkptr pp = *fb;
do
{
victim = pp;
if (victim == NULL)
break;
}
while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
!= victim);
if (victim != 0)
{
if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
{
errstr = "malloc(): memory corruption (fast)";
errout:
malloc_printerr (check_action, errstr, chunk2mem (victim), av);
return NULL;
}
check_remalloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}

/*
If a small request, check regular bin. Since these "smallbins"
hold one size each, no searching within bins is necessary.
(For a large request, we need to wait until unsorted chunks are
processed to find best fit. But for small ones, fits are exact
anyway, so we can check now, which is faster.)
*/

if (in_smallbin_range (nb))
{
idx = smallbin_index (nb);
bin = bin_at (av, idx);

if ((victim = last (bin)) != bin)
{
if (victim == 0) /* initialization check */
malloc_consolidate (av);
else
{
bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))
{
errstr = "malloc(): smallbin double linked list corrupted";
goto errout;
}
set_inuse_bit_at_offset (victim, nb);
bin->bk = bck;
bck->fd = bin;

if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}
}

/*
If this is a large request, consolidate fastbins before continuing.
While it might look excessive to kill all fastbins before
even seeing if there is space available, this avoids
fragmentation problems normally associated with fastbins.
Also, in practice, programs tend to have runs of either small or
large requests, but less often mixtures, so consolidation is not
invoked all that often in most programs. And the programs that
it is called frequently in otherwise tend to fragment.
*/

else
{
idx = largebin_index (nb);
if (have_fastchunks (av))
malloc_consolidate (av);
}

/*
Process recently freed or remaindered chunks, taking one only if
it is exact fit, or, if this a small request, the chunk is remainder from
the most recent non-exact fit. Place other traversed chunks in
bins. Note that this step is the only place in any routine where
chunks are placed in bins.

The outer loop here is needed because we might not realize until
near the end of malloc that we should have consolidated, so must
do so and retry. This happens at most once, and only when we would
otherwise need to expand memory to service a "small" request.
*/

for (;; )
{
int iters = 0;
while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
{
bck = victim->bk;
if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
|| __builtin_expect (victim->size > av->system_mem, 0))
malloc_printerr (check_action, "malloc(): memory corruption",
chunk2mem (victim), av);
size = chunksize (victim);

/*
If a small request, try to use last remainder if it is the
only chunk in unsorted bin. This helps promote locality for
runs of consecutive small requests. This is the only
exception to best-fit, and applies only when there is
no exact fit for a small chunk.
*/

if (in_smallbin_range (nb) &&
bck == unsorted_chunks (av) &&
victim == av->last_remainder &&
(unsigned long) (size) > (unsigned long) (nb + MINSIZE))
{
/* split and reattach remainder */
remainder_size = size - nb;
remainder = chunk_at_offset (victim, nb);
unsorted_chunks (av)->bk = unsorted_chunks (av)->fd = remainder;
av->last_remainder = remainder;
remainder->bk = remainder->fd = unsorted_chunks (av);
if (!in_smallbin_range (remainder_size))
{
remainder->fd_nextsize = NULL;
remainder->bk_nextsize = NULL;
}

set_head (victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);
set_foot (remainder, remainder_size);

check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}

/* remove from unsorted list */
unsorted_chunks (av)->bk = bck;
bck->fd = unsorted_chunks (av);

/* Take now instead of binning if exact fit */

if (size == nb)
{
set_inuse_bit_at_offset (victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}

/* place chunk in bin */

if (in_smallbin_range (size))
{
victim_index = smallbin_index (size);
bck = bin_at (av, victim_index);
fwd = bck->fd;
}
else
{
victim_index = largebin_index (size);
bck = bin_at (av, victim_index);
fwd = bck->fd;

/* maintain large bins in sorted order */
if (fwd != bck)
{
/* Or with inuse bit to speed comparisons */
size |= PREV_INUSE;
/* if smaller than smallest, bypass loop below */
assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
if ((unsigned long) (size) < (unsigned long) (bck->bk->size))
{
fwd = bck;
bck = bck->bk;

victim->fd_nextsize = fwd->fd;
victim->bk_nextsize = fwd->fd->bk_nextsize;
fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
}
else
{
assert ((fwd->size & NON_MAIN_ARENA) == 0);
while ((unsigned long) size < fwd->size)
{
fwd = fwd->fd_nextsize;
assert ((fwd->size & NON_MAIN_ARENA) == 0);
}

if ((unsigned long) size == (unsigned long) fwd->size)
/* Always insert in the second position. */
fwd = fwd->fd;
else
{
victim->fd_nextsize = fwd;
victim->bk_nextsize = fwd->bk_nextsize;
fwd->bk_nextsize = victim;
victim->bk_nextsize->fd_nextsize = victim;
}
bck = fwd->bk;
}
}
else
victim->fd_nextsize = victim->bk_nextsize = victim;
}

mark_bin (av, victim_index);
victim->bk = bck;
victim->fd = fwd;
fwd->bk = victim;
bck->fd = victim;

#define MAX_ITERS 10000
if (++iters >= MAX_ITERS)
break;
}

/*
If a large request, scan through the chunks of current bin in
sorted order to find smallest that fits. Use the skip list for this.
*/

if (!in_smallbin_range (nb))
{
bin = bin_at (av, idx);

/* skip scan if empty or largest chunk is too small */
if ((victim = first (bin)) != bin &&
(unsigned long) (victim->size) >= (unsigned long) (nb))
{
victim = victim->bk_nextsize;
while (((unsigned long) (size = chunksize (victim)) <
(unsigned long) (nb)))
victim = victim->bk_nextsize;

/* Avoid removing the first entry for a size so that the skip
list does not have to be rerouted. */
if (victim != last (bin) && victim->size == victim->fd->size)
victim = victim->fd;

remainder_size = size - nb;
unlink (av, victim, bck, fwd);

/* Exhaust */
if (remainder_size < MINSIZE)
{
set_inuse_bit_at_offset (victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
}
/* Split */
else
{
remainder = chunk_at_offset (victim, nb);
/* We cannot assume the unsorted list is empty and therefore
have to perform a complete insert here. */
bck = unsorted_chunks (av);
fwd = bck->fd;
if (__glibc_unlikely (fwd->bk != bck))
{
errstr = "malloc(): corrupted unsorted chunks";
goto errout;
}
remainder->bk = bck;
remainder->fd = fwd;
bck->fd = remainder;
fwd->bk = remainder;
if (!in_smallbin_range (remainder_size))
{
remainder->fd_nextsize = NULL;
remainder->bk_nextsize = NULL;
}
set_head (victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);
set_foot (remainder, remainder_size);
}
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}

/*
Search for a chunk by scanning bins, starting with next largest
bin. This search is strictly by best-fit; i.e., the smallest
(with ties going to approximately the least recently used) chunk
that fits is selected.

The bitmap avoids needing to check that most blocks are nonempty.
The particular case of skipping all bins during warm-up phases
when no chunks have been returned yet is faster than it might look.
*/

++idx;
bin = bin_at (av, idx);
block = idx2block (idx);
map = av->binmap[block];
bit = idx2bit (idx);

for (;; )
{
/* Skip rest of block if there are no more set bits in this block. */
if (bit > map || bit == 0)
{
do
{
if (++block >= BINMAPSIZE) /* out of bins */
goto use_top;
}
while ((map = av->binmap[block]) == 0);

bin = bin_at (av, (block << BINMAPSHIFT));
bit = 1;
}

/* Advance to bin with set bit. There must be one. */
while ((bit & map) == 0)
{
bin = next_bin (bin);
bit <<= 1;
assert (bit != 0);
}

/* Inspect the bin. It is likely to be non-empty */
victim = last (bin);

/* If a false alarm (empty bin), clear the bit. */
if (victim == bin)
{
av->binmap[block] = map &= ~bit; /* Write through */
bin = next_bin (bin);
bit <<= 1;
}

else
{
size = chunksize (victim);

/* We know the first chunk in this bin is big enough to use. */
assert ((unsigned long) (size) >= (unsigned long) (nb));

remainder_size = size - nb;

/* unlink */
unlink (av, victim, bck, fwd);

/* Exhaust */
if (remainder_size < MINSIZE)
{
set_inuse_bit_at_offset (victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
}

/* Split */
else
{
remainder = chunk_at_offset (victim, nb);

/* We cannot assume the unsorted list is empty and therefore
have to perform a complete insert here. */
bck = unsorted_chunks (av);
fwd = bck->fd;
if (__glibc_unlikely (fwd->bk != bck))
{
errstr = "malloc(): corrupted unsorted chunks 2";
goto errout;
}
remainder->bk = bck;
remainder->fd = fwd;
bck->fd = remainder;
fwd->bk = remainder;

/* advertise as last remainder */
if (in_smallbin_range (nb))
av->last_remainder = remainder;
if (!in_smallbin_range (remainder_size))
{
remainder->fd_nextsize = NULL;
remainder->bk_nextsize = NULL;
}
set_head (victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);
set_foot (remainder, remainder_size);
}
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}

use_top:
/*
If large enough, split off the chunk bordering the end of memory
(held in av->top). Note that this is in accord with the best-fit
search rule. In effect, av->top is treated as larger (and thus
less well fitting) than any other available chunk since it can
be extended to be as large as necessary (up to system
limitations).

We require that av->top always exists (i.e., has size >=
MINSIZE) after initialization, so if it would otherwise be
exhausted by current request, it is replenished. (The main
reason for ensuring it exists is that we may need MINSIZE space
to put in fenceposts in sysmalloc.)
*/

victim = av->top;
size = chunksize (victim);

if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
{
remainder_size = size - nb;
remainder = chunk_at_offset (victim, nb);
av->top = remainder;
set_head (victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);

check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}

/* When we are using atomic ops to free fast chunks we can get
here for all block sizes. */
else if (have_fastchunks (av))
{
malloc_consolidate (av);
/* restore original bin index */
if (in_smallbin_range (nb))
idx = smallbin_index (nb);
else
idx = largebin_index (nb);
}

/*
Otherwise, relay to handle system-dependent cases
*/
else
{
void *p = sysmalloc (nb, av);
if (p != NULL)
alloc_perturb (p, bytes);
return p;
}
}
}

在函数一开始会创建一些后续分配机制所需使用的变量,具体用途见注释,这里便不再赘叙

一、将请求大小转换为标准大小

在_int_malloc()函数一开始时会将用户所请求的大小转换为对应的与MALLOC_ALIGNMENT对齐的大小,在这里使用了一个宏checked_request2size(),其会使用request2size()宏进行计算,chunk size的计算规则大致如下图所示(该宏会将请求的size先加上chunk header的大小):

image.png

详见0x00.二.1

二、检查arena是否可用

上文我们讲到,获取可用arena这一步骤是在__libc_malloc()函数中完成的,_int_malloc()函数所需要做的仅仅是使用上层调用函数传入的arena,但是在__libc_malloc()函数中我们有可能无法得到一个可用的arena(详见0x02.2 & 0x02.4)

因而在这里会首先检查传入的arena,若为NULL则说明现有arena已经无法满足用户内存请求,此时会直接调用sysmalloc()函数通过mmap系统调用分配内存,并将所得的结果直接返回上层调用

若sysmalloc返回结果不为NULL,则会调用alloc_perturb()函数对我们所获得的内存块进行预处理,该函数定义于malloc.c中,如下:

1
2
3
4
5
6
7
8
9
10
/* ------------------ Testing support ----------------------------------*/

static int perturb_byte;

static void
alloc_perturb (char *p, size_t n)
{
if (__glibc_unlikely (perturb_byte))
memset (p, perturb_byte ^ 0xff, n);
}

若是静态全局变量perturb_byte为0,则会将该内存块中用户请求大小的部分全部设为0xff

这里用了一个不知所谓的异或操作,笔者暂且蒙在古里…

三、fastbin范围size请求相关操作

若所需chunk的size在fastbin的范围内(32位<=0x40,64位<=0x80,详见0x00.四),则会进行如下操作:

1.尝试从fastbin对应下标取出chunk

2.(若成功获得chunk)进行chunk size安全检查

0x04.sysmalloc:

定义于malloc.c中,用以真正调用系统调用的接口进行内存分配,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
/* ----------- Routines dealing with system allocation -------------- */

/*
sysmalloc handles malloc cases requiring more memory from the system.
On entry, it is assumed that av->top does not have enough
space to service request for nb bytes, thus requiring that av->top
be extended or replaced.
*/

static void *
sysmalloc (INTERNAL_SIZE_T nb, mstate av)
{
mchunkptr old_top; /* incoming value of av->top */
INTERNAL_SIZE_T old_size; /* its size */
char *old_end; /* its end address */

long size; /* arg to first MORECORE or mmap call */
char *brk; /* return value from MORECORE */

long correction; /* arg to 2nd MORECORE call */
char *snd_brk; /* 2nd return val */

INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */
INTERNAL_SIZE_T end_misalign; /* partial page left at end of new space */
char *aligned_brk; /* aligned offset into brk */

mchunkptr p; /* the allocated/returned chunk */
mchunkptr remainder; /* remainder from allocation */
unsigned long remainder_size; /* its size */


size_t pagesize = GLRO (dl_pagesize);
bool tried_mmap = false;


/*
If have mmap, and the request size meets the mmap threshold, and
the system supports mmap, and there are few enough currently
allocated mmapped regions, try to directly map this request
rather than expanding top.
*/

if (av == NULL
|| ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
&& (mp_.n_mmaps < mp_.n_mmaps_max)))
{
char *mm; /* return value from mmap call*/

try_mmap:
/*
Round up size to nearest page. For mmapped chunks, the overhead
is one SIZE_SZ unit larger than for normal chunks, because there
is no following chunk whose prev_size field could be used.

See the front_misalign handling below, for glibc there is no
need for further alignments unless we have have high alignment.
*/
if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
size = ALIGN_UP (nb + SIZE_SZ, pagesize);
else
size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize);
tried_mmap = true;

/* Don't try if size wraps around 0 */
if ((unsigned long) (size) > (unsigned long) (nb))
{
mm = (char *) (MMAP (0, size, PROT_READ | PROT_WRITE, 0));

if (mm != MAP_FAILED)
{
/*
The offset to the start of the mmapped region is stored
in the prev_size field of the chunk. This allows us to adjust
returned start address to meet alignment requirements here
and in memalign(), and still be able to compute proper
address argument for later munmap in free() and realloc().
*/

if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
{
/* For glibc, chunk2mem increases the address by 2*SIZE_SZ and
MALLOC_ALIGN_MASK is 2*SIZE_SZ-1. Each mmap'ed area is page
aligned and therefore definitely MALLOC_ALIGN_MASK-aligned. */
assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);
front_misalign = 0;
}
else
front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;
if (front_misalign > 0)
{
correction = MALLOC_ALIGNMENT - front_misalign;
p = (mchunkptr) (mm + correction);
p->prev_size = correction;
set_head (p, (size - correction) | IS_MMAPPED);
}
else
{
p = (mchunkptr) mm;
set_head (p, size | IS_MMAPPED);
}

/* update statistics */

int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1;
atomic_max (&mp_.max_n_mmaps, new);

unsigned long sum;
sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size;
atomic_max (&mp_.max_mmapped_mem, sum);

check_chunk (av, p);

return chunk2mem (p);
}
}
}

/* There are no usable arenas and mmap also failed. */
if (av == NULL)
return 0;

/* Record incoming configuration of top */

old_top = av->top;
old_size = chunksize (old_top);
old_end = (char *) (chunk_at_offset (old_top, old_size));

brk = snd_brk = (char *) (MORECORE_FAILURE);

/*
If not the first time through, we require old_size to be
at least MINSIZE and to have prev_inuse set.
*/

assert ((old_top == initial_top (av) && old_size == 0) ||
((unsigned long) (old_size) >= MINSIZE &&
prev_inuse (old_top) &&
((unsigned long) old_end & (pagesize - 1)) == 0));

/* Precondition: not enough current space to satisfy nb request */
assert ((unsigned long) (old_size) < (unsigned long) (nb + MINSIZE));


if (av != &main_arena)
{
heap_info *old_heap, *heap;
size_t old_heap_size;

/* First try to extend the current heap. */
old_heap = heap_for_ptr (old_top);
old_heap_size = old_heap->size;
if ((long) (MINSIZE + nb - old_size) > 0
&& grow_heap (old_heap, MINSIZE + nb - old_size) == 0)
{
av->system_mem += old_heap->size - old_heap_size;
arena_mem += old_heap->size - old_heap_size;
set_head (old_top, (((char *) old_heap + old_heap->size) - (char *) old_top)
| PREV_INUSE);
}
else if ((heap = new_heap (nb + (MINSIZE + sizeof (*heap)), mp_.top_pad)))
{
/* Use a newly allocated heap. */
heap->ar_ptr = av;
heap->prev = old_heap;
av->system_mem += heap->size;
arena_mem += heap->size;
/* Set up the new top. */
top (av) = chunk_at_offset (heap, sizeof (*heap));
set_head (top (av), (heap->size - sizeof (*heap)) | PREV_INUSE);

/* Setup fencepost and free the old top chunk with a multiple of
MALLOC_ALIGNMENT in size. */
/* The fencepost takes at least MINSIZE bytes, because it might
become the top chunk again later. Note that a footer is set
up, too, although the chunk is marked in use. */
old_size = (old_size - MINSIZE) & ~MALLOC_ALIGN_MASK;
set_head (chunk_at_offset (old_top, old_size + 2 * SIZE_SZ), 0 | PREV_INUSE);
if (old_size >= MINSIZE)
{
set_head (chunk_at_offset (old_top, old_size), (2 * SIZE_SZ) | PREV_INUSE);
set_foot (chunk_at_offset (old_top, old_size), (2 * SIZE_SZ));
set_head (old_top, old_size | PREV_INUSE | NON_MAIN_ARENA);
_int_free (av, old_top, 1);
}
else
{
set_head (old_top, (old_size + 2 * SIZE_SZ) | PREV_INUSE);
set_foot (old_top, (old_size + 2 * SIZE_SZ));
}
}
else if (!tried_mmap)
/* We can at least try to use to mmap memory. */
goto try_mmap;
}
else /* av == main_arena */


{ /* Request enough space for nb + pad + overhead */
size = nb + mp_.top_pad + MINSIZE;

/*
If contiguous, we can subtract out existing space that we hope to
combine with new space. We add it back later only if
we don't actually get contiguous space.
*/

if (contiguous (av))
size -= old_size;

/*
Round to a multiple of page size.
If MORECORE is not contiguous, this ensures that we only call it
with whole-page arguments. And if MORECORE is contiguous and
this is not first time through, this preserves page-alignment of
previous calls. Otherwise, we correct to page-align below.
*/

size = ALIGN_UP (size, pagesize);

/*
Don't try to call MORECORE if argument is so big as to appear
negative. Note that since mmap takes size_t arg, it may succeed
below even if we cannot call MORECORE.
*/

if (size > 0)
{
brk = (char *) (MORECORE (size));
LIBC_PROBE (memory_sbrk_more, 2, brk, size);
}

if (brk != (char *) (MORECORE_FAILURE))
{
/* Call the `morecore' hook if necessary. */
void (*hook) (void) = atomic_forced_read (__after_morecore_hook);
if (__builtin_expect (hook != NULL, 0))
(*hook)();
}
else
{
/*
If have mmap, try using it as a backup when MORECORE fails or
cannot be used. This is worth doing on systems that have "holes" in
address space, so sbrk cannot extend to give contiguous space, but
space is available elsewhere. Note that we ignore mmap max count
and threshold limits, since the space will not be used as a
segregated mmap region.
*/

/* Cannot merge with old top, so add its size back in */
if (contiguous (av))
size = ALIGN_UP (size + old_size, pagesize);

/* If we are relying on mmap as backup, then use larger units */
if ((unsigned long) (size) < (unsigned long) (MMAP_AS_MORECORE_SIZE))
size = MMAP_AS_MORECORE_SIZE;

/* Don't try if size wraps around 0 */
if ((unsigned long) (size) > (unsigned long) (nb))
{
char *mbrk = (char *) (MMAP (0, size, PROT_READ | PROT_WRITE, 0));

if (mbrk != MAP_FAILED)
{
/* We do not need, and cannot use, another sbrk call to find end */
brk = mbrk;
snd_brk = brk + size;

/*
Record that we no longer have a contiguous sbrk region.
After the first time mmap is used as backup, we do not
ever rely on contiguous space since this could incorrectly
bridge regions.
*/
set_noncontiguous (av);
}
}
}

if (brk != (char *) (MORECORE_FAILURE))
{
if (mp_.sbrk_base == 0)
mp_.sbrk_base = brk;
av->system_mem += size;

/*
If MORECORE extends previous space, we can likewise extend top size.
*/

if (brk == old_end && snd_brk == (char *) (MORECORE_FAILURE))
set_head (old_top, (size + old_size) | PREV_INUSE);

else if (contiguous (av) && old_size && brk < old_end)
{
/* Oops! Someone else killed our space.. Can't touch anything. */
malloc_printerr (3, "break adjusted to free malloc space", brk,
av);
}

/*
Otherwise, make adjustments:

* If the first time through or noncontiguous, we need to call sbrk
just to find out where the end of memory lies.

* We need to ensure that all returned chunks from malloc will meet
MALLOC_ALIGNMENT

* If there was an intervening foreign sbrk, we need to adjust sbrk
request size to account for fact that we will not be able to
combine new space with existing space in old_top.

* Almost all systems internally allocate whole pages at a time, in
which case we might as well use the whole last page of request.
So we allocate enough more memory to hit a page boundary now,
which in turn causes future contiguous calls to page-align.
*/

else
{
front_misalign = 0;
end_misalign = 0;
correction = 0;
aligned_brk = brk;

/* handle contiguous cases */
if (contiguous (av))
{
/* Count foreign sbrk as system_mem. */
if (old_size)
av->system_mem += brk - old_end;

/* Guarantee alignment of first new chunk made from this space */

front_misalign = (INTERNAL_SIZE_T) chunk2mem (brk) & MALLOC_ALIGN_MASK;
if (front_misalign > 0)
{
/*
Skip over some bytes to arrive at an aligned position.
We don't need to specially mark these wasted front bytes.
They will never be accessed anyway because
prev_inuse of av->top (and any chunk created from its start)
is always true after initialization.
*/

correction = MALLOC_ALIGNMENT - front_misalign;
aligned_brk += correction;
}

/*
If this isn't adjacent to existing space, then we will not
be able to merge with old_top space, so must add to 2nd request.
*/

correction += old_size;

/* Extend the end address to hit a page boundary */
end_misalign = (INTERNAL_SIZE_T) (brk + size + correction);
correction += (ALIGN_UP (end_misalign, pagesize)) - end_misalign;

assert (correction >= 0);
snd_brk = (char *) (MORECORE (correction));

/*
If can't allocate correction, try to at least find out current
brk. It might be enough to proceed without failing.

Note that if second sbrk did NOT fail, we assume that space
is contiguous with first sbrk. This is a safe assumption unless
program is multithreaded but doesn't use locks and a foreign sbrk
occurred between our first and second calls.
*/

if (snd_brk == (char *) (MORECORE_FAILURE))
{
correction = 0;
snd_brk = (char *) (MORECORE (0));
}
else
{
/* Call the `morecore' hook if necessary. */
void (*hook) (void) = atomic_forced_read (__after_morecore_hook);
if (__builtin_expect (hook != NULL, 0))
(*hook)();
}
}

/* handle non-contiguous cases */
else
{
if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
/* MORECORE/mmap must correctly align */
assert (((unsigned long) chunk2mem (brk) & MALLOC_ALIGN_MASK) == 0);
else
{
front_misalign = (INTERNAL_SIZE_T) chunk2mem (brk) & MALLOC_ALIGN_MASK;
if (front_misalign > 0)
{
/*
Skip over some bytes to arrive at an aligned position.
We don't need to specially mark these wasted front bytes.
They will never be accessed anyway because
prev_inuse of av->top (and any chunk created from its start)
is always true after initialization.
*/

aligned_brk += MALLOC_ALIGNMENT - front_misalign;
}
}

/* Find out current end of memory */
if (snd_brk == (char *) (MORECORE_FAILURE))
{
snd_brk = (char *) (MORECORE (0));
}
}

/* Adjust top based on results of second sbrk */
if (snd_brk != (char *) (MORECORE_FAILURE))
{
av->top = (mchunkptr) aligned_brk;
set_head (av->top, (snd_brk - aligned_brk + correction) | PREV_INUSE);
av->system_mem += correction;

/*
If not the first time through, we either have a
gap due to foreign sbrk or a non-contiguous region. Insert a
double fencepost at old_top to prevent consolidation with space
we don't own. These fenceposts are artificial chunks that are
marked as inuse and are in any case too small to use. We need
two to make sizes and alignments work out.
*/

if (old_size != 0)
{
/*
Shrink old_top to insert fenceposts, keeping size a
multiple of MALLOC_ALIGNMENT. We know there is at least
enough space in old_top to do this.
*/
old_size = (old_size - 4 * SIZE_SZ) & ~MALLOC_ALIGN_MASK;
set_head (old_top, old_size | PREV_INUSE);

/*
Note that the following assignments completely overwrite
old_top when old_size was previously MINSIZE. This is
intentional. We need the fencepost, even if old_top otherwise gets
lost.
*/
chunk_at_offset (old_top, old_size)->size =
(2 * SIZE_SZ) | PREV_INUSE;

chunk_at_offset (old_top, old_size + 2 * SIZE_SZ)->size =
(2 * SIZE_SZ) | PREV_INUSE;

/* If possible, release the rest. */
if (old_size >= MINSIZE)
{
_int_free (av, old_top, 1);
}
}
}
}
}
} /* if (av != &main_arena) */

if ((unsigned long) av->system_mem > (unsigned long) (av->max_system_mem))
av->max_system_mem = av->system_mem;
check_malloc_state (av);

/* finally, do the allocation */
p = av->top;
size = chunksize (p);

/* check that one of the above allocation paths succeeded */
if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
{
remainder_size = size - nb;
remainder = chunk_at_offset (p, nb);
av->top = remainder;
set_head (p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);
check_malloced_chunk (av, p, nb);
return chunk2mem (p);
}

/* catch all failure paths */
__set_errno (ENOMEM);
return 0;
}

0x05.new_heap:

定义于arena.c中,用以通过mmap系统调用分配内存,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
/* Create a new heap.  size is automatically rounded up to a multiple
of the page size. */

static heap_info *
internal_function
new_heap (size_t size, size_t top_pad)
{
size_t pagesize = GLRO (dl_pagesize);
char *p1, *p2;
unsigned long ul;
heap_info *h;

if (size + top_pad < HEAP_MIN_SIZE)
size = HEAP_MIN_SIZE;
else if (size + top_pad <= HEAP_MAX_SIZE)
size += top_pad;
else if (size > HEAP_MAX_SIZE)
return 0;
else
size = HEAP_MAX_SIZE;
size = ALIGN_UP (size, pagesize);

/* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed.
No swap space needs to be reserved for the following large
mapping (on Linux, this is the case for all non-writable mappings
anyway). */
p2 = MAP_FAILED;
if (aligned_heap_area)
{
p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,
MAP_NORESERVE);
aligned_heap_area = NULL;
if (p2 != MAP_FAILED && ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)))
{
__munmap (p2, HEAP_MAX_SIZE);
p2 = MAP_FAILED;
}
}
if (p2 == MAP_FAILED)
{
p1 = (char *) MMAP (0, HEAP_MAX_SIZE << 1, PROT_NONE, MAP_NORESERVE);
if (p1 != MAP_FAILED)
{
p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
& ~(HEAP_MAX_SIZE - 1));
ul = p2 - p1;
if (ul)
__munmap (p1, ul);
else
aligned_heap_area = p2 + HEAP_MAX_SIZE;
__munmap (p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
}
else
{
/* Try to take the chance that an allocation of only HEAP_MAX_SIZE
is already aligned. */
p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
if (p2 == MAP_FAILED)
return 0;

if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
{
__munmap (p2, HEAP_MAX_SIZE);
return 0;
}
}
}
if (__mprotect (p2, size, PROT_READ | PROT_WRITE) != 0)
{
__munmap (p2, HEAP_MAX_SIZE);
return 0;
}
h = (heap_info *) p2;
h->size = size;
h->mprotect_size = size;
LIBC_PROBE (memory_heap_new, 2, h, h->size);
return h;
}

【GLIBC.0x01】glibc2.23 malloc源码分析 - II:malloc
https://arttnba3.github.io/2021/01/15/GLIBC-0X01-MALLOC-2.23-PART-II/
作者
arttnba3
发布于
2021年1月15日
许可协议