How to set pthread max stack size
The API pthread_attr_setstacksize(pthread_attr_t *attr, size_t stacksize)
is to set the minimum stack size (in bytes) allocated for the created thread stack. But how to set the maximum stack size? Thanks
I just want to put in a comment about stack size: If you try to lookup «how much memory each thread eats up» and it ends up looking like your app takes some base memory + about 8MB per thread, you can probably relax. The memory is not actually used/locked until you actually fill it up. It’s often a bad idea to limit the stack size and there’s no need for it in most cases either, except for embedded systems ofc with very limited resources.
3 Answers 3
If you manage the allocation of memory for the stack yourself using pthread_attr_setstack you can set the stack size exactly. So in that case the min is the same as the max. For example the code below illustrate a cases where the program tries to access more memory than is allocated for the stack and as a result the program segfaults.
#include #define PAGE_SIZE 4096 #define STK_SIZE (10 * PAGE_SIZE) void *stack; pthread_t thread; pthread_attr_t attr; void *dowork(void *arg) < int data[2*STK_SIZE]; int i = 0; for(i = 0; i < 2*STK_SIZE; i++) < data[i] = i; >> int main(int argc, char **argv) < //pthread_attr_t *attr_ptr = &attr; posix_memalign(&stack,PAGE_SIZE,STK_SIZE); pthread_attr_init(&attr); pthread_attr_setstack(&attr,&stack,STK_SIZE); pthread_create(&thread,&attr,dowork,NULL); pthread_exit(0); >
If you rely on memory that is automatically allocated then you can specify a minimum amount but not a maximum. However, if the stack use of your thread exceeds the amount you specified then your program may segfault.
Note that the man page for pthread_attr_setstacksize says:
A thread’s stack size is fixed at the time of thread creation. Only the main thread can dynamically grow its stack.
To see an example of this program try taking a look at this link
You can experiment with the code segment that they provide and see that if you do not allocate enough stack space that it is possible to have your program segfault.
The default stack size per thread is usually very large on most implementations of pthreads(). If you are running into issues where you need it larger than default though, you can try to increase it using pthread_attr_stacksize(), checking the return value as you do, and trying to find a larger value. The actual limit will probably be based upon per process address space limits on your specific system.
If you just want to find out what the default stack size is, you can use pthread_attr_getstacksize(). It is not going to grow larger than that, unless you tell it to by adjusting it yourself before you create the thread(s).
Some code for adjusting stack sizes with pthreads is shown here https://stackoverflow.com/a/15356607/2159730
Note: the example in the link is for trying to minimize stack usage, and finding a minimum (which is dangerous, without a lot of testing). Sort of the opposite of what you seem to want to do, but the example may still be helpful to you. Just approach it differently.
In general, you should probably not need to raise the pthread stack size above the default. If you find you do, you probably should try understand why that is happening. It’s fairly unusual.
If you could explain what’s actually happening in your own code, instead of a general question, other solutions might present themselves. I’m wondering if you are actually trying to reduce the per thread stack size, and just not explaining yourself clearly. If that’s the case, the link might be helpful for that reason as well.
Setting the default stack size on Linux globally for the program
So I’ve noticed that the default stack size for threads on linux is 8MB (if I’m wrong, PLEASE correct me), and, incidentally, 1MB on Windows. This is quite bad for my application, as on a 4-core processor that means 64 MB is space is used JUST for threads! The worst part is, I’m never using more than 100kb of stack per thread (I abuse the heap a LOT ;)). My solution right now is to limit the stack size of threads. However, I have no idea how to do this portably. Just for context, I’m using Boost.Thread for my threading needs. I’m okay with a little bit of #ifdef hell, but I’d like to know how to do it easily first. Basically, I want something like this (where windows_* is linked on windows builds, and posix_* is linked under linux builds)
// windows_stack_limiter.c int limit_stack_size() < // Windows impl. return 0; >// posix_stack_limiter.c int limit_stack_size() < // Linux impl. return 0; >// stack_limiter.cpp int limit_stack_size(); static volatile int placeholder = limit_stack_size();
How do I flesh out those functions? Or, alternatively, am I just doing this entirely wrong? Remember I have no control over the actual thread creation (no new params to CreateThread on Windows), as I’m using Boost.Thread.
It’s not as bad as you think. only address space is reserved, not actual RAM. So e.g. if you spawn 8 threads, and each thread actually uses 100kb of stack space, then you’ve only used up 800kb of RAM, not 8MB.
Default stack size for pthreads
As I understand, the default stack size for a pthread on Linux is 16K. I am getting strange results on my 64-bit Ubuntu install.
pthread_attr_init(&attr); pthread_attr_getstacksize(&attr, &stacksize); printf("Thread stack size = %d bytes \n", stacksize); Prints Thread stack size = 8388608 bytes
You’re thinking of 16k per thread kernel stacks. Totally separate issue from user-space stack memory. kernel stacks are tiny because they can’t be paged, or be lazy-allocated, and have to be contiguous pages in physical memory. elinux.org/Kernel_Small_Stacks. Having an extremely high number of total threads can be a problem for i386, where address-space is limited, especially with 8k stacks by default for 32-bit.
2 Answers 2
Actually, your virtual stack size is 8388608 bytes (8 MB). Of course, it’s natural to conclude that this can’t be right, because that’s a ridiculously large amount of memory for every thread to consume for its stack when 99% of the time a couple of KB is probably all they need.
The good news is that your thread only uses the amount of physical memory that it actually needs. This is one of the magical powers that your OS gets from using the hardware Memory Management Unit (MMU) in your processor. Here’s what happens:
- The OS allocates 8 MB of virtual memory for your stack by setting up the MMU’s page tables for your thread. This requires very little RAM to hold the page table entries only.
- When your thread runs and tries to access a virtual address on the stack that doesn’t have a physical page assigned to it yet, a hardware exception called a «page fault» is triggered by the MMU.
- The CPU core responds to the page fault exception by switching to a privileged execution mode (which has its own stack) and calling the page fault exception handler function inside the kernel.
- The kernel allocates a page of physical RAM to that virtual memory page and returns back to the user space thread.
The user space thread sees none of that work. From its point of view, it just uses the stack as if the memory was there all along. Meanwhile, the stack automatically grows (or doesn’t) to meet the thread’s needs.
The MMU is a key part of the hardware of today’s computer systems. In particular, it’s responsible for a lot of the «magic» in the system, so I highly recommend learning more about what the MMU does, and about virtual memory in general. Also, if your application is performance sensitive and deals with a significant amount of data, you should understand how the TLB (the MMU’s page table cache) works and how you can restructure your data or your algorithms to maximize your TLB hit rate.
Size of a process/thread in Linux
What is the size of a process/thread in Linux? When a process/thread is created, along with task_struct and other data structure inside it, is there anything else? Is the stack of a process/thread allocated upon process/thread initialization (fixed size)? Or is it allocated when necessary (like virtual memory)? How can I know what size a standard process/thread when it is created in memory?
2 Answers 2
When a large block of memory (> pagesize = 4096 bytes) is first allocated on Linux it uses special «null» memory pages in the pagetable that aren’t backed by anything, so when a thread is started it will allocate ~1 MB of these zero pages for a thread stack. As the stack grows the pages are then converted into real memory backed pages. Because of this «null» page backing it is generally okay to have liberally large stacks.
Threads and processes are both created with the same underlying syscall called clone(2). It has lots of options and does lots of stuff. see man clone for a detailed explanation.
Large blocks of memory are allocated with an anonymous mmap(2) call.
You may also be interested in doing a web search for «linux overcommit bit»
(If you want to refine your question, I can be more specific.)
Thanks. So, each thread is reserved with 1MB for its stack size. However, you said that the memory of a thread is only allocated when it is really needed, which means the physical memory doesn’t lose another physical 1MB until the thread writes something to memory, does it? If this is the case, consider that my kernel is 50 MB (for example) and my memory is 70MB, can I still allocate more than 20 threads? Or the kernel actually reserves 1MB in physical memory?
Also, when I type uname -a , my stack size limit is 8192 kb. Is this the upper limit per process/thread?
The stack is initially formed with null pages and not using physical memory. Reads will return 0, writes will cause a page fault causing them to become backed with real memory. Say you push the first item on the stack, only the first page will be backed using a total of 4096 bytes of physical memory. Once you push more than 4096 bytes, page 2 will be backed by physical memory — and so on. So you can see the stack only uses approximately the amount of physical memory it needs (within a granularity of 4096 bytes).
kthreads are for use inside the kernal (by device drivers for example) and are not accessible from userland processes. I don’t know much about them sorry. I don’t know if the clone syscall calls start_kthread to make a new process or thread, or if kthread is a seperate kernel facility.
What Andrew said it true, but it doesn’t mean your thread/process doesn’t «use memory» from the moment it’s created. The space reserved for stacks always consumes virtual address space in your process, which means with large thread stacks you’ll quickly run out of addresses on 32-bit machines (just about 300 threads with default thread-stack-size on glibc will exhaust virtual address space). Also, stacks contribute to commit charge, which determines the total amount of memory that can be allocated when overcommit is disabled.
Linux by default pre-commits 128k for the main thread’s stack, and allows more to be obtained automatically if commit charge has not been exhausted. Thread stacks are allocated entirely by userspace (glibc/NPTL, on most Linux systems) and cannot grow beyond their initial size. Depending on the version and system settings, glibc/NPTL usually defaults to allocating somewhere between 2 MB and 10 MB per thread.
Thanks. What’s about the stack size in kbytes shown by uname -a ? In the end, the size of a thread is the size of thread_info struct + kernel stack (8KB) + thread stack (user stack, 2MB~10MB) ?
This is a good point about 32-bit virtual address space running out. I live in 64 bit land where this is not an issue.