Linux threads are processes

Distinction between processes and threads in Linux

After reading up on this answer and «Linux Kernel Development» by Robert Love and, subsequently, on the clone() system call, I discovered that processes and threads in Linux are (almost) indistinguishable to the kernel. There are a few tweaks between them (discussed as being «more sharing» or «less sharing» in the quoted SO question), but I do still have some questions yet to be answered. I recently worked on a program involving a couple of POSIX threads and decided to experiment on this premise. On a process that creates two threads, all threads of course get a unique value returned by pthread_self() , however, not by getpid() . A sample program I created follows:

#include #include #include #include #include void* threadMethod(void* arg) < int intArg = (int) *((int*) arg); int32_t pid = getpid(); uint64_t pti = pthread_self(); printf("[Thread %d] getpid() = %d\n", intArg, pid); printf("[Thread %d] pthread_self() = %lu\n", intArg, pti); >int main() < pthread_t threads[2]; int thread1 = 1; if ((pthread_create(&threads[0], NULL, threadMethod, (void*) &thread1)) != 0) < fprintf(stderr, "pthread_create: error\n"); exit(EXIT_FAILURE); >int thread2 = 2; if ((pthread_create(&threads[1], NULL, threadMethod, (void*) &thread2)) != 0) < fprintf(stderr, "pthread_create: error\n"); exit(EXIT_FAILURE); >int32_t pid = getpid(); uint64_t pti = pthread_self(); printf("[Process] getpid() = %d\n", pid); printf("[Process] pthread_self() = %lu\n", pti); if ((pthread_join(threads[0], NULL)) != 0) < fprintf(stderr, "Could not join thread 1\n"); exit(EXIT_FAILURE); >if ((pthread_join(threads[1], NULL)) != 0) < fprintf(stderr, "Could not join thread 2\n"); exit(EXIT_FAILURE); >return 0; > 

(This was compiled [ gcc -pthread -o thread_test thread_test.c ] on 64-bit Fedora; due to the 64-bit types used for pthread_t sourced from , the code will require minor changes to compile on 32-bit editions.) The output I get is as follows:

[bean@fedora ~]$ ./thread_test [Process] getpid() = 28549 [Process] pthread_self() = 140050170017568 [Thread 2] getpid() = 28549 [Thread 2] pthread_self() = 140050161620736 [Thread 1] getpid() = 28549 [Thread 1] pthread_self() = 140050170013440 [bean@fedora ~]$ 

By using scheduler locking in gdb , I can keep the program and its threads alive so I can capture what top says, which, just showing processes, is:

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28602 bean 20 0 15272 1112 820 R 0.4 0.0 0:00.63 top 2036 bean 20 0 108m 1868 1412 S 0.0 0.0 0:00.11 bash 28547 bean 20 0 231m 16m 7676 S 0.0 0.4 0:01.56 gdb 28549 bean 20 0 22688 340 248 t 0.0 0.0 0:00.26 thread_test 28561 bean 20 0 107m 1712 1356 S 0.0 0.0 0:00.07 bash 
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28617 bean 20 0 15272 1116 820 R 47.2 0.0 0:00.08 top 2036 bean 20 0 108m 1868 1412 S 0.0 0.0 0:00.11 bash 28547 bean 20 0 231m 16m 7676 S 0.0 0.4 0:01.56 gdb 28549 bean 20 0 22688 340 248 t 0.0 0.0 0:00.26 thread_test 28552 bean 20 0 22688 340 248 t 0.0 0.0 0:00.00 thread_test 28553 bean 20 0 22688 340 248 t 0.0 0.0 0:00.00 thread_test 28561 bean 20 0 107m 1860 1432 S 0.0 0.0 0:00.08 bash 

It seems to be quite clear that programs, or perhaps the kernel, have a distinct way of defining threads in contrast to processes. Each thread has its own PID according to top — why?

Читайте также:  Oracle linux mysql install

Источник

Are Linux kernel threads really kernel processes?

Is a kernel thread the same as a kernel process, since Linux processes support shared memory spaces between parent and child, or is it truly a different entity?

4 Answers 4

The documentation can be pretty confusing, so here is the «real» Linux model:

  • inside the Linux kernel, something that can be run (& scheduled) is called a «process»,
  • each process has a system-unique Process ID (PID), and a Thread Group ID (TGID),
  • a «normal» process has PID=TGID and no other process shares this TGID value,
  • a «threaded» process is a process which TGID value is shared by other processes,
  • several processes sharing the same TGID also share, at least, the same memory space and signal handlers (sometimes more),
  • if a «threaded» process has PID=TGID, it can be called «the main thread»,
  • calling getpid() from any process will return its TGID (= «main thread» PID),
  • calling gettid() from any process will return its PID (!),
  • any kind of process can be created with the clone(2) system call,
  • what is shared between processes is decided by passing specific flags to clone(2) ,
  • folders’ numeric names you can list with ls /proc as /proc/NUMBER are TGIDs,
  • folders’ numeric names in /proc/TGID/task as /proc/TGID/task/NUMBER are PIDs,
  • even though you don’t see every existing PIDs with ls /proc , you can still do cd /proc/any_PID .

Conclusion: from the kernel point of view, only processes exist, each having their own unique PID, and a so-called thread is just a different kind of process (sharing, at least, the same memory space and signal handlers with one or several other·s).

Читайте также:  Create fifo in linux

Note: the implementation of the «thread» concept in Linux has led to a vocabulary confusion, and if getpid() is lying to you does not do what you thought, it is because its behavior follows POSIX compatibility (threads are supposed to share a common PID).

Suggestion: using the word «task» may help referring to something runnable without getting into the process/thread confusion so much.

There is absolutely no difference between a thread and a process on Linux. If you look at clone(2) you will see a set of flags that determine what is shared, and what is not shared, between the threads.

Classic processes are just threads that share nothing; you can share what components you want under Linux.

This is not the case on other OS implementations, where there are much more substantial differences.

Threads are processes under Linux. They are created with the clone system call, which returns a process ID that can be sent a signal via the kill system call, just like a process. Thread processes are visible in ps output. The clone call is passed flags which determine how much of the parent process’s environment is shared with the thread process.

The man pthreads(7) says that for the current NPTL (Native POSIX Threads Library) implementation, «all of the threads in a process are placed in the same thread group; all members of a thread group share the same PID.» In the obsolete LinuxThreads implementation, each «thread» has its own PID.

@Totor is that true, though? On recent Linux systems I observe that each thread still has its own PID (viewable via htop, or /proc subsystem, for example) — as well as TGID, which is essentially the PID of the «main thread» in the process that spawned a given thread

Читайте также:  Посмотреть какие порты используются linux

Previous answers are excellent, pointing out that threads are processes inside the Linux kernel and that you can clone( ) any subset of the process state you like anyway.

But I think it’s helpful to remember that it matters how much context can be shared or must be saved uniquely, and how many cycles it may take for a context switch, which may depend on how much is likely to be different, not just as far as the OS is concerned, but also in the hardware, e.g., the TLB. So it matters what is cloned and what is shared.

At the application level, a new thread (as conventionally understood, sharing the memory image, current directory, open file handles, etc.) is always cheaper than a new process that at best only initially shares any of this. Even if the process is forked with copy-on-write, as soon as it writes, you do have to make the copy. This is why, in designing an application, it’s a lot more reasonable to create 10,000 threads than 10,000 processes. The reasons to do a new process are to run a different executable or to firewall for security reasons.

Источник

Оцените статью
Adblock
detector