Turn off buffering in pipe
The long_running_command prints progress but I’m unhappy with it. I’m using print_progress to make it nicer (namely, I print the progress in a single line). The problem: Connection a pipe to stdout also activates a 4K buffer, so the nice print program gets nothing . nothing . nothing . a whole lot . 🙂 How can I disable the 4K buffer for the long_running_command (no, I do not have the source)?
So when you run long_running_command without piping you can see the progress updates properly, but when piping they get buffered?
The inability for a simple way of controlling buffering has been a problem for decades. For example, see: marc.info/?l=glibc-bug&m=98313957306297&w=4 which basicly says «I can’t be arsed doing this and here’s some clap-trap to justify my position»
It is actually stdio not the pipe that causes a delay while waiting for enough data. Pipes do have a capacity, but as soon as there is any data written to the pipe, it is immediately ready to read at the other end.
15 Answers 15
Another way to skin this cat is to use the stdbuf program, which is part of the GNU Coreutils (FreeBSD also has its own one).
This turns off buffering completely for input, output and error. For some applications, line buffering may be more suitable for performance reasons:
Note that it only works for stdio buffering ( printf() , fputs() . ) for dynamically linked applications, and only if that application doesn’t otherwise adjust the buffering of its standard streams by itself, though that should cover most applications.
@qdii stdbuf does not work with tee , because tee overwrites the defaults set by stdbuf . See the manual page of stdbuf .
@lepe Bizarrely, unbuffer has dependencies on x11 and tcl/tk, meaning it actually needs >80 MB if you’re installing it on a server without them.
@qdii stdbuf uses LD_PRELOAD mechanism to insert its own dynamically loaded library libstdbuf.so . This means that it will not work with these kinds executables: with setuid or file capabilities set, statically linked, not using standard libc. In these cases it is better to use the solutions with unbuffer / script / socat . See also stdbuf with setuid/capabilities.
@jchook Yes, what was said in the accepted answer using unbuffer above also applies here: «for longer pipelines, you may have to unbuffer each command»
You can use the unbuffer command (which comes as part of the expect package), e.g.
unbuffer long_running_command | print_progress
unbuffer connects to long_running_command via a pseudoterminal (pty), which makes the system treat it as an interactive process, therefore not using the 4-kiB buffering in the pipeline that is the likely cause of the delay.
For longer pipelines, you may have to unbuffer each command (except the final one), e.g.
unbuffer x | unbuffer -p y | z
Note: On debian systems, this is called expect_unbuffer and is in the expect-dev package, not the expect package
@bdonlan: At least on Ubuntu (debian-based), expect-dev provides both unbuffer and expect_unbuffer (the former is a symlink to the latter). The links are available since expect 5.44.1.14-1 (2009).
unbuffer is in the main expect package on debian now (it’s still a symlink to expect_unbuffer , which is also in the main expect package)
Yet another way to turn on line-buffering output mode for the long_running_command is to use the script command that runs your long_running_command in a pseudo terminal (pty).
script -q /dev/null long_running_command | print_progress # (FreeBSD, Mac OS X) script -q -c "long_running_command" /dev/null | print_progress # (Linux)
+1 nice trick, since script is such an old command, it should be available on all Unix-like platforms.
It seems like script reads from stdin , which makes it impossible to run such a long_running_command in the background, at least when started from interactive terminal. To workaround, I was able to redirect stdin from /dev/null , since my long_running_command doesn’t use stdin .
One significant disadvantage: ctrl-z no longer works (i.e. I can’t suspend the script). This can be fixed by, for example: echo | sudo script -c /usr/local/bin/ec2-snapshot-all /dev/null | ts , if you don’t mind not being able to interact with the program.
Using script worked for me where stdbuf did not. Use script -e -c
For grep , sed and awk you can force output to be line buffered. You can use:
Force output to be line buffered. By default, output is line buffered when standard output is a terminal and block buffered other-wise.
Make output line buffered.
Using grep(etc.) like this won’t work. By the time you’ve executed long_running_command it’s too late. It’ll be buffered before it even gets to grep.
Should this work with grep —line-buffered pattern *many*many*files* | head ? It looks like grep processes all the files before feeding the output lines to head
If it is a problem with the libc modifying its buffering / flushing when output does not go to a terminal, you should try socat. You can create a bidirectional stream between almost any kind of I/O mechanism. One of those is a forked program speaking to a pseudo tty.
socat EXEC:long_running_command,pty,ctty STDIO
- create a pseudo tty
- fork long_running_command with the slave side of the pty as stdin/stdout
- establish a bidirectional stream between the master side of the pty and the second address (here it is STDIO)
If this gives you the same output as long_running_command , then you can continue with a pipe.
Edit : Wow Did not see the unbuffer answer ! Well, socat is a great tool anyway, so I might just leave this answer
long_running_command 1>&2 |& print_progress
The problem is that libc will line-buffer when stdout to screen and block-buffer when stdout to a file, but no-buffer for stderr.
I don’t think it’s the problem with pipe buffer, it’s all about libc’s buffer policy.
OK, what’s happening is that with both zsh (where |& comes from adapted from csh) and bash , when you do cmd1 >&2 |& cmd2 , both fd 1 and 2 are connected to the outer stdout. So it works at preventing buffering when that outer stdout is a terminal, but only because the output doesn’t go through the pipe (so print_progress prints nothing). So it’s the same as long_running_command & print_progress (except that print_progress stdin is a pipe that has no writer). You can verify with ls -l /proc/self/fd >&2 |& cat compared to ls -l /proc/self/fd |& cat .
That’s because |& is short for 2>&1 | , literally. So cmd1 |& cmd2 is cmd1 1>&2 2>&1 | cmd2 . So, both fd 1 and 2 end up connected to the original stderr, and nothing is left writing to the pipe. ( s/outer stdout/outer stderr/g in my previous comment).
It used to be the case, and probably still is the case, that when standard output is written to a terminal, it is line buffered by default — when a newline is written, the line is written to the terminal. When standard output is sent to a pipe, it is fully buffered — so the data is only sent to the next process in the pipeline when the standard I/O buffer is filled.
That’s the source of the trouble. I’m not sure whether there is much you can do to fix it without modifying the program writing into the pipe. You could use the setvbuf() function with the _IOLBF flag to unconditionally put stdout into line buffered mode. But I don’t see an easy way to enforce that on a program. Or the program can do fflush() at appropriate points (after each line of output), but the same comment applies.
I suppose that if you replaced the pipe with a pseudo-terminal, then the standard I/O library would think the output was a terminal (because it is a type of terminal) and would line buffer automatically. That is a complex way of dealing with things, though.
Is there a way to set up a Linux pipe to non-buffering or line-buffering?
My program is controlling an external application on Linux, passing in input commands via a pipe to the external applications stdin, and reading output result via a pipe from the external applications stdout. The problem is that writes to pipes are buffered by block, and not by line, and therefore delays occur before my app receives data output by the external application. The external application cannot be altered to add explicit fflush() calls. When I set the external application to /bin/cat -n (it echoes back the input, with line numbers added), it works correctly, it seems, cat flushes after each line. The only way to force the external application to flush, is sending exit command to it; as it receives the command, it flushes, and all the answers appears on the stdout, just before exiting. I’m pretty sure, that Unix pipes are appropiate solution for that kind of interprocess communication (pseudo server-client), but maybe I’m wrong. (I’ve just copied some text from a similar question: Force another program’s standard output to be unbuffered using Python)
4 Answers 4
Don’t use a pipe. Use a pty instead. Pty’s (pseudo-ttys) have the benefit of being line buffered if you want it, which provides you with simple framing for your data stream.
It is not the pty that decides about the buffering. It is the application (or library used by it) which contains the output buffer and changes the output buffering to line-buffering when the output goes to a (pseudo)terminal.