Files being used by a unix process
The fuser command lets me know which processes are using a file or directory. I’m looking for command that does the opposite: lets me know which files are being used by a process.
Update
6 Answers 6
lsof stands for “LiSt Open Files”. This shell command seems deceptively simple: It lists information about files opened by processes on a UNIX box.
Despite its (apparent) modest mission statement, lsof is actually one of the most powerful and useful UNIX commands. Its raw power comes from one of UNIX’s design principle often described as ”in UNIX everything is a file”. What this means is that the lsof concept of an open file not only covers regular files but also the following:
- Directories
- Streams or network files (for example, Internet or UNIX domain sockets and NFS files)
- Native libraries (for example, .soor .dylibdynamic libraries linked to a process)
- Block and character special files (for example, disk volume, external hard drive, console, or mouse)
- Pipes
Wait, I Cannot Find lsof on My System!
lsof is such a popular tool that it has been ported to pretty much all UNIX dialects (Linux, Mac OS X, BSD, Solaris, and so on). If it is unavailable on your box, use your usual package management system to install it. You can find lsof packages for Solaris on Sun Freeware.
While I wouldn’t begrudge anyone learning Dtrace or gaining experience installing software, in Solaris there is a command to see the files a process has open: /usr/bin/pfiles
% tail -f /etc/motd & [1] 6033 % pfiles 6033 6033: tail -f /etc/motd Current rlimit: 256 file descriptors 0: S_IFREG mode:0644 dev:182,65538 ino:163065 uid:0 gid:3 size:54 O_RDONLY|O_LARGEFILE /etc/motd 1: S_IFCHR mode:0620 dev:299,0 ino:718837882 uid:101 gid:7 rdev:24,3 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/3 2: S_IFCHR mode:0620 dev:299,0 ino:718837882 uid:101 gid:7 rdev:24,3 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/3
- you can use ls command and grep to find out the files used by chrome
$ ls -l /proc/*/fd | grep «chrome»
lrwx—— 1 ba abc 64 Jul 16 22:19 104 -> /home/abc/.config/google-chrome/Default/Cookies
lr-x—— 1 abc abc 64 Jul 16 22:19 113 -> /opt/google/chrome/nacl_irt_x86_64.nexe
lrwx—— 1 abc abc 64 Jul 16 22:19 121 -> /home/abc/.cache/google-chrome/Default/Cache/data_0
lrwx—— 1 abc abc 64 Jul 16 22:19 122 -> /home/abc/.cache/google-chrome/Default/Cache/data_1
lrwx—— 1 abc abc 64 Jul 16 22:19 123 -> /home/abc/.cache/google-chrome/Default/Cache/data_2
lr-x—— 1 abc abc 64 Jul 16 22:19 125 -> /home/abc/.config/google-chrome/Dictionaries/en-US-3-0.bdic
Another command to find out the result using lsof and grep
$ lsof | grep «chrome»
chrome 2204 abc cwd DIR 8,5 4096 1441794 /home/abc
chrome 2204 abc rtd DIR 8,5 4096 2 /
chrome 2204 abc txt REG 8,5 87345336 5111885 /opt/google/chrome/chrome
chrome 2204 abc mem REG 8,5 4202496 1443927 /home/abc/.cache/google-chrome/Default/Media Cache/data_3
chrome 2204 abc mem REG 8,5 1056768 1443926 /home/abc/.cache/google-chrome/Default/Media Cache/data_2
chrome 2204 abc mem REG 8,5 270336 1443925 /home/abc/.cache/google-chrome/Default/Media Cache/data_1
chrome 2204 abc mem REG 8,5 45056 1443924 /home/abc/.cache/google-chrome/Default/Media Cache/data_0
This is a classic application for dtrace.
I can’t remember the syntax exactly, but you can have a trace fire every time a file is opened by any process on the system. It can be done on a running system without anywhere near as much overhead as I expected it would have. If you’re running solaris as an administrator, dtrace is your best friend. Even if you’re not a programmer, it is quite simple to learn and a VERY powerful system query tool.
I can not test it: it does not seem installed on my Solaris servers. If you can post an example, that would help.
Under some unix systems, ( IE: Linux ), all files opened by a process have a FD id.
ls -la /proc/2055/fd total 0 dr-x------ 2 kent kent 0 Nov 19 21:44 . dr-xr-xr-x 7 kent kent 0 Nov 19 21:42 .. lr-x------ 1 kent kent 64 Nov 19 21:44 0 -> /dev/null l-wx------ 1 kent kent 64 Nov 19 21:44 1 -> /home/kent/.xsession-errors lrwx------ 1 kent kent 64 Nov 19 21:44 10 -> socket:[3977613] lrwx------ 1 kent kent 64 Nov 19 21:44 11 -> /home/kent/.googleearth/Cache/dbCache.dat lrwx------ 1 kent kent 64 Nov 19 21:44 12 -> /home/kent/.googleearth/Cache/dbCache.dat.index lrwx------ 1 kent kent 64 Nov 19 21:44 13 -> socket:[3978765] lrwx------ 1 kent kent 64 Nov 19 21:44 14 -> socket:[3978763] lrwx------ 1 kent kent 64 Nov 19 21:44 15 -> socket:[3978766] lrwx------ 1 kent kent 64 Nov 19 21:44 17 -> socket:[3978764] l-wx------ 1 kent kent 64 Nov 19 21:44 2 -> /home/kent/.xsession-errors lr-x------ 1 kent kent 64 Nov 19 21:44 3 -> pipe:[3977583] l-wx------ 1 kent kent 64 Nov 19 21:44 4 -> pipe:[3977583] lr-x------ 1 kent kent 64 Nov 19 21:44 5 -> pipe:[3977584] l-wx------ 1 kent kent 64 Nov 19 21:44 6 -> pipe:[3977584] lr-x------ 1 kent kent 64 Nov 19 21:44 7 -> pipe:[3977587] l-wx------ 1 kent kent 64 Nov 19 21:44 8 -> pipe:[3977587] lrwx------ 1 kent kent 64 Nov 19 21:44 9 -> socket:[3977588]
Additionally, sometimes you even get «FDINFO» ( I think this is a kernel flag on linux )
cat /proc/2055/fdinfo/11 pos: 232741818 flags: 02
Linux — How to track all files accessed by a process?
Is there a way to track all file I/O for a given process? All I really need is the locations of files being read from/written to from a given process (and ideally if it was a read or write operation although that’s not as important). I can run the process and track it rather than needing to attach to an existing process which I would assume is significantly simpler. Is there any kind of wrapper utility I can run a process though that will monitor file access?
4 Answers 4
lsof :
Try doing this as a starter :
this command will list all currently open files, fd, sockets for the process with the passed process ID.
For your special needs, see what I can offer as a solution to monitor a php script :
php foo.php & _pid=$! lsof -r1 -p $_pid kill %1 # if you want to kill php script
strace :
I recommend the use of strace . Unlike lsof , it stays running for as long as the process is running. It will print out which syscalls are being called when they are called. -e trace=file filters only for syscalls that access the filesystem:
sudo strace -f -t -e trace=file php foo.php
or for an already running process :
sudo strace -f -t -e trace=file -p
Thanks that’s a good starting point! It works for processes already running at the moment it’s run. I’m trying to do this for a PHP script for its entire execution, tracking the files from the start of the process until it exists. Looking at the help, There’s a -r repeat option but this seems to periodically scan the files that are open by the process rather than have been opened. Essentially I want to do this: lsof -p $$ && exec php foo.php This doesn’t seem to list files that are opened by foo.php
thanks, that’s certainly providing more relevant information and showing all the php extensions being loaded, the script contains unfortunately, file.txt is not listed in the output. I can verify the file is being opened by amending the script to print the contents of file.txt but I still don’t see file.txt in the output of lsof.?php>
To properly trace an AppImage, I needed to run strace as root but the command using my own user. This got the job done: sudo strace -fte trace=%file -u $(id -un)
Mixing your two solutions together becomes perfect: php foo.php & sudo strace -f -t -e trace=file -p $! especially for short running tasks.
Besides strace there is another option which does not substantially slow down the monitored process. Using the Liunx kernel’s fanotify (not to be confused with the more popular inotify) it is possible to monitor whole mount-points for IO-activity. With unshared mountnamespaces the mounts of a given process can be isolated fromt the rest of the system (a key technology behind docker).
An implementation of this concept can be found in shournal, which I am the author of.
$ shournal -e sh -c 'cat foo > bar' $ shournal --query --history 1 . 1 written file(s): /home/user/bar 1 read file(s): /home/user/foo
External links are always highly appreciated as sources, but imagine this one was to become invalid — your solution would be unsalvageable for future SO users. Please consider posting code here and explaining your solution so we all can learn.
@harmonica141: That’s always the problem: what to write and what to omit. A complete, minimal example would be not much shorter than the example at the bottom at man7.org/linux/man-pages/man7/fanotify.7.html . In fact, it could be almost the same with a leading unshare( CLONE_NEWNS); . Do you think it would be helpful to include the full source here?
strace is an amazing tool but its output is a bit verbose.
If you want you can use a tool I’ve written which processes strace output and provide a CSV report of all files accessed (TCP sockets too) with the following data:
1. Filename
2. Read/Written bytes
3. Number of read/write operations
4. Number of time the file was opened
It can be run on new processes or processes already running (using /proc/fd data).
I found it useful for debugging scenarios and performance analysis.
You can find it here: iotrace
Filename, Read bytes, Written bytes, Opened, Read op, Write op /dev/pts/1,1,526512,0,1,8904 socket_127.0.0.1:47948->127.0.0.1:22,1781764,396,0,8905,11 myfile.txt,65,0,9,10,0 pipe:[3339],0,0,0,1,0
Afterward, you can process the CSV data in Excel or other tools for sorting or other analysis required.
The downside is you need to download & compile and it isn’t always 100% accurate.
HOW TO : Find list of files used by a process in Linux
Quick howto on finding the list of files being accessed by a process in Linux. I needed to find this for troubleshooting an issue where a particular process was using an abnormally high percentage of CPU. I wanted to find out what this particular process was doing and accessing.
- Find the process ID (pid) of the process you want to analyze by running[code] ps -ef | grep NAME_OF_PROCESS [/code]
- Find the files the process is accessing at a given time by running[code]sudo ls -l /proc/PROCESS_ID/fd [/code]
For example, if I wanted to find the list of files being accessed by mysql, the process would look as such
[code] ps -ef | grep mysqld [/code]
which would show the output as
[code][email protected]:~$ ps -ef | grep mysqld
mysql 3304 1 0 Feb04 ? 00:00:23 /usr/sbin/mysqld
samurai 23389 23374 0 14:57 pts/0 00:00:00 grep –color=auto mysqld
[/code]
I can then find the list of files being used by mysql by running
[code] sudo ls -l /proc/3304/fd [/code]
lrwx—— 1 root root 64 Feb 7 15:00 0 -> /dev/null
lrwx—— 1 root root 64 Feb 7 15:00 1 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb 7 15:00 10 -> socket:[4958]
lrwx—— 1 root root 64 Feb 7 15:00 11 -> /tmp/ibdu9WRh (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 12 -> socket:[4959]
lrwx—— 1 root root 64 Feb 7 15:00 14 -> /var/lib/mysql/blog/wp_term_relatio nships.MYI
lrwx—— 1 root root 64 Feb 7 15:00 15 -> /var/lib/mysql/blog/wp_postmeta.MYI
lrwx—— 1 root root 64 Feb 7 15:00 17 -> /var/lib/mysql/blog/wp_term_relatio nships.MYD
lrwx—— 1 root root 64 Feb 7 15:00 18 -> /var/lib/mysql/blog/wp_term_taxonom y.MYI
lrwx—— 1 root root 64 Feb 7 15:00 2 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb 7 15:00 20 -> /var/lib/mysql/blog/wp_postmeta.MYD
lrwx—— 1 root root 64 Feb 7 15:00 21 -> /var/lib/mysql/blog/wp_term_taxonom y.MYD
lrwx—— 1 root root 64 Feb 7 15:00 22 -> /var/lib/mysql/blog/wp_terms.MYI
lrwx—— 1 root root 64 Feb 7 15:00 23 -> /var/lib/mysql/blog/wp_terms.MYD
lrwx—— 1 root root 64 Feb 7 15:00 3 -> /var/lib/mysql/ibdata1
lrwx—— 1 root root 64 Feb 7 15:00 4 -> /tmp/ibvANyz7 (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 5 -> /tmp/ibonS0mU (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 6 -> /tmp/ibcKctaH (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 7 -> /tmp/ibB5DS5t (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 8 -> /var/lib/mysql/ib_logfile0
lrwx—— 1 root root 64 Feb 7 15:00 9 -> /var/lib/mysql/ib_logfile1
[/code]