How to find out which files the program has accessed?
(Ubuntu linux) I started a program and how can I find out which file and IO this program has accessed? I know there are some software that allows one to get this information easily in windows.
2 Answers 2
You could be asking one of two questions:
Q: How do I find out which files a currently running program has open?
Q: I want to know which files a program opens, when it runs.
A: Use the strace tool. Read its man page and other documentation, for more information.
P.S. Using strace will slow down the program’s execution, noticable.
lsof is great for finding all open files of a program by its PID. Just run lsof -a -p . It can also show open sockets. A manpage for it is at http://linux.die.net/man/8/lsof. You may have to run lsof as root unless it has setuid to root permissions and is executable by other.
For IO utilization try iotop. iotop requires python and more info about it is at http://guichaz.free.fr/iotop/. You should be able to install both tools using apt-get.
Neither of these tools makes it easy to find files a program may have accessed right after starting and then quickly closed. This happens for many applications that read a configuration file after starting, for example. To find these sorts of files, I wrote a utility that recursively finds all files in a directory which have been created, accessed or modified within a given last number of seconds. If you have no idea where the program may have accessed files, then it can search the entire system from /.If you really want to localize the files it may have accessed, then it could be run chroot to confine it. Below is the utility, which I call tfind and is implemented in perl. When run with no arguments it prints its usage. A way to use it is to start your program, some seconds later run «tfind —atime 15 —ctime 15 —mtime 15 /» to see all the files that have been accessed, created or modified in the past 15 seconds and then eliminate those files that are known to be accessed by other programs, for example /var/log/messages. The remaining files are probably those accessed by your program.
#!/usr/bin/perl # tfind - find files less than n seconds old # usage: tfind [ --atime i --ctime j --mtime k ] path use strict; use warnings; use File::Find; use vars qw/*name/; *name = *File::Find::name; use Getopt::Long; my $prog = $0; $prog =~ s,(?:[^\0]*/)*([^\0]+),$1,; my $usage = " $prog finds files in path that have been modified, changed or accessed within a given number of seconds. The bundled find utilities only detect time changes to the nearest day. However, it can be useful for monitoring to find files modified, changed or accessed within shorter time periods. Usage: $prog [ --help --atime i --ctime j --mtime k ] path Options --atime i true if file was accessed within the last i seconds, where i must be a postive integer or 0 --ctime j true if the files status was changed within the last j seconds, where j must be a positve integer or 0 --mtime k true if the files data was modified within the last k seconds, where k must be a positive integer or 0 --help shows this help screen Examples $prog --atime 2 dir prints names of files in dir accessed within the last 2 seconds $prog --ctime 600 dir prints names of files in dir with status changes in the last 10 minutes $prog --mtime 3600 dir prints names of files in dir modified within the last hour $prog --atime 2 --ctime 600 --mtime 3600 dir prints names of files in dir meeting all three conditions "; my $opt_help = ''; my $opt_atime = ''; my $opt_mtime = ''; my $opt_ctime = ''; GetOptions ( "help" => \$opt_help, "atime=s" => \$opt_atime, "ctime=s" => \$opt_ctime, "mtime=s" => \$opt_mtime ); if ($opt_help) < print "$usage\n"; exit; >unless (@ARGV == 1) < print "$usage\n"; exit; >my $path = shift; my $atime_mark = 0; my $ctime_mark = 0; my $mtime_mark = 0; ($opt_atime,$atime_mark) = testoption("atime",$opt_atime); ($opt_ctime,$ctime_mark) = testoption("ctime",$opt_ctime); ($opt_mtime,$mtime_mark) = testoption("mtime",$opt_mtime); my $findstr; if (!$opt_atime && !$opt_ctime && !$opt_mtime) < exit; >else < $findstr = ' find(\&wanted, $path); sub wanted < my $start = time; my ($fsdev,$inode,$mode,$nlink,$uid,$gid,$devid,$size, $atime,$mtime,$ctime,$blksize,$blocks) = lstat $_;'; >if (!$opt_atime && !$opt_ctime && $opt_mtime) < $findstr .= ' if (($start - $mtime) < $mtime_mark) < print "$name\n"; >> '; > elsif ($opt_atime && !$opt_ctime && !$opt_mtime) < $findstr .= ' if (($start - $atime) < $atime_mark) < print "$name\n"; >> '; > elsif (!$opt_atime && $opt_ctime && !$opt_mtime) < $findstr .= ' if (($start - $ctime) < $ctime_mark) < print "$name\n"; >> '; > elsif ($opt_atime && !$opt_ctime && $opt_mtime) < $findstr .= ' if ((($start - $atime) < $atime_mark) && (($start - $mtime) < $mtime_mark)) < print "$name\n"; >> '; > elsif (!$opt_atime && $opt_ctime && $opt_mtime) < $findstr .= ' if ((($start - $ctime) < $ctime_mark) && (($start - $mtime) < $mtime_mark)) < print "$name\n"; >> '; > elsif ($opt_atime && $opt_ctime && !$opt_mtime) < $findstr .= ' if ((($start - $atime) < $atime_mark) && (($start - $ctime) < $ctime_mark)) < print "$name\n"; >> '; > elsif ($opt_atime && $opt_ctime && $opt_mtime) < $findstr .= ' if ((($start - $atime) < $atime_mark) && (($start - $ctime) < $ctime_mark) && (($start - $mtime) < $mtime_mark)) < print "$name\n"; >> '; > else < print "$prog: logical error in options values: opt_atime = $opt_atime opt_ctime = $opt_ctime opt_mtime = $opt_mtime\n"; exit 2; >eval $findstr; sub testoption < my $opt = $_[0]; my $optarg = $_[1]; my @out; if ($optarg || ($optarg =~ /^\s*[+-]?0\s*$/)) < $optarg = trim($optarg); if (($optarg =~ /^[+-]?\d+$/) && ($optarg >= 0)) < $out[0] = 1; $out[1] = $optarg; return @out; >else < print "$opt argument \"$optarg\" is not a positive integer or 0.\n"; exit; >> else < $out[0] = 0; $out[1] = 0; return @out; >> sub trim < my @out = @_; for (@out) < s/^\s+//; s/\s+$//; >return wantarray ? @out : $out[0]; >
Linux — How to track all files accessed by a process?
Is there a way to track all file I/O for a given process? All I really need is the locations of files being read from/written to from a given process (and ideally if it was a read or write operation although that’s not as important). I can run the process and track it rather than needing to attach to an existing process which I would assume is significantly simpler. Is there any kind of wrapper utility I can run a process though that will monitor file access?
4 Answers 4
lsof :
Try doing this as a starter :
this command will list all currently open files, fd, sockets for the process with the passed process ID.
For your special needs, see what I can offer as a solution to monitor a php script :
php foo.php & _pid=$! lsof -r1 -p $_pid kill %1 # if you want to kill php script
strace :
I recommend the use of strace . Unlike lsof , it stays running for as long as the process is running. It will print out which syscalls are being called when they are called. -e trace=file filters only for syscalls that access the filesystem:
sudo strace -f -t -e trace=file php foo.php
or for an already running process :
sudo strace -f -t -e trace=file -p
Thanks that’s a good starting point! It works for processes already running at the moment it’s run. I’m trying to do this for a PHP script for its entire execution, tracking the files from the start of the process until it exists. Looking at the help, There’s a -r repeat option but this seems to periodically scan the files that are open by the process rather than have been opened. Essentially I want to do this: lsof -p $$ && exec php foo.php This doesn’t seem to list files that are opened by foo.php
thanks, that’s certainly providing more relevant information and showing all the php extensions being loaded, the script contains unfortunately, file.txt is not listed in the output. I can verify the file is being opened by amending the script to print the contents of file.txt but I still don’t see file.txt in the output of lsof.?php>
To properly trace an AppImage, I needed to run strace as root but the command using my own user. This got the job done: sudo strace -fte trace=%file -u $(id -un)
Mixing your two solutions together becomes perfect: php foo.php & sudo strace -f -t -e trace=file -p $! especially for short running tasks.
Besides strace there is another option which does not substantially slow down the monitored process. Using the Liunx kernel’s fanotify (not to be confused with the more popular inotify) it is possible to monitor whole mount-points for IO-activity. With unshared mountnamespaces the mounts of a given process can be isolated fromt the rest of the system (a key technology behind docker).
An implementation of this concept can be found in shournal, which I am the author of.
$ shournal -e sh -c 'cat foo > bar' $ shournal --query --history 1 . 1 written file(s): /home/user/bar 1 read file(s): /home/user/foo
External links are always highly appreciated as sources, but imagine this one was to become invalid — your solution would be unsalvageable for future SO users. Please consider posting code here and explaining your solution so we all can learn.
@harmonica141: That’s always the problem: what to write and what to omit. A complete, minimal example would be not much shorter than the example at the bottom at man7.org/linux/man-pages/man7/fanotify.7.html . In fact, it could be almost the same with a leading unshare( CLONE_NEWNS); . Do you think it would be helpful to include the full source here?
strace is an amazing tool but its output is a bit verbose.
If you want you can use a tool I’ve written which processes strace output and provide a CSV report of all files accessed (TCP sockets too) with the following data:
1. Filename
2. Read/Written bytes
3. Number of read/write operations
4. Number of time the file was opened
It can be run on new processes or processes already running (using /proc/fd data).
I found it useful for debugging scenarios and performance analysis.
You can find it here: iotrace
Filename, Read bytes, Written bytes, Opened, Read op, Write op /dev/pts/1,1,526512,0,1,8904 socket_127.0.0.1:47948->127.0.0.1:22,1781764,396,0,8905,11 myfile.txt,65,0,9,10,0 pipe:[3339],0,0,0,1,0
Afterward, you can process the CSV data in Excel or other tools for sorting or other analysis required.
The downside is you need to download & compile and it isn’t always 100% accurate.