Linux delete open file

(How) does deleting open files on Linux and a FAT file system work?

It’s clear to me how deleting open files works on filesystems that use inodes — unlink() just decreases the link count to zero, and when the last file handle to the file is closed, the inode will be removed. But how does it work when using a file system that doesn’t use inodes, like FAT32, with Linux? Some experiments suggest that deleting open files is still possible (unlike on Windows, where the unlink call wouldn’t succeed), but what happens when the file system is uncleanly unmounted? How does Linux mark the files as unlinked, when the file system itself doesn’t support such an operation? Is the directory entry just deleted, but retained in memory (that would guarantee deletion after unmounting in any case, but would leave the file system in an inconsistent state), or will the deletion only be marked in memory, and written at the time the last file handle is closed, avoiding possible corruption, but restoring the deleted files after an unclean unmount?

What makes you feel deleting the file and having the processes simply using its cached content in memory would corrupt the file system ?

@jlliagre: I believe lxgr meant to say, «Does Linux just delete (i.e., clear) the directory entry (on the disk), but leave all the file’s data blocks on the disk, allocated (i.e., not releasing them to the free list, or whatever FAT’s equivalent of the free list is), retaining in memory sufficient control information to allow the process(es) that have the file open to continue to access it, but flagging it (in memory) as a deleted file, so the blocks will be released when the last process closes the file?» Because that would leave allocated blocks with no way to find them after a hard crash.

@Scott, yes, that is exactly what I meant. It seems that Linux actually does that, because when I uncleanly unmount a FAT file system with opened but deleted files on it, there is always some corruption detected by fsck.vfat, and it always is about blocks not marked as free but also not part of any file — which would suggest that the directory entry is deleted, but the corresponding blocks in the FAT are not set to show up as free space until the last handle is closed.

1 Answer 1

You are correct in your assumption that while all directory entries are deleted immediately after calling unlink(), the actual blocks that physically make up the file are only cleared on disk when nothing is using the inode anymore. (I say «directory entries» because in vfat, a file can actually have several of those, because of how vfat’s long file name support is implemented.)

In this context, by inode, I mean the structure in memory that the Linux kernel uses for handling files. It is used even when the filesystem is not «inode based». In the case of vfat, the inode is simply backed by some blocks on disk.

Читайте также:  Размер файлов команда линукс

Taking a look at the Linux kernel source code, we see that vfat_unlink , which implements the unlink() system call for vfat, does roughly the following (extremely simplified for illustration):

static int vfat_unlink(struct inode *dir, struct dentry *dentry)
  1. fat_remove_entries simply removes the entry for the file in its directory.
  2. clear_nlink sets the link count for the inode to 0 , which means that no file (i.e. no directory entry) points to this inode anymore.

Note that at this point, neither the inode nor its physical representation are touched in any way (except for the decreased link count), so it still happily exists in memory and on disk, as if nothing happened!

(By the way, it’s also interesting to note that vfat_unlink always sets the link count to 0 instead of just decrementing it using drop_link . This should tip you off that FAT filesystems do not support hard links! And is further indication that FAT itself does not know of any separate inode concept.)

Now let’s take a look at what happens when the inode is evicted. evict_inode is called when we do not want the inode in memory anymore. At its earliest, this can of course only happen when no process holds any open file descriptor to that inode anymore (but may in theory also happen at a later time). The FAT implementation for evict_inode looks (again, simplified) like this:

static void fat_evict_inode(struct inode *inode) < truncate_inode_pages(&inode->i_data, 0); if (!inode->i_nlink) < inode->i_size = 0; fat_truncate_blocks(inode, 0); > invalidate_inode_buffers(inode); clear_inode(inode); > 

The magic happens exactly within the if -clause: if the inode’s link count was 0, it means that no directory entry is actually pointing to it. So we set its size to 0 and actually truncate it down to 0 bytes, which actually deletes it from disk by clearing up the blocks it was made of.

So, the corruption you are experiencing in your experiments is easily explained: Just as you suspected, the directory entry has already been removed (by vfat_unlink ), but because the inode wasn’t evicted yet, the actual blocks were still untouched, and were still marked in the FAT (an acronym for File Allocation Table) as used. fsck.vfat however detects that there is no directory entry which points to those blocks anymore, complains, and repairs it.

By the way, CHKDSK would not just clear those blocks by marking them as free, but create new files in the root directory pointing to the first block in each chain, with names like FILE0001.CHK .

Источник

What happens internally when deleting an opened file in linux

The Question is why the «deleted» file still accessible by the process which opened it? And how is that been done by the operating system?

EDIT By UFDT i mean the file descriptor table of the process which holds the file descriptors of the files which opened by the process(each process has its own UFDT) and the GFDT is the global file descriptor table, there is only one GFDT in the system(RAM in our case).

Читайте также:  Значки в линукс минт

2 Answers 2

I never really heard about those UFDT and GFDT acronyms, but your view of the system sounds mostly right. I think you lack some detail on your description of how open files are managed by the kernel, and perhaps this is where your confusion comes from. I’ll try to give a more detailed description.

First, there are three data structures used to keep track of and manage open files:

  • Each process has a table of file descriptors. Each entry in this table stores a file descriptor, and the file descriptor status flags (as of now, the only flag is O_CLOEXEC ). The file descriptor is just a pointer to an entry in the file table entry, which I cover next. The integer returned by open(2) and family is usually an index into this file descriptor table — each process has its table, that’s why open(2) and family may return the same value for different processes opening different files.
  • There is one opened files table in the entire system. Each file descriptor table entry of each process references one of these entries in the opened files table. There is one entry in this table for each opened file: if two processes open the same file, two entries in this global table are created, even though it’s the same file. Each entry in the files table stores the file status flags (opened for reading, writing, appending, etc), and the current file offset. This is why different processes can read from and write to different offsets in the same file concurrently as long as each of them opens the file.
  • Each entry in the file table entry also references an entry in the vnode table. The vnode table is a global table that has one entry for each unique file. If processes A, B, and C open file D, there will be only one vnode table entry, referenced by all 3 of the file table entries (in Linux, there is really no vnode, rather there is an inode, but let’s keep this description generic and conceptual). The vnode entry contains pretty much the same information as the traditional inode (file size, other attributes, etc.), but it also contains other information useful for opened files, such as file locks that are active, who owns them, which portions of the file they lock, etc. This vnode entry also stores pointers to the file’s data blocks on disk.

Deleting a file consists of calling unlink(2) . This function unlinks a file from a directory. Each file inode in disk has a count of the number of links pointing to it; the file is only really removed if the link count reaches 0 and it is not opened (or 2 in the case of directories, since a directory references itself and is also referenced by its parent). In fact, the manpage for unlink(2) is very specific about this behavior:

unlink — delete a name and possibly the file it refers to

So, instead of looking at unlinking as deleting a file, look at it as deleting a file name, and maybe the file it refers to.

Читайте также:  Linux find directory by name recursively

When unlink(2) detects that there is an active vnode table entry referring this file, it doesn’t delete the file from the filesystem. Nothing happens. Yes, you can’t find the file on your filesystem anymore. find(1) won’t find it. You can’t open it in new processes.

But the file is still there. It just doesn’t appear in any directory entry.

For example, if it’s a huge file, and if you run df or du , you will see that space usage is the same. The file is still there, on disk, you just can’t reach it.

So, any reads or writes take place as usual — the file data blocks are accessible through the vnode table entry. You can still know the file size. And the owner. And the permissions. All of it. Everything’s there.

When the process terminates or explicitly closes the file, the operating system checks the inode. If the number of links pointing to the inode is 0 and this was the last process that opened the file (which is also indicated by storing a link count in the vnode table entry), then the file is purged.

Источник

Удаление открытого файла

Кто-нибудь может знает, возможно ли удалить файл, в который идёт поток информации, т.е. файл открыт на запись другим преложением, а мне в какой-то определённый момент времени нужно его обнулить в Unix/Linux. провобoвал echo > file echo -n > file head -c 0 > file cat /dev/null > file раземр не сбрасывается!

4 ответа 4

Если файл не залочен (обычно это не делается), то его можно легко удалить. Однако, исчезновение файла из дерева файлов не будет означать его фактическое удаление. На самом деле, файл будет существовать до тех пор, пока приложение, выполняющее в него запись, не закроет его. В то же время, файл будет уже нельзя открыть. В таком промежуточном состоянии он будте продолжать занимать место на диске. Я не уверен, что это поведение воспроизводится на всех файловых системах, но под ext3 я точно наблюдал подобное.

PS: таким образом, ответ на ваш вопрос: всё-таки это плохая идея и стоит воздержаться от этого. Тем более, что это точно будет непереносимо на другие платформы (такие как Windows).

lsof на вход принимает имя файла, а раз файл удалён, то и файла нет. Следовательно, lsof скажет, что файл не найден. Но на самом деле, тело файла всё ещё существует, хотя ссылок на него из дерева файлов уже нет.

cy6ergn0m@cgmachine ~ $ cat - > ~/delete & [1] 22129 cy6ergn0m@cgmachine ~ $ [1] + suspended (tty input) cat - > ~/delete cy6ergn0m@cgmachine ~ $ /usr/sbin/lsof ~/delete COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME cat 21808 cy6ergn0m 1w REG 8,7 0 19128453 /home/cy6ergn0m/delete cy6ergn0m@cgmachine ~ $ rm -f ~/delete cy6ergn0m@cgmachine ~ $ /usr/sbin/lsof ~/delete lsof: status error on /home/cy6ergn0m/delete: No such file or directory lsof 4.83 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 

Источник

Оцените статью
Adblock
detector