Linux async file io

Linux Asynchronous I/O

Asynchronous I/O (AIO) is a method for performing I/O operations so that the process that issued an I/O request is not blocked till the operation is complished. Instead, after an I/O request is submitted, the process continues to execute its code and can later check the status of the submitted request.

There are several means to accomplish asynchronous I/O in Linux:

  • kernel syscalls
  • user space library implementation and use system calls internally (libaio)
  • emulated AIO entirely in the user space without any kernel support (librt for now, part of libc)

Table of Contents

I/O Models

Mode Blocking Non-blocking
Synchronous read/write read/write
(O_NONBLOCK)
Asynchronous I/O multiplexing
(select/poll/epoll)
AIO

AIO System Calls

ABI Interface

AIO system call entry points are located in fs/aio.c file in the kernel’s source code. Types and constants exported to the user space reside in /usr/include/linux/aio_abi.h header file.

Linux kernel provides only 5 system calls for performing asynchronoes I/O.

#include int io_setup(unsigned nr_events, aio_context_t *ctxp); int io_destroy(aio_context_t ctx); int io_submit(aio_context_t ctx, long nr, struct iocb **iocbpp); int io_cancel(aio_context_t ctx, struct iocb *, struct io_event *result); int io_getevents(aio_context_t ctx, long min_nr, long nr, struct io_event *events, struct timespec *timeout); 

Every I/O request that is submitted to an AIO context is represented by an I/O control block structure — struct iocb

io_submit() takes AIO context ID, size of the array and the array itself as the arguments. Notice, that array should contain pointers to the iocb structures, not the structures themself.

io_submit() ’s return code can be one of the following values:

After iocb is submitted we can perform any other actions without waiting for I/O to complete. For every completed I/O request (successfully or unsuccessfully) kernel creates an io_event structure. To obtain the list of io_events (and consequently all completed iocbs) io_getevent() system call should be used. When calling io_getevents() , one needs to specify:

  1. which AIO context to get events from (ctx variable)
  2. a buffer where the kernel should load events to (events varaiable)
  3. minimal number of events one wants to get. If less then this number of iocbs are currently completed, io_getevents() will block till enough events appear. See point e) for more details on how to control blocking time.
  4. maximum number of events one wants to get. This usually is the size of the events buffer (second 1 in our program)
  5. If not enough events are available, we don’t want to wait forever. One can specify a relative deadline as the last argument. NULL in this case means to wait infinitely. If one wants io_getevents() not to block at all then timespec timeout structure need to be initialzed to zero seconds and zero nanoseconds.
Читайте также:  Консольные команды arch linux

The return code of io_getevents can be:

struct io_event

/* read() from /dev/aio returns these structures. */ struct io_event  __u64 data; /* the data field from the iocb */ __u64 obj; /* what iocb this event came from */ __s64 res; /* result code for this event */ __s64 res2; /* secondary result */ >; 

struct iocb

/* * we always use a 64bit off_t when communicating * with userland. its up to libraries to do the * proper padding and aio_error abstraction */ struct iocb  /* these are internal to the kernel/libc. */ __u64 aio_data; /* data to be returned in event's data */ __u32 PADDED(aio_key, aio_reserved1); /* the kernel sets aio_key to the req # */ /* common fields */ __u16 aio_lio_opcode; /* see IOCB_CMD_ above */ __s16 aio_reqprio; __u32 aio_fildes; __u64 aio_buf; __u64 aio_nbytes; __s64 aio_offset; /* extra parameters */ __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */ /* flags for the "struct iocb" */ __u32 aio_flags; /* * if the IOCB_FLAG_RESFD flag of "aio_flags" is set, this is an * eventfd to signal AIO readiness to */ __u32 aio_resfd; >; /* 64 bytes */ 

AIO Command

# /usr/include/linux/aio_abi.h enum  IOCB_CMD_PREAD = 0, IOCB_CMD_PWRITE = 1, IOCB_CMD_FSYNC = 2, IOCB_CMD_FDSYNC = 3, /* These two are experimental. * IOCB_CMD_PREADX = 4, * IOCB_CMD_POLL = 5, */ IOCB_CMD_NOOP = 6, IOCB_CMD_PREADV = 7, IOCB_CMD_PWRITEV = 8, >; 
  • IOCB_CMD_PREAD positioned read; corresponds to pread() system call.
  • IOCB_CMD_PWRITE positioned write; corresponds to pwrite() system call.
  • IOCB_CMD_FSYNC sync file’s data and metadata with disk; corresponds to fsync() system call.
  • IOCB_CMD_FDSYNC sync file’s data and metadata with disk, but only metadata needed to access modified file data is written; corresponds to fdatasync() system call.
  • IOCB_CMD_PREADV vectored positioned read, sometimes called “scattered input”; corresponds to preadv() system call.
  • IOCB_CMD_PWRITEV vectored positioned write, sometimes called “gathered output”; corresponds to pwritev() system call.
  • IOCB_CMD_NOOP defined in the header file, but is not used anywhere else in the kernel.

The semantics of other fields in the iocb structure depends on the command specified.

AIO Context

AIO context is a set of data structures that the kernel supports to perform AIO.

Every process can have multiple AIO contextes and as such one needs an identificator for every AIO context in a process.

A pointer to ctx variable is passed to io_setup() as a second argument and kernel fills this variable with a context identifier. Interestingly, aio_context_t is actually just an unsigned long defined in the kernel ( linux/aio_abi.h ) like that:

typedef unsigned long aio_context_t; 

The first argument of io_setup() function is the maximum number of requests that can simultaneously reside in the context.

syscall()

#define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include /* For SYS_xxx definitions */ int syscall(int number, . ); 

syscall() is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments. Employing syscall() is useful, for example, when invoking a system call that has no wrapper function in the C library.

syscall() saves CPU registers before making the system call, restores the registers upon return from the system call, and stores any error code returned by the system call in errno(3) if an error occurs.

Symbolic constants for system call numbers can be found in the header file .

Example

#include #include #include #include #include #include #include  inline int io_setup(unsigned nr, aio_context_t *ctxp)  return syscall(__NR_io_setup, nr, ctxp); > inline int io_destroy(aio_context_t ctx)  return syscall(__NR_io_destroy, ctx); > inline int io_submit(aio_context_t ctx, long nr, struct iocb **iocbpp)  return syscall(__NR_io_submit, ctx, nr, iocbpp); > inline int io_getevents(aio_context_t ctx, long min_nr, long max_nr, struct io_event *events, struct timespec *timeout)  return syscall(__NR_io_getevents, ctx, min_nr, max_nr, events, timeout); > int main(int argc, char *argv[])  aio_context_t ctx; struct iocb cb; struct iocb *cbs[1]; char data[4096]; struct io_event events[1]; int ret; int fd; fd = open("/tmp/test", O_RDWR | O_CREAT); if (fd  0)  perror("open"); return -1; > ctx = 0; ret = io_setup(128, &ctx); if (ret  0)  perror("io_setup"); return -1; > /* setup I/O control block */ memset(&cb, 0, sizeof(cb)); cb.aio_fildes = fd; cb.aio_lio_opcode = IOCB_CMD_PWRITE; /* command-specific options */ int i; for (i = 0; i  4096; ++i) data[i] = 'A'; cb.aio_buf = (uint64_t)data; cb.aio_offset = 0; cb.aio_nbytes = 4096; cbs[0] = &cb; ret = io_submit(ctx, 1, cbs); if (ret != 1)  if (ret  0) perror("io_submit"); else fprintf(stderr, "io_submit failed\n"); return -1; > /* get reply */ ret = io_getevents(ctx, 1, 1, events, NULL); printf("events: %d\n", ret); ret = io_destroy(ctx); if (ret  0)  perror("io_destroy"); return -1; > return 0; > 

System Tuning

/proc/sys/fs/aio-max-nr /proc/sys/fs/aio-nr 

libaio

Install

[oxnz@localhost aio]$ sudo yum install libaio-devel [oxnz@localhost aio]$ rpm -ql libaio /lib64/libaio.so.1 /lib64/libaio.so.1.0.0 /lib64/libaio.so.1.0.1 /usr/share/doc/libaio-0.3.109 /usr/share/doc/libaio-0.3.109/COPYING /usr/share/doc/libaio-0.3.109/TODO [oxnz@localhost aio]$ rpm -ql libaio-devel /usr/include/libaio.h /usr/lib64/libaio.so 

Syscall Wrappers

/* /usr/include/libaio.h */ /* Actual syscalls */ extern int io_setup(int maxevents, io_context_t *ctxp); extern int io_destroy(io_context_t ctx); extern int io_submit(io_context_t ctx, long nr, struct iocb *ios[]); extern int io_cancel(io_context_t ctx, struct iocb *iocb, struct io_event *evt); extern int io_getevents(io_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout); 

Helper Functions

static inline void io_prep_pread(struct iocb *iocb, int fd, void *buf, size_t count, long long offset) static inline void io_prep_pwrite(struct iocb *iocb, int fd, void *buf, size_t count, long long offset) static inline void io_prep_preadv(struct iocb *iocb, int fd, const struct iovec *iov, int iovcnt, long long offset) static inline void io_prep_pwritev(struct iocb *iocb, int fd, const struct iovec *iov, int iovcnt, long long offset) static inline void io_prep_poll(struct iocb *iocb, int fd, int events) static inline void io_prep_fsync(struct iocb *iocb, int fd) static inline void io_prep_fdsync(struct iocb *iocb, int fd) static inline int io_poll(io_context_t ctx, struct iocb *iocb, io_callback_t cb, int fd, int events) static inline int io_fsync(io_context_t ctx, struct iocb *iocb, io_callback_t cb, int fd) static inline int io_fdsync(io_context_t ctx, struct iocb *iocb, io_callback_t cb, int fd) static inline void io_set_eventfd(struct iocb *iocb, int eventfd); 

struct iocb

struct io_iocb_poll  PADDED(int events, __pad1); >; /* result code is the set of result flags or -'ve errno */ struct io_iocb_sockaddr  struct sockaddr *addr; int len; >; /* result code is the length of the sockaddr, or -'ve errno */ struct io_iocb_common  PADDEDptr(void *buf, __pad1); PADDEDul(nbytes, __pad2); long long offset; long long __pad3; unsigned flags; unsigned resfd; >; /* result code is the amount read or -'ve errno */ struct io_iocb_vector  const struct iovec *vec; int nr; long long offset; >; /* result code is the amount read or -'ve errno */ struct iocb  PADDEDptr(void *data, __pad1); /* Return in the io completion event */ PADDED(unsigned key, __pad2); /* For use in identifying io requests */ short aio_lio_opcode; short aio_reqprio; int aio_fildes; union  struct io_iocb_common c; struct io_iocb_vector v; struct io_iocb_poll poll; struct io_iocb_sockaddr saddr; > u; >; struct io_event  PADDEDptr(void *data, __pad1); PADDEDptr(struct iocb *obj, __pad2); PADDEDul(res, __pad3); PADDEDul(res2, __pad4); >; 

Example

#include #include #include #include #include #include #include #include int main()  io_context_t ctx; struct iocb iocb; struct iocb * iocbs[1]; struct io_event events[1]; struct timespec timeout; int fd; fd = open("/tmp/test", O_WRONLY | O_CREAT); if (fd  0) err(1, "open"); memset(&ctx, 0, sizeof(ctx)); if (io_setup(10, &ctx) != 0) err(1, "io_setup"); const char *msg = "hello"; io_prep_pwrite(&iocb, fd, (void *)msg, strlen(msg), 0); iocb.data = (void *)msg; iocbs[0] = &iocb; if (io_submit(ctx, 1, iocbs) != 1)  io_destroy(ctx); err(1, "io_submit"); > while (1)  timeout.tv_sec = 0; timeout.tv_nsec = 500000000; if (io_getevents(ctx, 0, 1, events, &timeout) == 1)  close(fd); break; > printf("not done yet\n"); sleep(1); > io_destroy(ctx); return 0; > 

Источник

Оцените статью
Adblock
detector