Batch processing in linux

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Batch processing toolset for Linux / Unix

License

portnov/batchd

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

The batchd is a toolset for batch processing for Linux/Unix. It enables one to:

  • Create and manage queues of tasks (batch jobs);
  • Specify time periods (schedules) when jobs from each queue can be executed;
  • Run batch jobs on localhost or on several machines one-by-one or in parallel.

The main concern of batchd are batch jobs, which are meant to take some time to execute (minutes to days) and consume a lot of computational power (probably whole power of the machine). Examples of such jobs are:

  • Scientific calculations (physical modelling or numeric experiments on differential equations, for example);
  • Building large software products from source code;
  • Running integration test suites;
  • Rendering complex 3D scenes or animations;
  • Executing complex reports on large databases;
  • Backups;
  • and so on.

For such tasks it becomes inconvinient to run them just when you want. For example, if you are a 3D artist, and you just completed your scene, you want to render it, but the rendering process will take the whole computational power of your machine, while you are going to work on the next scene. It would be good to run the rendering as a night job. Then you create the next scene and want it to be rendered next after the first. So you have a queue of batch jobs.

batchd supports simple host management for several kinds of virtual hosts:

  • LibVirt-supported hypervisors (KVM, QEMU, Xen, Virtuozzo, VMWare ESX, LXC, BHyve and more);
  • Docker containers;
  • AWS EC2 instances;
  • Linode.com instances.

batchd can automatically start such hosts when there are jobs to be executed on them, and automatically stop hosts when there is no need for them.

The batchd suite consists of the following components:

  • Client utilities. These allow you to create queues, put jobs into queues and so on. The following clients are available for now:
    • Command-line utility, batch . This is for now the most complete client.
    • Python3+Qt5 GUI client. This allows to view and edit queues, view, create and edit jobs.
    • Blender python client. This is simple addon for Blender, which allows to put rendering jobs to batchd queue from Blender’s UI.
    • Web client. This allows to view queues, create and view jobs. It is mainly intended for job creation and monitoring.

    There is also a command-line utility called batchd-admin, for administrative needs. For now it allows to create superuser (root), since it is not possible to create first user via REST API.

    See also REST API description in the REST.API file.

    batchd uses relatively simple access control model, based on users and permissions. Each user can have one of predefined permissions. Permissions can be assigned with relation to:

    • Specific queue or any queue
    • Specific job type or any type (only for creating jobs)
    • Specific host or any host (only for creating jobs)

    The following permissions are supported:

    • SuperUser — user that have this permission is superuser, it can do anything.
    • CreateJobs — a permission to create jobs. Can be granted with relation to queue and job type.
    • ViewJobs — a permission to view job details and results.
    • ManageJobs — a permission to edit and delete jobs.
    • ViewQueues — a permission to view queue details.
    • ManageQueues — a permission to create, edit and delete queues.
    • ViewSchedules — a permission to view schedules. This cannot be granted with relation to queue or job type.
    • ManageSchedules — a permission to create, edit or delete schedules. This cannot be granted with relation to queue or job type.

    Only superuser can manage users and their permissions.

    The following options are available for user authentication for access to REST API:

    • Basic HTTP authentication. In this case password are sent in clear text, so this is secure only if channel is secured with HTTPS. This can be implemented with external tool, for example nginx.
    • Unconditional authentication of user specified in X-Auth-User: HTTP header. This is intended for cases when authentication is done by external system. For example, this is usable when nginx checks client HTTPS certificates.
    • Disable authentication. In this mode all users are treated as superusers. This can be usable when running on localhost.

    All configuration files are in YAML format and stored under /etc/batchd/ or under ~/.config/batchd/. The config files are:

    • batchd/batchd.yaml. This file contains global options: database connection, worker threads count and so on.
    • batchd/client.yaml. This file contains default settings for `batch’ command-line client.
    • batchd/hosts/$hostname.yaml. Such files describe remote hosts (which are available through the SSH protocol with public key auth).
    • batchd/jobtypes/$type.yaml. Such files describe job types — types of jobs which can be run. See also the description below.

    For examples, see sample configs under sample-configs/ directory.

    The `batch’ command-line client also supports the following environment variables:

    • BATCH_MANAGER_URL
    • BATCH_QUEUE
    • BATCH_TYPE
    • BATCH_HOST
    • BATCH_USERNAME
    • BATCH_PASSWORD

    The priority for these options is as follows:

    • Options specified explicitly in command line have the highest priority.
    • Next environment variables are checked.
    • Then client.yaml config file is checked.
    • If there is no client.yaml or the option is not specified in it, then default value is used:
      • Default manager url is http://localhost:9681.
      • Default queue is «default».
      • Default job type is «command».
      • Default host name is «undefined», which means use local host.
      • CLI and Python clients use current OS user name as batchd user name by default.

      The job type describes how to execute jobs of specific kind:

      • Shell command to be run. $<>-syntax can be used for parameters substitution.
      • Types of job parameters. The following types are supported for now:
        • String — just a string.
        • Int — an integer number.
        • InputFile — the parameter represents a path to file which should be used as input. If the job is run on the remote host, then before command execution all files specified in InputFile-parameters are copied from the host where batchd dispatcher is running to the remote host (via SCP protocol).
        • OutputFile — the parameter represents a path to file which should be used as output. If the job is run on the remote host, then after command execution all files specified in OutputFile-parameters are copied from the remote host to the host where batchd dispatcher is running (via SCP protocol).
        • continue — the job will be marked as failed and dispatcher will just proceed with other jobs.
        • retry:
          • : the job will be marked as new and will be left in the beginning of the queue; so dispatcher will retry execution of this job in short time. Maximum number of retries is specified in `count’ parameter; default maximum is 1.
          • : simillar to previous, but the job will be moved to the end of the queue; so dispatcher will retry execution of this job after all previous jobs are finished.

          The schedule describes time periods when jobs can be executed. There are two options of how to specify periods:

          • Specify time of day period; for example, time: []. It is possible to specify several periods. If no periods are specified, it is treated as «any time of day».
          • Specify week days; for example, weekdays: [«Saturday», «Sunday»]. If no week days are specified, it is treated as «any week day».
          $ sudo apt-get install stack $ cd batchd/ $ stack install --flag batchd:docker --flag batchd:libvirt --flag batchd:ec2 $ vi .config/batchd/batchd.yaml # Please refer to sample-configs/ directory $ batchd-admin upgrade-db $ batchd-admin create-superuser 

          About

          Batch processing toolset for Linux / Unix

          Источник

          Introduction

          In today’s fast-paced world, time is of the essence, and getting things done quickly and efficiently is key. Linux, being a popular operating system, provides a wide range of powerful command-line tools to facilitate speedy execution of tasks.

          One such tool is the xargs command, which can be used to perform batch processing on multiple files or inputs.

          What is xargs Command

          The xargs command is a Linux utility that is used to build and execute command lines from standard input. It is commonly used in combination with other commands such as find, grep, and ls, to process a large number of files or inputs.

          The command reads input items separated by whitespace, and executes a specified command on each item.

          The basic syntax of the xargs command is as follows:

          xargs [options] [command [initial-arguments]]

          Some of the commonly used options include:

          • -a file: read items from a file instead of standard input.
          • -I replace-str: replace occurrences of replace-str in the initial-arguments with the input item.
          • -n num: use at most num arguments per command line.
          • -P max-procs: run up to max-procs processes at once.

          Here are some practical examples of how the xargs command can be used for batch processing:

          Find and Delete Files

          Suppose you have a directory containing thousands of files, and you want to delete all files with a particular extension, say .log . You can use the following command to accomplish this:

          The find command searches for all files with the .log extension, and passes them to the xargs command.

          The xargs command then executes the rm command on each file, effectively deleting them.

          Convert Multiple Files

          Suppose you have a directory containing several image files in the .png format, and you want to convert them all to .jpg format. You can use the following command to achieve this:

          ls *.png | xargs -I <> convert <> <>.jpg

          The ls command lists all files with the .png extension, which are then passed to the xargs command.

          The xargs command uses the -I option to replace occurrences of <> in the convert command with the input items. The convert command then converts each file to the .jpg format.

          Parallel Execution

          Suppose you have a directory containing several large text files, and you want to compress them all using gzip. You can use the following command to achieve parallel execution using the xargs command:

          ls *.txt | xargs -P 4 -n 1 gzip

          The ls command lists all files with the .txt extension, which are then passed to the xargs command.

          The xargs command uses the -P option to specify the maximum number of processes to run at once (in this case, 4), and the -n option to specify the number of arguments per command line (in this case, 1).

          The gzip command then compresses each file in parallel, effectively speeding up the process.

          Execute Multiple Commands

          Suppose you have a list of files and you want to perform two commands on each file: convert it to a PDF and then compress it. You can use the following command to accomplish this:

          cat files.txt | xargs -I <> sh -c 'convert <> -compress zip <>.pdf'

          The cat command reads the list of files from the files.txt file, which are then passed to the xargs command.

          The xargs command uses the -I option to replace occurrences of in the sh command with the input items.

          The sh command executes the convert command on each file, converting it to a PDF and then compressing it.

          Conclusion

          The xargs command is a versatile and powerful tool that can be used for batch processing tasks on Linux. It allows you to efficiently process a large number of files or inputs in one go, saving you time and effort.

          By combining it with other commands, you can perform complex operations on multiple files in a single command, simplifying your workflow and increasing productivity.

          Whether you’re managing large data sets or automating repetitive tasks, the xargs command is a valuable addition to your Linux toolkit.

          Источник

          Читайте также:  Linux loading so files
Оцените статью
Adblock
detector