Linux compressing file system

Carles Mateo

Blog on extreme IT, Development, Clouds, SRE, Operations, Start ups, Security, CTO and my thoughts

Creating a compressed filesystem with Linux and ZFS (using just files)

Many times it could be very convenient to have a compressed filesystem, so a system that compresses data in Real Time.

This not only reduces the space used, but increases the IO performance. Or better explained, if you have to write to disk 1GB log file, and it takes 5 seconds, you have a 200MB/s performance. But if you have to write 1GB file, and it takes 0.5 seconds you have 2000MB/s or 2GB/s. However the trick in here is that you really only wrote 100MB, cause the Data was compressed before being written to the disk.

This also works for reading. 100MB are Read, from Disk, and then uncompressed in the memory (using chunks, not everything is loaded at once), assuming same speed for Reading and Writing (that’s usual for sequential access on SAS drives) we have been reading from disk for 0.5 seconds instead of 5. Let’s imagine we have 0.2 seconds of CPU time, used for decompressing. That’s it: 0.7 seconds versus 5 seconds.

So assuming you have installed ZFS in your Desktop computer those instructions will allow you to create a ZFS filesystem, compressed, and mount it.

ZFS can create pools using disks, partitions or other block devices, like regular files or loop devices.

# Create the File that will hold the Filesystem, 1GB

root@xeon:/home/carles# dd if=/dev/zero of=/home/carles/compressedfile.000 bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.621923 s, 1.7 GB/s
zpool create compressedpool /home/carles/compressedfile.000

# If you don’t have automount set, then set the mountpoint

zpool set compressedpool mountpoint=/compressedpool

# Set the compression. LZ4 is fast and well balanced

zfs set compression=lz4 compressedpool

# Push some very compressible 1GB file. Don’t use just 0s as this is optimized 🙂

# Myself I copied real logs

ls -al --block-size=M *.log -rw------- 1 carles carles 1329M Sep 26 14:34 messages.log root@xeon:/home/carles# cp messages.log /compressedpool/

Even if the pool only had 1GB we managed to copy 1.33 GB file.

Then we check and only 142MB are being used for real, thanks to the compression.

root@xeon:/home/carles# zfs list NAME USED AVAIL REFER MOUNTPOINT compressedpool 142M 738M 141M /compressedpool root@xeon:/home/carles# df /compressedpool Filesystem 1K-blocks Used Available Use% Mounted on compressedpool 899584 144000 755584 17% /compressedpool

By default ZFS will only import the pools that are based on drives, so in order to import your pool based on files after you reboot or did zfs export compressedpool, you must specify the directory:

zpool import -d /home/carles compressedpool

You can also create a pool using several files from different hard drives. That way you can create mirror, RAIDZ1, RAIDZ2 or RAIDZ3 and not losing any data in that pool based on drives in case you loss a physical drive.

Читайте также:  Php linux cannot open shared object file

If you use one file in several hard drive, you are aggregating the bandwidth.

You can also do this in your instances or VMs. Create one file of 1GB and creating the pool for compressed logs or compressed core dumps. If later you need more space you can add another file to he pool. You don’t need to use any redundancy, just creating a pool with mountpoint /var/log or /var/core and grow as you need.

Logs and core dumps can be greatly compressed, for example a core dump of 54MB will be around 645KB if you compress it using a tool like bzip2. Using the compression from ZFS, you can choose different algorithms of compression, so expect a massive reduction of space and huge space savings for logs and core dumps.

This entry was posted in Performance, Storage and tagged Compression, LZ4, ZFS on 2018-09-26 by Carles Mateo . Views: 9,010 views

Rules for writing a Comment

  1. Comments are moderated
  2. I don’t publish Spam
  3. Comments with a fake email are not published
  4. Disrespectful comments are not published, even if they have a valid point
  5. Please try to read all the article before asking, as in many cases questions are already responded

Leave a Reply Cancel reply

You must be logged in to post a comment.

Blog running since 2013 February

Disclaimer

Opinions are my personal views, as Human Being and as Engineer.
I’m not the spokesperson for any company.
Any views, technology techniques or tricks expressed or explained in here are written on my own behalf, and so will not represent the position, or methodologies, of my current or former employers.

Social

Twitch:
https://www.twitch.tv/carlesmateo_com
I stream Python programming, refactor, Unit Testing, Linux, Docker.

Buy Automating and Provisioning to Amazon AWS with Python SDK boto3

Recent Posts

Other Carles’ projects

Cmemgzip — Compress Logs in memory when your Server ran out of space and save the day
Cmips — Benchmarking the Cloud
Cassandra Driver — A python Web Gateway to query Cassandra from XML, CSV, or JSon
Catalonia Framework — A lightweight PHP Framework
C-Client — An encrypted Messenger for people and companies
PrototypeC — A cheap tiny portable Linux laptop that weights 160 g.
CTop.py — Open Source Python Monitoring tool for Engineering Operations and SRE.
MySql Proxy Cache — A High Performance TCP/Ip Proxy Cache for Mysql, and Query debugger.

Buy Docker Combat Guide

Recent Comments

Buy Python 3 Combat Guide, by Carles Mateo

Other Engineering Blogs I like

Buy Python 3 Exercises for Beginners

Archives

  • November 2022 (2)
  • October 2022 (5)
  • September 2022 (4)
  • August 2022 (4)
  • July 2022 (11)
  • June 2022 (9)
  • May 2022 (11)
  • April 2022 (5)
  • March 2022 (5)
  • February 2022 (3)
  • January 2022 (2)
  • December 2021 (3)
  • November 2021 (4)
  • October 2021 (6)
  • September 2021 (6)
  • August 2021 (4)
  • July 2021 (4)
  • June 2021 (3)
  • May 2021 (4)
  • April 2021 (1)
  • March 2021 (9)
  • February 2021 (4)
  • January 2021 (6)
  • December 2020 (3)
  • November 2020 (5)
  • October 2020 (4)
  • September 2020 (3)
  • August 2020 (5)
  • July 2020 (1)
  • June 2020 (1)
  • May 2020 (5)
  • April 2020 (3)
  • March 2020 (9)
  • February 2020 (1)
  • January 2020 (1)
  • October 2019 (1)
  • September 2019 (1)
  • August 2019 (3)
  • July 2019 (2)
  • June 2019 (3)
  • May 2019 (1)
  • April 2019 (4)
  • March 2019 (1)
  • February 2019 (1)
  • November 2018 (2)
  • October 2018 (2)
  • September 2018 (2)
  • July 2018 (1)
  • June 2018 (1)
  • May 2018 (1)
  • March 2018 (1)
  • March 2017 (1)
  • June 2016 (1)
  • March 2016 (1)
  • August 2015 (1)
  • June 2015 (1)
  • May 2015 (2)
  • April 2015 (2)
  • February 2015 (1)
  • January 2015 (1)
  • October 2014 (1)
  • August 2014 (2)
  • July 2014 (1)
  • March 2014 (1)
  • February 2014 (1)
  • November 2013 (1)
  • September 2013 (1)
  • August 2013 (2)
  • April 2013 (1)
  • February 2013 (1)
Читайте также:  Не работает звук manjaro linux

Categories

  • Algorithms (4)
  • Amazon EC2 (11)
  • Bandwidth (8)
  • Books (9)
  • Business (4)
  • Carles Mateo in the News, Radio, Conferences (27)
  • Casual tech (14)
  • CI/CD (1)
  • Cloud providers (22)
    • Amazon Cloud (8)
    • Digital Ocean (6)
    • Google Cloud (4)
    • Microsoft Azure (1)
    • WordPress (5)
    • MySQL (8)
    • OpenLDAP (1)
    • Oracle (1)
    • Laptops (1)
    • NAS (7)
    • Raspberry Pi 2 (1)
    • Raspberry Pi 4 (3)
    • Smartphones (5)
    • Storage (26)
      • Erasure Coding (3)
      • MDRAID (4)
      • NAS (2)
      • NFS (3)
      • ZFS (10)
      • RabbitMQ (1)
      • CentOS (1)
      • Ubuntu Linux (17)
      • Windows 10 Pro (5)
      • Bash (12)
      • C (2)
      • Java (3)
      • JavaScript (2)
      • Microservices (3)
      • PHP (9)
      • Python (53)
      • Service-based Architecture (1)
      • Unit Testing (4)
      • Post-Mortem Analysis (4)
      • Catalonia (1)
      • Ireland (2)
      • Docker Containers (25)
      • Hyper-V (1)
      • VirtualBox (3)
      • VMware (2)
      • CDN (1)

      Источник

      What Is the Best Compression Tool in Linux?

      In this article, we will compare all the best and most popular Linux compression tools. This will include benchmark tests to see which compression method performs the best, and we’ll also weigh the pros and cons of compatibility and other areas. Compression methods covered will be gzip, xz, bzip2, 7zip, zip, rar, and zstd (Zstandard).

      Linux gives us a lot of options when we need to compress files. While that’s definitely a good thing, it can lead to confusion about which one should be used. Let’s start by comparing each method across a few key areas.

      Compression Benchmark Test

      Although compression ratio should not be the only determining factor when deciding on which tool to use, it will definitely play a big role.

      For our benchmark test, we’ll try compressing a copy of the 2002 video game Age of Mythology with a variety of tools. Older video games like AOM make for a good test, since compression methods weren’t up to par with today’s technology and video games contain a wide range of file formats, like audio, video, images, binary files, text, etc. The total size of this video game installation is 1350 MB.

      Default Compression Results

      Here are the results of our compression test when we use each tool’s default compression level. You can see the resulting compressed size, time elpased, and the precise commands we used to perform the compression.

      Compression Size Time Elapsed Command
      gzip 955 MB 1:45 tar cfz AOM.tar.gz AOM/
      xz 856 MB 16:06 tar cfJ AOM.tar.xz AOM/
      bzip2 943 MB 5:36 tar cfj AOM.tar.bz2 AOM/
      7zip 851 MB 10:59 7z a AOM.7z AOM/
      zip 956 MB 1:41 zip -r AOM.zip AOM/
      rar 877 MB 6:37 rar a AOM.rar AOM/*
      zstd 934 MB 0:43 tar —zstd -cf AOM.tar.zst AOM/

      Our test directory has been compressed with multiple tools

      Highest Compression Results

      And here are the results when we use each tool’s maximum compression level. A higher compression level usually results in some minor space savings, but can take the tool a lot longer to perform the job. The commands we use below are utilizing the absolute maximum compression level for each tool.

      Compression Size Time Elapsed Command
      gzip 954 MB 2:10 tar cf — AOM/ | gzip -9 — > AOM.tar.gz
      xz 847 MB 27:32 tar cf — AOM/ | xz -9e — > AOM.tar.xz
      bzip2 943 MB 5:42 tar cf — AOM/ | bzip2 -9 — > AOM.tar.bz2
      7zip 845 MB 16:41 7z a -mx=9 AOM.7z AOM/
      zip 955 MB 2:05 zip -9 -r AOM.zip AOM/
      rar 876 MB 6:31 rar a -m5 AOM.rar AOM/*
      zstd 873 MB 22:19 tar -I ‘zstd —ultra -22’ -cf AOM.tar.zst AOM/

      And the Winner Is…

      According to our benchmark test:

      For compression ratio, the best compression tool on Linux is 7zip.

      For compression speed, the best compression tool on Linux is Zstandard (zstd).

      Potential for Varying Results

      Keep in mind that you should take these benchmark results with a grain of salt. Depending on the type of files you’re compressing, and the hardware of your PC or server, you could get very different results in compression ratio and speed. This benchmark test works well as a very general measurement of the compression tools listed, but every situation is going to be different. If in doubt, try out a few of them yourself – that’s why we’ve given you the commands for each compression tool.

      Note also that we used the normal compression level and maximum compression level for each tool. There are a lot of other choices than just these two options. You could use some value in between, or even use a lesser compression level so the files compress very quickly.

      Compatibility

      Compression ratio and speed aren’t the only concern. Not always, anyway.

      On Linux systems, tar is the usual format for archives. Compression is then added to the tar file, resulting in extensions like .tar.gz and .tar.bz2 and .tar.xz . The tar format is able to combine files into a single archive, while preserving all of the Linux file permissions. Its compatibility with Linux file systems is why it’s preferred on Linux.

      On other operating systems, like Windows, the .zip format is much more common. Zip files are usually pretty painless to open on Linux, but tar files don’t always enjoy the same privilege on Windows. Zip files also won’t preserve file permissions on Linux.

      Why’s this matter? Well, depending on what you’re doing with your compressed archive, you may need to take the filetype into consideration. For example, it’s better to share zip files with Windows users. If you’re sharing the archive with Linux users, then it won’t matter as much. Users of both systems usually need extra software if they’re going to extract the contents of a 7z, rar, or zstd file.

      Источник

Оцените статью
Adblock
detector