felixhandte released this
Aug 19, 2019
We discovered an issue in the v1.4.2 release, which can degrade the effectiveness of dictionary compression. This release fixes that issue.
felixhandte released this
Jul 25, 2019
This release is a small one, that corrects an issue discovered in the previous release. Zstandard v1.4.1 included a bug in decompressing v0.5 legacy frames, which is fixed in v1.4.2.
felixhandte released this
Jul 19, 2019
This release is primarily a maintenance release.
It includes a few bug fixes, including a fix for a rare data corruption bug, which could only be triggered in a niche use case, when doing all of the following: using multithreading mode, with an overlap size >= 512 MB, using a strategy >= ZSTD_btlazy, and compressing more than 4 GB. None of the default compression levels meet these requirements (not even --ultra ones).
This release also includes some performance improvements, among which the primary improvement is that Zstd decompression is ~7% faster, thanks to @mgrice.
See this comparison of decompression speeds at different compression levels, measured on the Silesia Corpus, on an Intel i9-9900K with GCC 9.1.0.
terrelln released this
Apr 16, 2019
The main focus of the v1.4.0 release is the stabilization of the advanced API.
The advanced API provides a way to set specific parameters during compression and decompression in an API and ABI compatible way. For example, it allows you to compress with multiple threads, enable --long mode, set frame parameters, and load dictionaries. It is compatible with ZSTD_compressStream*() and ZSTD_compress2(). There is also an advanced decompression API that allows you to set parameters like maximum memory usage, and load dictionaries. It is compatible with the existing decompression functions ZSTD_decompressStream() and ZSTD_decompressDCtx().
The old streaming functions are all compatible with the new API, and the documentation provides the equivalent function calls in the new API. For example, see ZSTD_initCStream(). The stable functions will remain supported, but the functions in the experimental sections, like ZSTD_initCStream_usingDict(), will eventually be marked as deprecated and removed in favor of the new advanced API.
The examples have all been updated to use the new advanced API. If you have questions about how to use the new API, please refer to the examples, and if they are unanswered, please open an issue.
Zstd's fastest compression level just got faster! Thanks to ideas from Intel's igzip and @gbtucker, we've made level 1, zstd's fastest strategy, 6-8% faster in most scenarios. For example on the Silesia Corpus with level 1, we see 0.2% better compression compared to zstd-1.3.8, and these performance figures on an Intel i9-9900K:
A new experimental function ZSTD_decompressBound() has been added by @shakeelrao. It is useful when decompressing zstd data in a single shot that may, or may not have the decompressed size written into the frame. It is exact when the decompressed size is written into the frame, and a tight upper bound within 128 KB, as long as ZSTD_e_flush and ZSTD_flushStream() aren't used. When ZSTD_e_flush is used, in the worst case the bound can be very large, but this isn't a common scenario.
The parameter ZSTD_c_literalCompressionMode and the CLI flag --[no-]compress-literals allow users to explicitly enable and disable literal compression. By default literals are compressed with positive compression levels, and left uncompressed for negative compression levels. Disabling literal compression boosts compression and decompression speed, at the cost of compression ratio.
Cyan4973 released this
Dec 27, 2018
· 11 commits to dev since this release
v1.3.8 main focus is the stabilization of the advanced API.
This API has been in the making for more than a year, and makes it possible to trigger advanced features, such as multithreading, --long mode, or detailed frame parameters, in a straightforward and extensible manner. Some examples are provided in this blog entry. To make this vision possible, the advanced API relies on sticky parameters, which can be stacked on top of each other in any order. This makes it possible to introduce new features in the future without breaking API nor ABI.
This API has provided a good experience in our infrastructure, and we hope it will prove easy to use and efficient in your applications. Nonetheless, before being branded "stable", this proposal must spend a last round in "staging area", in order to generate comments and feedback from new users. It's planned to be labelled "stable" by v1.4.0, which is expected to be next release, depending on received feedback.
The experimental section still contains a lot of prototypes which are largely redundant with the new advanced API. Expect them to become deprecated, and then later dropped in some future. Transition towards the newer advanced API is therefore highly recommended.
Decoding speed has been improved again, primarily for some specific scenarios : frames using large window sizes (--ultra or --long), and cold dictionary. Cold dictionary is expected to become more important in the near future, as solutions relying on thousands of dictionaries simultaneously will be deployed.
The higher compression levels get a slight compression ratio boost, mostly visible for small (<256 KB) and large (>32 MB) data streams. This change benefits asymmetric scenarios (compress ones, decompress many times), typically targeting level 19.
A noticeable addition, @terrelln introduces the --rsyncable mode to zstd. Similar to gzip --rsyncable, it generates a compressed frame which is friendly to rsync in case of limited changes : a difference in the input data will only impact a small localized amount of compressed data, instead of everything from the position onward due to cascading impacts. This is useful for very large archives regularly updated and synchronized over long distance connections (as an example, compressed mailboxes come to mind).
The method used by zstd preserves the compression ratio very well, introducing only very tiny losses due to synchronization points, meaning it's no longer a sacrifice to use --rsyncable. Here is an example on silesia.tar, using default compression level :
Speaking of compression of level : it's now possible to use environment variable ZSTD_CLEVEL to influence default compression level. This can prove useful in situations where it's not possible to provide command line parameters, typically when zstd is invoked "under the hood" by some calling process.
Lastly, anyone interested in embedding a small zstd decoder into a space-constrained application will be interested in a new set of build macros introduced by @felixhandte, which makes it possible to selectively turn off decoder features to reduce binary size even further. Final binary size will of course vary depending on target assembler and compiler, but in preliminary testings on x64, it helped reducing the decoder size by a factor 3 (from ~64KB towards ~20KB).
terrelln released this
Nov 29, 2018
Zstandard regression testing data
Cyan4973 released this
Jun 28, 2018
Zstandard v1.3.5 is a maintenance release focused on dictionary compression performance.
Compression is generally associated with the act of willingly requesting the compression of some large source. However, within datacenters, compression brings its best benefits when completed transparently. In such scenario, it's actually very common to compress a large number of very small blobs (individual messages in a stream or log, or records in a cache or datastore, etc.). Dictionary compression is a great tool for these use cases.
This release makes dictionary compression significantly faster for these situations, when compressing small to very small data (inputs up to ~16 KB).
The above image plots the compression speeds at different input sizes for zstd v1.3.4 (red) and v1.3.5 (green), at levels 1, 3, 9, and 18. The benchmark data was gathered on an Intel Xeon CPU E5-2680 v4 @ 2.40GHz. The benchmark was compiled with clang-7.0, with the flags -O3 -march=native -mtune=native -DNDEBUG. The file used in the results shown here is the osdb file from the Silesia corpus, cut into small blocks. It was selected because it performed roughly in the middle of the pack among the Silesia files.
Intel Xeon CPU E5-2680 v4 @ 2.40GHz
-O3 -march=native -mtune=native -DNDEBUG
The new version saves substantial initialization time, which is increasingly important as the average size to compress becomes smaller. The impact is even more perceptible at higher levels, where initialization costs are higher. For larger inputs, performance remain similar.
Users can expect to measure substantial speed improvements for inputs smaller than 8 KB, and up to 32 KB depending on the context. The expected speed-up ranges from none (large, incompressible blobs) to many times faster (small, highly compressible inputs). Real world examples up to 15x have been observed.
The compression levels have been slightly adjusted, taking into consideration the higher top speed of level 1 since v1.3.4, and making level 19 a substantially stronger compression level while preserving the 8 MB window size limit, hence keeping an acceptable memory budget for decompression.
It's also possible to select the content of libzstd by modifying macro values at compilation time. By default, libzstd contains everything, but its size can be made substantially smaller by removing support for the dictionary builder, or legacy formats, or deprecated functions. It's even possible to build a compression-only or a decompression-only library.
zstd --list does not work with non-interactive tty. This issue is fixed in dev branch.
Cyan4973 released this
Mar 26, 2018
The v1.3.4 release of Zstandard is focused on performance, and will offers nice speed boost in most scenarios.
zstd cli will now performs compression in parallel with I/O operations by default. This requires multi-threading capability (which is also enabled by default). It doesn't sound like much, but effectively improves throughput by 20-30%, depending on compression level and underlying I/O performance.
For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU @ 3.10GHz, running time zstd enwik9 at default compression level (2) on a SSD gives the following :
This is a nice boost to all scripts using zstd cli, typically in network or storage tasks. The effect is even more pronounced at faster compression setting, since the CLI overlaps a proportionally higher share of compression with I/O.
Previous default behavior (blocking single thread) is still available, accessible through --single-thread long command. It's also the only mode available when no multi-threading capability is detected.
Some core routines have been refined to provide more speed on newer cpus, making better use of their out-of-order execution units. This is more sensible on the decompression side, and even more so with gcc compiler.
Example on the same platform, running in-memory benchmark zstd -b1 silesia.tar :
zstd -b1 silesia.tar
So far, compression level 1 has been the fastest one available. Starting with v1.3.4, there will be additional choices. Faster compression levels can be invoked using negative values. On the command line, the equivalent one can be triggered using --fast[=#] command.
Negative compression levels sample data more sparsely, and disable Huffman compression of literals, translating into faster decoding speed.
It's possible to create one's own custom fast compression level by using strategy ZSTD_fast, increasing ZSTD_p_targetLength to desired value, and turning on or off literals compression, using ZSTD_p_compressLiterals.
Performance is generally on par or better than other high speed algorithms. On below benchmark (compressing silesia.tar on an Intel Core i7-6700K CPU @ 4.00GHz) , it ends up being faster and stronger on all metrics compared with quicklz and snappy at --fast=2. It also compares favorably to lzo with --fast=3. lz4 still offers a better speed / compression combo, with zstd --fast=4 approaching close.
Applications which were considering Zstandard but were worried of being CPU-bounded are now able to shift the load from CPU to bandwidth on a larger scale, and may even vary temporarily their choice depending on local conditions (to deal with some sudden workload surge for example).
zstd-1.3.2 introduced the long range mode, capable to deduplicate long distance redundancies in a large data stream, a situation typical in backup scenarios for example. But its usage in association with multi-threading was discouraged, due to inefficient use of memory. zstd-1.3.4 solves this issue, by making long range match finder run in serial mode, like a pre-processor, before passing its result to backend compressors (regular zstd). Memory usage is now bounded to the maximum of the long range window size, and the memory that zstdmt would require without long range matching. As the long range mode runs at about 200 MB/s, depending on the number of cores available, it's possible to tune compression level to match the LRM speed, which becomes the upper limit.
zstd -T0 -5 --long file # autodetect threads, level 5, 128 MB window
zstd -T16 -10 --long=31 file # 16 threads, level 10, 2 GB window
As illustration, benchmarks of the two files "Linux 4.7 - 4.12" and "Linux git" from the 1.3.2 release are shown below. All compressors are run with 16 threads, except "zstd single 2 GB". zstd compressors are run with either a 128 MB or 2 GB window size, and lrzip compressor is run with lzo, gzip, and xz backends. The benchmarks were run on a 16 core Sandy Bridge @ 2.2 GHz.
The association of Long Range Mode with multi-threading offers now some very compelling results for large stream scenarios.
This release also brings its usual list of small improvements and bug fixes, as detailed below :