Despite the minor version increment (7.2 to 7.2.1) this release is actually quite substantial. The reason for calling this 7.2.1 rather than 7.3 is to align version numbers between regular Kakadu and the Speed-Pack.
This release serves to wrap up quite a number of promised improvements that had not made it into version 7.2. The most significant of these is that the `kdu_stripe_compressor' and `kdu_stripe_decompressor' API's have been augmented with SIMD accelerations (x86 processors only for now) that dramatically speed up the process of converting between most application-defined buffers organizations (across a range of data precisions) and Kakadu's four internal numerical representations.
Although these conversion and data reorganization steps are very simple, doing them with regular sample-by-sample operations can lead to a situation in which the inherently single-threaded data conversion process dominates the overall performance of a compression or decompression engine, since all other steps can be heavily multi-threaded. The `kdu_stripe_compressor' and `kdu_stripe_decompressor' interfaces are very convenient for application developers, but prior to this release the most efficient way to drive Kakadu was to directly interface to the more fundamental internal numeric representations via `kdu_multi_analysis' or `kdu_multi_synthesis'. This is not longer necessary and probably not even desirable, since almost all conversions and buffer reorganizations of interest are now handled automatically and efficiently by the stripe-compressor/stripe-decompressor interface.
The second useful feature of this release is that the makefiles have been rationalized and have been augmented with a fully configured set of makefiles for the MINGW-64 toolchain (Minimal GNU toolchain for Win64). This allows high quality binaries to be built for 64-bit Windows operating systems without the need for Microsoft's Visual Studio -- of course, though, to build the GUI "kdu_winshow" application and the .NET managed bindings for C# and Visual Basic, you would still need Visual Studio. This means that you can now build for the following operating systems directly using the makefiles: Win64, OSX (32/64 bit Intel/PPC), Linux (32/64 bit) and Solaris. This is in addition to the Visual Studio workspaces (Win32 and Win64) and the XCODE project files (OSX 32/64 bit).
A third improvement is that you can now build the .NET managed bindings for C# and Visual Basic using 64-bit binaries -- previously these interfaces were only built for 32-bit libraries. To build for 64-bit libraries, we had to overcome a known bug in Visual Studio 2010 and previous releases (was fixed only in VSTUDIO 2012) related to 64-bit atomic intrinsics.
A fourth useful improvement is that the `kdu_multi_analysis' and `kdu_multi_synthesis' interfaces, on top of which almost all compression/decompression API's are built, can now automatically select good double buffering parameters for interfacing multi-threaded DWT (where selected) and multi-threaded block coding modules. This automatic option is now selected by default by all of the demo applications, since it allows the amount of interface memory to be set in a manner that is sensitive to the image dimensions and processing precision so as to make better use of CPU cache resources. The "kdu_buffered_compress" and "kdu_buffered_expand" demo executables have been augmented to support more image file formats (PPM and BMP) and to use the high speed conversions now present in`kdu_stripe_compressor' and `kdu_stripe_decompressor' as effectively as possible -- they are very fast, and indeed "kdu_buffered_compress" is now generally faster than "kdu_compress", while being almost as capable and vastly simpler to understand. At the same time, an in-memory image replication mode has been added to "kdu_buffered_compress" to allow compression of very large images to be tested without necessarily having access to a massive image on the harddisk.
There are substantial improvements to the efficiency with which new JPIP channels are instantiated by `kdu_client' (useful for apps that want to set up a whole lot of parallel communication channels at once).
There are also a few minor bug fixes in this release.