By Leet on 2025-06-02 20:54:04
Most people do not store ISOs for Windows after using them. Some more tech-savvy people store a couple of them so that they can easily access them to reimage computers. Finally, some people like me store every single installer they have as a collection. I frequently require different Windows editions, variants, versions, and languages for testing software, so I have a big library.
Managing such a library gets inconvenient. Very, very fast. There are several reasons for this.
First and foremost, each ISO file is 3-6GB, depending on the version, which wastes a lot of space. Luckily, the 'Windows ISO Community' has a solution for this. Namely, SmartVersion. SmartVersion uses a binary diff format, SVF, to store the changes between two files. For example, you could have the following files: en-us_windows7.iso
and nl-nl_windows7.svf
. SmartVersion can then use the SVF file to convert the English ISO into the Dutch one. The English version is also known as the 'source' file. The Dutch version is similarly called the 'target' file. Because of SVF files being extremely space-efficient, I have thousands of ISOs in my collection. The full extracted size of this library would normally be over 3 TiB, but with SVF it is only about 100 GiB.
The second issue is that SVF files can be nested, i.e. you need multiple SVF files to get the desired ISO. An example of this:
a.iso + a_to_b.svf => b.iso
b.iso + b_to_c.svf => c.iso
In this case, you store only a.iso
and all SVF files. When you want to get c.iso
from your library, you have to first extract b.iso
and then apply the second SVF to it to get c.iso
. Fortunately, SmartVersion can also combine the SVF files. So a_to_b.svf
and b_to_c.svf
can be converted to a_to_c.svf
. This way, you only have to extract once. The actual issue is not the extraction though, but rather that it is not always clear which SVF and source ISO files are needed to retrieve a specific image.
Finally, it is very difficult to store information about these files. The SVF files only give you the filename, and the actual specific about the ISO (what build, edition, licensing etc.) are often hard to determine from the filename alone. For example:
en_windows_7_enterprise_x64_dvd_x15-70749.iso
From this name, the language, main version, edition and architecture are quite clear. But what build of Windows is this? What licensing does this use? I know it purely out of experience (RTM, Volume), but for some of my ISOs I still do not know exactly what they contain. Having written Panther2K, I know a thing or two about WIM files. Therefore, I can easily obtain metadata information if I have access to the WIM files. Unfortunately, extracting all 3 TiB of ISOs from the SVFs is not viable. At least, it is not good for my SSD to write this many terabytes of ISOs and I would like to prevent doing so if I purely need metadata.
My solution for this is a database system for Windows installers. Note that this also includes ESD files, which I use to quickly create Panther2K USBs. The requirements for the software are as follows:
Such a system comes with a lot of challenges. In this blog series, I will try to go through each of the components of the system to see what is involved in creating it. Stay tuned, because there are a lot of cool technologies and tricks used to make this run smoothly.
xx Leet