Someone I know recently switched from using physical cross-stitch patterns to an app called Markup R-XP. While using the app, they noticed that you can export projects as ZIP archives, but the archives were encrypted. For curiosity's sake, I decided to take a look and see if I could figure out the password. In this blog post, I describe three attempts I made at determining the password to decrypt the archive and how one ultimately succeeded.

First, some observations about the app and export process. The app never asks for a password, so it must either be hardcoded or automatically generated. This video indicates that you can share projects across multiple devices, so it can't be a per-device password. It could generate a password based on file names, but a hardcoded password is the simplest explanation.

As for the ZIP file, the sample I had been given contained about 100 files. It consisted of the following files:

data.json
CoverImage.jpg
Upload/pattern.pdf (remember this one)
Symbol/*.jpg

The last entry (the "Symbol" folder) contained the bulk of the files. Based on the names (ex. "7c7dcq19.hdedefaultSymbol.jpg"), I assumed they were symbols in the pattern. (I was able to verify that this assumption was correct later.)

First Attempt - Bruteforcing

My immediate thought when seeing this was to just try a dictionary attack. After all, humans are usually pretty terrible at picking passwords, and encrypting cross-stitch patterns is hardly a security-critical operation. From what I can tell, the encryption mostly seems to be a way to prevent people from accidentally breaking their own patterns, so it doesn't matter if the encryption is actually secure or not.

I ran zip2john and john to try to crack the ZIP, using both John the Ripper's default "password.lst" list and SecLists' "xato-net-10-million-passwords.txt" list. After exhausting both lists, I let John continue running overnight for 7 hours, trying approximately ~1.6 trillion passwords. After it failed to crack in that time frame, I started to wonder if the developer had picked a strong password or if it was dynamically generated based on the archive.

Second Attempt - Reverse Engineering

Though I was a bit disappointed by my failure thus far, I was also confident that (at least theoretically) I would be able to recover the password from the app itself. After all, users were never prompted to enter a password, so it must be in the app somewhere.

I was able to install the app from Google Play and use ADB to pull the app off my phone (albeit as several APK files). I joined the APKs using APKEditor and decoded/disassembled the combined APK using Apktool. (Yes, I could have used just one tool, but I actually discovered Apktool first, used that to disassemble everything, then discovered APKEditor, then combined and disassembled the combined APK).

I started to look through the disassembled code, at which point I realized I had no clue what I was doing. While I have a bit of experience reversing Linux and Windows binaries, Android development and reverse engineering is a whole different animal. Despite my ignorance, I started to look through the general structure of the app.

I started looking for strings like "zip", "encrypt", "export", etc. and I quickly found some points of interest. However, all of them seemed to be related to zipping (i.e. combining) an array or collecting logs. Eventually, I started looking at the "lib/arm64-v8a" directory, which included a bunch of .dll.so files. One file caught my eye, named "libaot-Telerik.Zip.dll.so". I got as far as opening Ghidra before realizing that reverse engineering a library that likely only created the ZIP (rather than handling the password) was pointless.

Ultimately, after going down many more rabbit holes and circular paths, I determined that the password either wasn't there or (more likely) I didn't know how to spot it. Defeated, I started researching other approaches.

Third Attempt - Known Plaintext

I don't remember exactly what I was looking for, but I eventually stumbled across bkcrack on GitHub, which seemed promising. After downloading it, I determined that the app used ZipCrypto to create its archives, which is vulnerable to the known-plaintext attack that bkcrack is specifically designed to exploit.

Now I just had to find a known plaintext. You may remember from my foreshadowing in the introduction that there was a file named "Upload/pattern.pdf". The uncompressed size of this PDF turned out to be the exact size of the pattern used to create the project within the app. The app appears to just store a copy of the pattern in the file, presumably so that it can display it in the project. Since I control which PDF I use, I should have a known-plaintext. One slight issue: the PDF itself doesn't act as the plaintext; instead the compressed PDF acts as the plaintext. The tutorial file from bkcrack describes running the attack against compressed data as "the not so easy way" since "one would have to guess how those first bytes are compressed".

At the time, I (foolishly) believed that this would be a fairly straightforward task. After all, ZIP files are the universal archive format. My first attempt was to use the Info-ZIP zip program to create a ZIP file containing just "pattern.pdf". However, I noticed that the compressed size of my "pattern.pdf" was different than the compressed size of "pattern.pdf" in the given archive. Further testing revealed that changing tools (7-Zip vs Info-Zip) and compression levels changed the compressed data significantly.

Ultimately, I tested every compression level for both tools, only to find that none of them matched the compressed size in the given archive. In retrospect, this should have been obvious, but I was hopeful that I would stumble across an easy solution. (Note: testing also revealed that the compressed size of a file as detected by bkcrack would increase by 12 bytes if the file was encrypted. If bkcrack reported an encrypted file was 30 bytes, the known plaintext would actually have a compressed (but unencrypted) size of 18 bytes.)

Despite the mismatches in file size, I decided to try to run the attack anyway. After all, I didn't need the compressed files to be the exactly same, I only needed the first 12 bytes or so to be identical. Attempting to run the attack revealed a further problem: how do I get the compressed data? I only need the compressed bytes of the PDF, the header is completely useless and would break the bkcrack. I was able to locate a specification for ZIP files and wrote a terrible Python script (see appendix) designed to parse ZIPs just enough to grab some of the compressed data.

With the compressed data in hand, I ran the attack for each unique ZIP archive, only for all of them to fail. The authors of bkcrypt recommend trying "various compression software" to find a correct plaintext, so I began searching for other tools that could generate ZIP files. I remembered the "libaot-Telerik.Zip.dll.so" file in the APK, so I tried to see if that was a well-known library. I was never able to track down that exact file, but I did find Telerik.Windows.Zip.dll, which seemed like a promising place to start.

I set up a Telerik account (thank you burner emails) and downloaded the installer, which the site graciously gave to me in the form of an EXE. I suppose I should have guessed based on the file name, but Telerik (or at least the version I was using) seemed to be Windows-exclusive. I am not running Windows, so I downloaded a Windows 10 ISO from Microsoft, set up a VM, installed Visual Studio and .NET, made the necessary sacrifices to the Windows gods, and set up a sample project using their "hello world" script. Lo and behold, the size of the ZIP it created exactly matched the size I was looking for.

After transferring the ZIP file back to the host OS, I started bkcrack. After just a few minutes, I was able to recover the state and decrypt the ZIP file. After just a few more minutes, I was able to recover the actual password used to encrypt the ZIP files.

Conclusion

Now that I have the password, what will I do with it? Probably nothing, ultimately, except for maybe changing the cover image of the project to something funny. I won't be posting either the state or the password here just to be safe, but I hope that my efforts here convince you to never use ZipCrypto again. I know I'll never touch it unless it's to do this to some other archive. The Telerik DLL that the app uses recently (as of 2024 Q1) introduced support for using AES instead of ZipCrypto.

If you happen to be the developer and you think that this is a security issue and you're interested in fixing it, using AES would be the easiest way to go about it (in my opinion), especially since you (presumably) don't have any compatibility concerns with other apps. For anyone else reading this, switching to AES would mitigate the attack used here, but support for AES encryption in ZIP files is far from universal. If you're willing to sacrifice compatibility for security, I'd say go for it.

Oh, one final note. If you're the dev of the app, send me a message (@abus.sh on Bluesky or <name of your app> at abus[.]sh) or something. Your app is cool and legitimately very useful. I'm glad that you developed it, I know the person who sent me the original ZIP is getting a lot of use out of it.

Appendices and Random Notes

A Note on the Password

After I located the password, I tried to search for it directly in the app, both as an APK and its disassembled form. However, I was unable to locate it in either case. I suspect I am simply an idiot who doesn't understand Android reverse engineering, but it does seem a bit strange to me. Perhaps the password is retrieved or derived at runtime?

Reimplementing Telerik's Compression

As seen earlier, Telerik.Windows.Zip.dll uses a different compression method compared to both 7-Zip and Info-ZIP. It might be an interesting project to re-implement this method in a more portable way, though I'm not sure I have the time for it right now. If anyone does decide to tackle this (or if anyone finds a project that already does that), please let me know!

My Terrible Python Code

This is not pretty, but it did the job. For the known-plaintext attack to work, I needed to extract some of the compressed data from the reference unencrypted archive without also grabbing any header information. A bit of googling and looking at the spec for ZIP files told me how to parse the file header (section 4.3 was particularly useful). This file just reads the header and grabs the first 20 bytes of compressed. It can also parse the header information, but I commented those sections out since I only cared about the compressed data.

You may notice that I have a commented line to always read 100 bytes of compressed data, rather than what the "compressed size" field indicates. For reasons that I'm sure make sense, the "compressed size" field was sometimes 0, which meant that I had to hardcode a value for those cases. I'm sure I could have fixed it, but I simply can't be bothered at this point.

import sys

if len(sys.argv) == 1:
    print(f'Usage: {sys.argv} [file.zip]')
    sys.exit(-1)

with open(sys.argv[1], 'rb') as f:
    # Header
    header = f.read(4)
    if header != b'\x50\x4b\x03\x04':
        print(f'Invalid header: expected {b'\x50\x4b\x03\x04'}, got {header}')
        sys.exit(-1)
    
    # Version
    version = int.from_bytes(f.read(2), byteorder='little')
    #print(f'version={version}')

    # Bit flag
    bit_flag = bin(int.from_bytes(f.read(2), byteorder='little'))[2:].rjust(8, '0')
    #print(f'flag={bit_flag}')

    # Compression method
    # There is a list of what the values mean, I just can't be bothered to add them
    method = int.from_bytes(f.read(2), byteorder='little')
    match method:
        case 8:
            method = 'deflate'
        case _:
            method = f'unknown ({method})'
    #print(f'method={method}')

    # File modified time
    mod_time = f.read(2)
    #print(f'mod_time={mod_time}')

    # File modified date
    mod_date = f.read(2)
    #print(f'mod_date={mod_date}')

    # CRC
    crc = hex(int.from_bytes(f.read(4), byteorder='little'))[2:]
    #print(f'crc={crc}')

    # Compressed size
    comp_size = int.from_bytes(f.read(4), byteorder='little')
    #print(f'comp_size={comp_size}')

    # Uncompressed size
    uncomp_size = int.from_bytes(f.read(4), byteorder='little')
    #print(f'uncomp_size={uncomp_size}')

    # File name length
    file_name_len = int.from_bytes(f.read(2), byteorder='little')
    #print(f'file_name_len={file_name_len}')

    # Extra field length
    ext_field_len = int.from_bytes(f.read(2), byteorder='little')
    #print(f'ext_field_len={ext_field_len}')

    # File name
    file_name = f.read(file_name_len).decode()
    #print(f'file_name={file_name}')

    # Extra field
    ext_field = f.read(ext_field_len)
    #print(f'ext_field={ext_field}')

    data = f.read(comp_size)
    #data = f.read(100)
    #print(data[:20])
    sys.stdout.buffer.write(data[:20])

Pointlessly Encrypted Files - Defeating Markup R-XP's Encryption

@abus.sh

2024-12-17T15:10:03.386Z