Far over the misty mountains cold - iLEAPP Threema Parser

Over Christmas, a family member asked me for a little help with data migration between two iPhones.
This also involved transferring chats from the Threema application from one device to the other.
Although this was not possible as desired (without setting up the second device again), a new iLEAPP parser at least made the chats readable.
Threema
The paid app can be purchased from the App Store (Adam ID 578665578).
Threema is essentially a chat application with E2E encryption and a focus on data protection and privacy. The app's code is even open source: Github. Developed in Switzerland, Threema is mainly used in German-speaking countries. The Swiss army and administration rely on Threema, as do various German politicians. But the application is also considered an (data protection) secure alternative to WhatsApp internationally. Recently, the app was featured in the Darcy Jenson case. Three Australian citizens are alleged to have received instructions from a “Mr. Big” via Threema to murder Australian Zivan Radmanovic on the Indonesian island of Bali.
The fact that no phone number or email address is required for registration contributes to the anonymity of using Threema.
So far, I have only looked at the iOS version. An analysis of the Android version may follow later.
Data backup
Conveniently, the Threema database is not only included in an FFS backup, but also in an iTunes backup (and thus in a UFADE PRFS backup). User data is located (as is often the case with iOS applications) in the “Shared” directory:
/private/var/mobile/Containers/Shared/AppGroup/{uuid}
Outsourced media files are located in the hidden subfolder: .ThreemaData_SUPPORT/_EXTERNAL_DATA/
Special thanks go to my colleague Bruno Fischer, who provided me with several innocuous data sets to test my parser and create this documentation.
Sources
User data is almost exclusively stored in the database: ThreemaData.sqlite. Thankfully, almost everything here (compared to PotatoChat) is readable in plain text and logically structured.
The following tables are particularly interesting:
| Table | Content |
|---|---|
ZMESSAGE | Chats and group chats (messages) |
ZCONVERSATION | Chats and group chats (overview) |
ZCONTACT | Known users |
Z_PRIMARYKEY | Overview of message types |
ZAUDIODATA | Content sent as audio (m4a) |
ZIMAGEDATA | Content sent as images (jpg) |
ZVIDEODATA | Content sent as videos (mp4) |
ZFILEDATA | Content sent as files (original file format) |
Processing
The actual messages and most of the content can be found in the ZMESSAGE table. Texts are stored in plain text.
The following columns of the ZMESSAGE table were used for processing the messages:
| Column | Content |
|---|---|
Z_PK | Primary key - unique assignment of a message entry |
ZDATE | Timestamp of the message in Cocoa format (seconds since January 1, 2001) |
ZISOWN | 1 = message sent by yourself, 0 = message from another user |
ZSENDER | Z_PK from ZCONTACT if it is a group chat message |
ZREAD | 1 = message has been read, 0 = message has not been read |
ZTEXT | Actual message |
ZCAPTION | Text associated with a media file |
Z_ENT | Message type - mapped with Z_ENT from Z_PRIMARYKEY |
ZFILENAME | File name if the attachment was sent as a file |
ZMIMETYPE | Mimetype (e.g., “image/jpeg”) if the attachment was sent as a file |
ZAUDIO | Z_PK from ZAUDIODATA if the file was sent as audio |
ZIMAGE | Z_PK from ZIMAGEDATA if the file was sent as an image |
ZVIDEO | Z_PK from ZVIDEODATA if the file was sent as a video |
ZLONGITUDE | Longitude for sent locations |
ZLATITUDE | Latitude for sent locations |
ZACCURACY | Accuracy of the location data |
Media
The primary keys stored in the ZMESSAGE table enable individual media to be mapped to specific messages:
ZMESSAGE.ZAUDIO | ZMESSAGE.ZIMAGE | ZMESSAGE.ZVIDEO | ZMESSAGE.Z_PK | |
|---|---|---|---|---|
| Mapping: | ZAUDIODATA.Z_PK | ZIMAGEDATA.Z_PK | ZVIDEODATA.Z_PK | ZFILEDATA.ZMESSAGE |
| Medium / Reference: | ZAUDIODATA.ZDATA | ZIMAGEDATA.ZDATA | ZVIDEODATA.ZDATA | ZFILEDATA.ZDATA |
The media could be found as blobs in the ZDATA column. However, a distinction is made here:
First byte is 01: The rest of this blob contains the actual media file in binary form. The following image shows the typical JPEG header (FF D8 FF E0):

First byte is 02: The ASCII string between the leading 02 and the trailing 00 byte contains a UUID as a reference to the media file (in the image: F61847CA-3769-457E-A345-BB1A09C651A1):

A file with this name can be found under .ThreemaData_SUPPORT/_EXTERNAL_DATA/. All files in this directory have no file extension, regardless of media type:

SQL Query
Now, the insights gained just need to be translated into processing logic that is as universal as possible. The following SQL query serves as the data basis for iLEAPP:
SELECT
m.Z_PK AS MID,
m.ZDATE as COCOA,
conv.Z_PK as C_ID,
m.ZISOWN as OUT,
CASE
WHEN conv.ZGROUPNAME IS NOT NULL
THEN conv.ZGROUPNAME
WHEN cont.ZFIRSTNAME IS NOT NULL AND cont.ZLASTNAME IS NULL
THEN cont.ZFIRSTNAME
WHEN cont.ZFIRSTNAME IS NOT NULL AND cont.ZLASTNAME IS NOT NULL
THEN cont.ZFIRSTNAME || ' ' || cont.ZLASTNAME
WHEN cont.ZFIRSTNAME IS NULL AND cont.ZLASTNAME IS NOT NULL
THEN cont.ZLASTNAME
ELSE cont.ZPUBLICNICKNAME
END AS CHAT,
CASE
WHEN m.ZISOWN = 1 THEN
'local user'
WHEN m.ZSENDER IS NOT NULL THEN
CASE
WHEN sd.ZFIRSTNAME IS NOT NULL AND sd.ZLASTNAME IS NULL
THEN sd.ZFIRSTNAME
WHEN sd.ZFIRSTNAME IS NOT NULL AND sd.ZLASTNAME IS NOT NULL
THEN sd.ZFIRSTNAME || ' ' || sd.ZLASTNAME
WHEN sd.ZFIRSTNAME IS NULL AND sd.ZLASTNAME IS NOT NULL
THEN sd.ZLASTNAME
ELSE sd.ZPUBLICNICKNAME
END
ELSE
CASE
WHEN conv.ZGROUPNAME IS NOT NULL
THEN conv.ZGROUPNAME
WHEN cont.ZFIRSTNAME IS NOT NULL AND cont.ZLASTNAME IS NULL
THEN cont.ZFIRSTNAME
WHEN cont.ZFIRSTNAME IS NOT NULL AND cont.ZLASTNAME IS NOT NULL
THEN cont.ZFIRSTNAME || ' ' || cont.ZLASTNAME
WHEN cont.ZFIRSTNAME IS NULL AND cont.ZLASTNAME IS NOT NULL
THEN cont.ZLASTNAME
ELSE cont.ZPUBLICNICKNAME
END
END AS SENDER,
CASE
WHEN m.ZISOWN = 0 THEN
'local user'
ELSE
CASE
WHEN conv.ZGROUPNAME IS NOT NULL
THEN conv.ZGROUPNAME
WHEN cont.ZFIRSTNAME IS NOT NULL AND cont.ZLASTNAME IS NULL
THEN cont.ZFIRSTNAME
WHEN cont.ZFIRSTNAME IS NOT NULL AND cont.ZLASTNAME IS NOT NULL
THEN cont.ZFIRSTNAME || ' ' || cont.ZLASTNAME
WHEN cont.ZFIRSTNAME IS NULL AND cont.ZLASTNAME IS NOT NULL
THEN cont.ZLASTNAME
ELSE cont.ZPUBLICNICKNAME
END
END AS RECEIVER,
m.ZREAD as "READ",
p.Z_NAME as "MTYPE",
CASE
WHEN m.ZTEXT IS NOT NULL THEN m.ZTEXT
WHEN m.ZCAPTION IS NOT NULL THEN m.ZCAPTION
ELSE NULL
END AS MESSAGE,
CASE
WHEN ad.ZDATA IS NOT NULL THEN ad.ZDATA
WHEN vd.ZDATA IS NOT NULL THEN vd.ZDATA
WHEN id.ZDATA IS NOT NULL THEN id.ZDATA
WHEN fd.ZDATA IS NOT NULL THEN fd.ZDATA
ELSE NULL
END AS MEDIA,
CASE
WHEN tn.ZDATA IS NOT NULL THEN tn.ZDATA
ELSE NULL
END AS THUMBNAIL,
m.ZFILENAME AS FILENAME,
m.ZMIMETYPE AS MIMETYPE,
m.ZLATITUDE AS LATITUDE,
m.ZLONGITUDE AS LONGITUDE,
m.ZACCURACY AS ACCURACY
FROM
ZMESSAGE m
JOIN
Z_PRIMARYKEY p
ON m.Z_ENT = p.Z_ENT
LEFT JOIN ZAUDIODATA ad ON ad.Z_PK = m.ZAUDIO
LEFT JOIN ZVIDEODATA vd ON vd.Z_PK = m.ZVIDEO
LEFT JOIN ZIMAGEDATA id ON id.Z_PK = m.ZIMAGE
LEFT JOIN ZFILEDATA fd ON fd.ZMESSAGE = m.Z_PK
LEFT JOIN ZIMAGEDATA tn ON tn.Z_PK = m.ZTHUMBNAIL
LEFT JOIN ZCONTACT sd ON sd.Z_PK = m.ZSENDER
LEFT JOIN
ZCONVERSATION conv
ON m.ZCONVERSATION = conv.Z_PK
LEFT JOIN
ZCONTACT cont
ON conv.ZCONTACT = cont.Z_PK;
Output
As announced at the beginning, I created an iLEAPP parser (Threema.py) and was able to reliably prepare the chats in the expected form for the test data at hand. This parser already takes into account the upcoming release of LAVA and already contains the necessary parameters for display as a “Conversation”:


UFADE Adjustments
At the beginning, I explained that the Threema database (but without WAL and SHM) is also included in an iTunes backup. This makes it much easier to generate test data, but it also revealed a previously unknown (and rare) problem with the decryption logic used by UFADE in the iOSbackup module.
Bruno provided me with Threema data from FFS and PRFS backups. It was striking that the PRFS Threema databases were declared “malformed,” while the FFS databases opened without any problems. Comparisons of two databases from the same device showed that the PRFS data was too small here – the last approx. 100kb of the database were missing.
When reviewing the iOSbackup code, I identified a truncate() function as the “culprit,” which trims each output file to the specified size from manifest.db. If this function is commented out, the PRFS backup created afterwards also contains a readable Threema database. Since the missing truncate can cause problems with other data types (e.g., plist), a workaround has been integrated into UFADE that only prevents truncate for SQLite databases after checking the file size.
