matryoshka ctf problem from Texsaw2023


Matryoshka was a ctf problem for the beginner/intermediate oriented ctf Texsaw2023 CTF Time. I casually competed in this with a few other officers of the UTD Computer Security Group under the team name Charliott (Charles and my name, Elliot, combined. We were also joined by Takémaru and David). We ended up placing quite well considering most of us on the team were planning to just do some homework that day.


Walkthrough

For this problem we are given a few things to start.

  • doll1.zip
  • Important to note in the problem description:
    • need a password to “open” each doll
    • need “some kind of steganography tools”

Doll 1

Lets see whats in doll1.zip

> unzip doll1.zip && cd doll1
Archive:  doll1.zip
  inflating: doll1/doll1.jpg

> ls
doll1.jpg

We are given a single doll image, shown below.

We know at this point we need some password, and a stego tool to extract info from the image.

My first thought was to use the most overpowered ctf tool known… strings.

> strings doll1.jpg
# a bunch of misc output, but at the very bottom...
password:sneakythief

I knew strings my beloved wouldn’t fail me.

  • If that had not worked, I would manually scan through the strings output, or move on to trying other tools like like maybe binwalk.

Now that we have the password, we can go to use the (only stego tool I know for extracting hidden data with a password) steghide.

> steghide extract -sf doll1.jpg -p sneakythief
wrote extracted data to "linktodoll2.txt".

> cat linktodoll2.txt
https://drive.google.com/file/d/1DWTzhosMfQtrsOLqgPecmRYfmX8cOkIB/view?usp=share_link

The extraction worked! There was a hidden text file with a link to a google drive file. Lets download the contents of this link and continue on to doll2.

Doll 2

For doll 2 we are given a directory with the following contents…

> tree doll2/
doll2.jpg
part1
└── md5hash1.txt
part2
└── md5hash2.txt

As expected, we are given another doll image.

And in the contents of the other two folders…

> cat part*/md5*
dd679302de4ce83d961f95a1facca536
5400711cd704e87ed3fd11556cc174ae

In the part1 and part2 folder, we are given two MD5 hashes (guessing by the name of the files md5hash1.txt and md5hash2.txt. would be funny if that was lie t_t).

The doll image shows two salt shakers with uwu and owo on them, and next to the shakers, two hash browns. I think it’s a good guess that these are salts for the hashes, so I created a file to write both hashes down and appended the salts to them.

> cat combined.hash
dd679302de4ce83d961f95a1facca536:uwu  # md5hash1.txt
5400711cd704e87ed3fd11556cc174ae:owo  # md5hash2.txt

At the time of first solving this problem, I was quite fortunate to have access to a gpu.

We can go ahead and attempt to crack these hashes with hashcat.

> prime-run hashcat \    # prime-run to use gpu
        -O -w3 \         # use optimized kernel and "high" performance profile
        -m 20  \         # hash mode 20 : md5($salt.$pass)
        -a 0   \         # attack mode 0 : "straight" attack, use wordlist
        combined.hash \  # target hashes
        /usr/share/wordlists/seclists/Passwords/ # password wordlists

# note: copy pasting this wont work because of comments

Looking at the terminal output after hashcat is done, I can see it only recovered one of the two hashes.

# manually removed unimportant filler from this output
Session..........: hashcat
Status...........: Exhausted
Hash.Mode........: 20 (md5($salt.$pass))
Hash.Target......: combined.hash
###
Recovered........: 1/2 (50.00%) Digests (total), 1/2 (50.00%) Digests (new), 1/2 (50.00%) Salts
###

This took me a bit of thinking to understand why, but after looking at the doll image again, I realized the placement of the salt was different for the two hash browns. Let’s rerun the same hashcat command but this time using hash mode 10, which is the same 20, but appends the salt to the hash instead of prepending it.

# manually removed unimportant filler from this output
Session..........: hashcat
Status...........: Cracked
Hash.Mode........: 10 (md5($pass.$salt))
Hash.Target......: combined.hash
###
Recovered........: 2/2 (100.00%) Digests (total), 1/2 (50.00%) Digests (new), 2/2 (100.00%) Salts
###

hashcat terminated with status Cracked! You may be wondering why it shows recovered 2/2 even though we switched the hash mode. This is because of hashcat’s potfiles. This is where hashcat caches all of your cracked hashes so you don’t crack the same hash multiple times, wasting processing. Let’s cat my potfile.

> cat ~/.local/share/hashcat/hashcat.potfile
dd679302de4ce83d961f95a1facca536:uwu:pizza
5400711cd704e87ed3fd11556cc174ae:owo:pasta

Looks like hash1 is pizza and hash2 is pasta. Let’s concatenate these and use that as the password to doll2.jpg.

> steghide extract -sf doll2.jpg -p pizzapasta
wrote extracted data to "linktodoll3.txt".

Yay! Success on doll 2.

> cat linktodoll3.txt
https://drive.google.com/file/d/1xv7ImiZmUgokSdTopiXrtWVeDkCkl3Im/view?usp=share_link

Another drive link to follow.

Doll 3

Let’s take a look at whats inside doll3

> tree doll3/
doll3.jpg
hint1
└── convertthenshift.txt
hint2
└── weirdaudio.wav
hint3
└── noize.js

And the image of doll3…

It seems that for this doll we have to crack three different hints, and use the combination of those to find the next doll.

Let’s take a look at hint1.

Hint 1

> cat hint1/convertthenshift.txt
01100101 01110011 01110000 00100000 01111001 01101100 01111000 01110000 00100000 01111010 01110001 00100000 01100101 01110011 01110000 00100000 01111010 01100011 01110010 01101100 01111001 01110100 01101011 01101100 01100101 01110100 01111010 01111001

Hint 1 has a long binary sequence, first step I think we should drop this in Cyber Chef.

Adding the From Binary operation to our recipe in Cyber Chef, with our input, gives us the following text output.
esp ylxp zq esp zcrlytkletzy

This looks quite “english-like”, and considering the name of the text document is convertthenshift, I think it makes sense we try to shift for meaningful output. So adding the ROT operation to this recipe, and messing with the offset a bit…

We get the plaintext the name of the organization

We can start filling in the blanks of the sentence given to us from the image.

What is [the name of the organization] that [hint 2] to [hint 3]?

Hint 2

For hint2, there is a wav file called weirdaudio.wav

The audio itself does not hold any meaning to my ears when listening.

  • When I first did this problem, I started with attempting to dissect the file. This did not lead to anything interesting, so after a bit, I searched wav frequency analyzer online. One of the top results was on dcode.fr; dcode is great, so I used that.

Using dcode.fr’s spectral analysis tool, we can see some text in the spectrogram.

Hint2 is issued the web certificate.

So now,
What is [the name of the organization] that [issued the web certificate] to [hint 3]?

Hint 3

Now onto the last hint. We have been given a javascript source file. It’s quite long so I am not going to copy all of it here. I can tell that it’s likely obfuscated though; var names, for example, have been replaced with hex identifiers.

I have little to no js experience even though the syntax seems quite easy, so the first thing I did was search “Javascript deobfuscator”. After going through a few websites, I found this absolutely amazing site deobfuscate.relative.im.

It simplified 61 lines of bumbled garbage into this…

function printHint() {
  let _0x10d288 = 'Hint 3 is: https://defcon.org/'
  console.log(
    "Let's see if you can piece together the string variable containing hint 3"
  )
}
printHint()

Now we know the combined hints are…
What is [the name of the organization] that [issued the web certificate] to [https://defcon.org/]?

Simple enough, checking the cert in firefox we can see…

The organization is Hellenic Academic and Research Institutions CA

Referencing the original doll image, it mentions to use the first 43 characters including spaces as the password. This comes out to…
Hellenic Academic and Research Institutions

So lets try that as the password!

> steghide extract -sf doll3.jpg -p "Hellenic Academic and Research Institutions"
wrote extracted data to "linktodoll4.txt".
> cat linktodoll4.txt
https://drive.google.com/file/d/1Pj70CF0Yhjm05rHi0qVgiYyJ0CPKCYfI/view?usp=share_link

Another drive folder containing the next doll, onto Doll4…

Doll 4

Lets take a look at whats inside,

> ls doll4/
doll4.jpg  guardian

We got another doll image of course…

Now lets check this other file.

> file guardian
guardian: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=58fe99bb338995b496be0f71f1c5c305e25c2066, not stripped

A not-stripped ELF binary, cool! Seems likes this will be a reversing problem. Lets run it to see what happens. (this is a ctf ran by a university so I’m not too worried about this being malicious lol)

> chmod +x guardian && ./guardian
Here is the password:
0u1u111t1u1u0t05020

Uh oh, it looks like the password got severely damaged somehow. I am afraid there is nothing I can do to fix it.

Alright this doesn’t make too much sense so lets pop this in cutter. I am using the ghidra plugin for cutter here, so we can see decompilation.

Using cutter we can go to main in the functions tab, let’s take a look at the decompiled contents…

void main(int argc, char **argv)
{
    char **var_38h;
    int var_2ch;
    char *s;
    int64_t var_20h;
    int64_t var_18h;
    
    setbuf(_stdout, 0);
    setbuf(_stdin, 0);
    setbuf(_stderr, 0);
    s = (char *)0x0;
    var_20h = 0;
    var_18h._0_4_ = 0;
    genesis((int64_t)&s);
    puts("Here is the password:");
    exitium((int64_t)&s);
    puts(&s);
    sleep(2);
    puts(
        "\nUh oh, it looks like the password got severely damaged somehow. I am afraid there is nothing I can do to fix it."
        );
    return;
}

Pretty standard main from what I can tell. I see that we are passing a char array s around to two user defined functions genesis and exitium. It looks like exitium is what prints out s as it is right after the Here is the password: prompt. I think for this step we should pop this in gdb and see if we can see anything interesting. (Note: I’m using gdb+gef here.)

> gdb guardian
GNU gdb (GDB) 13.1
# hiding all the liscense junk

Let’s put a breakpoint at genesis, since this is where s is passed first. Again, this binary isn’t stripped, so it’s super easy to go where where want.

gef> b genesis
Breakpoint 1 at 0x400681

Breakpoint set, now let’s run it and print out some context (doing context so we don’t have to explode this whole page with irrelevant stuff).

gef> r
gef> context code threads
──────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x400678 <frame_dummy+40> jmp    0x4005f0 <register_tm_clones>
     0x40067d <genesis+0>      push   rbp
     0x40067e <genesis+1>      mov    rbp, rsp
 →   0x400681 <genesis+4>      mov    QWORD PTR [rbp-0x18], rdi
     0x400685 <genesis+8>      mov    DWORD PTR [rbp-0x4], 0x0
     0x40068c <genesis+15>     jmp    0x4006f4 <genesis+119>
     0x40068e <genesis+17>     mov    eax, DWORD PTR [rbp-0x4]
     0x400691 <genesis+20>     movsxd rdx, eax
     0x400694 <genesis+23>     mov    rax, QWORD PTR [rbp-0x18]
──────────────────────────────────────────────────────────────────────────────────────────── threads ────
[0] Id 1, Name: "guardian", stopped 0x400681 in genesis (), reason: BREAKPOINT
─────────────────────────────────────────────────────────────────────────────────────────────────────────

So we are now in the start of genesis, my guess is that this function fills the content of the char array s, so lets get to the end of this function and see what the stack looks like.

gef> finish
gef> context stack
─────────────────────────────────────────────────────────────────────────────────────── stack ────
0x007fffffffe6d0│+0x0000: 0x007fffffffe818  →  0x007fffffffeae8  →  "/home/morgan/Downloads/doll4/guardian"$rsp
0x007fffffffe6d8│+0x0008: 0x0000000100000000
0x007fffffffe6e0│+0x0010: "queenofthenest85321"$rdi
0x007fffffffe6e8│+0x0018: "henest85321"
0x007fffffffe6f0│+0x0020: 0x00000000313233 ("321"?)
0x007fffffffe6f8│+0x0028: 0x007ffff7ffdab0  →  0x007ffff7fca000  →  0x03010102464c457f
0x007fffffffe700│+0x0030: 0x0000000000000001	 ← $rbp
0x007fffffffe708│+0x0038: 0x007ffff7dbf790  →   mov edi, eax
──────────────────────────────────────────────────────────────────────────────────────────────────

Taking a look at the stack we can see an interesting string queenofthenest85321. This looks like it will be the password, but lets think about why first to confirm.

Whats important to notice here is that is that $rdi is pointing to this string’s position on the stack. When the address of s was passed to genesis, a pointer to s’s location on the stack was loaded into $rdi. Now that we are right outside of the function, $rdi has not been overwritten, and still contains the pointer to s.

So assuming that the exitium function breaks this string during output, we can at least try this as the password before doing anymore digging.

> steghide extract -sf doll4.jpg -p "queenofthenest85321"
wrote extracted data to "linktodoll5.txt".
> cat linktodoll5.txt
https://drive.google.com/file/d/1ijPF7E1diKyVTYX5nT7wcIO_DBVUCyp9/view?usp=share_link

Great! Looks like that was the password!

Let’s follow the drive link to doll 5.

Doll 5

Opening this drive link we get one image. It’s of the smallest baby doll yet, and the flag!!!

  • I used tesseract to extract the text instead of typing it all out, it got a couple characters wrong but after fixing those I submitted this as the flag and got the points!

This was a great challenge for a beginners ctf. It went over so many different categories and put them all into one challenge. At first I thought this chal was 400 points (highest in the ctf) due to difficulty, but I think it was partially because of the pure time you have to spend solving this one. Applause to the writer of Matryoshka and Texsaw for a great ctf :D

Writeups, talks, etc