back to homepage

PHDays AICTF 2025 Writeup

I got 8th place in PHDays AICTF 2025. This is a writeup for it.

Floor Check

We are given a single parquet file. Simply observe the table, the rows and columns visually form characters corresponding to the flag.

D .mode box
D select * from read_parquet('miccheck_a8749ce.parquet');
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │
├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
│ - │ - │ - │ - │ - │ 8 │ - │ - │ - │ - │
│ - │ - │ - │ - │ - │ 8 │ - │ - │ - │ - │
│ - │ - │ - │ - │ 8 │ - │ 8 │ - │ - │ - │
│ - │ - │ - │ - │ 8 │ - │ 8 │ - │ - │ - │
│ - │ - │ - │ 8 │ - │ - │ - │ 8 │ - │ - │
│ - │ - │ - │ 8 │ - │ - │ - │ 8 │ - │ - │
│ - │ - │ 8 │ 8 │ 8 │ 8 │ 8 │ 8 │ 8 │ - │
│ - │ - │ 8 │ - │ - │ - │ - │ - │ 8 │ - │
│ - │ 8 │ - │ - │ - │ - │ - │ - │ - │ 8 │
│ - │ 8 │ - │ - │ - │ - │ - │ - │ - │ 8 │
│ - │ - │ - │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ - │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ - │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ - │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ - │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ 8 │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ 8 │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ 8 │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ 8 │ - │ - │ - │ - │ - │ - │ - │
│ - │ - │ 8 │ - │ - │ - │ - │ - │ - │ - │
... etc.

VoiceGuard

We are given a reference sample of someone's voice and asked to clone it to speak new text. The premise is security verification. I ran the IndexTTS model locally and passed this easily.

Athrad Edhellen

This was a frustrating one. You can see the final script in action on YouTube. You have to solve 100 captchas by decoding the 6 Tengwar number symbols shown. The code I wrote to automate this is shit, basically a bunch of heuristics and heuristics to override heuristics. The challenge included 6 fonts and for each I gathered examples of the 12 symbols. Then, I just "slide" the representative examples across the new captcha and look for matches (the image is thresholded to remove noise). Basic selenium code completely automates the process of solving captchas.

Ham Filter

We are allowed to poison a spam-detection model with 100 training samples. Our goal is to get the test score of the model below 60%. We are also given access to the entire training data. My idea was to take the spam messages, generate my own data based on them, and wrongly label these messages as no spam. Firstly, we can get the spam messages like so:

D .mode json
D select message from read_parquet('train_data_chocobo.parquet') where label=1;
[{"message":"-PLS STOP bootydelious (32/F) is inviting you to be her friend. Reply YES-434 or NO-434 See her: www.SMS.ac/u/bootydelious STOP? Send STOP FRND to 62468"},
{"message":"network operator. The service is free. For T & C's visit 80488.biz"},
{"message":"XMAS Prize draws! We are trying to contact U. Todays draw shows that you have won a å£2000 prize GUARANTEED. Call 09058094565 from land line. Valid 12hrs only"},
... etc.

Then we can paste this JSON into the Python script. The script takes the messages and samples new words according to the frequency of the words present. I initially set the length l to between 20 and 30, but I was only getting a drop from 87% to 83%. I noticed that although only 100 rows were allowed, 1MB of data was allowed. So I extended the length by a factor of 10 which managed to drop the test score sufficiently to show the flag.

import random
from collections import Counter

import duckdb

data = [...]

acc = []
for x in data:
    acc += x['message'].split()
c = Counter(acc)
words = list(c.keys())
ws = list(c.values())

con = duckdb.connect()
con.execute('CREATE TABLE foo (label BIGINT, message VARCHAR);')
for _ in range(100):
    l = random.randint(200, 300)
    con.execute(
        'INSERT INTO foo (label, message) VALUES (0, ?);',
        [' '.join(random.choices(words, weights=ws, k=l))],
    )
print(con.execute('SELECT * FROM foo;').fetchall())
con.execute("COPY foo TO 'my_custom_data.parquet' (FORMAT 'parquet');")

Rate My Car

We upload an image onto the website and the judge (a multimodal LLM) returns a score between 0 and 100 based on the car present in the image. Our goal is to get a score of 1337. I used the following low resolution image to break the LLM, which given previous challenges, I suspected to be GPT-4o.

aictf gpt4o visual jailbreak

Flag Leak

We are given a webpage with 1000 YouTube videos. We are told one of them contains footage of a person submitting the flag. I noticed that a lot of the videos were popular, that is had a lot of views. Hence, I scraped the view counts of every video like so:

import os

for i, line in enumerate(
    """
[... 1000 youtube video urls here]
""".strip().splitlines()
):
    os.system(f'yt-dlp --print "%(upload_date)s.%(view_count)s" {line} >{i}; echo "{line}" >>{i}')

Then I sorted the results by views, and wouldn't you know it the first video (9 views) was the correct one. It showed some Russian guy developing a ping pong app with ChatGPT and at the end, he submits the flag.

9 20250519 https://www.youtube.com/watch?v=7TE0p3kx5p0
110 20240809 https://www.youtube.com/watch?v=YK5Db_HOFC0
121 20250305 https://www.youtube.com/watch?v=fWu18blUXDo
191 20191019 https://www.youtube.com/watch?v=XCviY4qNQ_Q
1015 20221212 https://www.youtube.com/watch?v=OGmvoj5Dj_A
... etc.

Brauni

We are given an executable and need it to output "valid." IDA gives this code:

int __fastcall main(int argc, const char **argv, const char **envp)
{
  int v4; // [rsp+10h] [rbp-90h] BYREF
  unsigned int v5; // [rsp+14h] [rbp-8Ch]
  signed int i; // [rsp+18h] [rbp-88h]
  int DeviceCount; // [rsp+1Ch] [rbp-84h]
  unsigned int *v8; // [rsp+20h] [rbp-80h] BYREF
  __int64 v9; // [rsp+28h] [rbp-78h] BYREF
  unsigned int v10; // [rsp+30h] [rbp-70h]
  __int64 v11; // [rsp+34h] [rbp-6Ch] BYREF
  unsigned int v12; // [rsp+3Ch] [rbp-64h]
  _DWORD v13[18]; // [rsp+40h] [rbp-60h] BYREF
  unsigned __int64 v14; // [rsp+88h] [rbp-18h]

  v14 = __readfsqword(0x28u);
  if ( argc == 2 )
  {
    DeviceCount = cudaGetDeviceCount(&v4, argv, envp);
    if ( !DeviceCount && v4 )
    {
      v5 = 16;
      if ( strlen(argv[1]) < 0x10 )
        v5 = strlen(argv[1]);
      for ( i = 0; i < (int)v5; ++i )
        v13[i] = argv[1][i];
      cudaMalloc<unsigned int>(&v8, 4LL * (int)v5);
      cudaMemcpy(v8, v13, 4LL * (int)v5, 1LL);
      dim3::dim3((dim3 *)&v11, v5, 1, 1);
      dim3::dim3((dim3 *)&v9, 1, 1, 1);
      if ( !(unsigned int)_cudaPushCallConfiguration(v9, v10, v11, v12, 0LL, 0LL) )
        kernel(v8);
      cudaMemcpy(v13, v8, 4LL * (int)v5, 2LL);
      if ( v13[0] == 1 )
        puts("Valid");
      else
        puts("Fail");
      cudaFree(v8);
      return 0;
    }
    else
    {
      puts("Fail: No CUDA-capable GPU found.");
      return 1;
    }
  }
  else
  {
    puts("Usage: program <key>");
    return 0;
  }
}

So clearly all the important logic is in the kernel. We can output the PTX code.

To put it shortly, since I forget where I put the code, you can use Z3 to solve for the input and hence the flag.

Conclusion

I definitely could've completed a few more tasks, but I was tired. For one of them, you had to exploit a torch.load() vuln (pickle RCE) but I kept getting an internal server error when trying to download the files to exfiltrate the flag. Another involved a SQL injection in the semantic search feature to leak all user notes; but, the notes are encrypted client-side. So, I tried using vec2text to invert the embedding and it led to very promising results—unfortunately, not good enough becuase the IP address was garbage (I don't have an OpenAI key to improve with num_steps).