RC4 Encryption: From Cryptographic Theory to Practical Implementation

RC4: From Basic XOR to Stream Cryptography

In previous posts, we explored XOR, our first ally in payload obfuscation. It’s simple, fast, and does the job. But here’s the problem:

XOR takes a key and repeats it. If you see the same payload multiple times, patterns are obvious. AVs from 15 years ago could already detect it.

Enter RC4: the bigger brother of XOR. During the 90s and 2000s, RC4 was the de facto standard in browsers, TLS, and enterprise systems. Today it’s deprecated by NIST, but for our purposes: “Evading static signatures” remains exponentially more effective than XOR.

Why? Because RC4 doesn’t repeat the key. It generates a different pseudorandom sequence every time, making the same payload encrypted unrecognizable.

In this tutorial, you’ll learn three ways to implement RC4: from pure Rust implementation (maximum control) to native Windows APIs. Choose your path.

What is RC4? Understanding the Algorithm

Did you know…?
RC4 was designed in 1987 by Ron Rivest as a simple and fast stream cipher. For years it was the de facto standard in browsers and protocols like TLS/SSL. However, in 2015 it was officially banned from TLS 1.3 due to discovered vulnerabilities. Today it’s considered cryptographically weak… but for payload obfuscation it remains highly effective.

The Concept: A Pseudorandom Sequence Generator

Here’s the twist: XOR repeats the key. RC4 is completely different.

RC4 generates a sequence of bytes that APPEARS random, but is reproducible if you know the key. Then, it XORs your payload with that sequence.

Let’s visualize it:

Your key: "malghost123"
                ↓
    RC4 KSA (Key Scheduling)
         [mixes the key]
                ↓
Generated sequence: 0x7E, 0xC2, 0x5A, 0x91, 0x23, 0xD4...
                ↓
Your payload:       0x90, 0xCC, 0x44, 0x2E, 0xAA, 0x3F...
                ↓
XOR result:         0xEE, 0x2E, 0x1E, 0xBD, 0x89, 0xEB...

Here’s the magic: The sequence 0x7E, 0xC2, 0x5A… does NOT repeat. Every byte is different, generated dynamically. So if you encrypt the same payload twice with the same key, you get the SAME output (reproducible). But from the outside, it looks completely random.

With XOR: Same key = same repeated bytes = detectable patterns.

With RC4: Same key = unique sequence that looks random = nearly impossible to detect.

The Honest Truth: Real Limitations

RC4 isn’t magic, it has limitations you must understand:

⚠️ RC4 is deprecated - NIST banned it from TLS 1.3 due to serious cryptographic vulnerabilities.
⚠️ The payload is exposed in memory - Eventually, when you execute it, it will appear unencrypted.
⚠️ EDR + Behavioral Analysis - Modern EDRs detect malicious behavior, not just signatures.

But here’s the key: RC4 EVADES static signature-based detections. It’s excellent for bypassing the first step (file analysis).

To maximize its effectiveness, combine RC4 with:

🔑 Dynamically generated keys
🛡️ Anti-debugging and anti-VM
👻 Behavior that appears legitimate
🔍 Less obvious APIs for injection

RC4 Architecture: Two Critical Components

RC4 is simple in concept but effective in practice. It works in two phases:

KSA (Key Scheduling Algorithm) - Initial setup
PRGA (Pseudo-Random Generation Algorithm) - Byte generation

Let’s explore each:

Phase 1: KSA (Key Scheduling Algorithm)

Think of it this way: you have an array of 256 bytes [0, 1, 2, 3, ..., 255]. The goal of KSA is to completely shuffle this array based on your key, so the output appears random.

Why? Because the next phase (PRGA) will use this “shuffled” array to generate sequence bytes. Without this shuffle, encryption would be predictable.

fn rc4_init(context: &mut Rc4Context, key: &[u8]) {
    // Initialize S-box with values 0-255
    for i in 0..256 {
        context.s[i] = i as u8;
    }

    // Permute S using the key
    let mut j: u8 = 0;
    for i in 0..256 {
        j = j.wrapping_add(context.s[i]).wrapping_add(key[i % key.len()]);

        // Swap values
        context.s.swap(i, j as usize);
    }

    context.i = 0;
    context.j = 0;
}

What happens: The original S-box [0, 1, 2, 3, ..., 255] transforms into something like [0xAB, 0x3D, 0xF2, ..., 0x5C] based on your key.

Phase 2: PRGA (Pseudo-Random Generation Algorithm)

This is where the magic happens. We use the “shuffled” S-box from KSA to generate a stream of bytes that appear random but are reproducible.

In each iteration:

We move two indices (i, j) within the S-box
We swap values in the S-box
We extract a byte from the sequence
We XOR that byte with your payload

The result is a unique sequence that appears random but is identical if you repeat the process with the same key.

fn rc4_cipher(context: &mut Rc4Context, input: &[u8], output: &mut [u8]) {
    assert_eq!(input.len(), output.len());

    for (in_byte, out_byte) in input.iter().zip(output.iter_mut()) {
        // Generate the next pseudorandom index
        context.i = context.i.wrapping_add(1);
        context.j = context.j.wrapping_add(context.s[context.i as usize]);

        // Swap values in the S-box
        context.s.swap(context.i as usize, context.j as usize);

        // Generate sequence byte and XOR
        let k_index = (context.s[context.i as usize] as u16 + context.s[context.j as usize] as u16) % 256;
        *out_byte = in_byte ^ context.s[k_index as usize];
    }
}

What happens: For each byte of the payload, the S-box is modified pseudorandomly and an “unpredictable” byte is generated that XORs with your data.

Method 1: Pure Rust Implementation

Why choose this method?

🎓 It’s the most educational, you understand how RC4 really works
🔧 Complete control over every step
🖥️ Works on any platform (Windows, Linux, macOS)
🛡️ Memory safety guaranteed by Rust (no buffer overflows)
⚠️ It’s the most “obvious” for static analysis (signatures recognize this implementation)

This is the most straightforward way: implement RC4 from scratch in pure Rust.

Structure and Base Modules

#[derive(Clone)]
struct Rc4Context {
    i: u8,
    j: u8,
    s: [u8; 256],
}

impl Rc4Context {
    fn new() -> Self {
        Rc4Context {
            i: 0,
            j: 0,
            s: [0; 256],
        }
    }
}

Context Initialization (KSA)

/// Initializes the RC4 context with the provided key
///
/// Parameters:
///  - context: Mutable RC4 context
///  - key: The encryption key
fn rc4_init(context: &mut Rc4Context, key: &[u8]) {
    // Initialize S-box: [0, 1, 2, ..., 255]
    for i in 0..256 {
        context.s[i] = i as u8;
    }

    // KSA permutation: shuffles S-box using the key
    let mut j: u8 = 0;
    for i in 0..256 {
        j = j.wrapping_add(context.s[i]).wrapping_add(key[i % key.len()]);
        context.s.swap(i, j as usize);
    }

    context.i = 0;
    context.j = 0;
}

Encryption/Decryption (PRGA)

/// Encrypts or decrypts data using RC4
///
/// Parameters:
///  - context: Already initialized RC4 context
///  - input: Input data (plaintext or ciphertext)
///  - output: Mutable output buffer
fn rc4_cipher(context: &mut Rc4Context, input: &[u8], output: &mut [u8]) {
    assert_eq!(input.len(), output.len(), "Input and output must have the same size");

    for (in_byte, out_byte) in input.iter().zip(output.iter_mut()) {
        // Generate the next pseudorandom index
        context.i = context.i.wrapping_add(1);
        context.j = context.j.wrapping_add(context.s[context.i as usize]);

        // Swap elements in the S-box
        context.s.swap(context.i as usize, context.j as usize);

        // Generate sequence byte and XOR
        let k_index = (context.s[context.i as usize] as u16 + context.s[context.j as usize] as u16) % 256;
        *out_byte = in_byte ^ context.s[k_index as usize];
    }
}

Complete Program: Interactive Demonstration

Here we use the Rc4Context, rc4_init and rc4_cipher functions defined in the previous sections. The main() demonstrates the complete flow:

use std::io::{self};

// Use the definitions of Rc4Context, rc4_init and rc4_cipher from the previous sections

fn main() {
    // Example payload (binary data simulating machine code)
    let payload = vec![
        0x55, 0x89, 0xE5, 0x83, 0xEC, 0x10, 0xC7, 0x45, 0xF8, 0x00, 0x00, 0x00, 0x00,
        0x8B, 0x45, 0xF8, 0x83, 0xC0, 0x01, 0x89, 0x45, 0xF8, 0x8B, 0x45, 0xF8
    ];

    let rc4_key = b"malghost123";
    let mut encrypted = vec![0u8; payload.len()];
    let mut decrypted = vec![0u8; payload.len()];

    println!("[*] RC4 Encryption/Decryption Demonstration");
    println!("[*] Payload size: {} bytes\n", payload.len());

    // Show original
    print!("[+] ORIGINAL PAYLOAD:\n    ");
    for byte in &payload {
        print!("{:02x} ", byte);
    }
    println!("\n\nPress ENTER to continue...");
    let mut input = String::new();
    io::stdin().read_line(&mut input).unwrap();

    // STEP 1: Encryption
    println!("\n[*] Encrypting with RC4...");

    let mut ctx1 = Rc4Context::new();
    rc4_init(&mut ctx1, rc4_key);
    rc4_cipher(&mut ctx1, &payload, &mut encrypted);

    print!("[+] ENCRYPTED PAYLOAD:\n    ");
    for byte in &encrypted {
        print!("{:02x} ", byte);
    }
    println!("\n\nPress ENTER to continue...");
    let mut input = String::new();
    io::stdin().read_line(&mut input).unwrap();

    // STEP 2: Decryption
    println!("\n[*] Decrypting with RC4...");

    let mut ctx2 = Rc4Context::new();
    rc4_init(&mut ctx2, rc4_key);
    rc4_cipher(&mut ctx2, &encrypted, &mut decrypted);

    print!("[+] DECRYPTED PAYLOAD:\n    ");
    for byte in &decrypted {
        print!("{:02x} ", byte);
    }
    println!();

    // Verification
    if payload == decrypted {
        println!("\n[✓] Verification SUCCESSFUL: Decrypted matches original");
    } else {
        println!("\n[!] Verification FAILED: Data corrupted");
    }

    println!("\n\nPress ENTER to exit...");
    let mut input = String::new();
    io::stdin().read_line(&mut input).unwrap();
}

Demo:

Method 2: SystemFunction032 (Native Windows API)

Did you know Windows already has RC4 built-in? There’s a little-known API called SystemFunction032 that implements RC4 natively.

Why choose this method?

🛡️ Uses official Windows cryptographic code
⚡ Faster native performance
👻 Fewer static signatures
⚠️ Undocumented API. Microsoft doesn’t guarantee future compatibility

The tradeoff: It’s an “undocumented” API, so it could change. But it’s been available since Windows XP and still works on Windows 11.

Cargo.toml

To use Windows APIs from Rust, you need the windows-rs crate:

[package]
name = "rc4_systemfunction032"
version = "0.1.0"
edition = "2024"

[dependencies]
windows = { version = "0.51", features = [
    "Win32_Foundation",
    "Win32_System_LibraryLoader",
] }

Defining SystemFunction032

SystemFunction032 function signature in C:

// Function declaration (exported from Advapi32.dll)
NTSTATUS WINAPI SystemFunction032(
    struct USTRING *Data,       // Pointer to structure with data to encrypt/decrypt
    const struct USTRING *Key   // Pointer to structure with RC4 key
);

Parameters:

Data: USTRING structure pointing to your data. Important: modified IN-PLACE (encryption overwrites original data)
Key: USTRING structure pointing to your RC4 key
Return: NTSTATUS (0 = success, negative values = error)

Using SystemFunction032

use windows::core::PCSTR;
use windows::Win32::System::LibraryLoader::{LoadLibraryA, GetProcAddress};

#[repr(C)]
struct USTRING {
    length: u32,
    maximum_length: u32,
    buffer: *mut u8,
}

type SystemFunction032Fn = unsafe extern "system" fn(
    *mut USTRING,
    *mut USTRING,
) -> i32;

fn rc4_encryption_via_systemfunc032(
    rc4_key: &[u8],
    payload_data: &mut [u8],
) -> bool {
    unsafe {
        let advapi32 = match LoadLibraryA(PCSTR(b"Advapi32.dll\0".as_ptr())) {
            Ok(lib) => lib,
            Err(_) => {
                println!("[!] Failed to load Advapi32");
                return false;
            }
        };

        if advapi32.is_invalid() {
            println!("[!] Invalid Advapi32 handle");
            return false;
        }

        let system_function_032 = GetProcAddress(
            advapi32,
            PCSTR(b"SystemFunction032\0".as_ptr()),
        );

        if system_function_032.is_none() {
            println!("[!] Failed to get SystemFunction032");
            return false;
        }

        let mut data = USTRING {
            length: payload_data.len() as u32,
            maximum_length: payload_data.len() as u32,
            buffer: payload_data.as_mut_ptr(),
        };

        let mut key = USTRING {
            length: rc4_key.len() as u32,
            maximum_length: rc4_key.len() as u32,
            buffer: rc4_key.as_ptr() as *mut u8,
        };

        let func: SystemFunction032Fn = std::mem::transmute(system_function_032.unwrap());
        let status = func(&mut data, &mut key);

        if status != 0 {
            println!("[!] SystemFunction032 failed: 0x{:08X}", status);
            return false;
        }

        println!("[+] RC4 encryption successful with SystemFunction032");
        true
    }
}

Complete Program: Demonstration

Here we use the rc4_encryption_via_systemfunc032 function defined in the previous section. The main() shows the complete flow:

use windows::core::PCSTR;
use windows::Win32::System::LibraryLoader::{LoadLibraryA, GetProcAddress};

// Use the definitions of USTRING, SystemFunction032Fn and rc4_encryption_via_systemfunc032 from the previous section

fn main() {
    let mut payload = vec![
        0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
        0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
        0xB8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
    ];

    let rc4_key = b"malghost123";

    println!("[*] RC4 encryption using SystemFunction032");
    println!("[+] Size: {} bytes\n", payload.len());

    // Show original
    print!("[*] ORIGINAL:\n    ");
    for byte in &payload {
        print!("{:02x} ", byte);
    }
    println!("\n");

    // Encrypt
    if !rc4_encryption_via_systemfunc032(rc4_key, &mut payload) {
        println!("[!] Encryption error");
        return;
    }

    print!("[+] ENCRYPTED:\n    ");
    for byte in &payload {
        print!("{:02x} ", byte);
    }
    println!("\n");

    // Decrypt (RC4 is bidirectional)
    if !rc4_encryption_via_systemfunc032(rc4_key, &mut payload) {
        println!("[!] Decryption error");
        return;
    }

    print!("[+] DECRYPTED:\n    ");
    for byte in &payload {
        print!("{:02x} ", byte);
    }
    println!();
}

Compilation

# Compile for Windows x86_64
cargo build --release

# Run
cargo run --release

Method 3: SystemFunction033 (Alternative Native)

If SystemFunction032 is patched or detected, there’s a twin sister: SystemFunction033. It implements the same RC4 as SystemFunction032.

Defining SystemFunction033

The signature of SystemFunction033 is identical to SystemFunction032:

// Function declaration (exported from Advapi32.dll)
NTSTATUS WINAPI SystemFunction033(
    struct USTRING *Data,       // Pointer to structure with data to encrypt/decrypt
    const struct USTRING *Key   // Pointer to structure with RC4 key
);

What’s the difference then?

Both functions have the same signature and use the same USTRING structure. The internal difference is in how Windows implements each one, but for the external user, they’re interchangeable.

SystemFunction033 Implementation

use windows::core::PCSTR;
use windows::Win32::System::LibraryLoader::{LoadLibraryA, GetProcAddress};

#[repr(C)]
struct USTRING {
    length: u32,
    maximum_length: u32,
    buffer: *mut u8,
}

type SystemFunction033Fn = unsafe extern "system" fn(
    *mut USTRING,
    *mut USTRING,
) -> i32;

/// Encrypts/decrypts data using SystemFunction033 (native Windows RC4)
///
/// Parameters:
///  - rc4_key: RC4 encryption key
///  - payload_data: Payload buffer (mutable)
fn rc4_encryption_via_systemfunc033(
    rc4_key: &[u8],
    payload_data: &mut [u8],
) -> bool {
    unsafe {
        // Load Advapi32.dll
        let advapi32 = match LoadLibraryA(PCSTR(b"Advapi32.dll\0".as_ptr())) {
            Ok(lib) => lib,
            Err(_) => {
                println!("[!] Failed to load Advapi32");
                return false;
            }
        };

        if advapi32.is_invalid() {
            println!("[!] Invalid Advapi32 handle");
            return false;
        }

        // Get address of SystemFunction033
        let system_function_033 = GetProcAddress(
            advapi32,
            PCSTR(b"SystemFunction033\0".as_ptr()),
        );

        if system_function_033.is_none() {
            println!("[!] Failed to get SystemFunction033");
            return false;
        }

        // Create USTRING structures (different order)
        let mut key = USTRING {
            length: rc4_key.len() as u32,
            maximum_length: rc4_key.len() as u32,
            buffer: rc4_key.as_ptr() as *mut u8,
        };

        let mut data = USTRING {
            length: payload_data.len() as u32,
            maximum_length: payload_data.len() as u32,
            buffer: payload_data.as_mut_ptr(),
        };

        // Call SystemFunction033
        let func: SystemFunction033Fn = std::mem::transmute(system_function_033.unwrap());
        let status = func(&mut data, &mut key);

        if status != 0 {
            println!("[!] SystemFunction033 failed: 0x{:08X}", status);
            return false;
        }

        println!("[+] RC4 encryption successful with SystemFunction033");
        true
    }
}

Security Best Practices

1. Never Store the Key in Plain Text

If you write the key directly in the code, any static analysis (strings, objdump) will find it.

❌ This is instant death:

let key = b"malghost123";  // strings binary would show "malghost123"

✅ Better alternatives:

// Option 1: Individual bytes (makes searching harder)
let key = [b'm', b'a', b'l', b'g', b'h', b'o', b's', b't', b'1', b'2', b'3'];

// Option 2: Generate dynamically at runtime with rand
use rand::Rng;

let mut rng = rand::thread_rng();
let mut key: [u8; 32] = [0; 32];
rng.fill(&mut key);  // Doesn't depend on hardcoded data

// Option 3: Encoded key + decryption
let encoded_key = [0x45u8, 0x67, 0x89];
let mut key = [0u8; 32];
for (i, &byte) in encoded_key.iter().enumerate() {
    key[i] = byte ^ 0xAA;  // Decrypts using simple XOR
}

Pro tip: Combine dynamic generation with environment data. For example, use the hash of the MAC address as part of the key. This way, the key is unique per machine and doesn’t appear hardcoded.

2. Use Long Keys

How long is “long enough”?

8 bytes = absolute minimum (weak)
16 bytes = recommended (good)
32 bytes = excellent (strong)

// 32-byte key (256 bits)
let rc4_key: [u8; 32] = [
    0x6d, 0x61, 0x6c, 0x64, 0x65, 0x76, 0x31, 0x32,
    0x33, 0x2d, 0x6c, 0x6f, 0x6e, 0x67, 0x2d, 0x6b,
    0x65, 0x79, 0x2d, 0x66, 0x6f, 0x72, 0x2d, 0x72,
    0x63, 0x34, 0x2d, 0x65, 0x6e, 0x63, 0x72, 0x79
];

3. Layered Encryption (Defense in Depth)

Apply RC4 twice with different keys. That way, if someone breaks one layer, the other still protects:

// Two-layer encryption
let mut ctx1 = Rc4Context::new();
rc4_init(&mut ctx1, &key1);
let mut encrypted1 = vec![0u8; payload.len()];
rc4_cipher(&mut ctx1, &payload, &mut encrypted1);

let mut ctx2 = Rc4Context::new();
rc4_init(&mut ctx2, &key2);
let mut encrypted2 = vec![0u8; encrypted1.len()];
rc4_cipher(&mut ctx2, &encrypted1, &mut encrypted2);

// Decryption (reverse order)
let mut ctx3 = Rc4Context::new();
rc4_init(&mut ctx3, &key2);
let mut decrypted1 = vec![0u8; encrypted2.len()];
rc4_cipher(&mut ctx3, &encrypted2, &mut decrypted1);

let mut ctx4 = Rc4Context::new();
rc4_init(&mut ctx4, &key1);
let mut decrypted2 = vec![0u8; decrypted1.len()];
rc4_cipher(&mut ctx4, &decrypted1, &mut decrypted2);

Limitations and Reality

Being completely honest: RC4 isn’t a silver bullet. It has real limitations you must understand.

What RC4 DOES do well

✅ Evades static signatures - Antivirus based on pattern matching won’t detect you.
✅ Avoids superficial analysis - Quick analysis won’t reveal the payload.
✅ It’s fast and simple - Low CPU and memory usage.

What RC4 CANNOT do

❌ Doesn’t protect against deep analysis - A malware researcher breaks RC4 in minutes.
❌ The payload is exposed in memory - When you execute it, it appears unencrypted (inevitably).
❌ EDR is smarter now - Modern EDRs detect behavior, not just signatures.

RC4 is a first line of defense. It stops automatic static detections. But if a human analyst looks at your binary + behavior, RC4 is just a small obstacle.

Summary: What We’ve Learned

In this tutorial on RC4 encryption, we covered:

✅ Stream cipher theory - How RC4 generates sequences that appear random
✅ RC4 architecture - KSA (shuffling) and PRGA (generation) explained clearly
✅ Three ways to implement RC4 in Rust: Pure, SystemFunction032 and SystemFunction033
✅ Security best practices - Dynamic keys, long keys, layered encryption
✅ Real limitations - What RC4 protects and what it doesn’t

The Final Reflection: Uncomfortable Truths

RC4 is exponentially more robust than XOR. It evades static signatures that would stop most payloads. But it’s NOT a silver bullet. If you think RC4 alone makes you invisible, you’re wrong. EDRs, malware analysts, and modern defense systems are MUCH more sophisticated.

RC4 is the first step. But if someone really wants to analyze you, they break RC4 in seconds.

Hope this tutorial helped you understand RC4 not as “magic”, but as what it really is: a well-designed algorithm that generates pseudorandom sequences for payload obfuscation.

Ready for the next level? Explore anti-debugging, anti-VM, or advanced injection techniques in the next MalGhost posts. 🔓