Encryption and Decryption of text within a Video

With the rapid advancement of digital technologies and the widespread use of multimedia platforms, securing sensitive information during transmission and storage has become increasingly important. Traditional encryption techniques provide confidentiality but do not conceal the existence of data, which may attract unwanted attention. On the other hand, steganography hides the presence of data by embedding it within a cover medium, such as an image, audio, or video, but without encryption, the hidden information can still be compromised if discovered. In this system, the text message is first encrypted using a symmetric key encryption algorithm like AES (Advanced Encryption Standard) in CBC (Cipher Block Chaining) mode. AES is widely recognized for its strength and efficiency, making it ideal for securing textual data. The encrypted text, which now appears as unintelligible binary data, is then embedded into a cover video. The embedding process uses image-based steganographic techniques, typically by modifying the Least Significant Bits (LSB) of pixel values in selected video frames. At the receiver's end, the system performs the reverse operation: it first extracts the encrypted data from the video and then decrypts it using the shared secret key to recover the original message. This project demonstrates a practical application of multimedia security, combining the strengths of cryptography and steganography for secure and covert communication. It can be applied in various fields such as military communication, watermarking, copyright protection, and secure messaging ensuring the confidentiality, integrity, and invisibility of sensitive data during transmission is a major challenge. Traditional encryption techniques, while effective at securing data, do not hide its presence—making it vulnerable to detection and potential attack. On the other hand, steganography conceals the existence of the data but without encryption, the hidden information can be extracted and misused if discovered. There is a critical need for a system that not only protects the content of sensitive text through encryption but also conceals its existence within a multimedia medium such as video. This project addresses these challenges by proposing a system that combines AES encryption and video-based steganography, offering a dual-layered approach to secure text communication.

Dataset Description

The dataset consists of two primary components:

1. Cover Video:

  • A single video file in .mp4 format is used as the cover medium for embedding the encrypted text.
  • The video has the following characteristics:
    • Resolution: 1920×1080 pixels (Full HD) or 1280×720 pixels (HD), depending on the available video quality.
    • Frames Per Second (FPS): 30 FPS to maintain smooth playback and adequate frame extraction capacity.
    • Duration: Varies (typically between 30 to 60 seconds) to provide sufficient frames for embedding the encrypted text.
  • The video is divided into individual frames (extracted using libraries like OpenCV) for pixel-level manipulation during the embedding process.
  • Each frame is then resized (if necessary) to ensure it matches the model or processing requirements (e.g., 256×256 pixels).

2. Secret Text File:

  • A text file that contains the plaintext message to be securely embedded into the video.
  • The text file is encrypted using AES-CBC (Advanced Encryption Standard in Cipher Block Chaining mode) with a secret key and initialization vector (IV).

Data Preprocessing

Video Preprocessing

Extraction:
The cover video is segmented into individual frames. This allows pixel-level access for embedding the encrypted text. Frame extraction is performed at the video's original frame rate (e.g., 30 frames per second).

Resizing:
Identify outliers in numerical features. Outliers can significantly models. We might choose to remove outliers if they're considered errors or not representative of the data. Alternatively, we could use techniques like winsorization to cap outliers to a certain value.

Normalization:
The pixel values of the frames are normalized to maintain consistency and improve processing efficiency during embedding.

Selection:
Depending on the size of the encrypted message, a subset of frames is selected for embedding. This selection ensures that the number of frames is sufficient to hold the encrypted binary data without loss.

Algorithms Used :

This project integrates two key technologies — cryptography and steganography securely hide a secret message inside a video. The following algorithms are employed: 1. AES-CBC Encryption (Cryptography) Advanced Encryption Standard (AES) is a symmetric block cipher used to encrypt secret text before embedding it into the video. The Cipher Block Chaining (CBC) mode is chosen for enhanced security, where each plaintext block is XORed with the previous ciphertext block before encryption. A randomly generated Initialization Vector (IV) is used for the first block to prevent pattern leakage. Input: Plaintext message Output: Ciphertext (encrypted binary)

Key Features:

  • 128-bit or 256-bit key length
  • Random IV for security
  • Padding applied to make the message a multiple of the block size (16 bytes)
  • This ensures confidentiality, so even if the embedded data is extracted, it remains unintelligible without the correct key and IV.
2. LSB-Based Video Steganography (Steganography) Least Significant Bit (LSB) steganography is used to embed the encrypted binary data into the least significant bits of the pixel values of selected video frames. Input: Ciphertext (in binary form), cover video frames Output: Stego frames (frames with hidden encrypted text)

Design:

Flowchart:

Methodology:

The proposed system combines AES-based encryption with LSB video steganography to securely embed a secret text message into a video file. The entire methodology is divided into several sequential stages to ensure both security and imperceptibility. The key steps are as follows: 1. Input Acquisition
A plain text file containing the secret message is used as input. A cover video (e.g., MP4 format) serves as the medium to hide the encrypted data.
2. Text Encryption Using AES-CBC
The plaintext is encrypted using the Advanced Encryption Standard (AES) in Cipher Block Chaining (CBC) mode. A symmetric key and a randomly generated initialization vector (IV) are used to enhance security. The output is a ciphertext that is unreadable without the correct key and IV.
3. Binary Conversion
The encrypted ciphertext is converted into a binary string. Each byte of ciphertext is represented as an 8-bit binary value, making it suitable for bit-wise embedding into video frames.
4. Video Preprocessing
The cover video is processed using computer vision techniques to extract frames at a fixed frame rate (e.g., 30 FPS). All frames are resized to a uniform dimension (e.g., 256×256) to maintain consistency during embedding. Pixel values of each frame are normalized for further processing.
5. Data Embedding Using LSB Steganography
The binary string obtained from the encrypted text is embedded into the least significant bits (LSBs) of pixel values in the extracted video frames. Each bit of the binary data is hidden in the LSB of a color channel (Red, Green, or Blue) of selected pixels. The embedding is done sequentially until all bits are hidden.
6. Stego Video Generation
Once the data embedding is complete, the modified frames are reassembled into a video format. The frame rate, resolution, and codec are kept consistent with the original video to maintain its appearance. The resulting stego video visually resembles the original video but contains the hidden encrypted message.
7. Decryption and Extraction (Optional)
At the receiving end, the stego video can be processed in reverse:

  • Extract frames
  • Retrieve embedded bits from LSBs
  • Reconstruct encrypted binary
  • Decrypt using the same AES key and IV to recover the original text
This methodology ensures a two-layer security model — cryptographic confidentiality via AES and data hiding via steganography — thereby providing secure, imperceptible communication through video media.

Implementation

The implementation of this project involves the integration of cryptography and steganography techniques using various Python-based libraries. The process is divided into modular components for clarity, scalability, and reusability. Below are the key stages of implementation:
1. Environment Setup
The implementation was carried out in a Python environment using the following libraries:
OpenCV for video processing (frame extraction and reassembly) NumPy for numerical operations and image array manipulation PyCryptodome for AES encryption and decryption OS / Glob for file handling and directory management
2. Text Encryption Module
The secret message is first read from a .txt file. AES encryption is applied using a predefined key and randomly generated Initialization Vector (IV). The output ciphertext is securely stored and prepared for binary conversion.

Text encryption Code


The provided Python script implements an AES encryption process using the AES-GCM mode of operation, which ensures both confidentiality and integrity of the message. The process starts by deriving a secure AES key from a password and a randomly generated salt using PBKDF2HMAC with the SHA256 hash algorithm. This is done to strengthen the password by making it computationally expensive to perform brute-force attacks. The encrypt_message function is responsible for encrypting the provided secret message. It begins by generating a random salt and using it along with the password to derive the AES key through the key derivation function (derive_key). A random 12-byte Initialization Vector (IV) is generated to ensure that the encryption process produces different ciphertexts even when the same message and password are used. The IV is crucial for the AES-GCM mode to work correctly. Next, the AES cipher is initialized in GCM mode, and the message is padded to ensure it fits the block size required by AES (16 bytes). After padding, the message is encrypted using the cipher's encryptor. The resulting ciphertext is then combined with the encryption tag (used for verification during decryption) and saved along with the IV, tag, and salt into separate files in base64 format. These files are stored in a dedicated directory to ensure secure handling of the encrypted data. This approach guarantees that the message is securely encrypted, and the necessary components (ciphertext, IV, tag, and salt) are saved for later use during decryption. Additionally, the encryption process ensures that any sensitive information is securely stored and cannot be read without the appropriate key and IV.
3. Binary Data Conversion
The encrypted ciphertext is converted to a binary string. This binary stream is used as the hidden data to embed into the cover video frames.
4. Frame Extraction and Preprocessing The cover video is read and split into individual frames. Each frame is resized to a fixed resolution (e.g., 256×256). Frames are normalized if necessary to prepare them for bit-level manipulation. The preprocessing module is responsible for preparing the input video for further processing in the steganography system. It performs two primary tasks: extracting individual frames from the video and separating the audio track. Initially, the system defines a structured directory setup to organize data. A base directory is used to store all relevant files, with subdirectories specifically for extracted frames and audio. When the function is called with a video path, it first determines a unique name for the session based on the video filename (excluding the extension). This name is used to create dedicated folders for storing the frames and the audio associated with that video. For frame extraction, the system checks if the frames for the video have already been extracted. If not, it opens the video using OpenCV and reads each frame sequentially. These frames are then saved as individual image files (in .png format) in the corresponding session folder. This allows frame-level access for embedding or analysis in the later stages of the steganography process. In parallel, the module also handles audio extraction. Using the moviepy library, it isolates the audio stream from the video and saves it in .wav format. This ensures that the audio can be preserved and later reattached to the modified video if needed. This modular preprocessing approach not only organizes video content efficiently but also prevents unnecessary reprocessing by checking if the output already exists. It ensures that both video frames and audio are readily available in a clean and reusable format for the subsequent embedding and reconstruction processes.

Code for extraction



5. Data Embedding Module (LSB Steganography)
The binary string is embedded into the Least Significant Bits of pixel values of the extracted frames. Bits are embedded in a sequential manner across the RGB channels to maximize capacity without affecting visual quality. A mapping is maintained for where data is embedded for accurate retrieval.

Code for embedding frames



This module focuses on embedding an encrypted message into video frames using Least Significant Bit (LSB) steganography. The LSB technique modifies the least significant bits of the image pixels to store binary data, ensuring that the changes are visually imperceptible. The process begins by reading the encrypted message from a file, which is encoded in Base64. This message is first decoded and then converted into its binary representation so that it can be embedded into the pixel data of the video frames. Each character of the encrypted message is transformed into an 8-bit binary format. Next, the system loads each image frame from a specified directory containing previously extracted video frames. It flattens each frame’s pixel data to a one-dimensional array, making it easier to sequentially embed bits of the message.
Each bit of the binary message is embedded into the least significant bit of the pixel values, preserving the original appearance of the image to the human eye. This embedding continues frame by frame until the entire encrypted message is hidden within the video. Each modified frame is saved with a new filename indicating that it contains embedded data. This method ensures that the secret message is securely and invisibly embedded into the video content without noticeable degradation in visual quality.
6. Video Reconstruction
The modified frames (stego frames) are compiled back into a video using the same frame rate as the original video. The output is a stego video that visually resembles the original but carries hidden encrypted data.

Code for Reconstruction of frames



This module is responsible for reconstructing a video from a series of modified image frames, which may contain hidden encrypted information. The process begins by collecting and sorting all image frames from a specified directory. These frames are assumed to be saved in sequential order (e.g., frame_0001.png, frame_0002.png) as they were originally extracted from a video. The first frame is read to determine the resolution of the video (i.e., width and height), which is essential for initializing the video writer. Using the OpenCV library, a video writer object is created with the specified frame rate and MP4 codec. Each image frame is then sequentially written to this video writer, effectively stitching the images back together into a continuous video file. Once all frames are written, the video writer is released and the output video is saved to the specified location. An additional function is provided to play the reconstructed video, displaying each frame in a window until the playback ends or the user interrupts it. This module is crucial for visualizing the final stego video after embedding the encrypted message within its frames.
7. Decryption and Extraction
The stego video can be read to extract the LSBs from the frames. The binary data is reassembled and decrypted using the same AES key and IV to retrieve the original message

Code for Decryption of secret message



This module performs the decryption of an AES-GCM encrypted message that was previously hidden within video frames. The process starts by retrieving the necessary cryptographic components stored in separate files: the encrypted message, initialization vector (IV), authentication tag, and salt—each encoded in Base64 format. The system uses the Password-Based Key Derivation Function (PBKDF2) along with SHA-256 hashing to derive a 256-bit AES key from a user-supplied password and the extracted salt. Once the key is derived, the AES cipher is initialized in Galois/Counter Mode (GCM) using the IV and tag to ensure both confidentiality and integrity of the message. The ciphertext is then decrypted, and the original plaintext message is recovered after removing any padding added during the encryption phase. Finally, the decrypted message is saved to a text file and displayed on the console for verification. This step is critical in reversing the encryption process and recovering the original hidden data securely.


Implementation Highlights:

  • AES-GCM Decryption: Utilized the AES cipher in Galois/Counter Mode (GCM) for secure and authenticated decryption of the embedded message.
  • Key Derivation using PBKDF2: Generated a 256-bit AES key from a password using PBKDF2HMAC with SHA-256 and a randomly generated salt to enhance security against brute-force attacks.
  • Base64 Encoding/Decoding: Handled all cryptographic components (ciphertext, IV, tag, salt) using Base64 encoding for file-safe storage and reliable transmission.
  • Secure Unpadding: Applied PKCS7 unpadding after decryption to retrieve the original plaintext message without padding artifacts.
  • Modular File Handling: Segregated encrypted data and cryptographic parameters into individual files, improving clarity, modularity, and debugging convenience.
  • Error Handling: Wrapped the decryption logic in a try-except block to handle exceptions gracefully and display appropriate error messages.

Output


The image shows a steganalysis comparison



After LSB-based embedding, the stego video retained high visual similarity to the cover video, with negligible perceptual distortion. Quantitative analysis using metrics like PSNR (Peak Signal-to-Noise Ratio) showed values above 40 dB, indicating high fidelity. Histogram analysis revealed minimal distributional changes, and noise variance in stego frames remained close to cover frames, further proving subtle embedding. The steganalysis module verified the robustness of the embedding process by confirming the presence of hidden data through detectable yet minimal differences. Overall, the system achieved a successful balance between imperceptibility, security, and reliability, making it suitable for secure video data transmission and privacy-preserving applications.

Conclusion


The project successfully demonstrates a secure and efficient method for embedding encrypted text into a video using a combination of AES-CBC encryption and LSB steganography. By applying cryptographic encryption before the embedding process, the system ensures that even if the hidden data is extracted, it remains unreadable without the decryption key and initialization vector. The use of video frames as a medium for steganography offers a large capacity for hidden data, while the LSB method preserves the perceptual quality of the original video, making the modifications virtually imperceptible to the human eye. Throughout the implementation, care was taken to ensure data integrity, maintain synchronization between frames, and securely manage the encryption parameters. The system can be extended for secure message transmission, digital watermarking, and other privacy-preserving multimedia applications. In conclusion, this approach provides a two-layered security model that combines the confidentiality of cryptography with the concealment of steganography, offering a practical solution for covert communication in digital media.