Video Encoding – Part 2 (Video Codecs)
In the previous section Video Encoding Part-1, we looked at how individual images are represented and compressed using formats like BMP, PNG, and JPEG. However, video is not just a collection of independent images—it leverages compression both within each frame (spatial compression) and across frames (temporal compression) by exploiting similarities between consecutive frames.
1. Sequence of images and video encoding
-
A video is a sequence of such images (called frames) that are stored or transmitted one after another.
-
As discussed in the previous article, an image is captured (for example from a camera) as a raw frame, which can be very large in size. To make it practical for storage and transmission, it is compressed into formats like JPEG.
-
As a first step, each frame is compressed using image compression techniques (similar to JPEG).
-
However, video encoding goes a step further; Instead of treating every frame independently, it compares each frame with its adjacent frames and encodes only the differences between them.
-
This process—called video encoding, and it significantly reduces the amount of data, often by more than 10×, compared to sending individually compressed images for every frame.
-
Video encoding is usually a computational intense process; And depending on the parameters chosen, the process can take significant processor and system resources.
-
Video codecs typically use a combination of:
- Intra-frame compression (within a frame, similar to JPEG)
- Inter-frame compression (between frames, using motion and differences)
2. Video encoding algorithms
-
There are various algorithms used to perform this inter-frame compression, such as H.264, H.265, etc.
-
Each algorithm offers different trade-offs:
- Some prioritize better visual quality
- Some prioritize faster encoding/decoding speed
- Some aim for higher compression efficiency (smaller file size at similar quality) -
In addition to the algorithm itself, several encoding parameters also impact the final output:
- Bitrate: controls how much data is used per second (higher = better quality, larger size)
- Frame rate (FPS): affects smoothness and bandwidth
- Resolution: higher resolution increases data size
- Keyframe interval (I-frame frequency): impacts seeking, latency, and compression efficiency
3. Frame types (I,P,B)
-
Video codecs do not treat all frames equally. Instead, they use different frame types to reduce redundancy and improve compression efficiency.
-
The three main frame types are:
I-frames (aka Key frame) (Intra-coded frames): a fully self-contained video frame that is encoded independently, without referencing any other frames, and acts as a complete image in the video stream.
- Fully self-contained frames;
- Similar to a standalone image (like JPEG)
- Do not depend on any other frame
- Used as reference points for decoding and seeking
P-frames (Predicted frames): A video frame that stores only the differences from a previous. It depends on I-frame for its reconstruction because it is a difference frame.
- Store only the difference from previous frames
- Depend on past I-frames or P-frames
- Require less data than I-frames
- Common in normal video playback
B-frames (Bidirectional frames): a video frame that is reconstructed using information from both previous and next reference frames (I-frames or P-frames), allowing for higher compression efficiency.
- Use information from both previous and next frames
- Achieve the highest compression efficiency
- More complex to encode/decode
- May increase latency slightly
By combining these frame types, video codecs significantly reduce file size while maintaining visual quality.
4. Codecs vs Containers
Codec (a compression and decompression technique)
-
A video codec is about "how video frames are encoded are decoded".
-
Examples are: H.264, H.265 etc.
-
A codec compresses frames and generates I, P, and B frames
Container
-
A container is a file format that stores video frames (i.e. coded output), audio stream, subtitles, metadata
-
Example: MP4, MKV, AVI (older containers)
-
In other words, code is algorithm to compress video frames into I,P,B frames; Container is how the frames and other infromation like audio etc. is to be stored in file.
5. Video resolutions (CIF, QCIF, HD, FHD, etc.)
These are early standardized low-resolution video formats used in video conferencing and surveillance systems when bandwidth was very limited.
-
Video resolution defines the spatial size of each frame, i.e., how many pixels make up a single image in a video stream.
-
Earlier formats like QCIF and CIF were designed for video conferencing and surveillance when bandwidth and processing power were very limited
-
Modern formats like HD and Full HD (FHD) are used for high-quality streaming and display systems.
Video resolution comparison (QCIF → 8K)
| Format | Resolution (W × H) | Total Pixels | Aspect Ratio | Era / Use Case |
|---|---|---|---|---|
| QCIF | 176 × 144 | ~25K | 11:9 | Very low-bandwidth video calls, early mobile video |
| CIF | 352 × 288 | ~101K | 11:9 | Early video conferencing, CCTV systems |
| 4CIF | 704 × 576 | ~405K | 11:9 | Improved surveillance, broadcast systems |
| SD (480p) | 720 × 480 | ~345K | 4:3 / 16:9 | DVD, legacy TV systems |
| HD (720p) | 1280 × 720 | ~921K | 16:9 | Basic HD streaming |
| FHD (1080p) | 1920 × 1080 | ~2.07M | 16:9 | Standard streaming, video conferencing |
| 2K (QHD) | 2560 × 1440 | ~3.69M | 16:9 | High-end monitors, gaming, streaming |
| 4K (UHD) | 3840 × 2160 | ~8.29M | 16:9 | Ultra HD streaming, cinema-grade content |
| 8K (UHD) | 7680 × 4320 | ~33.18M | 16:9 | Premium displays, professional production |
6. Audio codecs
-
Audio data can also be compressed, similar to video, to reduce storage size and bandwidth usage during transmission.
-
Unlike a video frame, which is a 2D signal (X and Y axes), audio is a 1D time-domain signal, so its compression algorithms are different and operate over time rather than spatial dimensions.
-
Common audio codecs include formats like AAC and MP3, which exploit redundancy in sound signals and human hearing perception.
-
Audio compression is outside the scope of this article and will be covered in a separate discussion.
7. Audio and video synchronization
-
Audio and video are kept in sync using timestamps (PTS) attached to both streams.
-
Each video frame and audio chunk has a time reference that tells the player when to play it.
-
During playback, the player aligns audio and video on a common timeline.
-
If one stream arrives earlier (common in streaming), the player uses a buffer to wait and match timing.
-
The goal is simple: lips, sound, and motion should match naturally during playback.
-
This topic is not covered in this article; it is mentioned for the sake of curious readers.