Skip to content

Video Encoding – Part 1 (Image Formats)

In this article, we look at how images are actually represented in digital systems—from raw sensor data to common formats like RGB, YUV, JPEG, and PNG. Before we even talk about video, we need to understand how a single frame is stored, because video is just a sequence of these frames.

This is where image formats become the foundation of everything in video encoding.


1. Rraw image representation (BMP)

  • An image is fundamentally a grid of pixels, where each pixel stores visual information about a single point in the scene. These pixels are either grayscale (black & white) or color (RGB) depending on how the image is represented.

  • In a grayscale image, each pixel stores a single value (typically 1 byte) representing intensity, ranging from black to white.

  • In a color image, each pixel stores three values corresponding to the intensities of the primary colors Red, Green, and Blue (RGB).

  • Therefore, the total uncompressed image size is typically W × H × 1 for grayscale and W × H × 3 for RGB images.

  • A common file format used to store this raw pixel data directly is BMP, which stores pixel values with minimal or no compression, making it large in size but simple in structure.

BMP size table

Resolution Pixels (W×H) Format Channels Bytes per pixel Total size
360p 640 × 360 Grayscale 1 1 byte 230,400 bytes (~225 KB)
360p 640 × 360 RGB 3 3 bytes 691,200 bytes (~675 KB)
720p 1280 × 720 Grayscale 1 1 byte 921,600 bytes (~900 KB)
720p 1280 × 720 RGB 3 3 bytes 2,764,800 bytes (~2.64 MB)
1080p 1920 × 1080 Grayscale 1 1 byte 2,073,600 bytes (~2.0 MB)
1080p 1920 × 1080 RGB 3 3 bytes 6,220,800 bytes (~5.9 MB)
2K 2560 × 1440 Grayscale 1 1 byte 3,686,400 bytes (~3.5 MB)
2K 2560 × 1440 RGB 3 3 bytes 11,059,200 bytes (~10.5 MB)
4K 3840 × 2160 Grayscale 1 1 byte 8,294,400 bytes (~7.9 MB)
4K 3840 × 2160 RGB 3 3 bytes 24,883,200 bytes (~23.7 MB)

Note: some extra information is also added called the header (if stored in a file such as .bmp)


2. Compressed image (JPEG and PNG)

Raw images are very large in size because they store pixel data directly without compression. This makes them expensive in terms of both storage and transmission bandwidth, especially for high-resolution images.

  • To solve this, images are typically compressed using standard encoding algorithms that reduce file size while preserving visual quality.

  • Some of the most common image compression formats are JPEG and PNG.

  • JPEG is a lossy compression format and is commonly used for both color (RGB) and grayscale images, making it highly efficient for natural photographs.

  • PNG is a lossless compression format and supports RGBA images, meaning it can store Red, Green, Blue, and Alpha (transparency) channels without losing any information.

JPEG Image Size Comparison (Lossy Compression)

Resolution Pixels (W×H) Format Channels Size Formula Raw RGB Size JPEG Size (typical)
360p 640×360 Grayscale 1 W × H × 1 230 KB 30 – 100 KB
360p 640×360 RGB 3 W × H × 3 675 KB 50 – 150 KB
720p 1280×720 Grayscale 1 W × H × 1 900 KB 100 – 300 KB
720p 1280×720 RGB 3 W × H × 3 2.64 MB 200 – 600 KB
1080p 1920×1080 Grayscale 1 W × H × 1 2.07 MB 200 – 700 KB
1080p 1920×1080 RGB 3 W × H × 3 5.93 MB 400 KB – 1.5 MB
2K 2560×1440 Grayscale 1 W × H × 1 3.5 MB 400 KB – 1.5 MB
2K 2560×1440 RGB 3 W × H × 3 10.5 MB 800 KB – 3 MB
4K 3840×2160 Grayscale 1 W × H × 1 7.9 MB 1 – 4 MB
4K 3840×2160 RGB 3 W × H × 3 23.7 MB 2 – 8 MB

PNG Image Size Comparison (Lossless Compression)

Resolution Pixels (W×H) Format Channels Size Formula Raw RGB Size PNG Size (lossless range)
360p 640×360 Grayscale 1 W × H × 1 230 KB 150 – 500 KB
360p 640×360 RGB 3 W × H × 3 675 KB 200 – 600 KB
360p 640×360 RGBA 4 W × H × 4 900 KB 250 – 800 KB
720p 1280×720 Grayscale 1 W × H × 1 900 KB 500 KB – 1.5 MB
720p 1280×720 RGB 3 W × H × 3 2.64 MB 800 KB – 2.5 MB
720p 1280×720 RGBA 4 W × H × 4 3.52 MB 1 – 3 MB
1080p 1920×1080 Grayscale 1 W × H × 1 2.07 MB 1 – 4 MB
1080p 1920×1080 RGB 3 W × H × 3 5.93 MB 1.5 – 5.5 MB
1080p 1920×1080 RGBA 4 W × H × 4 7.91 MB 2 – 7 MB
2K 2560×1440 Grayscale 1 W × H × 1 3.5 MB 2 – 7 MB
2K 2560×1440 RGB 3 W × H × 3 10.5 MB 3 – 10 MB
2K 2560×1440 RGBA 4 W × H × 4 14.0 MB 4 – 12 MB
4K 3840×2160 Grayscale 1 W × H × 1 7.9 MB 5 – 18 MB
4K 3840×2160 RGB 3 W × H × 3 23.7 MB 7 – 25 MB
4K 3840×2160 RGBA 4 W × H × 4 31.6 MB 10 – 30 MB

3. Typical size reduction

  • JPEG → big reduction, loses detail
  • PNG → small/moderate reduction, preserves detail
Format Size vs RAW What it means
JPEG ~10× to 30× smaller Lossy compression (removes visual detail)
PNG ~1× to 5× smaller (sometimes larger than RAW) Lossless compression (preserves all data)

4. Typical uses of JPEG and PNG

JPEG is used when you want small file size for natural images.

  • Photos (phone camera images)
  • Web images (blogs, websites)
  • Social media uploads
  • Storage of large photo libraries

PNG is used when you need perfect quality preservation.

  • UI graphics (icons, buttons)
  • Logos
  • Text in images (screenshots)
  • Diagrams / charts
  • images with transparency (alpha channel)

Even though PNG is lossless (as raw BMP image), in most cases, it is much smaller is size.


5. YUV format (alternate to RGB)

  • YUV is a way to represent images where color is split into:

    Y = Luma (brightness) U = Chrominance (blue difference) V = Chrominance (red difference)

  • So instead of storing R G B per pixel, we store Y + U + V

What each channel (YUV) means?

  • Y (luma): brightness / grayscale structure of the image
  • U: how “blue-ish” or “yellow-ish” the color is
  • V: how “red-ish” or “green-ish” the color is

Note: Brightness is separated from color.

Why use YUV instead of RGB

  • In RGB, Every pixel is stored as R, G, B
  • But human vision does NOT perceive all three equally
  • Problem: RGB mixes brightness + color together in each channel, which is inefficient for compression and video.

  • Human vision is very sensitive to brightness (detail, edges) and less sensitive to fine color changes

  • So instead of: R G B (all equally important)
    We get:"
    Y (important structure)
    U,V (color info)


6. YUV advantages

We use YUV instead of RGB because it separates brightness from color, matching human vision and enabling much more efficient compression—especially in video systems; Following are more details.

i) Compression becomes easier

We can reduce color resolution:

  • keep full Y
  • reduce U and V (e.g. YUV420) (to be convered in more details later)

ii) Video becomes efficient

Most video codecs assume:

  • brightness matters most
  • color can be compressed more

(more on video codecs in part of this series)

iii) Better signal separation

Processing becomes easier:

  • sharpening → apply to Y only
  • color correction → apply to U/V only

7. Images from camera (YUV420 vs MJPG)

  • A camera when connected to a computer provides video as a sequence of images (frames) at a fixed frame rate such as 30 fps, 15 fps, etc.

  • These frames are independent images (in most camera output modes), meaning each frame is captured and transmitted separately; Not heavily interdependent like in encoded video formats (e.g., H.264) which will be described in part 2 of this series.

  • Most cameras (not all) support two main types of image streams at this stage (before video encoding happens)


    i) YUV420: a modified form of the YUV color representation (i.e., not RGB). It stores brightness (Y) at full resolution and color components (U and V) at reduced resolution, making it efficient for real-time processing and video pipelines.


    ii) MJPEG (Motion JPEG): where each frame is a JPEG-compressed image (typically RGB for color cameras). Each frame is compressed independently using the JPEG algorithm.

YUV420 vs MJPEG from camera

  • Imagine that a camera is providing images: (1000×1000, 30 fps, 1 sec)
Format Type Lossy? Representation Size per frame Size per second Key idea
YUV420 Raw video format ❌ No (representation only) Y + U/2 + V/2 (chroma subsampled) ~1.5 MB ~45 MB Fixed bandwidth, no compression
MJPEG Compressed video stream ✅ Yes JPEG per frame (RGB compressed) ~0.2 – 1.0 MB ~6 – 30 MB Compression depends on image content
  • depending on our application, we read input either in YUV420 or MJPEG

  • Some camera may have only one format.

Note: See an article video for linux (v4l2) for more details


8. What is next?

  • In the next article in this series, we will move from image representation to video conversion, covering how individual image frames are combined into a continuous stream.

  • We'll see how formats like YUV and MJPEG fit into real video pipelines.

  • We will also look at the core concepts behind video encoding along with some practical, system-level considerations.


Some references