A few days ago I stumbled upon someone asking how (digital) video files work. I’d spend a lot of time looking into video formats during my first year of university,
It wasn’t part of the Computer Science course I studied, but I was interested in video file formats at the time, so I looked into them.
this means I have a good idea how most of the major (digital) video formats work. So, I decided to give my “go to” answer regarding video formats.
The original context for the question was related to mkv (Matroska video format) files and how they worked. Here is my answer:
The first thing you needed to know is that mkv is a container. Imagine that a video file is a bucket, within this bucket there are two items: a book full of pictures (the video segment of the file) and an audio tape (the audio segment of the file).
The book can take many different forms: 4×3 prints (MPEG 2 DvD video stream); landscape paintings (H.264 BluRay video); passport photos (divx/xvid); or 8×10 prints (x264 video). In the very same way the audio tape can take many different formats, too.
Within the container, each item has information on what it is (called the header) and where to start read/listening to them (an identifier for the software codec required to turn it into moving pictures or sound). However, different playback devices have different support for the containers.
Most playback devices support the different stream (in the example above: the book and audio tape) types (divx, x264/H.264, MPEG, etc.), but they don’t understand the container types. They, literally, cannot figure out what kind of bucket they’re looking at.
A video file really is just a container for other files. Most video file formats contain one video stream and, at least, one audio stream.
Here stream refers to a formatted stream of bits
Most modern video container formats (mkv, mp4, avi *shudder* etc.) have support for multiple subtitle streams, too. This means that the subtitle track can be embedded within the container without having to “burnt” into the video.
The act of hard coding subtitles (sometimes referred to as “burning them in”) requires editing each frame of the video stream to include the subtitles relevant to that frame. This leads to a video file with subtitles that “can’t be turned off”, because they’re actually part of the visual information. The opposite of this is soft subtitling: the subtitles are provided as a separate stream, or even as a separate file, and loaded in only id the user/viewer requests them.
You can think of a subtitled movie on VHS as being hard subtitled, and (most) DvD and Blu-Ray movies being soft subtitled.
I used “most” for DvD/Blu-Ray, because a lot of early Tartan Asia releases were just digitised versions of the VHS sources. This meant that the subtitles were burnt into the image.
Almost any video and audio stream combination is supported by most hardware players, due to the video and audio being processed separately. However, not all of the containers are supported, especially the open source ones like mkv. The reasons for this are, mostly political: if the chips in an iPhone supported mkv, then they wouldn’t support Apple’s chosen format (m4v, which is just mp4 with DRM support).
This leads to a lot of “x device wont play my <insert format here> files. I guess I have to convert them.” When, in actual fact, you don’t. Like I said earlier, the streams can be played back fine, it’s just the containers that the hardware doesn’t know how to read.
It’s like the container is in Urdu, the streams are in Latin and you speak French. It’s possible to read the Latin parts, but you have no idea where to start because parts of each are separated by sections of Urdu.
To convert or not to convert?
Most of the time, you have no need to convert the streams within the container. You just need to relocate them to another container, one that your player can read.
Say you have an mkv video with x264 video and AAC audio.
x264 is the open source implementation of H.264 (the Blu-Ray video codec), and AAC is the audio format used by iTunes and many other audio services (like DAB radio).
Most hardware players can play the streams, as these are the formats that HD video is delivered in on video streaming sites (like YouTube, Hulu and Netflix). But the mkv part throws a lot of players off, because they don’t know how to look into the bucket (to go back to the metaphor for a moment) So, you just need to change the container.
How do I do that?
There are a few ways to do this:
- Convert the entire file (including streams to another, supported, format)
- Swap the container out for a different container
There are others, but I’ll just focus on these two for now.
If you chose the convert the file completely, you’re going to lose data; it will take a long time; and you’ll be left with a crippled file (different stream formats have different limits on resolution, bits per pixel, bit rate, quantisation, etc.).
If you chose the swap the container out, you wont lose data (as you’re not touching the streams); it wont take a long time; and the file will be, for all intents and purposes, exactly the same.
How do I do that?
The quickest and easiest way to swap containers is to use FFMpeg. FFMepg is a utility that sits in the background of your operating system, waiting for you to load a video or audio file. As soon as you do, it steps in between the media player and the file, it inspects the file and converts the container (in real time) to a format that the media player can understand.
FFMpeg is actually more complex than that, and can do quite a lot more than what we’re about to use it for.
Say you have a video file called:
Baby’s first steps.mkv
The file contains two streams:
- x264 video
- MP3 audio
Now suppose you want to watch this back on a Playstation3. The PS3 cannot read your file as it is because it cannot understand the container, but we know that it can play the individual streams (it can play Netflix movies and MP3 audio files, this is how we know). So what can we do?
If you have FFMpeg installed, all you need do is run this command in a command prompt (assuming you are running Windows):
ffmpeg -i <path-to-input-file>.mkv -vcodec copy -acodec copy <full-name-of-new-file>.mp4
This will tell FFMpeg to create a new file called “full-name-of-new-file.mp4” and copy the video and audio streams (as is) from “path-to-input-file.mkv” into it. Copy this new file to a memory stick, jam the memory stick into your PS3 and, viola, a file that it can read.
And there you have it, a quick and dirty guide to video formats and how they (kind of) work