Video streaming is a set of technologies behind YouTube videos playback, real-time video chats in Skype and online broadcasting of your favorite team match. With some limitations even a TV program, say, from 50 years ago, could have been referred to as video streaming, even though it was not digital and, of course, noninteractive.
And even more so, we can claim that video streaming began with the first TV set back in the 1920s, based on the original invention by Paul Nipkow in 1884. Really, 1884. It’s time to celebrate 130th anniversary this year.
Today video streaming is understood as a set of technologies that comply with the following requirements:
Media is streamed over IP;
Video is either live or can be accessed on demand;
Videoondemand can be sought, played, paused;
Optionally, live video may be rewound, paused for a short period of time or recorded.
Why is video streaming important?
Humanity started from rock painting, continued with clay tablets and cuneiforms, then information dissemination really took off with Gutenberg's printing press and again later with radio. The main conclusion: the bandwidth available to information streams has constantly grown throughout the history. Which one is faster? To read a 200page book on engine transmission maintenance or to watch a 30-minute video guide? In most cases the latter is the fastest. And with the appearance of the video streaming we can even skip the parts of the video we are not interested in and focus on the most relevant information or pause and rewind a live video stream from a conference to refresh our memory. Another trend in a video streaming is that we are sometimes too busy to even watch a video and extract the relevant information from a video clip based on our criteria.
What are the video streaming components?
Distributed File Storage
Distributed File Storage
Video content is stored in a special file store. Worldwide video delivery video storage should be:
Failsafe meaning it should have several copies of each video file served;
Distributed, i.e. with copies available for each target region;
Without a single point of failure which requires deployment of multiple file storage nodes in every region.
Fortunately, this infrastructure is not necessarily rolled out by the video content provider.
Multiple solutions are here to help if HTTPbased video streaming is used:
Some video delivery protocols, those not based on HTTP delivery, require deployment of so called edge servers. In a few words, there are a limited number of origin servers serving video files, the origin servers configuration is normally designed to handle failover procedures and make the video content always available. But the number of origin servers is not enough to handle millions of simultaneous viewers. Instead, each viewer connects to an edge server which is a de facto proxy to the origin server or to another edge server. This approach allows streaming a video or even live content with minimal network delays.
DRM stands for Digital Rights Management and it’s a set of technologies aimed at protecting digital content from unauthorized reproduction after sale. DRM implementation may vary from a simple content scrambling to advanced encryption of video streams and even more sophisticated algorithms using strong math to verify the digital signatures of video content. Basically, DRM enforces a business model of selling a digital copy per user or per device. Obviously, monetization of VideoOnDemand service subscriptions implies the use of a DRM or custom encryption solution. This is used to protect content by making it exclusively available to subscribers. Almost every modern delivery protocol supports video stream encryption, including HLS, MPEGDASH and RTMP.
Some delivery protocols (HLS, MPEGDASH) are natively supported by video playback devices (Smart TV, Apple iPhone, etc.), other protocols (RTMP) sometimes require custom video players (for instance, FlowPlayer, JWPlayer). In any case most of the playback solutions support DRM. In some cases, for example, for websites, players have to be customized. The level of customization can be different, such as themes, color setup, or even handling low-level encryption protocols and authentication/authorization schemas.
What is the present state of video streaming?
RTMP and RTMFP
RTMP is a proprietary Adobe Inc. protocol over TCP/IP transport protocol designed for video and audio streaming and transmitting arbitrary metadata, including user defined commands.
It has several variations built on top of the standard RTMP stack:
RTMPS — RTMP over SSL;
RTMPE — Encrypted RTMP, using AES128 algorithm for transmitted data encryption;
RTMPT — encapsulated RTMP over HTTP, intended for overriding limitations introduced by some firewalls and NATs;
RTMFP — RTMP for p2p networks, using UDP for NAT traversal
Passive protocol (except for RTMFP), does not have limitations with active client connection in comparison with RTP/RTSP.
Is natively supported by flash players and Adobe’s technological stack
Can be customized by means of passing user defined functions
Scaling server deployment is nontrivial as the underlying protocol is TCP
Open source servers are mostly buggy due to the unclear specification (including some difficulties with AMF0/3 headers processing)
Limited number of supported codecs supported by RTMP.
HTTP-Based Protocols (from HLS/HDS to MPEG-DASH)
One of the first HTTPbased streaming protocols implemented by Apple Inc. It reuses the popular m3u8 (WinAmp’s) format for playlist descriptions. Items in a m3u8 playlist are normally mpeg2ts container chunks containing interlaced video and audio data of 560 seconds length. M3u8 files can be hierarchical (i.e. listing other m3u8 files for the same content), this is needed for localized video broadcasting and/or different video quality.
Very simple human readable format for playlists.
Both live and videoondemand broadcasting is supported.
No need for special servers, video can be split into mpeg2ts chunks and m3u8 playlists, this content can be uploaded later to any HTTP server.
Support for data encryption for licensed broadcasting.
Easy scaling can be done by means of using CDNs for m3u8 and file chunks delivery to the end user over HTTP.
Theoretical delay between live event recording and playback can take from 2 seconds (theoretical) to 3060 seconds in practice.
Video seek operation requires downloading whole chunks containing the video at the desired timestamps.
Why are there so many technologies and what are the core problems with the video streaming?
Variety of Media Formats
In the modern digital world, even large associations, expert groups and institutes cannot stop the format wars. One of the most recent examples is Blu-Ray vs. HDDVD. The variety of video streaming formats has real underlying technological challenges. For instance, the central idea behind the RTP/RTSP protocols was to imitate DVR remote control. In the heyday of QuickTime Streaming Server this was enough. Later, the RTMP format took its place as it provided a custom control and interaction channel in addition to video stream management. The evolution of video formats was continued by at least two branches: one was intended to reuse the existing HTTP infrastructure to deliver video (HLS, HDS, MPEGDASH); another borrowed an idea from p2p networks to reduce loading on servers and to take some of a client throughput to broadcast to other users.
Variety of Terminal Devices
Nowadays, there is a huge variety of make and models of video streaming capable devices: smartphones, laptops and desktops, tablets, smart TVs, embedded devices, built-in terminals and much, much more. Each device, depending on make, operating system and hardware and software requirements has its own set of supported codecs and media formats. Unfortunately, there’s no uniformly supported solution for video streaming.
For instance, if it is required to support old Android devices, the latest iPhones and legacy browsers (w/o support for html5), then, at least, 3 types of media delivery formats are required: HLS, RTSP and RTMP.
Unless MPEGDASH is widely adopted, in order to provide video on demand delivery, the target devices should be analyzed prior to deploying a streaming solution.
What are the latest trends in video streaming?
Social Networks Integration
YouTube is an example of a rather simple social network built around video content, allowing users to sign up for the video channels, leave commentary or embed video into their homepages or social networks accounts.
The vision for the future of social network integration expressed by many researchers condenses down to the next generation web products should allow users to build their own playlists, to form user context based playlists semiautomatically and aggregate different video resources. In addition to this, video content must be available anytime anywhere, on mobile and desktops platforms, smart TVs, etc.
User-Defined and Automatic Metadata Generation
Following the previous paragraph, modern solutions should enable users to integrate any videos with their metadata. For example, a football match or a favorite music video could be enhanced with user notes, video clips and other media, which can be saved and shared with other people.
Content Search and Image Recognition
One of Google’s services allows searching using an image rather than text, as well searching for images using text descriptions. The same approach for video content is the future of video streaming technology. Metadata such as voice, images of recognized objects and their textual representations should be generated in the background for uploaded or live videos and associated with it. Such metadata will enable existing new end-user experience, such as, for instance, lookup for an event in a movie by querying something like “The note on the table from Steve...” and obtaining a playback of the relevant parts of the video as a response to the query.