#  Video Decode
Video decoding is the process of extracting individual frames from a media volume which contains video.  This process involves interfacing with the media file, requesting specified frames, receiving the requested frames, and adding timing information to each frame to ensure appropriate sequencing.

## Background
Non-streaming media frames are typically stored in one of two ways.  Either all the frames are stored together in a single wrapped container, such as an MXF file, or they are stored as an explicit series of frames, such as a folder full of .ari files.

File-based video is most often stored in a compressed or proprietary format and requires interaction with other media libraries to access the stored video information.  Before the video frames can be accessed, it must be determined which libraries are capable of decoding the frames and selecting the appropriate one if more than one library can handle decoding.

Once the appropriate library is selected and configured, requests can be made to supply video frames.  The library will require guidance on which frame(s) to decode and this will be accomplished typically by supplying a frame index or a particular timestamp based on the video's framerate.  Some decoders will offer extended decode parameters that allow the specification of bit depth, color space, data type, hardware mode, etc; the exact options offered are unique to each decoder.

In order to permit re-sequencing further along the media pipeline, timing information has to be associated with each video frame.  The most common way to do this is to wrap the decoded frame in a container which also contains the timing.  These containers are commonly refered to as sample buffers and the timing is usually denoted with a rational value with the numerator denoting the position information and the denominator representing the time scale; CMTime is the standard rational used in macOS video processing.  


## Implementation 
 
### Coordination independent of buffer decoding 

The most basic implementations of video decoding is to leave the coordination of the supplied buffers to the entity responsbile for decoding the frame.  However, this paring of responsibilies can lead to sub optimal performance due to the differences in capabilities of individual media decoding libraries.  Some libraries are robust and have decoders which can asynchronusly handle multiple video decode requests efficiently.  In this case, it would be possible to allow the decoder to coordinate the return of the requested frames.  However, this is not universal.  Some deocding libraries only offer sequential decoding requests on each decoder.  However, often times it is possible to create multiple decoding interfaces to the same file which CAN simultaneously access the common video data and result in performance increases.  In this situation the coordination of frame requests MUST reside outside of the decoder and some other entity will have to manage the requests to each decoder and appropriately sequence the frames for use further down the pipeline.

Another concept of coordination that exists independent of the decoder is the idea of modified frame rates.  It is not uncommon for the frame rate of a source video to be augmented for the output.  An example of this is reducing the frame rate of a video file shot with a very high frame rate.  Videos recorded at high frame rates store more frames per second.  More frames mean larger file size and slower decoding times.  Experts tend to think the human eye can pickup differences in the neighborhood of 30-60 frames per second.  Thus files recorded at high frames rates but intended for human consumption, such as playback, don't require all of the frames to be useful.  

When the frame rate is being augmented there are two ways to accomplish this augmentation.  The simplest way is to maintain the number of frames but augment the video's duration.  In the case of a reduction from 60 fps to 30 fps, the duration of the video will double because each frame will have a presentation of duration of 0.03333 (1/30) seconds instead of their recorded durations of 0.016666 (1/60).  This type augmentation is useful when wanting to speed up or slowdown the presentation of a file.  The other way to augment is to maintain the video's duration but change the number of frames.  This is accomplished either by dropping frames in the case of decreasing the frame rate or duplicating frames to increase the frame rate.  This type of augmentation is useful when playing back or transcoding high fps files for human viewing.  To make this work, some entity will have to keep track of the frames and make the decision on which frames are to be dropped or which are to be duplicated.  This is another reason for the frame coordinator living independently of the decoder.  The decoder's job is to decode specified frames and should not concern itself with anything else.  It will be the coordinator's job to instruct the decoder which frames to decode and handle all the bookkeeping necessary to make a correct frame rate augmentation. 
