Thursday, September 27, 2007

Visually Lossless and how to back it up.

CineForm is the company most guilty of using the term Visually Lossless (VL)-- we may even have coined it. Red has benefited from CineForm's success with wavelets, and they also uses the term, as they should. But it is a marketing term, so we all should do our best to back it up. While visual inspection of 1:1 pixel data (or magnified) at 12" viewing distances is a good way to test for VL, there is a different way to measure the amount of compression that is significant, without involving human inspection. There is a point in most high bit-rate image compression technologies when the amount of loss is insignificant; this occurs when the compression error falls well below the inherent noise floor of the imaging device.

Back in March '07, Scott Billups, George Palmer and Mark Chiolis helped us out with a Viper test shoot at Plus-8 to compare HDCAM-SR to CineForm 444 -- thanks again guys. Follow following this blog when remember many posts on the subject. Using a Billups Chart in front of the Viper in Filmstream mode, the stationary image noise power was measured in PSNR at 38.97dB -- this is the amount of difference between adjacent frames due to sensor noise or other environmentals, from an otherwise completely static image. Yet the difference between any uncompressed frame and the HDCAM-SR equivalent was 49.8dB, more than 10dB less distortion (massive) than the inherent noise level. CineForm 444 was 51.03db, better still (the higher number means less distortion.) While adjacent frame noise is not normally measured as a distortion, it is a very good indicator in the amount change that you must be less than for the image to look the same after compression -- assuming you are also trying to store the noise which I feel is an important part of being visually lossless. In film scans, the film grain is the noise floor you need to be less than, and the compression should preserve the shape of the grain to reach VL.

Taking this test further, and I get little wacky, but please bear with me. I average 72 frames of the uncompressed stationary image to extract out the sensor noise -- the 72 gives and effective 37dB of noise suppression. This new average frame is the ideal frame; it is what the camera was trying to see if noise could be eliminated in the camera. When I compare each unique uncompressed frame to the ideal, we can now measure the uncompressed acquisition distortion, just like as it was a lossy format. The PSNR error between uncompressed signal and the ideal frame was 41.86dB. When you compare the same ideal frame to the compressed results, you get the values of 41.38dB and 41.63dB for HDCAM-SR and CineForm respectively (CineForm 444 was compressing to around 5:1 using Filmscan1 mode and HDCAM SR compressed to around 4:1.) The less than 0.5dB difference between the lightly compressed solutions and the uncompressed capture, is why CineForm and Sony SR can safely claim to be visual lossless.

Ideal image:

Uncompressed image:

HDCAM-SR image:

CineForm image:

Download the full screen DPX versions here.

Notes
1: See in the black background to see how much noise has been removed through averaging sequential frames.

2: The reduction in noise in the ideal image can make it look softer, as noise is often seen as a visual cue for sharpness--add some noise to it and will seem sharper again.

3: There is some edge enhancement that must have occurred in camera, as all the images show it, so it is not a compression artifact.

Update: 10/1/07, Here is a good write up on the Viper's edge enhancement that I commented on in note 3.

Friday, September 21, 2007

10-bit log vs 12-bit linear

I have written on the subject of linear vs. log encoding before, but the issue keeps coming up as more users attempt to compare REDCODE to CineForm RAW. While there is plenty of mythology around most of what RED does, the smoke around REDCODE is clearing, so comparisons will and are being made between the only two wavelet based RAW compressors in existence. While I would love to do a head-to-head test starting with the same uncompressed RAW frames, that will be difficult until the Red RAWPORT is ready. So this article does not attempt to do a quality comparison between REDCODE and CineForm RAW; instead I hope to reveal the impacts of 10-bit log (used by CineForm RAW) vs. 12-bit linear (publicly disclosed as used by REDCODE) as it relates to lossy* compression.

* Lossy - sounds bad doesn't it? But even visually lossless compression is not mathematically identical to the original. To reliably achieve image compression above about 2:1, a lossy compression is required.

In the cases of both CineForm RAW and REDCODE compression, the decision to apply a log or linear curve is a design choice, not a software/hardware limitation. CineForm compression delivers up to 12-bit precision (see info on CineForm 444), so 12-bit linear compression with CineForm RAW is certainly an option. Similarly, REDCODE could have chosen to compress 10-bit log; both algorithms could compress 12-bit log with suitable source data. So while there are marketing advantages to numbers 12 over 10, and similarly some marketing advantage to Log over Linear (too often incorrectly associated with "video"), there are real world quality impacts that we want to explore in more detail.

* For those wanting a refresher on Log vs. Linear vs. Video Gamma, please see this ProLost blog entry by Stu Maschwitz.

For this post I tried doing a little research on log vs. linear compression impacts on human visual artifacts, but I didn't get very far. Almost all compression analysis for video and still imagery is done on 8-bit data that already has some display curve applied (typically a 2.2 gamma curve.) There are good reasons for this; when testing for the visual impact of compression, you typically test using the target display presentation format, i.e. resolution, gamma curve and bit-depth match the output display. For digital cinematography you can't assume an output curve, as you are shooting to allow for a wide range of post processing before it is delivered to a wide variety of display types with differing curves - there is no way of knowing your output delivery curve at shoot time.

There is an assumption in the above paragraph that compression introduces less distortion when applied to the output curve. We see this every time we switch on the television or download a video; the compression is applied to the final color and gamma corrected sequence. An alternative approach might be to encode with one curve, and have the decoder output to another, yet we don't see this much, certainly never in distribution formats. All this gets into the hairy subject of human visual modeling and the suppression of noise in the shadow regions of the image. The reason common curves like 2.2 gamma are applied for distribution derives from the classic book Video Demystified by Keith Jack :
...[has the] advantage in combatting noise, as the eye is approximately equally sensitive to equally relative intensity changes.
There is a lot in that short sentence. Starting with the second part about the eye's sensitivity, light that changes intensity from 2 to 4 (this could number candles or photons per unit time) is perceived the same relative to increasing brightness from 50 to 100 - each is perceived as getting approximately twice as bright. Now think of the analog broadcast days, where the channel is very sensitive to noise. Noise is an additive function, so noise of + or - 1 in this example could result in reception of 1 to 5 and 49 to 101. The noise in the brighter image will not be seen, yet the darker values/image are significantly distorted. Gamma-encoding the source (2 to 4) would produce something like 28 to 38, and 50 to 100 would be transmitted as 122 to 167 (using 2.2 gamma.) So even with the same noise added to the gamma-corrected value, the final displayed value (the display device reverses the gamma) would be 1.8 to 4.1 and 49.5 100.5. These resulting numbers greatly improve the darker regions of the image without compromising the highlights.

So what does all this have to with digital image compression? The introduction of compression is equivalent to the additive noise effect I just discussed. While compression artifacts are not as random as in the analog world, they are additive in the same way, so the impact to shadow regions of the image is the same. Some might think you can design compression technology that compresses the shadows less - sounds like a great idea - yet that already exists, adding a curve to pre-emphasize the shadows does exactly this. Let's look at why compression noise is additive; if you don't care about that skip to the next paragraph.

Examining why compression distortion is additive, just like analog noise, requires a base understanding of image compression. Visual compression technologies like DCT and Wavelets divide the pixel data by frequency, where low frequency data is more important to the eye than high frequency data, which is exploited to reduce the amount of data transmitted. The simplest compression example is when we transmit the average values (low frequency = (v1+v2)/2) of adjacent pixels, at full precision, and also transmit the difference of the adjacent pixels (high frequency = (v2-v1)/2) with less precision (compression through quantization). That is the basis of DCT and Wavelets; differences between the two arise in how the low and high frequency values are calculated. Let's reconsider our original bright pairs 2,4 and 50,100; imagine they are adjacent pixel values. Low pass data is (2+4)/2 = 3, and (50+100)/2 = 75; the high pass data is (4-2)/2 = 1, and (100-50)/2 = 25. If we transmit the data with quantization (no lossy compression), the original image can be perfectly reconstructed as 3-1=2, 3+1=4 and 75-25=50, 75+25=100. Now to model compression let's quantize the high frequency components by 2. With decimal points rounded off for compression we get (4-2 )/(2*2) = 0, and (100-50)/(2*2) = 12, now the reconstructed image is 3,3 and 51,99 (e.g. 75-12*2 = 51, 75+12*2 = 99.) All the shadow detail has been lost, yet the highlights are visually lossless. Distortion due to quantization is +/-1 in the shadows and also +/-1 in the highlights - quantization impacts dark and light regions equally, just as in analog noise. Doing the same compression with curved data (28,38 and 122,167) yields reconstructed values as 29,37 and 122,166 which is then displayed as 2.1,3.6 and 50.4,99.2, leaving significantly more shadow detail. This example shows is why compression is typically applied to curved data.

If linear is so poor with shadows, wouldn't the optimum curve have each doubling of light (each stop,) be represented with the same amount of precision? This seems completely reasonable. Instead of storing linear light, each uses the values 2048 to 4095 to represent the last stop of light, so why not divide the available values amongst the number of stops the camera can shoot. Let's say (for simple math) your camera has around 10-stops of latitude, that would place around 400 levels per stop or 100 levels per stop in 10-bit. Now in your creative post-production color grading, it doesn't matter whether you use the top five stops or the bottom five to create a contrasty image - the quality should be the same. It turns out we don't have an ideal world, so while moving compression of the top stop of 2048-4095 down to around 900-1000 (10-bit) is fine, we find that expanding the 10th stop values of 0 to 8 up to 0 to 100, while preserving all the shadow detail, also preserves (beautifully) all the details of sensor noise, which is always present, and which is difficult to compress. It may also be obvious that expanding the "8" value to "100" doesn't gains you 100 discrete levels in the last stop -- for that you would need the currently impossible, 16-bit sensor with about 90dB SNR. So while these new cameras claim 11-stops, don't go digging too far into the shadows.

* NOTE: 400 levels per stop using 12-bit precision sounds 4 times better than using 10-bit. However, once you compress the image the difference mostly go away. To achieve the same data rate the 12-bit encoder has to quantize its data 4 times as much. The 12-bit compression only really starts to pay off with very little compression, say between 5:1 and 2:1.

The curve that is applied is a compromise between compression noise immunity and coupling too much sensor noise into the output signal. As a result there is no standard curve for digital acquisition, as the individual sensor characteristics and bitrate of the acquisition compression all play a part when designing a curve. This goes for the Thompson Viper Filmstream curve recording to an HDCAM-SR deck and also to an SI-2K recording to CineForm RAW -- the curves are different. But as long as the curve is known, it is reversible, allowing linear reconstruction as/when needed. So the curve is just a pre-filter for optimum quality compression.

Now for some real world images. To clearly demonstrate this issue I started with an uncompressed image that I shot RAW with my 6MPixels Pentax *IST-D. Using Photoshop I produced a 16-bit linear TIFF source for After Effects. Here is the source image displayed as linear without any gamma correction.



Here the source is corrected with a 2.2 gamma for the display, looking very much like the shooting environment.


To help showcase the shadow distortion, I zoomed in on a dark region that has some worthy detail I then applied a little "creative" addition by increasing the gamma to 3.0 (from 2.2) to enhance the shadow detail. Below is the same image encoded using linear to 4:4:4 using the worst quality settings for CineForm and JPEG2000. (I used the worst (highest compression) setting to help enhance compression artifacts -- as bitrates increase the artifacts diminish). I would have preferred to only use JPEG2000 (I don't like running CineForm at this low quality), however the AE implementation of J2K via QuickTime is only 8-bit so it introduces banding as well as linear compression issues. You can see there is a different look to JPEG2000 and CineForm when heavily compressed, yet they both show problems with the shadows with a linear source. With CineForm set to Low and JPEG2000 set to 0 (on the quality level) the output compression ranged between 23:1 and 25:1. The images have their linear output corrected to a gamma of 3.0. Click any image to see them at 1:1 scale.

While there is plenty of compression artifacts in the dark chrome of the lamp, the white of the lamp shade is showing the noise becoming very blotchy as the natural detail/texture carried in the noise is lost.

The images below have a Cineon log curve applied before compression. The Cineon curve, while good for film, is not well optimized for digital sources. It sets the black level to 95 and the white to 685, giving you around 9-bits of curve to cover the 12-bit linear source. Yet even still, the results show the benefit of the log encoding. The images below have their Cineon curve reversed and the Levels filter 3.0 gamma applied.

So designing a log curve that is optimized for the camera's sensor and its compression processing will generate superior shadow quality through compression processing than will linear compression, but without visible impact to the highlights. While the quality on the latter images is superior, it is worth noting that the bitrate didn't change more than a couple of percent up or down. Although it's a different topic, this also demonstrates that compression data rate (only) is not a good indicator of image quality.
So I don't end the pictorial by only showing heavy compression, here are a couple of log encoded screen shots at 8.8:1 and 5.5:1 compressed; notice at these higher data rates the Cineon log coding looks identical to the 16-bit TIFF source. The bottom line is that properly designed log curves optimized for the camera will provide better resulting images than coding linear data.


The case for Linear:

While I have done my best to make the case for log or gamma encoding before compression, are there any advantages to linear coding in general? Firstly, uncompressed 12-bit does contain more tonal information in the highlights without sacrificing shadows. You would likely store the data as a linear 16-bit TIFF sequence, which will provide very large amounts of data to deal with -- this 6MPixel example at 24fps would be 829 MB/s (for 4K images you'll generate 1+GB/s.) If your workflow includes uncompressed 10-bit log DPX files, your data rate is still high at 552 MB/s (768MB/s for 4K) so it might seem the difference is small enough to stick with the 12-bit. While mathematically you have more data you would likely never see the difference.

Just like uncompressed, compressed linear data has more tonal detail in the last couple stops -- so an overexposed image may do well through compression. Linear encoding can be considered as a curve that emphasizes the highlights as far as the human eye is concerned, so there may some shooting conditions where that emphasis is beneficial. Yet overexposing your digital cameras image lends to unwanted clipping; generally for digital acquisition an under-exposure is preferred.

Linear coding also looks great in mathematical models that measure compression distortion. Algorithms like PSNR and even SSIM interpret curves as producing unwanted distortion (when referencing the linear image), even if the results actually look better with a curve applied. This is one reason I'm careful when using only these measurements when tweaking CineForm codec quality as you are in danger of making an image look better to a computer but not to the final viewer.

Linear compression eats shadow noise - that may be perceived as a good thing. Many have discussed in the Red online forums that wavelets can be used as noise reduction filters - that is true. Unfortunately is it not possible to completely separate noise form detail, some detail will be lost though noise filtering. Noise filtering can help with compression, giving the compressor less to do as a means to reduce the data rate. While noise can be added back, lost information cannot. If you can encode the image including the shadow noise, you provide the most flexibility in post as noise can always be filtered later.

Finally, linear coding is a little easier when performing operations like white balance and color matrix as these operations typically occur in the camera before the curve is applied, yet before the compression stage of all traditional cameras: DV, HDV, DVCPRO-HD, HDCAM, etc. All these camera technologies use curves so that compression to 8-bit still provides good results. In the new RAW cameras, white balance and color matrix operations are delayed into the post production environment, which is one of the key reasons that makes RAW acquisition so compelling -- improved flexibility through a wider image latitude. If you apply a curve to aide compression you have to remove that curve before you can correctly do white balancing, saturation and linear operations in compositing tools. Now if the curve is customized for the camera, like Viper Filmstream or SI-2K's log curve, downstream tools may not know how to reverse the curve to allow linear processing. This can be a real workflow concern, so vendors like Thompson provide the curves for Viper Filmstream, and with the SI-2K using CineForm RAW the curve management is handled by the decoder, presenting linear upon request.

Note: For those who want to know the curve used by the SI-2K, it is defined by output = Log base 90 (input*89+1).

While converting curved pixel data back to linear for these fundamental color operation does add a small amount of compute time, this is all under-the-hood in the CineForm RAW workflow. Our goal is working towards the most optimized workflow without sacrificing quality. We view curves as one of the elements required for good compression. The black box of our compression includes the input and output curves as needed. If you consider this whole black box, CineForm RAW supports linear input and linear output, but without linear compression artifacts.

Wednesday, September 19, 2007

An "Intermediate Codec" -- Is that term valid anymore?

Some of you may have seen our most recent announcement that CineForm compression has been selected for a 300-screen digital theater rollout in India. Please check out the press release on CineForm.com. While we are very excited about participating in this new market, it has put our codec in an unusual position; our software tools now participate in every part of the film production workflow. Acquisition: We are the acquisition format in cameras like Silicon Imaging and soon Weisscam (see Weisscam press release), we are inside digital disk recorders like Wafian, and we are an output format in the upcoming CODEX recorders. Post-Production: We are widely used natively as an online post-production format on the PC and increasingly on Mac, and now Exhibition: the same format used throughout acquisition and post is now beginning to drive digital theater projection at HD, 2K, and hopefully beyond.

CineForm created its compression technology because codecs designed for tape acquisition or fixed-channel distribution were simply not good enough for the visual quality and multi-resolution workflow demands of advanced post-production. We initially became the “intermediate” between the mediums of heavily compressed source formats and heavily compressed distribution formats. But it turns out if you are good at post, you can also be good at some markets for acquisition and distribution. I need to stop saying that codecs designed for acquisition and distribution “suck” for post, as we are now the exception. :) Now, it is still very true that no one compression format is suitable for all markets, so no one will be downloading CineForm for streaming media; H264 is 10 times more efficient for that market. Yet try editing H264 and you will know why CineForm has its market.

The design parameters for CineForm compression have not changed. It is well known in compression circles that of the three design parameters - quality, speed, and size - you can only pick two. CineForm is one of the few to select quality and speed. Acquisition and distribution formats typically choose size first. In the professional acquisition space, size is becoming less important (except you don't want to store 4k uncompressed; even 2k/HD is a burden for many) as hard disk and flash-based recording systems don't limit to 25Mbit/s as DV/HDV tapes have done--file size can now increase to reach your desired quality. Digital theater markets are also very much quality-driven, and less sensitive to size, as today they are already storing between 80 and 250Mb/s for compressed content delivered to the screen. So why did Cinemeta in India select CineForm? CineForm’s delivered quality and speed together result in a systems cost savings. They get CineForm’s acknowledged “quality”, while the “speed” means it can be played back in software on standard PC platforms without HW acceleration that is required for competing solutions.

Tuesday, September 04, 2007

Congratulations to Red

If you haven't heard, Red just shipped their first 25 cameras -- likely the most anticipated camera launch in recent history. During Red's short public existence, there have been many changes in the way people consider their camera purchases, with a far greater choice of resolution, frame-rates, lens mounts, sensor size, format factor and camera designs that actually consider the post-production workflow upon acquisition. Cameras are no longer bound by the existing standards, enabling many new players to enter the market, and while Red isn't the first, it is the most industry-changing to date.

The Red One is yet to approach feature completion, and I haven't gotten to play with a Red camera yet, but the posted images from the first customers are looking nice. Not that we have the ultimate imaging device. I feel the opposite--this is only the start, as there is plenty of room to grow for Red, for the existing players, and for any other startup that wants to give it a go.