Move some functions to kdm_with_metadata

[dcpomatic.git] / doc / design / resampling.tex
diff --git a/doc/design/resampling.tex b/doc/design/resampling.tex

index 65182411e74f3571579b8a4f850535bdc5b8e780..22d88e3759aca46d27f82b34ace79676659f82dd 100644 (file)
--- a/doc/design/resampling.tex
+++ b/doc/design/resampling.tex
@@ -1,7 +1,8 @@
  \documentclass{article}
+\usepackage{amsmath}
  \begin{document}
  
-Here is what resampling we need to do.  Content video is at $C_V$ fps, audio at $C_A$.  
+Here is what resampling we need to do.  Content video is at $C_V$ fps, audio at $C_A$.
  
  \section{Easy case 1}
  
@@ -18,8 +19,9 @@ $C_V$ is a DCI rate, $C_A$ is not.  e.g.\ if $C_V = 24$, $C_A = 44.1\times{}10^3
  \textbf{Resample $C_A$ to the DCI rate.}
  
  \section{Hard case 1}
+\label{sec:hard1}
  
-$C_V$ is not a DCI rate, $C_A$ is.  e.g.\ if $C_V = 25$, $C_A =
+$C_V$ is not a DCI rate, $C_A$ is, e.g.\ if $C_V = 25$, $C_A =
  48\times{}10^3$.  We will run the video at a nearby DCI rate $F_V$,
  meaning that it will run faster or slower than it should.  We resample
  the audio to $C_V C_A / F_V$ and mark it as $C_A$ so that it, too,
@@ -31,5 +33,69 @@ resample audio to $25 * 48\times{}10^3 / 24 = 50\times{}10^3$.
  \medskip
  \textbf{Resample $C_A$ to $C_V C_A / F_V$}
  
+\section{Hard case 2}
+
+Neither $C_V$ nor $C_A$ is not a DCI rate, e.g.\ if $C_V = 25$, $C_A =
+44.1\times{}10^3$.  We will run the video at a nearby DCI rate $F_V$,
+meaning that it will run faster or slower than it should.  We first
+resample the audio to a DCI rate $F_A$, then perform as with
+Section~\ref{sec:hard1} above.
+
+\medskip
+\textbf{Resample $C_A$ to $C_V F_A / F_V$}
+
+
+\section{The general case}
+
+Given a DCP running at $F_V$ and $F_A$ and a piece of content at $C_V$
+and $C_A$, resample the audio to $R_A$ where
+\begin{align*}
+R_A &= \frac{C_V F_A}{F_V}
+\end{align*}
+
+Once this is done, consider 1 second's worth of content samples ($C_A$
+samples).  We have turned them into $R_A$ samples which should still
+last 1 second.  These samples are then played back at $F_A$ samples
+per second, so they last $R_A / F_A$ seconds.  Hence there is a
+scaling between some content time and some DCP time of $R_A / F_A$
+i.e. $C_V / F_V$.
+
+
+\section{Another explanation}
+
+Say we have some content at a video rate $C_V$ and we want to
+run it at DCP video rate $F_V$.  It's always the video rates that
+decide what to do, since we don't have an equivalent to audio
+resampling in the video domain.
+
+We can just mark the video as $F_V$ and it will run $F_V / C_V$ faster
+than it was.  Let's call the factor $S = F_V / C_V$.
+
+An equivalent for audio would be to take the content audio at a rate
+$C_A$ and mark it as $C_A S$.  Then the same audio frames will be run
+more quickly, just as the same video frames are being.  The audio would be
+in sync with the video since it has been sped up by the same amount.
+
+In practice we can't do this, in general, as the only allowed DCP
+audio rates are 48kHz and 96kHz.  Instead, we'll resample to some new
+rate $P$ and mark it as $Q$ where $Q / P = S$.  Resampling does not
+change the sound, just how many samples are being used to describe it,
+so this is equivalent to marking the original, unsampled audio as $C_A S$.
+
+Then we set $Q = 48$kHz so that $P = 48000 / S$, or $P = C_V F_A
+/ F_V$.
+
+Note that the original sampling rate of the audio content is
+irrelevant.  Also, skipping or doubling of video frames is analagous
+to audio resampling: the data are the same, just represented with more
+or fewer samples.
+
+
+\section{Further thoughts}
+
+Consider the case where the content video rate $C_V = 24$ and the DCP
+video rate $F_V = 25$.  Then 46080 (resampled) samples of audio
+content last 1s at the original rate or $24/25$s at the DCP rate and
+1s of DCP is made up of 48000 (resampled) content samples.
  
  \end{document}