# Introduction to Pinholes

Pinhole cameras let light converge into a tiny hole (a pinhole!) and then diverge on the other side onto an image plane. In old cameras, this plane was the film. In digital cameras, it's a little more complicated, but the premise remains the same. This allows us to create this lovely diagram that forms the core of our vision systems:

Real WorldCameraf(not to scale)py  

# Interpreting the Triangles

Ok, so what the heck does that mean? This is a vertical side-view of a camera, where the intersection of the two dashed lines is the pinhole of the camera. The section of the horizontal dashed line on the camera's side (labelled as f) is the distance between the pinhole and the image plane.

The image plane's height is marked as py, indicating pixel y-axis length, and it represents a single column of pixels in the captured images.

This diagram doesn't have to be a vertical side-view, though; we can simply rename the py line to px and it is now a horizontal side-view, where the px line represents a single row of pixels in the captured image.

# Applications of the Triangles

These triangles allow us to approximate the location of a target, provided that we can detect it with computer vision, and that we have some knowns about it. Let's draw in a few more things on the diagram:

Real WorldCameraf(not to scale)pytargethdtarget(in image)ph  

That was a lot. Let's take it down part by part: the point labelled target is whatever we're targeting, possibly a piece of retroreflective tape on the wall? target (in image) indicates the position of the pixel that shows the target in the camera. You'll notice that it's upside down on the diagram; this is a side-effect of the pinhole structure.

Next, the segment marked h is the height of the target, relative to the camera's height (the distance vertically from the camera to the target). The segment marked d is the distance of the target from the camera laterally. The segment marked ph is the distance, in pixels, from the centre of the image to the pixel that the target appears on in the image.

So how do we use this diagram? Well, we can use the principle of similar triangles to solve for various unknowns: the triangle on the inside of the camera is proportional to that outside the camera. Here's what that looks like in an equation:

\frac{h}{p_h} = \frac{d}{f}

Since p_h and f are known (p_h because it's in the image and f because it's constant and intrinsic to the camera), we can solve for either d or h as long as we know the other one. For a real-world application, if we know that the retroreflective tape is 8 feet off the ground, we can solve for how far away it is from the camera. For example's sake, let's say that it's pixel height is 200 px, and the camera's focal length is 678 px. We can solve for d by cross-multiplying:

\begin{split}
\frac{8 \textrm{ft}}{200 \textrm{px}} &= \frac{d}{678 \textrm{px}} \\
\frac{8 \textrm{ft} \times 678 \textrm{px}}{200 \textrm{px}} &= d \\
27.12 \textrm{ft} &= d
\end{split}

It's as easy as that! No complicated trigonometry or linear algebra. Things get quite a bit harder when you don't know either distance or height, or if your target is more complex than a simple piece of retro tape or apriltag that can be represented by a single point.