📜 Gate 3: The Hall of Convergence¶
You now stand before the Third Gate, known as the Hall of Convergence.
Here, the fractured shards of vision must be brought back together into a single whole.
The walls of this hall shimmer with fragments of broken images, floating and misaligned — waiting for your hand to restore them.
👁️ The Archivist’s Words¶
You have already passed through the Gates of Inversion and Recognition.
You have learned to return fragments to their true form and to reveal hidden correspondences through the Algorithms of Sight.But now, Visioneer, comes the trial of Convergence.
🔑 Your Task¶
You are given:
- Region 1 (ref_plane_img) — the unbroken fragment, the sacred base plane.
- Regions 2, 3, and 4 (warped_region_1.png,warped_region_2.png,warped_region_3.png) — shards that have been restored from distortion, but remain displaced.
- Their Homographies — the secret maps of alignment.
Your mission:
- Warp each shard into the plane of Region 1 using the given homographies.
- Stitch them into a single canvas, reconstructing the hidden whole.
- Ensure that Region 1 remains the base foundation, unaltered and true.
⚠️ The Archivist’s Warning¶
Beware, Visioneer.
Many attempt to cheat, letting false algorithms do the work.
But this Gate tests your mastery, not your shortcuts.The careless will summon only chaos — broken seams, misaligned truths, or blank voids.
Only by true effort will you restore the Vision.
🎯 Goal¶
Pass through the Hall of Convergence, and you will emerge not as an apprentice but as a Weaver of Worlds — one who can take the scattered and make it whole.
## Importing recipes
%matplotlib inline
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os
from google.colab import drive
# Mounting google drive
drive.mount('/content/drive')
def display_images_in_row(images, cmap='gray'):
fig, axs = plt.subplots(1, len(images), figsize=(5 * len(images), 5))
if len(images) == 1:
axs = [axs]
for ax, img in zip(axs, images):
ax.imshow(img, cmap=cmap)
ax.axis('off')
plt.tight_layout()
plt.show()
base_dir = "/content/drive/MyDrive/ES666CV/images/C" # Base directory containing I1, I2, ..., I5
all_images = [] # List to store restored images
for dataset in sorted(os.listdir(base_dir)):
dataset_path = os.path.join(base_dir, dataset)
if not os.path.isdir(dataset_path):
continue
# Reference image to get output size
ref_img_path = os.path.join(dataset_path, "ref_plane_img.png")
ref_img = cv2.imread(ref_img_path)
ref_img = cv2.cvtColor(ref_img, cv2.COLOR_BGR2RGB)
# List to store restored images for this dataset
images = [ref_img]
# Loop through warped regions 2,3,4
for i in range(2, 5):
warped_img_path = os.path.join(dataset_path, f"warped_region_{i}.png")
# Load image and homography
warped_img = cv2.imread(warped_img_path)
warped_img = cv2.cvtColor(warped_img, cv2.COLOR_BGR2RGB)
# Inverse warp
images.append(warped_img)
# Add restored images for this dataset to the main list
all_images.append(images)
for idx,imgs in enumerate(all_images):
print(f"\nDisplaying input images for I{idx+1}: Reference Plane Image, Warped Image 1, Warped Image 2,Warped Image 3")
display_images_in_row(imgs, cmap=None)
Task 1: Implement Inverse Warping from scratch (4 marks)¶
Instructions:
Implement
warpPerspective_from_scratch(img, H, output_shape)to warp the input image img to a new plane. The homography H maps coordinates in the source image to the target plane, and output_shape defines the size (height, width) of the output.Implement
restore_images_in_reference_plane(img, H, output_shape)to map a warped image img back to the reference plane using the inverse of H. Here, H is the homography that originally mapped the reference plane to the warped image.
Homography Mapping:¶
A homography $H$ maps points $(x, y)$ in the source image to points $(x', y')$ in the target plane:
$$ \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$
Here, $H$ is a $3 \times 3$ matrix representing a projective transformation.
Inverse Warping Principle:
To compute the value of a pixel in the target image, we perform inverse mapping:
$$ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = H^{-1} \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} $$
- $H^{-1}$ maps a pixel in the target plane back to its corresponding location in the source image.
- This ensures that every target pixel is assigned a valid value from the source, avoiding holes.
Bilinear Interpolation:
Since the mapped source coordinates $(x, y)$ are generally non-integer, we use bilinear interpolation to compute pixel values:
$$ I(x', y') = (1 - \delta_x)(1 - \delta_y) I(x_0, y_0) + \delta_x (1 - \delta_y) I(x_1, y_0) + (1 - \delta_x) \delta_y I(x_0, y_1) + \delta_x \delta_y I(x_1, y_1) $$
Where:
- $(x_0, y_0), (x_1, y_1)$ are the four integer neighbors of $(x, y)$
- $\delta_x = x - x_0$, $\delta_y = y - y_0$
This produces smooth pixel values when mapping to the reference plane.
def warpPerspective_from_scratch(img, H, output_shape):
"""
Inverse warp an image using a given homography matrix H.
Parameters:
img : np.array
Input image to be warped.
H : np.array
Homography matrix mapping source -> target plane.
output_shape : tuple
(height, width) of the target image.
Returns:
restored : np.array
The warped image of shape output_shape.
"""
h, w = output_shape
restored = np.zeros((h, w, img.shape[2]), dtype=img.dtype)
# Precompute inverse for mapping from output plane to input plane
H_inv = np.linalg.inv(H)
for y_out in range(h):
for x_out in range(w):
# Create homogeneous coordinates for the output pixel
out_coords = np.array([x_out, y_out, 1])
# Map back to input image using inverse homography
src_coords = H_inv @ out_coords
src_coords /= src_coords[2] # Normalize
x_src, y_src = src_coords[0], src_coords[1]
# Check if the source coordinates are inside the input image
if 0 <= x_src <= img.shape[1]-1 and 0 <= y_src <= img.shape[0]-1:
# Bilinear interpolation
x0, y0 = int(np.floor(x_src)), int(np.floor(y_src))
x1, y1 = min(x0 + 1, img.shape[1]-1), min(y0 + 1, img.shape[0]-1)
dx, dy = x_src - x0, y_src - y0
for c in range(img.shape[2]):
val = (img[y0, x0, c] * (1 - dx) * (1 - dy) +
img[y0, x1, c] * dx * (1 - dy) +
img[y1, x0, c] * (1 - dx) * dy +
img[y1, x1, c] * dx * dy)
restored[y_out, x_out, c] = val
return restored
def restore_images_in_reference_plane(img, H, output_shape):
"""
Restore a warped image back to the reference plane using inverse homography.
Parameters:
img : np.array
Warped image to be restored.
H : np.array
Homography matrix mapping reference -> warped plane.
output_shape : tuple
(height, width) of the reference plane.
Returns:
warped : np.array
Restored image in the reference plane.
"""
h_out, w_out = output_shape
## TODO: Compute H inverse: H_inv
## TODO: For each pixel in the output (reference) plane, compute the corresponding
## pixel location in the warped image using H_inv.
H_inv = np.linalg.inv(H)
# Use the warpPerspective function with inverse homography
warped = warpPerspective_from_scratch(img, H_inv, output_shape)
return warped
restored_all = [] # List to store restored images
for dataset in sorted(os.listdir(base_dir)):
dataset_path = os.path.join(base_dir, dataset)
if not os.path.isdir(dataset_path):
continue
print(f"\nProcessing dataset: {dataset}")
# Reference image to get output size
ref_img_path = os.path.join(dataset_path, "ref_plane_img.png")
ref_img = cv2.imread(ref_img_path)
ref_img = cv2.cvtColor(ref_img, cv2.COLOR_BGR2RGB)
h_out, w_out = ref_img.shape[:2]
# Directory containing homography matrices
homography_dir = os.path.join(dataset_path, "homographies")
# List to store restored images for this dataset
restored_dataset = [ref_img]
# Loop through warped regions 2,3,4
for i in range(2, 5):
warped_img_path = os.path.join(dataset_path, f"warped_region_{i}.png")
H_path = os.path.join(homography_dir, f"H_region_{i}.txt")
# Load image and homography
warped_img = cv2.imread(warped_img_path)
warped_img = cv2.cvtColor(warped_img, cv2.COLOR_BGR2RGB)
H = np.loadtxt(H_path)
# Inverse warp
restored_img = restore_images_in_reference_plane(warped_img, H, (h_out, w_out))
restored_dataset.append(restored_img)
# save restored image
# output_dir = os.path.join(dataset_path, "restored")
# os.makedirs(output_dir, exist_ok=True)
# save_path = os.path.join(output_dir, f"restored_warped_region_{i}.png")
# cv2.imwrite(save_path, restored_img)
# print(f"Saved: {save_path}")
# Add restored images for this dataset to the main list
restored_all.append(restored_dataset)
for idx,imgs in enumerate(restored_all):
print(f"\nDisplaying restored images for I{idx+1}:")
display_images_in_row(imgs, cmap=None)
Displaying restored images for I2:
Displaying restored images for I3:
Displaying restored images for I4:
Displaying restored images for I5:
📖 Theory Question: Explain the difference between forward warping and inverse warping. Why is inverse warping generally preferred?
Forward Warping vs Inverse Warping¶
When transforming an image using a homography matrix $H$, we aim to map pixels between two coordinate systems — typically between a source image and a destination (target) plane.
Let a pixel in the source image be represented in homogeneous coordinates as:
$$ \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$
Here:
- $(x, y)$ are coordinates in the source image.
- $(x', y')$ are the corresponding coordinates in the destination image.
- $H$ is the $3 \times 3$ homography matrix that maps one plane to another.
Forward Warping¶
In forward warping, each pixel from the source image is projected onto its new location in the destination image using:
$$ \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$
We then place the pixel value from $(x, y)$ into the position $(x', y')$ in the output image.
Problems with Forward Warping:¶
- Holes (Unassigned Pixels) — Some pixels in the destination image might not be hit by any source pixel, leading to gaps or black spots.
- Overlaps (Multiple Assignments) — Multiple source pixels may map to the same destination pixel.
- Difficult Interpolation — We don’t know easily which destination pixels correspond to valid source pixels, making it hard to fill the missing areas.
Hence, while conceptually straightforward, forward warping produces non-uniform and incomplete results.
Inverse Warping¶
In inverse warping, we iterate over each pixel in the destination image, and find where it came from in the source image using the inverse homography:
$$ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = H^{-1} \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} $$
Then, we sample the intensity at $(x, y)$ from the source image (using nearest-neighbor or bilinear interpolation) and assign it to the destination pixel $(x', y')$.
Advantages of Inverse Warping:¶
- No Holes — Every pixel in the destination gets a value.
- Smooth Sampling — We can use interpolation to get sub-pixel accuracy.
- Stable and Deterministic — Each output pixel is assigned exactly once.
Task 2: Keypoint Detection and Homography Estimation (4 marks)¶
Instructions:
Detect keypoints and compute descriptors for each image using SIFT. Then, match keypoints between each restored region and the reference image. You may use OpenCV functions for both detecting SIFT keypoints and matching descriptors.
Implement homography estimation using RANSAC from scratch. Do not use OpenCV’s findHomography for this step. Compute the homography matrix that aligns each region to the reference plane.
Keypoint Detection and Homography Estimation¶
1. Keypoint Detection and Descriptors:
- SIFT (Scale-Invariant Feature Transform) detects distinctive keypoints in an image.
- Each keypoint has a descriptor vector $\mathbf{d} \in \mathbb{R}^{128}$ that encodes local gradient information, invariant to scale and rotation.
2. Matching Keypoints:
- Given descriptors $\mathbf{d}_i$ from the source image and $\mathbf{d}_j$ from the target image, nearest neighbor matching is performed:
$$ \text{match}(\mathbf{d}_i) = \arg\min_j \|\mathbf{d}_i - \mathbf{d}_j\|_2 $$
- The ratio test filters ambiguous matches to reduce outliers.
3. Homography Estimation (DLT):
- A homography $H$ maps points $\mathbf{p}_i = [x_i, y_i, 1]^T$ in the source image to points $\mathbf{p}'_i = [x'_i, y'_i, 1]^T$ in the target image:
$$ \mathbf{p}'_i = H \mathbf{p}_i $$
- For $N \geq 4$ matched points, we solve the Direct Linear Transform (DLT) linear system:
$$ A \mathbf{h} = 0 $$
where $\mathbf{h}$ is the vectorized homography $H$.
- Normalization improves numerical stability by translating and scaling points so their centroid is at the origin and average distance is $\sqrt{2}$.
4. RANSAC for Robust Estimation:
- Randomly select 4 correspondences to compute a candidate homography $H$.
- Compute the reprojection error for all points:
$$ e_i = \|\mathbf{p}'_i - H \mathbf{p}_i\|_2 $$
- Points with $e_i < \text{threshold}$ are inliers.
- Repeat for many iterations and select $H$ with the maximum inliers.
- Finally, recompute $H$ using all inliers for a robust estimate.
def detect_keypoints_sift(img):
"""
Detect keypoints and compute descriptors using SIFT.
Parameters:
img : np.array
Input image in which keypoints are to be detected.
Returns:
keypoints : list
Detected keypoints in the image.
descriptors : np.array
Feature descriptors corresponding to the keypoints.
"""
## TODO: Convert image to grayscale
gray=cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
## TODO: Detect SIFT keypoints
sift =cv2.SIFT_create()
## TODO: Compute descriptors for the keypoints
keypoints,descriptors=sift.detectAndCompute(gray, None)
return keypoints, descriptors
def match_keypoints(des1, des2, ratio_thresh=0.75):
"""
Match descriptors between two images using nearest neighbor and ratio test.
Parameters:
des1 : np.array
Descriptors from the first image.
des2 : np.array
Descriptors from the second image.
ratio_thresh : float
Ratio threshold for filtering ambiguous matches.
Returns:
good_matches : list
List of filtered good matches passing the ratio test.
"""
## TODO: Match descriptors using nearest neighbor (Using OpenCV)
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
matches = bf.knnMatch(des1, des2, k=2)
## TODO: Apply ratio test to filter matches
good_matches = []
for m, n in matches:
if m.distance < ratio_thresh * n.distance:
good_matches.append(m)
return good_matches
def compute_homography_ransac(kp1, kp2, matches, iterations=1000, threshold=5):
"""
Compute homography using matched keypoints and RANSAC.
Parameters:
kp1 : list
Keypoints from the first image.
kp2 : list
Keypoints from the second image.
matches : list
Matches between keypoints of the two images.
iterations : int
Number of RANSAC iterations.
threshold : float
Distance threshold to classify inliers.
Returns:
H : np.array
Estimated homography matrix mapping kp1 -> kp2.
"""
## TODO: Implement RANSAC to find the best homography
## TODO: Convert keypoints to coordinates
## TODO: Iteratively estimate H and count inliers
# Convert keypoints to NumPy arrays once
pts1 = np.float32([kp1[m.queryIdx].pt for m in matches])
pts2 = np.float32([kp2[m.trainIdx].pt for m in matches])
n_points = len(matches)
if n_points < 4:
raise ValueError("Not enough matches to compute homography")
max_inliers = 0
best_H = None
# Pre-allocate homogeneous coordinates to avoid rebuilding every iteration
pts1_hom = np.hstack([pts1, np.ones((n_points, 1))])
for _ in range(iterations):
# Randomly choose 4 correspondences
idx = np.random.choice(len(pts1), 4, replace=False)
H = cv2.getPerspectiveTransform(pts1[idx], pts2[idx])
pts1_transformed = cv2.perspectiveTransform(pts1.reshape(-1, 1, 2), H)
distances = np.linalg.norm(pts1_transformed.reshape(-1, 2) - pts2, axis=1)
inliers = np.sum(distances < threshold)
if inliers > max_inliers:
max_inliers = inliers
best_H = H
return best_H
homographies_all = [] # nested list: [dataset][region_index] -> H
for dataset_idx, dataset_images in enumerate(restored_all):
print(f"\nProcessing dataset {dataset_idx+1}")
homographies_dataset = []
# Use the reference image as base (assume region_1 is at index 0)
ref_img = dataset_images[0]
kp_ref, des_ref = detect_keypoints_sift(ref_img)
# Compute homography for each of the other restored regions
for region_idx in range(1, len(dataset_images)):
img = dataset_images[region_idx]
kp, des = detect_keypoints_sift(img)
matches = match_keypoints(des, des_ref)
H = compute_homography_ransac(kp, kp_ref, matches)
homographies_dataset.append(H)
print(f"Dataset {dataset_idx+1}, Region {region_idx+1} Homography:\n", H)
# visualize matches
img_matches = cv2.drawMatches(img, kp, ref_img, kp_ref, matches[:30], None, flags=2)
plt.figure(figsize=(12,6))
plt.imshow(img_matches)
plt.axis('off')
plt.show()
homographies_all.append(homographies_dataset)
Processing dataset 1 Dataset 1, Region 2 Homography: [[ 1.00748986e+00 7.64714288e-04 1.61247080e+03] [ 2.80265892e-03 1.00305069e+00 -2.12369959e+00] [ 2.86266790e-06 3.65196744e-07 1.00000000e+00]]
Dataset 1, Region 3 Homography: [[ 1.00356157e+00 -3.94564416e-03 -1.50509922e+00] [ 1.55809657e-03 9.99014080e-01 1.20998482e+03] [ 1.53620479e-06 -1.62282524e-06 1.00000000e+00]]
Dataset 1, Region 4 Homography: [[1.00316123e+00 5.60358195e-03 1.61233672e+03] [2.06421531e-03 1.00538736e+00 1.20944498e+03] [9.41852280e-07 2.22981271e-06 1.00000000e+00]]
Processing dataset 2 Dataset 2, Region 2 Homography: [[ 1.00037108e+00 -7.31640591e-04 8.00052389e+02] [-1.76493843e-04 9.99454729e-01 1.95406314e-01] [ 6.10369344e-07 -7.35597017e-07 1.00000000e+00]]
Dataset 2, Region 3 Homography: [[ 1.01877685e+00 -2.01013263e-02 -2.29758923e+00] [ 1.19151436e-02 9.96221084e-01 5.29100076e+02] [ 1.56894586e-05 -2.33424897e-05 1.00000000e+00]]
Dataset 2, Region 4 Homography: [[1.00979133e+00 6.31894281e-03 7.99856608e+02] [4.56404705e-03 1.00708588e+00 5.31874268e+02] [7.30161843e-06 8.64121916e-06 1.00000000e+00]]
Processing dataset 3 Dataset 3, Region 2 Homography: [[ 1.00076051e+00 -1.45010232e-02 4.12265358e+02] [ 1.74575328e-02 9.96119613e-01 -2.54675957e+00] [ 2.06006965e-05 -2.19985703e-05 1.00000000e+00]]
Dataset 3, Region 3 Homography: [[ 9.84491443e-01 8.58495826e-03 2.07376323e+00] [-1.02998382e-02 9.98684631e-01 2.74083388e+02] [-2.76402692e-05 3.21945092e-05 1.00000000e+00]]
Dataset 3, Region 4 Homography: [[ 7.43614063e-01 1.49346032e+00 3.96188687e+02] [-1.94324008e-01 1.85259519e+00 2.71562709e+02] [-5.83896062e-04 2.55050968e-03 1.00000000e+00]]
Processing dataset 4 Dataset 4, Region 2 Homography: [[ 9.88534742e-01 -1.30693390e-02 1.30749939e+03] [-2.86291313e-03 9.84518744e-01 7.25295446e+00] [-2.77311013e-06 -8.68913216e-06 1.00000000e+00]]
Dataset 4, Region 3 Homography: [[ 1.00018833e+00 1.81438081e-03 -1.35675484e-01] [-2.45809616e-04 1.00056098e+00 9.80012130e+02] [-3.99044266e-07 7.58490245e-07 1.00000000e+00]]
Dataset 4, Region 4 Homography: [[ 9.95123182e-01 -2.28978038e-02 1.30586172e+03] [-3.41800558e-03 9.75132587e-01 9.81559306e+02] [-2.15236762e-06 -1.61907900e-05 1.00000000e+00]]
Processing dataset 5 Dataset 5, Region 2 Homography: [[ 9.70381746e-01 -3.76849249e-03 8.00712606e+02] [-1.50105159e-02 9.91800895e-01 3.17009665e+00] [-2.16987084e-05 -5.57718265e-06 1.00000000e+00]]
Dataset 5, Region 3 Homography: [[ 9.91841682e-01 -7.26360773e-03 4.53042541e+00] [-2.72889546e-03 9.88931960e-01 6.01110799e+02] [-2.74223858e-06 -8.18955715e-06 1.00000000e+00]]
Dataset 5, Region 4 Homography: [[ 9.72587854e-01 1.21346935e-02 8.00734422e+02] [-1.75528764e-02 1.00651468e+00 6.00341767e+02] [-2.36251140e-05 1.19701164e-05 1.00000000e+00]]
📖 Theory Question: What could happen if you compute a homography using all matched points without handling outliers? Give an example.
When estimating a homography, we assume that all point correspondences between two images are correct.
However, in real-world conditions, feature detectors (like SIFT or ORB) often produce false matches — known as outliers.
If these outliers are not handled (e.g., by using RANSAC), they can significantly distort the estimated homography matrix.
Mathematical Explanation¶
A homography maps points between two planes as:
$$ \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$
If one or more correspondences $(x, y) \leftrightarrow (x', y')$ are incorrect,
the least-squares solution for $H$ will try to satisfy all points simultaneously —
including those that violate the true geometric relationship.
This leads to large reprojection errors and an inaccurate $H$.
Consequences¶
- Image misalignment: The warped image may appear twisted, scaled, or displaced incorrectly.
- Ghosting or tearing: When stitching, overlapping regions may not align, creating visible seams.
- Invalid perspective: Straight lines may appear bent or broken due to incorrect projective mapping.
Example¶
Suppose we have 100 matched keypoints between two images, but 15 of them are incorrect due to repetitive textures (e.g., windows on a building facade).
If you compute the homography using all 100 matches, the 15 outliers can cause the entire transformation to drift — making even correctly matched areas fail to align.
Using RANSAC, however, these 15 points would be discarded, and the homography would be estimated only from the 85 inliers, producing a stable and accurate transformation.
Task 3: Stitch the Images to Form the Original Image (2 Marks)¶
- Complete the function
stitch_with_known_homographies(base_img, other_imgs, homographies)to stitch all regions into the original reference plane. - Use your implementation of
warpPerspective_from_scratch()to warp each region into the base image plane. - Ensure the base image (ref_plane_img.png) remains unaltered and all warped regions are aligned correctly using the homographies you have calculated in the preevious step.
- Return the final stitched image with the same dimensions as the base image.
def stitch_with_known_homographies(base_img, other_imgs, homographies):
"""
Stitch multiple image regions into the plane of a base image using known homographies.
Parameters:
base_img : np.array
Reference image (Region 1) that serves as the base plane.
other_imgs : list of np.array
Restored regions (e.g., Regions 2, 3, 4) to be stitched.
homographies : list of np.array
Homography matrices mapping each region to the base image plane.
Returns:
stitched_image : np.array
The final stitched image containing all regions aligned to the base plane.
"""
## TODO: Determine the canvas size for stitching
## TODO: Place the base image in the canvas
## TODO: Translate homographies to canvas coordinates if needed
## TODO: Warp each region using your warpPerspective implementation
## TODO: Merge warped regions onto the canvas (e.g., simple overwrite or blending)
h_base, w_base = base_img.shape[:2]
# 1. Compute canvas bounding box
# Start with base image corners
corners = np.array([
[0, 0, 1],
[w_base, 0, 1],
[0, h_base, 1],
[w_base, h_base, 1]
]).T # shape (3,4)
all_x, all_y = [], []
# Add base image corners
base_corners = corners[:2, :] # x and y
all_x.extend(base_corners[0])
all_y.extend(base_corners[1])
# Add corners of other images transformed by their homographies
for img, H in zip(other_imgs, homographies):
if H is None:
continue
h, w = img.shape[:2]
img_corners = np.array([
[0, 0, 1],
[w, 0, 1],
[0, h, 1],
[w, h, 1]
]).T # shape (3,4)
warped_corners = H @ img_corners
warped_corners /= warped_corners[2, :]
all_x.extend(warped_corners[0])
all_y.extend(warped_corners[1])
# Determine canvas size
x_min, x_max = int(np.floor(min(all_x))), int(np.ceil(max(all_x)))
y_min, y_max = int(np.floor(min(all_y))), int(np.ceil(max(all_y)))
canvas_w = x_max - x_min
canvas_h = y_max - y_min
# 2. Initialize canvas
stitched_image = np.zeros((canvas_h, canvas_w, base_img.shape[2]), dtype=base_img.dtype)
# Offset to shift base image and warped images into canvas
offset = np.array([[ -x_min ], [ -y_min ], [0]])
# 3. Place base image
for y in range(h_base):
for x in range(w_base):
stitched_image[y - y_min, x - x_min] = base_img[y, x]
# 4. Warp and stitch other images
for img, H in zip(other_imgs, homographies):
if H is None:
continue
# Shift homography to canvas coordinates
H_canvas = np.eye(3)
H_canvas[:3, :3] = H
H_canvas[:, 2:3] += offset # adjust translation
warped_img = warpPerspective_from_scratch(img, H_canvas, (canvas_h, canvas_w))
mask = np.any(warped_img > 0, axis=2)
rows, cols = np.where(mask)
stitched_image[rows, cols] = warped_img[rows, cols]
return stitched_image
# ----------------------------
# Run stitching on all datasets
# ----------------------------
stitched_all = []
output_dir = "stitched_results"
os.makedirs(output_dir, exist_ok=True)
for dataset_idx, dataset_images in enumerate(restored_all):
print(f"\n[INFO] Processing dataset {dataset_idx+1}")
# First image is reference (Region 1)
ref_img = dataset_images[0]
# Remaining images are other regions
other_imgs = dataset_images[1:]
# Homographies for this dataset
homographies = homographies_all[dataset_idx]
# Stitch
stitched = stitch_with_known_homographies(ref_img, other_imgs, homographies)
stitched_all.append(stitched)
# Save
save_path = os.path.join(output_dir, f"stitched_dataset_{dataset_idx+1}.png")
cv2.imwrite(save_path, stitched)
print(f"[INFO] Saved stitched result: {save_path}")
# Show
stitched_rgb = stitched
plt.figure(figsize=(12,6))
plt.imshow(stitched_rgb)
plt.title(f"Stitched Result - Dataset {dataset_idx+1}")
plt.axis("off")
plt.show()
[INFO] Processing dataset 1 [INFO] Saved stitched result: stitched_results/stitched_dataset_1.png
[INFO] Processing dataset 2 [INFO] Saved stitched result: stitched_results/stitched_dataset_2.png
[INFO] Processing dataset 3 [INFO] Saved stitched result: stitched_results/stitched_dataset_3.png
[INFO] Processing dataset 4 [INFO] Saved stitched result: stitched_results/stitched_dataset_4.png
[INFO] Processing dataset 5 [INFO] Saved stitched result: stitched_results/stitched_dataset_5.png
📖 Theory Question: Why is it important that the base image remains unaltered during stitching?
During image stitching, it is important that the base image remains unaltered for the following reasons:
Reference for Alignment:
The base image serves as the fixed reference plane onto which other images are warped and aligned. Modifying it would distort the reference, making it impossible to accurately compute homographies for the additional images.Prevent Accumulated Errors:
If the base image is altered during stitching, small interpolation or warping errors could propagate to subsequent images, resulting in visible misalignments and artifacts in the final panorama.Maintain Pixel Integrity:
Preserving the base image ensures that its original pixel values are used when blending overlapping regions. This prevents color distortions, ghosting, or loss of detail.Consistent Blending:
Many stitching algorithms use the base image as the anchor for blending (e.g., feathering or multi-band blending). Altering it could disrupt smooth transitions between images.
Keeping the base image unaltered ensures accurate alignment, preserves image quality, and prevents cumulative stitching errors, which are critical for producing a seamless panorama.