📜 Welcome to the Order of Visioneers¶
You are a new intern in the ancient Order of Visioneers — a secret society tasked with recovering lost knowledge encoded in ancient visual artifacts. The Grand Archivist tells you:
The Great Archive has been corrupted. Our only hope is to decode and reconstruct the fragments using the Algorithms of Sight. Only those who pass through the Five Gates may restore the World of Vision
But the Archivist also warns:
Beware, interns. Those who attempt to pass the Gates with borrowed visions or false answers are cursed. Their sight is clouded, their names erased from the Hall of Records, and they are cast forever into the Void of Zero — where no Visioneer has honor, nor power. Only true effort unlocks the path.
You must pass through five gates, each guarded by a unique challenge — and each challenge maps to your assignments. Let's depart for the Gate 1.
🌀 Gate 1 : The Chamber of Singular Truths¶
Theme: SVD, Image Reconstruction, Background-Foreground Separation
Deep in the archives lies a corrupted portrait of the First Visioneer. The portrait has lots of unnecessary elements and distraction. To pass the gate, you must:
Decompose the portrait into its elemental forms, reconstruct what once was but with necessary elements only.
Run the following code to setup the assignment
## Importing recipes
%matplotlib inline
import imageio
import numpy as np
import imageio.v2 as imageio
import moviepy.editor as mpe
from google.colab import drive
import matplotlib.pyplot as plt
from skimage.color import rgb2gray
from skimage.transform import resize
import matplotlib.animation as animation
from moviepy.editor import ImageSequenceClip
# Mounting google drive
drive.mount('/content/drive')
## Making recipes
video = mpe.VideoFileClip("/content/drive/My Drive/ES666CV/video/A/input.mp4")
def video_to_array(video):
frames = np.array([rgb2gray(frame) for frame in video.iter_frames()])
return frames
def array_to_video(images):
fig, ax = plt.subplots()
width, height = 320, 240 # pixels
dpi = 200 # dots per inch
fig.set_size_inches(width / dpi, height / dpi)
ax.axis('off') # Remove axes
fig.subplots_adjust(left=0, right=1, bottom=0, top=1)
im = ax.imshow(images[0], cmap='gray', vmin=0, vmax=1) # Initial frame
def update(frame):
im.set_array(images[frame])
return [im]
ani = animation.FuncAnimation(fig, update, frames=len(images), interval=1000/25) # ~25 fps
# Save as MP4 (requires FFmpeg; alternatively save as GIF with writer='pillow')
ani.save('temp.mp4', writer='ffmpeg', fps=25, dpi=200)
video = mpe.VideoFileClip('temp.mp4')
plt.close()
return video
def display_images_in_row(images, cmap='gray'):
fig, axs = plt.subplots(1, len(images), figsize=(5 * len(images), 5))
if len(images) == 1:
axs = [axs]
for ax, img in zip(axs, images):
ax.imshow(img, cmap=cmap)
ax.axis('off')
plt.tight_layout()
plt.show()
/usr/local/lib/python3.12/dist-packages/moviepy/config_defaults.py:47: SyntaxWarning: invalid escape sequence '\P'
IMAGEMAGICK_BINARY = r"C:\Program Files\ImageMagick-6.8.8-Q16\magick.exe"
/usr/local/lib/python3.12/dist-packages/moviepy/video/io/ffmpeg_reader.py:294: SyntaxWarning: invalid escape sequence '\d'
lines_video = [l for l in lines if ' Video: ' in l and re.search('\d+x\d+', l)]
/usr/local/lib/python3.12/dist-packages/moviepy/video/io/ffmpeg_reader.py:367: SyntaxWarning: invalid escape sequence '\d'
rotation_lines = [l for l in lines if 'rotate :' in l and re.search('\d+$', l)]
/usr/local/lib/python3.12/dist-packages/moviepy/video/io/ffmpeg_reader.py:370: SyntaxWarning: invalid escape sequence '\d'
match = re.search('\d+$', rotation_line)
WARNING:py.warnings:/usr/local/lib/python3.12/dist-packages/moviepy/video/io/sliders.py:61: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
if event.key is 'enter':
Mounted at /content/drive
Task1: Forge the weapon - SVD (3 Marks)¶
Implement Singular Value Decomposition (SVD) from scratch. You can use Numpy's Eigen Value Decomposition for your implementation.
Write your code in the commented part only
Concept¶
Singular Value Decomposition (SVD) factorizes any matrix $A \in \mathbb{R}^{m \times n}$ as:
$$ A = U S V^T $$
- ( U ): left singular vectors (orthonormal, size $m \times r$)
- ( S ): diagonal matrix of singular values ($sigma_i \geq 0$)
- ( V ): right singular vectors (orthonormal, size ($n \times r$) )
- ( r = $\text{rank}(A)$ )
Mathematics¶
- Compute ( A^T A ) and its eigen-decomposition:
$$ A^T A = V \Lambda V^T $$
- Singular values:
$$ \sigma_i = \sqrt{\lambda_i} $$
- Left singular vectors:
$$ U_i = \frac{A v_i}{\sigma_i} $$
- Form ( S ) with singular values on the diagonal.
Thus,
$$ A = U S V^T $$
Insight¶
- Top singular values capture the main structure (patterns).
- Truncating small ones gives low-rank approximation → useful for compression and denoising.
import numpy as np
def compute_svd(matrix: np.array, epsilon: float = 1e-10) -> tuple:
m, n = matrix.shape
# Step 1: eigen decomposition of A^T A
ATA = matrix.T @ matrix
eigvals, eigvecs = np.linalg.eigh(ATA)
# Step 2: sort in descending order
sorted_indices = np.argsort(eigvals)[::-1]
eigvals = eigvals[sorted_indices]
eigvecs = eigvecs[:, sorted_indices]
# Step 3: singular values (non-negative)
singular_values = np.sqrt(np.maximum(eigvals, 0))
# Step 4: apply epsilon cutoff (truncate small singular values)
mask = singular_values > epsilon
singular_values = singular_values[mask]
V = eigvecs[:, mask]
# Step 5: compute U (no explicit normalization here)
U = (matrix @ V) / singular_values
# Step 6: build diagonal S
S = np.zeros((len(singular_values), len(singular_values)))
np.fill_diagonal(S, singular_values)
Vt = V.T
return U, S, Vt
Task 2: Decompose into elemental forms and reconstruct from true elements discarding unncessary elements (3 Marks)¶
- Decompose image using SVD and reconstruct the image using only top $k$ singular values.
- Experiment with different values of $k$ and visualize the results.
- Plot reconstruction error by measuring Mean squared error between reconstructed and original image for different values of $k$.
url = "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg?crop=1xw:0.74975xh;0,0.190xh&resize=1200:*" # If this image becomes unavailable (rare!) read some other from web
image = imageio.imread(url)
gray_image = rgb2gray(image)
plt.figure(figsize=(6, 6))
plt.imshow(gray_image, cmap='gray')
plt.title('Original Grayscale Image')
plt.axis('off')
plt.show()
def compress_image_using_topk_singular_values(image, k):
U, S, Vt = compute_svd(image)
# Extract top-k components
U_k = U[:, :k]
S_k = S[:k, :k]
Vt_k = Vt[:k, :]
# Reconstruct with top-k singular values
compressed_image = U_k @ S_k @ Vt_k
return compressed_image
def reconstruction_error(original, reconstructed):
mse = np.mean((original - reconstructed) ** 2)
return mse
# Plot Reconstruction error w.r.t. 'k'
m, n = gray_image.shape
k_values = np.linspace(1, min(m, n), 15, dtype=int)
plt.figure(figsize=(12, 12))
for i, k in enumerate(k_values):
compressed_image = compress_image_using_topk_singular_values(gray_image, k)
re = reconstruction_error(gray_image, compressed_image)
plt.subplot(3, 5, i + 1)
plt.imshow(compressed_image, cmap='gray')
plt.title(f'k = {k} , RE = {re:.4f}')
plt.axis('off')
plt.tight_layout()
plt.show()
# Plot Reconstruction error w.r.t. 'k'
k_values = np.linspace(1, min(m, n), 100, dtype=int) # Lets sample more 100 k values for analysing how k affects reconstruction quality.
## Your code here for plot
errors = []
for k in k_values:
compressed_image = compress_image_using_topk_singular_values(gray_image, k)
re = reconstruction_error(gray_image, compressed_image)
errors.append(re)
plt.figure(figsize=(8,5))
plt.plot(k_values, errors, marker='o', linewidth=2)
plt.xlabel("k")
plt.ylabel("Reconstruction Error")
plt.title("Reconstruction Error vs. k")
plt.grid(True)
plt.show()
What value of $k$ do you think is enough for the image reconstruction. What is the benefit of this method?
Answer: From the reconstruction error vs. 𝑘 plot, we can see that the error decreases rapidly at first and then flattens out. This means that the first few hundred singular values already capture most of the important visual structure of the image, while the remaining singular values contribute very little. For this image, a 𝑘 around 50–100 is usually enough to reconstruct the image with good visual quality, as the details and overall appearance are preserved and the error becomes very small beyond this point.
The benefit of this method is that it allows us to achieve dimensionality reduction and compression. Instead of storing all pixels, we only need the top k singular values and vectors. This reduces storage and computation while still preserving the essential features of the image. In addition, the decomposition separates important structure (captured by large singular values) from noise or less relevant details (captured by small singular values).
Task 3: Separate the truth from chaos and distraction (Marks 4)¶
Perform background-foreground separation using SVD.
In this task you might need help. So, the archivist called an expert who is too old to code but wise enough to guide you.
The expert will guide you and you will have to code it. First lets welcom the expert, "Prof. Dhoomketu" from Dholakpur
video.ipython_display(width=300)
Moviepy - Building video __temp__.mp4. Moviepy - Writing video __temp__.mp4
Moviepy - Done ! Moviepy - video ready __temp__.mp4
In the above video, we want to focus on the building. But, the moving cars are distracting us. We want to remove them. Follow the instruction by Prof. Dhoomketu to complete this task
Prof. Dhoomketu: Currently we have video frames of shape (num_frames, height, width). Write a function that transforms this tensor to matrix of shape (height, width, num_frames) [there was typo here as announced in google classroom] (height * width, num_frames). Ensure that the order of the frames is preserved.
frames = video_to_array(video)
def tarnsform_video_frames(frames: np.array):
num_frames, h, w = frames.shape
# Reshape to (h*w, num_frames)
matrix = frames.reshape(num_frames, h*w).T
return matrix
matrix = tarnsform_video_frames(frames)
Prof. Dhoomketu: Reconstruct the matrix using SVD with with very few of the top singular values (2 or 4 ). And rearrange the matrix to the original frame shape.
## Write your code here and assign the results to the variables below based on Prof. Dhoomketu's instruction
reconstructed_matrix = compress_image_using_topk_singular_values(matrix,4)
reconstructed_frames = reconstructed_matrix.T.reshape(frames.shape)
Prof. Dhoomketu: Now visualize the first frames of the original and reconstructed videos. Just run the below code. I have brought the visualization codes for you.
display_images_in_row([frames[0], reconstructed_frames[0], frames[0] - reconstructed_frames[0]])
Prof. Dhoomketu: If you see that the cars are gone from the frames, it means you where able to successfully perform the separation of the distracting cars from the background. Run the following code to visualize the separated cars from the video
cars_only = frames - reconstructed_frames
cars_only = array_to_video(cars_only)
cars_only.ipython_display(width=300)
Moviepy - Building video __temp__.mp4. Moviepy - Writing video __temp__.mp4
Moviepy - Done ! Moviepy - video ready __temp__.mp4
Prof. Dhoomketu: If you where able to successfully separate the cars, congratulations! But, I want to check if you understand whats going on behind the scenes? Why do you think the SVD-based approach was effective in this case?
HINT: Complete the following code for hints. Resize the original matrix and reconstructed matrix to shape (200,200). Visualize the resized matrices to get the hint. You can use the following syntax for resizing:
resized_original = resize(original, (200,200),anti_aliasing=False)
When we resize and visualize the original and reconstructed matrices, we notice that:
The original matrix shows tiny pixel-level changes across rows, reflecting the presence of moving cars in different frames.
The low-rank reconstruction (few singular values) looks almost uniform across rows, keeping only the static structures (the background/building) while discarding the small changes (cars).
This happens because SVD captures the most redundant and correlated information in the top singular values, which correspond to the background that stays nearly constant across frames. The moving cars, being less correlated, only appear in smaller singular values and disappear when we truncate them.
We resized to 200×200 so that these global patterns could be visualized more clearly — the original matrix was too large to spot small variations, while the resized version highlights the contrast between stable background and fluctuating foreground.
## Write your code here
resized_original_matrix = resize(matrix,(200,200),anti_aliasing=False)
resized_reconstructed_matrix = resize(reconstructed_matrix,(200,200),anti_aliasing=False)
display_images_in_row([resized_original_matrix, resized_reconstructed_matrix])
Prof. Dhoomketu: Q1: What do you observe? Answer my previous question using the observation here to complete the assignment. Q2: Why did we resized the matrices for visualization?
Answer: In the resized original matrix, we observe small variations across rows that correspond to moving cars in different frames. However, in the low-rank reconstruction these variations vanish, since the static background remains largely constant across frames and can be described with only a few dominant singular values.
SVD works well here because it captures the redundant, correlated information (the background) in the top singular values, while the less correlated variations (moving cars) are pushed into the smaller singular values.
The original matrix is too large to visualize effectively, making it difficult to notice such subtle differences. Resizing to 200×200 simplifies the computation, emphasizes overall structure over pixel-level noise, and makes the contrast between static background and moving objects much clearer.