## Description

HW2

1 CSE 252A Computer Vision I – Homework 2

1.1 Instructions

• All solutions must be written in this notebook

• Programming aspects of this assignment must be completed using Python in this notebook.

• If you want to modify the skeleton code, you can do so. This has been provided just to

provide you with a framework for the solution.

• You may use python packages for basic linear algebra (you can use numpy or scipy for basic

operations), but you may not use packages that directly solve the problem.

• If you are unsure about using a specific package or function, then ask the instructor and

teaching assistants for clarification.

• You must submit this notebook exported as a pdf. You must also submit this notebook as

.ipynb file.

• You must submit both files (.pdf and .ipynb) on Gradescope. You must mark each problem

on Gradescope in the pdf.

1.2 Problem 1: Perspective Projection and Homogenous Coordinates [10 pts]

1.2.1 Part 1 [5 pts]

Consider a perspective projection where a point

P = [x y z]

T

1

is projected onto an image plane Π′

represented by k = f

′ 0 as shown in the following figure.

The first second and third coordinate axes are denoted by i, j, k respectively.

Consider the projection of two rays in the world coordinate system

Q1 = [7 -3 1] + t[8 2 4]

Q2 = [2 -5 9] + t[8 2 4]

where −∞ ≤ t ≤ −1.

Calculate the coordinates of the endpoints of the projection of the rays onto the image plane.

Identify the vanishing point based on the coordinates.

1.2.2 Part 2 [3 pts]

Prove that all parallel lines have the same vanishing point.

1.2.3 Part 3 [2 pts]

Show that the use of homogenous coordinates can convert an affine transformation into that of a

linear one. Recall that an affine transformation of any vector x is described by Ax + b.

1.3 Problem 2: Image Formation and Rigid Body Transformations [10 points]

In this problem we will practice rigid body transformations and image formations through the

projective camera model. The goal will be to photograph the following four points

AP1 = [-1 -0.5 2]

T

,

AP2 = [1 -0.5 2]

T

,

AP3 = [1 0.5 2]

T

,

AP4 = [-1 0.5 2]

T

To do this we will need two matrices. Recall, first, the following formula for rigid body transformation

BP =

B

AR

AP +

BOA

2

Where BP is the point coordinate in the target (B) coordinate system. AP is the point coordinate in

the source (A) coordinate system. B

AR is the rotation matrix from A to B, and BOA is the origin of

the coordinate system A expressed in B coordinates.

The rotation and translation can be combined into a single 4 × 4 extrinsic parameter matrix,

Pe

, so that BP = Pe

·

AP.

Once transformed, the points can be photographed using the intrinsic camera matrix, Pi which

is a 3 × 4 matrix.

Once these are found, the image of a point, AP, can be calculated as Pi

· Pe

·

AP.

We will consider four different settings of focal length, viewing angles and camera positions

below. For each of these calculate:

a) Extrinsic transformation matrix,

b) Intrinsic camera matrix under the perspective camera assumption.

c) Calculate the image of the four vertices and plot using the supplied functions

Your output should look something like the following image (Your output values might not

match, this is just an example)

Sample Plots

1. [No rigid body transformation]. Focal length = 1. The optical axis of the camera is aligned

with the z-axis.

2. [Translation]. BOA = [0 0 1]

T

. Focal length = 1. The optical axis of the camera is aligned with

the z-axis.

3. [Translation and Rotation]. Focal length = 1. B

AR encodes a 30 degrees around the z-axis and

then 60 degrees around the y-axis. BOA = [0 0 1]

T

.

4. [Translation and Rotation, long distance]. Focal length = 5. B

AR encodes a 30 degrees around

the z-axis and then 60 degrees around the y-axis. BOA = [0 0 13]

T

.

You can refer the Richard Szeliski starting page 36 for image formation and the extrinsic matrix.

Intrinsic matrix calculation for perspective camera models was covered in class and

can be referred in slide 2

https://cseweb.ucsd.edu/classes/fa19/cse252A-a/lec2.pdf

You can also refer lecture 3 of the previous year’s course as well for further information

3

if you wish!

http://cseweb.ucsd.edu/classes/fa18/cse252A-a/lec3.pdf

We will not use a full intrinsic camera matrix (e.g. that maps centimeters to pixels, and defines

the coordinates of the center of the image), but only parameterize this with f , the focal length. In

other words: the only parameter in the intrinsic camera matrix under the perspective assumption

is f .

[ ]: import numpy as np

import matplotlib.pyplot as plt

import math

# convert points from euclidian to homogeneous

def to_homog(points): #here always remember that points is a 3×4 matrix

# write your code here

# convert points from homogeneous to euclidian

def from_homog(points_homog):

# write your code here

# project 3D euclidian points to 2D euclidian

def project_points(P_int, P_ext, pts):

# write your code here

return from_homog(pts_final)

[ ]: # Change the three matrices for the four cases as described in the problem

# in the four camera functions geiven below. Make sure that we can see the␣

,→formula

# (if one exists) being used to fill in the matrices. Feel free to document␣

,→with

# comments any thing you feel the need to explain.

def camera1():

# write your code here

return P_int_proj, P_ext

def camera2():

# write your code here

return P_int_proj, P_ext

4

def camera3():

# write your code here

return P_int_proj, P_ext

def camera4():

# write your code here

return P_int_proj, P_ext

# Use the following code to display your outputs

# You are free to change the axis parameters to better

# display your quadrilateral but do not remove any annotations

def plot_points(points, title=”, style=’.-r’, axis=[]):

inds = list(range(points.shape[1]))+[0]

plt.plot(points[0,inds], points[1,inds],style)

for i in range(len(points[0,inds])):

plt.annotate(str(“{0:.3f}”.format(points[0,inds][i]))+”,”+str(“{0:.3f}”.

,→format(points[1,inds][i])),(points[0,inds][i], points[1,inds][i]))

if title:

plt.title(title)

if axis:

plt.axis(axis)

plt.tight_layout()

def main():

point1 = np.array([[-1,-.5,2]]).T

point2 = np.array([[1,-.5,2]]).T

point3 = np.array([[1,.5,2]]).T

point4 = np.array([[-1,.5,2]]).T

points = np.hstack((point1,point2,point3,point4))

for i, camera in enumerate([camera1, camera2, camera3, camera4]):

P_int_proj, P_ext = camera()

plt.subplot(2, 2, i+1)

plot_points(project_points(P_int_proj, P_ext, points), title=’Camera %d␣

,→Projective’%(i+1), axis=[-0.6,2.5,-0.75,0.75])

plt.show()

main()

5

1.4 Problem 3: Homography [12 pts]

Consider a vision application in which components of the scene are replaced by components from

another image scene.

In this problem, we will implement partial functionality of a smartphone camera scanning

application (Example: CamScanner) that, in case you’ve never used before, takes pictures of documents and transforms it by warping and aligning to give an image similar to one which would’ve

been obtained through using a scanner.

The transformation can be visualized by imagining the use of two cameras forming an image

of a scene with a document. The scene would be the document you’re trying to scan placed

on a table and one of the cameras would be your smart phone camera, forming the image that

you’ll be uploading and using in this assignment. There can also be an ideally placed camera,

oriented in the world in such a way that the image it forms of the scene has the document perfectly

algined. While it is unlikely you can hold your phone still enough to get such an image, we can

use homography to transform the image you take into the image that the ideally placed camera

would have taken.

This digital replacement is accomplished by a set of corresponding points for the document in

both the source (your picture) and target (the ideal) images. The task then consists of mapping

the points from the source to their respective points in the target image. In the most general case,

there would be no constraints on the scene geometry, making the problem quite hard to solve.

If, however, the scene can be approximated by a plane in 3D, a solution can be formulated much

more easily even without the knowledge of camera calibration parameters.

To solve this section of the homework, you will begin by understanding the transformation

that maps one image onto another in the planar scene case. Then you will write a program that

implements this transformation and use it to warp some document into a well aligned document

(See the given example to understand what we mean by well aligned).

To begin with, we consider the projection of planes in images. imagine two cameras C1 and

C2 looking at a plane π in the world. Consider a point P on the plane π and its projection p =

[u1, v1, 1]

T

in the image 1 and q = [u2, v2, 1]

T

in image 2.

There exists a unique, upto scale, 3 × 3 matrix H such that, for any point P:

q ≈ H p

Here ≈ denotes equality in homogeneous coordinates, meaning that the left and right hand sides

are proportional. Note that H only depends on the plane and the projection matrices of the two

cameras.

The interesting thing about this result is that by using H we can compute the image of P that

would be seen in the camera with center C2 from the image of the point in the camera with center

at C1, without knowing the three dimensional location. Such an H is a projective transformation

of the plane, called a homography.

In this problem, complete the code for computeH and warp functions that can be used in the

skeletal code that follows.

There are three warp functions to implement in this assignment, example ouputs of which are

shown below. In warp1, you will create a homography from points in your image to the target

image (Mapping source points to target points). In warp2, the inverse of this process will be done.

In warp3, you will create a homography between a given image and your image, replacing your

document with the given image.

6

1.

2.

3.

7

4. In the context of this problem, the source image refers to the image of a document you take

that needs to be replaced into the target.

5. The target image can start out as an empty matrix that you fill out using your code.

6. You will have to implement the computeH function that computes a homography. It takes

in exactly four point correspondences between the source image and target image in homogeneous coordinates respectively and returns a 3 × 3 homography matrix.

7. You will also have to implement the three warp functions in the skeleton code given and

plot the resultant image pairs. For plotting, make sure that the target image is not smaller

than the source image.

Note: We have provided test code to check if your implementation for computeH is correct.

All the code to plot the results needed is also provided along with the code to read in the images

and other data required for this problem. Please try not to modify that code.

You may find following python built-ins helpful: numpy.linalg.svd, numpy.meshgrid

[ ]: import numpy as np

from PIL import Image

import matplotlib.pyplot as plt

# load image to be used – resize to make sure it’s not too large

# You can use the given image as well

# A large image will make testing you code take longer; once you’re satisfied␣

,→with your result,

# you can, if you wish to, make the image larger (or till your computer memory␣

,→allows you to)

source_image = np.array(Image.open(“photo.jpg”))/255

# display images

plt.imshow(source_image)

# Align the polygon such that the corners align with the document in your␣

,→picture

# This polygon doesn’t need to overlap with the edges perfectly, an␣

,→approximation is fine

# The order of points is clockwise, starting from bottom left.

x_coords = [0,0,0,0]

y_coords = [0,0,0,0]

# Plot points from the previous problem is used to draw over your image

# Note that your coordinates will change once you resize your image again

source_points = np.vstack((x_coords, y_coords))

plot_points(source_points)

plt.show()

8

print (source_image.shape)

[ ]: def computeH(source_points, target_points):

# returns the 3×3 homography matrix such that:

# np.matmul(H, source_points) = target_points

# where source_points and target_points are expected to be in homogeneous

# make sure points are 3D homogeneous

assert source_points.shape[0]==3 and target_points.shape[0]==3

#Your code goes here

H_mtx = np.zeros((3,3)) #Fill in the H_mtx with appropriate values.

return H_mtx

#######################################################

# test code. Do not modify

#######################################################

def test_computeH():

source_points = np.array([[0,0.5],[1,0.5],[1,1.5],[0,1.5]]).T

target_points = np.array([[0,0],[1,0],[2,1],[-1,1]]).T

H = computeH(to_homog(source_points), to_homog(target_points))

mapped_points = from_homog(np.matmul(H,to_homog(source_points)))

print (from_homog(np.matmul(H,to_homog(source_points[:,1].reshape(2,1)))))

plot_points(source_points,style=’.-k’)

plot_points(target_points,style=’*-b’)

plot_points(mapped_points,style=’.:r’)

plt.show()

print(‘The red and blue quadrilaterals should overlap if ComputeH is␣

,→implemented correctly.’)

test_computeH()

[ ]: def warp(source_img, source_points, target_size):

# Create a target image and select target points to create a homography␣

,→from source image to target image,

# in other words map all source points to target points and then create

# a warped version of the image based on the homography by filling in the␣

,→target image.

# Make sure the new image (of size target_size) has the same number of␣

,→color channels as source image

assert target_size[2]==source_img.shape[2]

#Your code goes here

return target_img

# Use the code below to plot your result

result = warp(source_image, source_points, (0,0,0)) #Choose appropriate target␣

,→size

9

plt.subplot(1, 2, 1)

plt.imshow(source_image)

plt.subplot(1, 2, 2)

plt.imsave(“myop.png”,result)

plt.imshow(result)

plt.show()

The output of warp1 of your code probably has some striations or noise. The larger you make

your target image, the less it will resemble the document in the source image. Why is this happening?

To fix this, implement warp2, by creating an inverse homography matrix and fill in the target

image.

[ ]: def warp2(source_img, source_points, target_size):

# Create a target image and select target points to create a homography␣

,→from target image to source image,

# in other words map each target point to a source point, and then create a␣

,→warped version

# of the image based on the homography by filling in the target image.

# Make sure the new image (of size target_size) has the same number of␣

,→color channels as source image

#Your code goes here

return target_img

# Use the code below to plot your result

result = warp2(source_image, source_points, (0,0,0)) #Choose appropriate size

plt.subplot(1, 2, 1)

plt.imshow(source_image)

plt.subplot(1, 2, 2)

plt.imshow(result)

plt.imsave(“warp2.png”,result)

plt.show()

Try playing around with the size of your target image in warp1 versus in warp2, additionally

you can also implement nearest pixel interpolation or bi-linear interpolations and see if that makes

a difference in your output.

In warp3, you’ll be replacing the document in your image with a provided image. Read in

“ucsd_logo.png” as the source image, keeping your document as the target.

[ ]: # Load the given UCSD logo image

def warp3(target_image, target_points, source_image):

#Your code goes here

return target_image

# Use the code below to plot your result

result1 = warp3(target_image, target_points, source_image)

10

plt.subplot(1, 2, 1)

plt.imshow(source_image2)

plt.subplot(1, 2, 2)

plt.imshow(result1)

plt.imsave(“warp3.png”,result1)

plt.show()

1.5 Problem 4: Surface Rendering [18 pts]

In this portion of the assignment we will be exploring different methods of approximating local

illumination of objects in a scene. This last section of the homework will be an exercise in rendering

surfaces. Here, you need use the surface normals and the masks from the provided pickle files,

with various light sources, different materials, and using a number of illumination models. For the

sake of simplicity, multiple reflections of light rays, and occlusion of light rays due to object/scene

can be ignored.

1.5.1 Data

The surface normals and masks are to be loaded from the respective pickle files. For comparison,

You should display the rendering results for both normals calculated from the original image and

the diffuse components. There are 2 images that we will be playing with namely one of a sphere

and the other of a pear.

Assume that the albedo map is uniform.

1.5.2 Lambertian Illumination

One of the simplest models available to render 3D objections with illumination is the Lambertian

model. This model finds the apparent brightness to an observer using the direction of the light

source L and the normal vector on the surface of the object N. The brightness intensity at a given

point on an object’s surface, Id, with a single light source is found using the following relationship:

Id = L · N(IlC)

where, C and Il are the the color and intensity of the light source respectively.

1.5.3 Phong Illumination

One major drawback of Lambertian illumination is that it only considers the diffuse light in its

calculation of brightness intensity. One other major component to illumination rendering is the

specular component. The specular reflectance is the component of light that is reflected in a single

direction, as opposed to all directions, which is the case in diffuse reflectance. One of the most

used models to compute surface brightness with specular components is the Phong illumination

model. This model combines ambient lighting, diffused reflectance as well as specular reflectance

to find the brightness on a surface. Phong shading also considers the material in the scene which

is characterized by four values: the ambient reflection constant (ka), the diffuse reflection constant

(kd

), the specular reflection constant (ks) and α the Phong constant, which is the ‘shininess’ of an

object. Furthermore, since the specular component produces ‘rays’, only some of which would be

observed by a single observer, the observer’s viewing direction (V) must also be known. For some

11

scene with known material parameters with M light sources the light intensity Iphong on a surface

with normal vector N seen from viewing direction V can be computed by:

Iphong = kaIa + ∑

m∈M

{kd(Lm · N)Im,d + ks(Rm · V)

α

Im,s} ,

Rm = 2N(Lm · N) − Lm,

where Ia, is the color and intensity of the ambient lighting, Im,d and Im,s are the color values for

the diffuse and specular light of the mth light source.

1.5.4 Rendering

Please complete the following:

1. Write the function lambertian() that calculates the Lambertian light intensity given the

light direction L with color and intensity C and Il = 1, and normal vector N. Then use this

function in a program that calculates and displays the specular sphere and the pear using

each of the two lighting sources found in Table 1. Note: You do not need to worry about material

coefficients in this model.

2. Write the function phong() that calculates the Phong light intensity given the material constants (ka, kd

, ks

, α), V = (0, 0, 1)

⊤, N and some number of M light sources. Then use this

function in a program that calculates and displays the specular sphere and the pear using

each of the sets of coefficients found in Table 2 with each light source individually, and both

light sources combined.

Hint: To avoid artifacts due to shadows, ensure that any negative intensities found are set to zero.

Table 1: Light Sources

m Location Color (RGB)

1 (−1

3

,

1

3

,

1

3

)

⊤ (1, 1, 1)

2 (1, 0, 0)

⊤ (1, .5, .5)

Table 2: Material Coefficients

Mat. ka kd ks α

1 0 0.1 0.75 5

2 0 0.5 0.1 5

3 0 0.5 0.5 10

1.5.5 Part 1. Loading pickle files and plotting the normals [4 pts] (Sphere – 2pts, Pear – 2pts)

In this first part, you are required to work with 2 images, one of a sphere and the other one of a

pear. The pickle file normals.pickle is a list consisting of 4 numpy matrices which are

1) Normal Vectors for the sphere with specularities removed (Diffuse component)

2) Normal Vector for the sphere

3) Normal Vectors for the pear with specularities removed (Diffuse component)

12

4) Normal vectors for the pear

Please load the normals and plot them using the function plot_normals which is provided.

[ ]: def plot_normals(diffuse_normals, original_normals):

# Stride in the plot, you may want to adjust it to different images

stride = 5

normalss = diffuse_normals

normalss1 = original_normals

print(“Normals:”)

print(“Diffuse”)

# showing normals as three separate channels

figure = plt.figure()

ax1 = figure.add_subplot(131)

ax1.imshow(normalss[…, 0])

ax2 = figure.add_subplot(132)

ax2.imshow(normalss[…, 1])

ax3 = figure.add_subplot(133)

ax3.imshow(normalss[…, 2])

plt.show()

print(“Original”)

figure = plt.figure()

ax1 = figure.add_subplot(131)

ax1.imshow(normalss1[…, 0])

ax2 = figure.add_subplot(132)

ax2.imshow(normalss1[…, 1])

ax3 = figure.add_subplot(133)

ax3.imshow(normalss1[…, 2])

plt.show()

[ ]: #Plot the normals for the sphere and pear for both the normal and diffuse␣

,→components.

#1 : Load the different normals

# LOAD HERE

#2 : Plot the normals using plot_normals

#What do you observe? What are the differences between the diffuse component␣

,→and the original images shown?

#PLOT HERE

1.5.6 Part 2. Lambertian model [6 pts]

Fill in your implementation for the rendered image using the lambertian model.

[ ]: def normalize(img):

assert img.shape[2] == 3

maxi = img.max()

13

mini = img.min()

return (img – mini)/(maxi-mini)

[ ]: def lambertian(normals, lights, color, intensity, mask):

”’Your implementation”’

image = np.ones((normals.shape[0], normals.shape[1], 3))

return (image)

Plot the rendered results for both the sphere and the pear for both the original and the diffuse

components. Remember to first load the masks from the masks.pkl file. The masks.pkl file is a list

consisting of 2 numpy arrays1)Mask for the sphere

2)Mask for the pear

Remember to plot the normalized image using the function normalize which is provided.

[ ]: # Load the masks for the sphere and pear

# LOAD HERE

# Output the rendering results for Pear

dirn1 = np.array([[1.0],[0],[0]])

color1 = np.array([[1],[.5],[.5]])

dirn2 = np.array([[-1.0/3],[1.0/3],[1.0/3]])

color2 = np.array([[1],[1],[1]])

#Display the rendering results for pear for both diffuse and for both the light␣

,→sources

[ ]: # Output the rendering results for Sphere

dirn1 = np.array([[1.0],[0],[0]])

color1 = np.array([[1],[.5],[.5]])

dirn2 = np.array([[-1.0/3],[1.0/3],[1.0/3]])

color2 = np.array([[1],[1],[1]])

#Display the rendering results for sphere for both diffuse and for both the␣

,→light sources

1.5.7 Part 3. Phong model [8 pts]

Please fill in your implementation for the Phong model below.

[ ]: def phong(normals, lights, color, material, view, mask):

”’Your implementation”’

return (image)

With the function completed, plot the rendering results for the sphere and pear (both diffuse

and original compnents) for all the materials and light sources and also with the combination of

both the light sources.

[ ]: # Output the rendering results for sphere

view = np.array([[0],[0],[1]])

material = np.array([[0.1,0.75,5],[0.5,0.1,5],[0.5,0.5,10]])

lightcol1 = np.array([[-1.0/3,1],[1.0/3,1],[1.0/3,1]])

14

lightcol2 = np.array([[1,1],[0,0.5],[0,0.5]])

#Display rendered results for sphere for all materials and light sources and␣

,→combination of light sources

[ ]: # Output the rendering results for the pear.

view = np.array([[0],[0],[1]])

material = np.array([[0.1,0.75,5],[0.5,0.1,5],[0.5,0.5,10]])

lightcol1 = np.array([[-1.0/3,1],[1.0/3,1],[1.0/3,1]])

lightcol2 = np.array([[1,1],[0,0.5],[0,0.5]])

#Display rendered results for pear for all materials and light sources and␣

,→combination of light sources

15