CSC2503—Foundations of Computer Vision

Assignment 1: Photometric Stereo

The purpose of this assignment is to construct a depth field from a series of images of an object from

the same view point, but under different illumination conditions. We assume orthographic projection

of an object with Lambertian reflectance, illuminated by several distant point light sources. This

problem is widely known as Photometric Stereo.

Your goal is to 1) recover the light source directions, 2) estimate the surface normals at each pixel, 3)

estimate the albedo of the object at each pixel, and 4) estimate the depth (or height) at each pixel.

You will be given a collection of test images.1 For each of six objects you will receive 12 images.

For a single object all the images are taken from the same camera position and viewing direction.

They are, however, taken under different point light sources; this is the key. In addition to the 12

images there is a mask image for each object that specifies those pixels to which the object projects

in the images. Finally, in addition to the test objects there are images taken of a chrome sphere

(from which you will estimate eight of the light source directions).

Get the handout code A1handout.zip from the course dropbox. A template for the solution is

solntemplate.m. This file and others in the zip file are used in the following questions.

What to hand in. Write a short report in PDF format addressing each of the questions below

(LaTeX- or Word-formatted reports are preferred but hand-written reports that have been scanned

to PDF along with a small sample of result images is acceptable). The report should be self-contained

in that the marker should not have to run your Matlab code to be convinced your code is correct.

In your report you can assume that the marker knows the context of the questions, so do not spend

time repeating material in the hand-out or in class notes. Pack your PDF report and the completed

Matlab files into a zip file called A1.zip and submit it via the submit command on a CS Teaching

Lab computer. You need to have a Teaching Lab account to do this so make sure you get your account

set up well before the due date. Accounts for all registered students have already been set up (the

username is your UTORid).

1. Calibration. Before we can extract surface normals from images, we have to calibrate our capture

setup. We need to specify the transformation from the scene to image coordinates. We also need

to determine the power and direction of the various illuminants, as well as the camera response

function.

For this assignment, the camera response function has already been radiometrically calibrated.

In particular, the gamma correction has been undone, and any vignetting has been corrected for.

What this means is that you can treat pixel values as (proportional to) scene radiance. So for a

Lambertian surface we have the image brightness Iν(~x), at a pixel ~x in each of the colour channels,

ν = R, G, B, is given by

Iν(~x) = aν (~x)b~n(~x) · L~ c. (1)

Here aν(~x) for ν = R, G, B is the albedo of the surface at pixel ~x (within the range [0, 255]), and

~n(~x) is the surface normal. Also L~ represents both the direction and intensity of the light source.

1Many thanks to Steve Seitz at University of Washington for providing the calibrated images.

1

We assume the strength of the different light sources has been balanced, and take ||L~ || = 1.

The only remaining task in calibratng the capture setup is therefore to estimate the light source

directions. The method we suggest is based on images of a shiny chrome sphere, in the same

location as all the other objects, illuminated with the same point sources as the other objects.

Since we know the object is spherical, we can determine the normal at any given point on its

surface, and therefore we can also compute the reflection direction for any spot on the surface.

Toward this end we will assume a particularly simple projection model. First, for simplicity, let’s

assume that the world coordinate frame and the camera coordinate frame are coincident, so that

the extrinsic mapping is the identity. And let’s assume a right-handed coordinate system (Y

points down, X points right, and the optical axis coincides with positive Z-axis). Finally let’s

assume an orthographic projection. Accordingly, the direction of the camera from any point on

the objects of interest in the scene is (0, 0, −1).

With these assumptions we can specify the mapping from points in the world, X~ = (X, Y, Z)

T

, to

the locations of image pixels, ~x = (x, y), comprising the intrinsic and extrinsic transformations,

as follows:

~x =

”

s1 0 o1

0 s2 o2

#

1 0 0 0

0 1 0 0

0 0 0 1

X~

1

!

(2)

where s1 and s2 are determined by the pixel dimensions, and (o1, o2)

T

is a simple coordinate

shift of the origin (which we can take to be zero). Here we can use square pixels of unit size, so

s1 = s2 = 1. With these choices the orthographic projection reduces to ~x = (x, y)

T = (X, Y )

T

.

Your Task. Given this setup, complete the M-function fitChromeSphere.m so that it estimates

the light source directions from the eight images of the chrome sphere. (These are the same light

source directions for the first 8 images for each different object.) In your Matlab code avoid

looping over image pixels. You can test your code by running the script file solntemplate.m

(up to the first continue statement). This script calls getLightDir.m, which in turn calls

fitChromeSphere.m. In your report briefly describe the algorithm you used here and your reasons for choosing it. (If you have trouble with this question, default light source directions are

provided in getLightDir.m. These are incorrect, but will let you begin doing the other parts of

this assignment.)

2. Computing Surface Normals and Grey Albedo. Let’s begin with a grey-level image I

(perhaps just the average of the three colour channels, R, G, and B). Then, given the model

assumptions made above of diffuse objects and distant point light sources, the relationship between

the surface normal ~n(~x) and albedo a(~x) at some scene point ~p which projects to image pixel ~x is

given by

I(~x) = a(~x) ~n(~x) · L. ~ (3)

This is a monochrome version of equation (1), with I(~x) denoting the image brightness at pixel

~x. Also, for convenience, we have dropped the rectification of the ~n · L~ term. Essentially we are

assuming that the light source is positioned such that no visible piece of the surface is oriented

away from the light source. This assumption may not hold exactly, but it will simplify our

estimation algorithms. Since our images are already balanced as described above, we assume the

2

incoming irradiance from each light is 1, that is ||L~ || = 1. Finally, it is also assumed that ~n(~x) is

a unit vector.

Given the lighting directions computed in question 1, equation (3) provides one constraint on

three unknowns, namely, the albedo a(~x) and two independent components of the direction of

~n(~x). With three images, taken under linearly independent lighting directions, we should be able

to uniquely constrain a(~x) and ~n(~x). With more than three images we may not be able to find an

exact solution, but we can formulate a least-squares estimator for the unknowns a(~x) and ~n(~x).

Initially, let the unknowns at each pixel be written as a single 3-vector, for example, let ~g(~x) =

a(~x)~n(~x). We therefore wish to estimate the vector ~g(~x) at pixel ~x. Toward this end, our objective

function for pixel ~x, given greyscale measurements Ij (~x) in image j, is

E(~x) = X

8

j=1

(Ij (~x) − ~g(~x) · L~

j )

2

(4)

Here j indexes just the first eight images, that is, the ones for which we obtained estimates of the

light source directions L~

j in question 1 above. The objective E(~x) is then minimized with respect

to ~g(~x).

Your Task. Derive the solution by differentiating E(~x) with repect to the elements of ~g (for

a fixed pixel ~x) and then setting the derivatives to zero. Show your work here. Finally, given

~g(~x), it follows that the grey-level albedo a(~x) is just the magnitude of ~g(~x) and ~n(~x) is of course

the direction of ~g(~x). Implement your solution in the M-file fitReflectance.m, which has been

started in the handout code. In your Matlab code, avoid looping over image pixels. When you

complete this M-file, run the script file solntemplate.m up to the continue statement to look

at your result. The script file will display figures of your estimated grey albedo, the components

of the recovered normals, and the image reconstructions along with their errors. Comment on

the quality of the (grey-level) image reconstructions. What sort of modelling errors are most

apparent?

3. Computing RGB Albedo and Relighting. Above we computed the normal and albedo for

a monochrome (or grayscale) image. We can now use that normal to compute the albedo for all

three colour channels. You can formulate this as a least-squares estimator for the albedo aν(~x)

for pixel ~x and color channel ν, minimizing

Eν(~x) = X

8

j=1

( Iν(~x) − aν(~x) ~n(~x) · L~

j )

2

(5)

with ν ranging over R, G, and B, where all quantities are known except of course for aν(~x).

Your Task. To minimize equation (5), differentiate Eν(~x) with respect to aν(~x), set the derivative

to zero and solve. Show your work here. Complete the corresponding section of the script file

solntemplate.m. When this is completed the code should display your estimated albedo (now in

colour, check this on the cat, owl, or rock image sets), and will synthesize new images from novel

light source positions. Comment on the quality of the newly synthesized images. (E.g., do they

appear realistic?) Supposing that the recovered albedos and surface normals are correct, what

are some of the main remaining modelling errors in generating these synthetically light images?

3

4. Surface Fitting. Finally we want to find a surface which has (roughly) these normals. Due to

noise in the estimated normals, we cannot simply integrate the surface normals along particular

paths to obtain the surface. The trouble is that, due to noise, the value we get will typically be

path dependent. Therefore, rather than attempting to compute a unique surface that is exactly

consistent with all the normals, we will use a simple least-squares estimator that is approximately

consistent with the normals.

To formulate the problem, let’s first express the surface as S~(X, Y ) = (X, Y, Z(X, Y ))T

, in world

coordinates. We can then write two surface tangent vectors as

~τX =

∂S~

∂X = (1, 0, ZX)

T

(6)

~τY =

∂S~

∂Y = (0, 1, ZY )

T

(7)

where ZX =

∂Z

∂X and ZY =

∂Z

∂Y . And therefore the surface normal, a unit vector in the direction

of the cross product ~τX × ~τY , is given by

~n(X, Y ) = (ZX, ZY , −1)T

||(ZX, ZY , −1)|| (8)

We already estimated this normal ~n(~x) at every pixel in question 2 above.

To formulate our constraints on depth, we exploit equation (2), and write the depth derivatives,

∂Z

∂(X,Y ) = (ZX, ZY ), as

∂Z

∂(X, Y )

=

∂Z

∂(x, y)

∂(x, y)

∂(X, Y )

=

∂Z

∂(x, y)

”

s1 0

0 s2

#

=

s1

∂Z

∂x , s2

∂Z

∂y

(9)

Here we denote the pixel ~x as the 2-vector (x, y)

T

. Next, we use first-order forward differences to

approximate the depth derivatives, yielding

ZX ≈ s1 (Z(x+1, y) − Z(x, y)) (10)

ZY ≈ s2 (Z(x, y+1) − Z(x, y)) (11)

We then substitute these two approximations into equations (6) and (7) to obtain approximate

tangent vectors. Ignoring image noise and errors in the numerical differentiation, we know that

these tangent vectors are perpendicular to the surface normal; i.e.,

~τX(x, y) · ~n(x, y) = 0 (12)

~τY (x, y) · ~n(x, y) = 0 (13)

These equations constrain the unknown depths Z using the known normal vectors. However,

given any solution Z(x, y) it follows that Z(x, y) + c is also a solution for any constant c. To

constrain this remaining degree of freedom, set Z(x0, y0) = 0 at one pixel (x0, y0).

With some algebraic manipulation, one can rewrite the collection of constraints for pixels within

the object mask as

A ~z = ~v , (14)

4

where ~z is a vector comprising the values in the depth map Z(x, y), the entries of the matrix A

include the third component of the normal at each pixel along diagonals and some off-diagonals,

and the vector ~v comprises the first and second elements of the normal vector at each pixel. The

matrix equation has NM columns and just less than 2NM rows. It’s very large. Fortunately,

most of the entries are zero, and so you can exploit the use of sparse matrix representations

and solvers in Matlab. Given A and ~v you can use the backslash Matlab operation A\~v to find

the pseudo-inverse solution for ~z. As this is a large sparse system Matlab will use an iterative

conjugate gradient solver.

Your Task. Complete the M-file getDepthFromNormals.m so that it uses the above algorithm

to estimate the depths Z(~x). The handout code should then display your estimate. Describe, in

no more than a few paragraphs, your assessment of where the technique works well, and where

there are failures. Where the technique fails to produce nice results, please explain as best you

can what are the likely causes of the problems. Can you formulate ways to fix these problems?

5. BONUS [for 2 extra marks]: Estimate Light Source Direction Given Scene Properties.

The image data set for each object includes four additional images taken from the same viewpoint

as before, but with unknown light source directions L~ . These are the last four images in each

data set, and for these we do not have corresponding images of the chrome sphere.

Your Task. Formulate a simple minimization problem to estimate both the direction and magnitude of L~ for each of these four additional images, given your results for a(~x) and ~n(~x) from

question 2 above. Your write up should clearly describe this minimization problem, and the algorithm you used to solve it. Implement the algorithm in the appropriate part of solntemplate.m.

Your recovered light sources should be stored as columns in the 3 × 4 array Lrec.

In addition, suppose one of the novel images was actually illuminated by two separate, distant,

point light sources of reduced intensity. If you ignore self-shadowing and the overall brightness,

could you estimate both light source directions from such an image? If so, explain how. If not,

what could you estimate about the two light source directions? Explain. (This is a theoretical

question, you do not have to program anything for this last part. Although, if you are curious,

you can try this by computing the average of any two images for a particular data set. This will

provide a very good approximation of the image of the object under both illuminants, each with

reduced intensity.)

5

## Reviews

There are no reviews yet.