The goals / steps of this project are the following:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
- Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
- Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
- Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
1. Explain how (and identify where in your code) you extracted HOG features from the training images.
The code for getting the HOG features was taken directly from the course work.
# Define a function to return HOG features and visualization
# Vis == False means we do not want to get an image back, True produces output image.
def get_hog_features(img,
orient,
pix_per_cell,
cell_per_block,
vis=False,
feature_vec=True):
if vis == True:
features, hog_image = hog(img,
orientations=orient,
pixels_per_cell=(pix_per_cell, pix_per_cell),
cells_per_block=(cell_per_block, cell_per_block),
transform_sqrt=False,
visualise=vis,
feature_vector=feature_vec)
return features, hog_image
else:
features = hog(img,
orientations=orient,
pixels_per_cell=(pix_per_cell, pix_per_cell),
cells_per_block=(cell_per_block, cell_per_block),
transform_sqrt=False,
visualise=vis,
feature_vector=feature_vec)
return features
I visualize the return on the HOG images after cell 5.
# Choose random images from the images of cars and non cars read in
car_ind = np.random.randint(0, len(cars))
notcar_ind = np.random.randint(0, len(notcars))
# Read in a car and non car image
car_image = mpimg.imread(cars[car_ind])
notcar_image = mpimg.imread(notcars[notcar_ind])
color_space = 'RGB'
orient = 9
pix_per_cell = 8
cell_per_block = 2
hog_channel = 0
spatial_size = (16,16)
hist_bins = 16
spatial_feat = True
hist_feat = True
hog_feat = True
car_features, car_hog_image = single_img_features(car_image,
color_space = color_space,
spatial_size = spatial_size,
hist_bins = hist_bins,
orient = orient,
pix_per_cell = pix_per_cell,
cell_per_block = cell_per_block,
hog_channel = hog_channel,
spatial_feat = spatial_feat,
hist_feat = hist_feat,
hog_feat = hog_feat,
vis = True
)
notcar_features, notcar_hog_image = single_img_features(notcar_image,
color_space = color_space,
spatial_size = spatial_size,
hist_bins = hist_bins,
orient = orient,
pix_per_cell = pix_per_cell,
cell_per_block = cell_per_block,
hog_channel = hog_channel,
spatial_feat = spatial_feat,
hist_feat = hist_feat,
hog_feat = hog_feat,
vis = True
)
images = [car_image, car_hog_image, notcar_image, notcar_hog_image]
titles = ['car image '+str(car_image.shape)+'', 'car HOG image', 'Notcar image '+str(notcar_image.shape)+'', 'not car HOG image']
fig = plt.figure(figsize=(12,3))
visualize(fig, 1, 4, images, titles)
The return looks like the images below.
I then explored different color spaces on random images from each of the two classes and displayed them to get a feel for what the skimage.hog()
output looks like. The other settings are all the same, based off of coursework and walk through video.
color_space = ''
orient = 9
pix_per_cell = 8
cell_per_block = 2
hog_channel = 0
spatial_size = (16,16)
hist_bins = 16
spatial_feat = True
hist_feat = True
hog_feat = True
- YCrCb (What I used for the project, should be same as above).
- RGB
- HSV
- LUV
- HLS
Based on the images created above I did not feel like it was obvious that one would be better than another. I decided to wait to run the classifier and see if color space produced clearly better results.
3. Describe how (and identify where in your code) you trained a classifier using your selected HOG features (and color features if you used them).
The classifier code is located in cell 10 and, again, was taken from the course work and the walk through video.
# Define feature parameters
color_space = 'YCrCb'
orient = 9
pix_per_cell = 8
cell_per_block = 2
hog_channel = 'ALL'
spatial_size = (16, 16)
hist_bins = 16
spatial_feat = True
hist_feat = True
hog_feat = True
t = time.time()
n_samples = 2000
# Generate 1000 random indices
random_idxs = np.random.randint(0 , len(cars), n_samples)
test_cars = np.array(cars)[random_idxs]
test_notcars = np.array(notcars)[random_idxs]
car_features = extract_features(test_cars,
color_space = color_space,
spatial_size = spatial_size,
hist_bins = hist_bins,
orient = orient,
pix_per_cell = pix_per_cell,
cell_per_block = cell_per_block,
hog_channel = hog_channel,
spatial_feat = spatial_feat,
hist_feat = hist_feat,
hog_feat = hog_feat
)
notcar_features = extract_features(test_notcars,
color_space = color_space,
spatial_size = spatial_size,
hist_bins = hist_bins,
orient = orient,
pix_per_cell = pix_per_cell,
cell_per_block = cell_per_block,
hog_channel = hog_channel,
spatial_feat = spatial_feat,
hist_feat = hist_feat,
hog_feat = hog_feat
)
print(time.time()-t, 'Seconds to compute features...')
X = np.vstack((car_features, notcar_features)).astype(np.float64)
X_scaler = StandardScaler().fit(X)
scaled_X = X_scaler.transform(X)
y = np.hstack((np.ones(len(car_features)), np.zeros(len(notcar_features))))
rand_state = np.random.randint(0, 100)
X_train, X_test, y_train, y_test = train_test_split(scaled_X,
y,
test_size = 0.1,
random_state = rand_state
)
print('Using : ', orient,' orientations, ',pix_per_cell,'pixels per cell ', cell_per_block,'...etc...')
print('Feature vector length : ', len(X_train[0]))
# Use SVC
svc = LinearSVC()
t = time.time()
svc.fit(X_train, y_train)
print(round(time.time() -t, 2 ), "Seconds to train SVC...")
print('Test accuracy of SVC = ', round(svc.score(X_test, y_test), 4))
- YCrCB results
- RGB result
- HSV results
- LUV results
- HLS results
The results overall where not terribly different, but LUV, RGB and YCrCb had the lowest scores.
1. Describe how (and identify where in your code) you implemented a sliding window search. How did you decide what scales to search and how much to overlap windows?
This code was taken directly from the coursework and from the walk through video.
out_images = []
out_maps = []
out_titles = []
out_boxes = []
ystart = 400
ystop = 656
# Scale entire image and subsample the array
scale = 1.8
for img_src in example_images:
img_boxes = []
t = time.time()
count = 0
img = mpimg.imread(img_src)
draw_img = np.copy(img)
# Make a heat map
heatmap = np.zeros_like(img[:,:,0])
img = img.astype(np.float32) / 255
img_tosearch = img[ystart:ystop,:,:]
ctrans_tosearch = convert_color(img_tosearch, conv='RGB2YCrCb')
if scale != 1:
imshape = ctrans_tosearch.shape
ctrans_tosearch = cv2.resize(ctrans_tosearch, (np.int(imshape[1]/scale), np.int(imshape[0]/scale)))
ch1 = ctrans_tosearch[:,:,0]
ch2 = ctrans_tosearch[:,:,1]
ch3 = ctrans_tosearch[:,:,2]
# Use // to remove floating points from results
nxblocks = (ch1.shape[1] // pix_per_cell) - 1
nyblocks = (ch1.shape[0] // pix_per_cell) - 1
nfeat_per_block = orient * cell_per_block**2
window = 64
nblocks_per_window = (window // pix_per_cell) - 1
cells_per_step = 2
nxsteps = (nxblocks - nblocks_per_window) // cells_per_step
nysteps = (nyblocks - nblocks_per_window) // cells_per_step
# Compute individual channel HOG features for the entire image
hog1 = get_hog_features(ch1, orient, pix_per_cell, cell_per_block, feature_vec=False)
hog2 = get_hog_features(ch2, orient, pix_per_cell, cell_per_block, feature_vec=False)
hog3 = get_hog_features(ch3, orient, pix_per_cell, cell_per_block, feature_vec=False)
for xb in range(nxsteps):
for yb in range(nysteps):
count += 1
ypos = yb * cells_per_step
xpos = xb * cells_per_step
hog_feat1 = hog1[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel()
hog_feat2 = hog2[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel()
hog_feat3 = hog3[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel()
hog_features = np.hstack((hog_feat1, hog_feat2, hog_feat3))
xleft = xpos * pix_per_cell
ytop = ypos * pix_per_cell
# Extract the image patch
subimg = cv2.resize(ctrans_tosearch[ytop:ytop + window, xleft:xleft + window], (64,64))
# Get color features
spatial_features = bin_spatial(subimg, size=spatial_size)
hist_features = color_hist(subimg, nbins=hist_bins)
# Scale features and make a prediction
test_features = X_scaler.transform(np.hstack((spatial_features, hist_features, hog_features)).reshape(1, -1))
test_prediction = svc.predict(test_features)
if test_prediction == 1:
xbox_left = np.int(xleft * scale)
ytop_draw = np.int(ytop * scale)
win_draw = np.int(window * scale)
cv2.rectangle(draw_img,(xbox_left, ytop_draw + ystart),(xbox_left + win_draw, ytop_draw + win_draw + ystart),(0,0,255),6)
img_boxes.append(((xbox_left, ytop_draw + ystart),(xbox_left + win_draw, ytop_draw + win_draw + ystart)))
heatmap[ytop_draw + ystart:ytop_draw + win_draw + ystart, xbox_left:xbox_left + win_draw] +=1
print(time.time() - t, 'seconds to run, total windows = ', count)
out_images.append(draw_img)
out_titles.append(img_src[-12:])
out_titles.append(img_src[-12:])
out_images.append(heatmap)
out_maps.append(heatmap)
out_boxes.append(img_boxes)
fig = plt.figure(figsize = (12,24))
visualize(fig, 8, 2, out_images, out_titles)
2. Show some examples of test images to demonstrate how your pipeline is working. What did you do to optimize the performance of your classifier?
Here is where YCrCb became my choice because it produced the best results. Here are some example images run through the pipeline:
HLS, for example, had more false positives with the same settings
1. Provide a link to your final video output. Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.)
Here's a link to my video result
2. Describe how (and identify where in your code) you implemented some kind of filter for false positives and some method for combining overlapping bounding boxes.
- How did you reduce false positives in the pipeline to make it more reliable?
I was able to reduce false positives by tweaking the scale, ystart and ystop variables and scale.
- Did you apply thresholding in order to improve on the performance of the classifier?
I applied the a threshold using the code below, taken from stage 35 of the course.
Inside the process image function. I ended up setting this to 0 and just narrowing down the region.
....
heat_map = apply_threshold(heat_map, 0)
....
The method is defined in cell 33.
def apply_threshold(heatmap, threshold):
# Zero out pixels below the threshold
heatmap[heatmap <= threshold] = 0
# Return the image after applying threshold
return heatmap
def process_image(img):
out_img, heat_map = find_cars(img, scale)
heat_map = apply_threshold(heat_map, 0)
labels = label(heat_map)
draw_img = draw_labeled_bboxes(np.copy(img), labels)
return draw_img
1. Briefly discuss any problems / issues you faced in your implementation of this project. Where will your pipeline likely fail? What could you do to make it more robust?
I have yet to try it out on a busy street. I feel like it would fail if there was a lot of movement, like in windy conditions, where things move across the rod. Although, that is just a guess, nothing tested yet. I am also unsure how well it would do in less than ideal driving conditions such as rain, snow, night-time driving. etc.
Overall I feel this is a good start but to make something like this truly useful It would need a lot of testing in different conditions.
Improvements
- Need to adjust scale so I can detect images further away in the image.
- Need to do a better job smoothing out the boxes in order to eliminate flashing boxes.
- I would like to develop this further to provide information about the other vehicles like speed distance away from the car.
- I would like to figure out a way to mark cars coming at me from the other direction as well. In this cvideo example it is unimportant, but on a 2 lane road I think it would be more useful.