안녕하세요 ! 소신입니다.
R-CNN이랑 Fast R-CNN은 거의 논문리뷰만 하고 구현은 안했는데,
Faster R-CNN은 구현까지 해보았습니다. (근데 오류가 있는것 같음..)
# Faster R-CNN 구조
Faster R-CNN의 구조는 Fast R-CNN에 Region Proposal Network(RPN)를 추가한 모델입니다.
Fast R-CNN의 단점
Selective Search의 CPU로 연산 (연산 시간의 Bottleneck)
Faster R-CNN의 해결 방안
Region Proposal을 모델에서 처리
이로 인해 진정한 end-to-end 학습이 가능한 모델이 되는데요,
근데 학습과정이 생각보다 까다로웠습니다.
# Faster R-CNN 학습 과정
흔히 Backbone이라고 부르는 Conv Layer에서 Feature Map을 추출합니다.
추출한 Feature Map을 RPN Layer에 넣어, region proposal을 생성합니다.
Feature Map과 RPN을 거쳐 얻은 Region Proposal을 통해 Bounding Box Regression과 Classifier를 수행합니다.
RoI Pooling + RoI Head (Fast R-CNN)
여기서 두 가지 Loss를 통해 학습하게 되는데,
RPN layer의 loss와 RoI의 Loss입니다.
이 둘을 합해서 하나의 Total Loss를 계산하고, 이를 통해 전체 네트워크를 업데이트합니다.
# Faster R-CNN 코드 단위 이해
Faster R-CNN을 코드로 이해하는데 꼬박 3일정도 걸린 것 같습니다. (저는 바보인가봐요...)
참고한 코드는
1. Faster RCNN from scratch Github
2. Ganghee-Lee/Faster-RCNN-Tensor Flow Github
3. Simple Faster RCNN pytorch Github
총 세 개 입니다.
1번 코드를 중심으로 설명하자면,
# 패키지 Import, custom utils Load
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from utils import *
utils에는 IoU 계산, non-maximum suppression(NMS), bounding box -> loc 변환, loc -> bounding box 변환
# Sample Data, Feature Extraction
image = torch.zeros((1, 3, 800, 800)).float()
image_size = (800, 800)
# bbox -> y1, x1, y2, x2
bbox = torch.FloatTensor([[20, 30, 400, 500], [300, 400, 500, 600]])
labels = torch.LongTensor([6, 8])
sub_sample = 16
vgg16 = torchvision.models.vgg16(pretrained=True)
req_features = vgg16.features[:30]
print(req_features)
output_map = req_features(image)
print(output_map.shape)
샘플 데이터로 800x800 이미지를 사용했고, bounding box는 2개, labels도 2개입니다.
sub_sample은 down sampling이 얼마나 되는지, 800에서 50으로 줄어듦으로 16입니다. (down sample 4번 2**4)
원본 이미지와 box입니다.
output_map이 Conv Layer (여기선 VGG16)을 지나서 나온 결과입니다.
# Anchor 생성
anchor_scale = [8, 16, 32]
ratio = [0.5, 1, 2] # H/W
len_anchor_scale = len(anchor_scale)
len_ratio = len(ratio)
len_anchor_template = len_anchor_scale * len_ratio
anchor_template = np.zeros((9, 4))
for idx, scale in enumerate(anchor_scale):
h = scale * np.sqrt(ratio) * sub_sample
w = scale / np.sqrt(ratio) * sub_sample
y1 = -h/2
x1 = -w/2
y2 = h/2
x2 = w/2
anchor_template[idx*len_ratio:(idx+1)*len_ratio, 0] = y1
anchor_template[idx*len_ratio:(idx+1)*len_ratio, 1] = x1
anchor_template[idx*len_ratio:(idx+1)*len_ratio, 2] = y2
anchor_template[idx*len_ratio:(idx+1)*len_ratio, 3] = x2
print(anchor_template)
Anchor scale과 ratio로 box를 판단할 모양을 만들어줍니다. (anchor_template)
anchor 하나당 scale 3, ratio 3으로 총 9개입니다.
template 도식입니다. 크기별로, 모양별로 총 9가지가 나오게 됩니다.
feature_map_size = (50, 50)
# The first center coors is (8, 8)
ctr_y = np.arange(8, 800, 16)
ctr_x = np.arange(8, 800, 16)
ctr = np.zeros((*feature_map_size, 2))
for idx, y in enumerate(ctr_y):
ctr[idx, :, 0] = y
ctr[idx, :, 1] = ctr_x
print(ctr.shape)
anchor_template이 위치할 점들을 가져옵니다.
위의 점들을 도식하면 아래와 같이 50x50개의 anchor center들이 생성됩니다.
anchors = np.zeros((*feature_map_size, 9, 4))
for idx_y in range(feature_map_size[0]):
for idx_x in range(feature_map_size[1]):
anchors[idx_y, idx_x] = (ctr[idx_y, idx_x] + anchor_template.reshape(-1, 2, 2)).reshape(-1, 4)
anchors = anchors.reshape(-1, 4)
print(anchors.shape) # (22500, 4)
이 점들에 각각 anchor template을 적용해줍니다.
파란색 박스가 800x800 이미지고,
각각의 anchor별로 template을 적용했을 때, 나오는 22500개 (50x50x9)의 anchor box들입니다.
# anchor box labeling for RPN
valid_index = np.where((anchors[:, 0] >= 0)
&(anchors[:, 1] >= 0)
&(anchors[:, 2] <= 800)
&(anchors[:, 3] <= 800))[0]
print(valid_index.shape) # 8940
이미지를 넘어가는 박스는 사실상 사용을 못하는 것이기 때문에, 제외해줍니다.
valid_labels = np.empty((valid_index.shape[0],), dtype=np.int32)
valid_labels.fill(-1)
valid_anchors = anchors[valid_index]
print(valid_anchors.shape) # (8940,4)
print(bbox.shape) # torch.Size([2,4])
RPN에서는 Class는 관심없고, Object가 있는 곳을 Region Proposal하는 것이 중요하므로, label은 1, 0이 됩니다.
또한, 8940개의 proposal 중 겹치는 것도 있고, 불필요한 것들도 있으므로 이를 제거하기 위해
-1이라는 label을 할당해줍니다.
ious = bbox_iou(valid_anchors, bbox.numpy()) # anchor 8940 : bbox 2
pos_iou_thres = 0.7
neg_iou_thred = 0.3
# Scenario A
anchor_max_iou = np.amax(ious, axis=1)
pos_iou_anchor_label = np.where(anchor_max_iou >= pos_iou_thres)[0]
neg_iou_anchor_label = np.where(anchor_max_iou < neg_iou_thred)[0]
valid_labels[pos_iou_anchor_label] = 1
valid_labels[neg_iou_anchor_label] = 0
# Scenario B
gt_max_iou = np.amax(ious, axis=0)
gt_max_iou_anchor_label = np.where(ious == gt_max_iou)[0]
print(gt_max_iou_anchor_label)
valid_labels[gt_max_iou_anchor_label] = 1
valid anchor와 bbox 와의 iou를 계산해줍니다.
ious는 행렬이 되는데 행은 각 anchor가 되고 열은 bbox입니다.
8940, 2의 shape로 Anchor별 bbox와의 iou 값들이 들어가게 됩니다.
이 값을 통해 0.7 이상이면 positive, 0.3보다 작으면 negative를 레이블하게 됩니다.
대부분의 경우 0.7이상인 경우가 많지 않기 때문에, Scenario A에서는 논문대로 하고,
Scenario B에서 박스별 iou 최대값인 애들로 positive로 레이블합니다.
n_sample_anchors = 256
pos_ratio = 0.5
total_n_pos = len(np.where(valid_labels == 1)[0])
n_pos_sample = n_sample_anchors*pos_ratio if total_n_pos > n_sample_anchors*pos_ratio else total_n_pos
n_neg_sample = n_sample_anchors - n_pos_sample
pos_index = np.where(valid_labels == 1)[0]
if len(pos_index) > n_sample_anchors*pos_ratio:
disable_index = np.random.choice(pos_index, size=len(pos_index)-n_pos_sample, replace=False)
valid_labels[disable_index] = -1
neg_index = np.where(valid_labels == 0)[0]
disable_index = np.random.choice(neg_index, size=len(neg_index) - n_neg_sample, replace=False)
valid_labels[disable_index] = -1
그리고나서 positive와 negative를 합해 256개만 남기고 제외합니다.
postive가 128개가 되지 않는 경우, 나머지는 negative로 채우게 됩니다.
# Each anchor corresponds to a box
argmax_iou = np.argmax(ious, axis=1)
max_iou_box = bbox[argmax_iou].numpy()
print(max_iou_box.shape) # 8940, 4
print(valid_anchors.shape) # 8940, 4
anchor_loc_format_target = format_loc(valid_anchors, max_iou_box)
print(anchor_loc_format_target.shape) # 8940, 4
위의 코드를 보면, ious에서 Anchor별로 어떤 박스가 iou가 높은지 확인합니다.
(0.37312, 0.38272) 이면 1, (0.38272, 0.37312) 이면 0
이렇게하면, 1 0 1 0 0 0 0 1 0, ... 이라는 8940개의 배열이 생기게 됩니다.
이 index로 box값들을 하나하나 할당해서 8940, 4의 배열을 만듭니다.
그리고나서, utils에 있는(직접 만든) format_loc함수로 anchor box에 location을 할당해줍니다.
(정확히 이해는 못했는데, Regression을 해준다는 의미인 것 같습니다.)
anchor_target_labels = np.empty((len(anchors),), dtype=np.int32)
anchor_target_format_locations = np.zeros((len(anchors), 4), dtype=np.float32)
anchor_target_labels.fill(-1)
anchor_target_labels[valid_index] = valid_labels
anchor_target_format_locations[valid_index] = anchor_loc_format_target
print(anchor_target_labels.shape) # 22500,
print(anchor_target_format_locations.shape) # 22500, 4
이렇게 하면, 최종적으로 label하고, loc을 계산한 앵커들이 나오게 됩니다.
# RPN
위의 과정이 RPN을 위한 사전 작업이라고 볼 수 있습니다.
mid_channel = 512
in_channel = 512
n_anchor = 9
conv1 = nn.Conv2d(in_channel, mid_channel, 3, 1, 1)
reg_layer = nn.Conv2d(mid_channel, n_anchor*4, 1, 1, 0)
cls_layer = nn.Conv2d(mid_channel, n_anchor*2, 1, 1, 0)
VGG Net 기준 conv layer의 output channel이 512이라서 in channel은 512가 되고,
box regression은 anchor 9 * 4 (location)
box classification은 anchor 9 * 2 (object or not)
이 됩니다.
x = conv1(output_map)
anchor_pred_format_locations = reg_layer(x)
anchor_pred_scores = cls_layer(x)
print(anchor_pred_format_locations.shape) # torch.Size([1, 36, 50, 50])
print(anchor_pred_scores.shape) # torch.Size([1, 18, 50, 50])
weight 초기화를 거쳐, 추출한 feature map을 conv에 통과시키고,
location과 class를 예측합니다.
이렇게 되면 각 위치별 (50, 50) regression 과 classification 예측 값이 나오게 됩니다.
anchor_pred_format_locations = anchor_pred_format_locations.permute(0, 2, 3, 1).contiguous().view(1, -1, 4)
anchor_pred_scores = anchor_pred_scores.permute(0, 2, 3, 1).contiguous().view(1, -1, 2)
objectness_pred_scores = anchor_pred_scores[:, :, 1]
위에서 ground truth로 만든 anchor와 비교하기 위해, 형태를 맞춰줍니다.
print(anchor_target_labels.shape)
print(anchor_target_format_locations.shape)
print(anchor_pred_scores.shape)
print(anchor_pred_format_locations.shape)
gt_rpn_format_locs = torch.from_numpy(anchor_target_format_locations)
gt_rpn_scores = torch.from_numpy(anchor_target_labels)
rpn_format_locs = anchor_pred_format_locations[0]
rpn_scores = anchor_pred_scores[0]
target은 bbox를 통해 만든 ground truth 값들, pred는 RPN으로 예측한 값들입니다.
둘 모두 reg 22500, 4 / cls 22500, 1이 됩니다.
numpy에서 torch로 변환해주고, batch 중 1개만 가져와주는 코드 (여기선 batch가 1)
####### Object or not loss
rpn_cls_loss = F.cross_entropy(rpn_scores, gt_rpn_scores.long(), ignore_index=-1)
print(rpn_cls_loss)
####### location loss
mask = gt_rpn_scores > 0
mask_target_format_locs = gt_rpn_format_locs[mask]
mask_pred_format_locs = rpn_format_locs[mask]
print(mask_target_format_locs.shape)
print(mask_pred_format_locs.shape)
x = torch.abs(mask_target_format_locs - mask_pred_format_locs)
rpn_loc_loss = ((x<0.5).float()*(x**2)*0.5 + (x>0.5).float()*(x-0.5)).sum()
print(rpn_loc_loss)
object인지 아닌지는 cross entropy loss를, location은 실제 object인 것만 masking해서 loss 값을 계산합니다.
rpn_lambda = 10
N_reg = mask.float().sum()
rpn_loss = rpn_cls_loss + rpn_lambda / N_reg * rpn_loc_loss
print(rpn_loss)
cls loss와 loc loss는 lambda로 적절하게 합쳐주게 됩니다.
# Generating Proposal to Feed Fast R-CNN
RPN에서 구한 Proposal을 Fast R-CNN에서 학습할것만 남기는 과정입니다.
nms_thresh = 0.7
n_train_pre_nms = 12000
n_train_post_nms = 2000
n_test_pre_nms = 6000
n_test_post_nms = 300
min_size = 16
non-maximum suppression (NMS)으로 같은 클래스 정보를 가지는 박스들끼리 iou값을 비교해, 중복되는 것들은 제외해줍니다. 이 때 threshold가 nms_thresh 0.7입니다.
nms이전에 12000개만 우선적으로 남기게 되고,
nms를 하면 2000개의 최종 proposal만 남습니다.
이 2000개의 proposal로 Fast RCNN을 학습하게 됩니다.
box의 width 와 height가 16보다 작으면, 해당 proposal도 제외합니다.
Test에선 6000개, 300개만 남기게 됩니다. (현재 코드에선 사용 X)
print(anchors.shape) # 22500, 4
print(anchor_pred_format_locations.shape) # 22500, 4
rois = deformat_loc(anchors=anchors, formatted_base_anchor=anchor_pred_format_locations[0].data.numpy())
print(rois.shape) # 22500, 4
print(rois)
#[[ -37.56205856 -83.65124834 55.51502551 96.9647187 ]
# [ -59.50866938 -56.68875009 64.91222143 72.23375052]
# [ -81.40298363 -41.99777969 96.39533509 49.35743635]
# ...
# [ 610.35422226 414.3952291 979.0893042 1163.98340092]
# [ 538.20066833 564.81064224 1041.29725647 1063.15491104]
# [ 432.48094419 606.7697889 1166.24708388 973.39356325]]
이 부분은 좀 헷갈리는데, 예측한 location을 anchors를 통해 rois로 다시 바꿔줍니다. (bounding box)
rois[:, 0:4:2] = np.clip(rois[:, 0:4:2], a_min=0, a_max=image_size[0])
rois[:, 1:4:2] = np.clip(rois[:, 1:4:2], a_min=0, a_max=image_size[1])
print(rois)
# [[ 0. 0. 55.51502551 96.9647187 ]
# [ 0. 0. 64.91222143 72.23375052]
# [ 0. 0. 96.39533509 49.35743635]
# ...
# [610.35422226 414.3952291 800. 800. ]
# [538.20066833 564.81064224 800. 800. ]
# [432.48094419 606.7697889 800. 800. ]]
그리고 이미지 사이즈를 벗어나는 값들은 이미지 크기에 맞게 조정해줍니다.
h = rois[:, 2] - rois[:, 0]
w = rois[:, 3] - rois[:, 1]
valid_index = np.where((h>min_size)&(w>min_size))[0]
valid_rois = rois[valid_index]
valid_scores = objectness_pred_scores[0][valid_index].data.numpy()
그리고 box크기가 16보다 작은 것들은 제외하고
object score를 기준으로 정렬해줍니다.
valid_score_order = valid_scores.ravel().argsort()[::-1]
pre_train_valid_score_order = valid_score_order[:n_train_pre_nms]
pre_train_valid_rois = valid_rois[pre_train_valid_score_order]
pre_train_valid_scores = valid_scores[pre_train_valid_score_order]
print(pre_train_valid_rois.shape) # 12000, 4
print(pre_train_valid_scores.shape) # 12000,
print(pre_train_valid_score_order.shape) # 12000,
nms를 적용하기 전 12000개만 가져오고
keep_index = nms(rois=pre_train_valid_rois, scores=pre_train_valid_scores, nms_thresh=nms_thresh)
post_train_valid_rois = pre_train_valid_rois[keep_index][:n_train_post_nms]
post_train_valid_scores = pre_train_valid_scores[keep_index][:n_train_post_nms]
print(post_train_valid_rois.shape) # 2000, 4
print(post_train_valid_scores.shape) # 2000,
nms를 적용해 2000개의 roi만 남깁니다.
2000개도 생각보다 많습니다.
# anchor box labeling for Fast R-CNN
n_sample = 128
pos_ratio = 0.25
pos_iou_thresh = 0.5
neg_iou_thresh_hi = 0.5
neg_iou_thresh_lo = 0.0
여기서부턴 RPN에서 ground truth를 만드는 과정과 같습니다.
단지 Fast RCNN을 위한 ground truth를 만드는 것이 차이 (실제 클래스, 실제 bounding box)
ious = bbox_iou(post_train_valid_rois, bbox)
print(ious.shape) # 2000, 2
위에서 구한 2000개의 roi와 bbox를 비교해 iou를 계산해줍니다.
RPN에선 8940, 2였는데 2000개만 비교해주면 되니, 2000, 2의 배열이 만들어지게 됩니다.
bbox_assignments = ious.argmax(axis=1)
roi_max_ious = ious.max(axis=1)
roi_target_labels = labels[bbox_assignments]
print(roi_target_labels.shape) # 2000
여기선 anchor에서 큰 값인 애들을 실제 label (6, 8)로 각각 할당해주게 됩니다.
0번째가 크면 6 1번째가 크면 8입니다. (헷갈리신다면 코드 구현 맨 위에서 box label값을 확인해보세요)
6 8 6 6 8 6 6 6과 같은 형태의 배열이 만들어지게 되는데
이게 전부 target일수가 없겠죠?
total_n_pos = len(np.where(roi_max_ious >= pos_iou_thresh)[0])
n_pos_sample = n_sample*pos_ratio if total_n_pos > n_sample*pos_ratio else total_n_pos
n_neg_sample = n_sample - n_pos_sample
print(n_pos_sample) # 10
print(n_neg_sample) # 118
그래서 positive threshold에 따라 positive인 애들과 negative인 애들을 128개만 sampling합니다. (n_sample)
pos_index = np.where(roi_max_ious >= pos_iou_thresh)[0]
pos_index = np.random.choice(pos_index, size=n_pos_sample, replace=False)
neg_index = np.where((roi_max_ious < neg_iou_thresh_hi) & (roi_max_ious > neg_iou_thresh_lo))[0]
neg_index = np.random.choice(neg_index, size=n_neg_sample, replace=False)
print(pos_index.shape) # 10
print(neg_index.shape) # 118
positive index와 negative index를 가져오고, (pos ratio가 0.25이기 때문에 positive box는 최대 32개만 가져옴)
keep_index = np.append(pos_index, neg_index)
post_sample_target_labels = roi_target_labels[keep_index].data.numpy()
post_sample_target_labels[len(pos_index):] = 0
post_sample_rois = post_train_valid_rois[keep_index]
최종적으로 sampling까지 끝낸 roi들만 남기게 됩니다.
그 중에 positive만 뽑아보면 위의 그래프와 같습니다.
왼쪽 아래 박스가 라벨 6이고, 오른쪽 위 박스가 라벨 8이니
초록색은 6라벨을 위한 roi box가 되고, 빨간색은 라벨 8을 위한 roi box가 됩니다.
post_sample_bbox = bbox[bbox_assignments[keep_index]]
post_sample_format_rois = format_loc(anchors=post_sample_rois, base_anchors=post_sample_bbox.data.numpy())
print(post_sample_format_rois.shape)
이를 Fast R-CNN과 비교하기 위한 loc 형태로 변환해주면 target box도 끝
# Fast R-CNN
rois = torch.from_numpy(post_sample_rois).float()
print(rois.shape) # 128, 4
# roi_indices = torch.zeros((len(rois),1), dtype=torch.float32)
# print(rois.shape, roi_indices.shape)
# indices_and_rois = torch.cat([roi_indices, rois], dim=1)
# print(indices_and_rois.shape)
roi를 torch로 변환해주고,
밑에 주석처리된 코드는 batch별로 계산해주기 위해
batch별 index를 할당해주고 인덱스, roi로 배열을 만들어주는 코드입니다 (여기선 batch 1)
RoI Pooling
size = (7, 7)
adaptive_max_pool = nn.AdaptiveMaxPool2d(size)
# correspond to feature map
rois.mul_(1/16.0)
rois = rois.long()
roi pooling을 통해 고정된 크기로 추출합니다.
그리고 128개의 rois들은 각각 50,50의 공간에 매핑됩니다.
output = []
num_rois = len(rois)
for roi in rois:
roi_feature = output_map[..., roi[0]:roi[2]+1, roi[1]:roi[3]+1]
output.append(adaptive_max_pool(roi_feature))
output = torch.cat(output, 0)
print(output.shape) # 128, 512, 7, 7
각각의 roi를 pooling layer를 거쳐, 고정된 크기로 추출해주면 128, 512, 7, 7의 결과가 나오게 됩니다.
이미지 크기에 자유롭기 위해 roi pooling layer를 사용해준 모습이고,
output_ROI_pooling = output.view(output.size(0), -1)
print(output_ROI_pooling.shape) # 128, 25088
이를 일자로 펴주게되면 128, 25088의 배열이 나오게 됩니다.
RoI Head & Classifier, BBox Regression
roi_head = nn.Sequential(nn.Linear(25088, 4096),
nn.Linear(4096, 4096))
cls_loc = nn.Linear(4096, 21*4)
cls_loc.weight.data.normal_(0, 0.01)
cls_loc.bias.data.zero_()
cls_score = nn.Linear(4096, 21)
cls_score.weight.data.normal_(0, 0.01)
cls_score.bias.data.zero_()
x = roi_head(output_ROI_pooling)
roi_cls_loc = cls_loc(x)
roi_cls_score = cls_score(x)
print(roi_cls_loc.shape, roi_cls_score.shape) # 128, 84 / 128, 21
최종적으로 fully connected layer를 거쳐, 20 (class) + 1 (background)로 분류하게 됩니다.
location은 *4 (x1,y1,x2,y2)
Fast R-CNN Loss
print(roi_cls_loc.shape) # 128, 84
print(roi_cls_score.shape) # 128, 21
예측값
print(post_sample_format_rois.shape) # 128, 4
print(post_sample_target_labels.shape) # 128,
gt_roi_cls_loc = torch.from_numpy(post_sample_format_rois).float()
gt_roi_cls_label = torch.from_numpy(post_sample_target_labels).long()
실제값 ground truth입니다.
roi_cls_loss = F.cross_entropy(roi_cls_score, gt_roi_cls_label)
print(roi_cls_loss)
cls loss는 cross entropy loss를
num_roi = roi_cls_loc.size(0)
roi_cls_loc = roi_cls_loc.view(-1, 21, 4)
roi_cls_loc = roi_cls_loc[torch.arange(num_roi), gt_roi_cls_label]
print(roi_cls_loc.shape)
mask = gt_roi_cls_label>0
mask_loc_pred = roi_cls_loc[mask]
mask_loc_target = gt_roi_cls_loc[mask]
print(mask_loc_pred.shape) # 10, 4
print(mask_loc_target.shape) # 10, 4
x = torch.abs(mask_loc_pred-mask_loc_target)
roi_loc_loss = ((x<0.5).float()*x**2*0.5 + (x>0.5).float()*(x-0.5)).sum()
print(roi_loc_loss)
Fast R-CNN도 마찬가지로 masking처리해서 label이 background가 아닌 것들만 bounding box regression하게 됩니다.
roi_lambda = 10
N_reg = (gt_roi_cls_label>0).float().sum()
roi_loss = roi_cls_loss + roi_lambda / N_reg * roi_loc_loss
print(roi_loss)
lambda를 적용해 Fast R-CNN의 Total loss를 구할 수 있습니다.
# Faster R-CNN Total Loss
total_loss = rpn_loss + roi_loss
Faster R-CNN의 Total loss는 rpn_loss와 roi_loss를 합친 값이 됩니다.
이 loss를 backward해서 network를 update하면 됩니다.
이 과정을 모듈화하는게 헬포인트
# Pytorch 클래스화
정리되지않아 코드가 길어, 접어놓았습니다. (오류도 좀 있고 이미지 크기나 여러 상황에 일반화가 되지 않았습니다.)
faster_rcnn.py
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision.ops import RoIPool
import numpy as np
from utils import *
# Backbone
from backbone import get_bb_clf
# bbox = torch.FloatTensor([[30,20,500,400], [400,300,600,500]])
# labels = torch.LongTensor([6, 8])
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import numpy as np
from utils import *
from creator_tools import *
# bbox = torch.FloatTensor([[30,20,500,400], [400,300,600,500]])
# labels = torch.LongTensor([6, 8])
# RPN
class RPN(nn.Module):
def __init__(
self, in_c=512, mid_c=512,
image_size=(800,800), sub_sample=16,
anchor_scale=[8,16,32], ratio=[0.5,1,2],
):
super(RPN, self).__init__()
self.rpn = nn.Conv2d(in_c, mid_c, 3, 1, 1)
self.relu = nn.ReLU(inplace=True)
n_anchor = len(anchor_scale) * len(ratio)
self.reg = nn.Conv2d(mid_c, n_anchor*4, 1, 1, 0)
self.cls = nn.Conv2d(mid_c, n_anchor*2, 1, 1, 0)
self.anchor_base = generate_anchors(image_size, sub_sample=sub_sample,
anchor_scale=anchor_scale, ratio=ratio)
self.proposal_layer = ProposalCreator(self)
weight_init(self.rpn)
weight_init(self.reg)
weight_init(self.cls)
# x : feature map
def forward(self, x, img_size, scale=1.):
n, _, h, w = x.shape
anchor = self.anchor_base
n_anchor = anchor.shape[0] // (h * w) # 9
x = self.rpn(x)
x = self.relu(x)
pred_loc = self.reg(x) # batch, anchor*4, height, width
pred_cls = self.cls(x) # batch, anchor*2, height, width
pred_loc = pred_loc.permute(0, 2, 3, 1).contiguous().view(n, -1, 4) # batch anchors (coor)
pred_cls = pred_cls.permute(0, 2, 3, 1).contiguous() # batch anchors (obj)
pred_sfmax_cls = F.softmax(pred_cls.view(n, h, w, n_anchor, 2), dim=4)
pred_fg_cls = pred_sfmax_cls[:,:,:,:,1].contiguous()
pred_fg_cls = pred_fg_cls.view(n, -1)
pred_cls = pred_cls.view(n, -1, 2)
pred_object = pred_cls[:, :, 1]
rois = []
roi_indices = []
for i in range(n):
roi = self.proposal_layer(
pred_loc[i].cpu().data.numpy(),
pred_fg_cls[i].cpu().data.numpy(),
anchor, img_size,scale=scale)
batch_index = i * np.ones((len(roi),), dtype=np.int32)
rois.append(roi)
roi_indices.append(batch_index)
rois = np.concatenate(rois, axis=0)
roi_indices = np.concatenate(roi_indices, axis=0)
return pred_loc, pred_cls, rois, roi_indices, anchor
# target_loc, target_cls = assign_cls_loc로 만든 것
def rpn_loss(pred_loc, pred_cls, target_loc, target_cls, rpn_lamda=10):
# cls loss
# print(pred_cls.shape)
gt_rpn_cls = torch.from_numpy(target_cls).long().to('cuda:0')
pred_rpn_cls = pred_cls[0].to('cuda:0')
# print(pred_rpn_cls.shape, gt_rpn_cls.shape)
rpn_cls_loss = F.cross_entropy(pred_rpn_cls, gt_rpn_cls, ignore_index=-1)
# reg loss
gt_rpn_loc = torch.from_numpy(target_loc).to('cuda:0')
pred_rpn_loc = pred_loc[0].to('cuda:0')
mask = gt_rpn_cls > 0
mask_gt_loc = gt_rpn_loc[mask]
mask_pred_loc = pred_rpn_loc[mask]
x = torch.abs(mask_gt_loc - mask_pred_loc)
rpn_loc_loss = ((x<0.5).float()*(x**2)*0.5 + (x>0.5).float()*(x-0.5)).sum()
N_reg = mask.float().sum()
rpn_loss = rpn_cls_loss + rpn_lamda / N_reg * rpn_loc_loss
return rpn_cls_loss, rpn_loc_loss, rpn_loss
# class RoIHead(nn.Module):
# def __init__(self, n_class, roi_size, spatial_scale, classifier):
# super(RoIHead, self).__init__()
# self.classifier = classifier
# self.cls_loc
class FastRCNN(nn.Module):
def __init__(self, classifier, n_class=21, size=(7,7), spatial_scale=(1./16)):
super(FastRCNN, self).__init__()
self.roi = RoIPool(size, spatial_scale)
self.roi_pool = nn.AdaptiveMaxPool2d(size)
self.classifier = classifier
self.reg = nn.Linear(4096, n_class*4)
weight_init(self.reg)
self.cls = nn.Linear(4096, n_class)
weight_init(self.cls)
def forward(self, feature_map, rois, roi_indices):
# correspond to feature map
roi_indices = totensor(roi_indices).float()
rois = totensor(rois).float()
indices_rois = t.cat([roi_indices[:, None], rois], dim=1).contiguous()
pool = self.roi(feature_map, indices_rois)
pool = pool.view(pool.size(0), -1)
x = self.classifier(pool)
roi_loc = self.reg(x)
roi_cls = self.cls(x)
return roi_loc, roi_cls
# gt_loc = torch.from_numpy(final_rois).float()
# gt_cls = torch.from_numpy(final_cls).long()
def fastrcnn_loss(roi_loc, roi_cls, gt_loc, gt_cls): # [128, 84], [128, 21], [128, 4] torch float, [128, 1] torch long
roi_cls = roi_cls.to('cuda:0')
gt_cls = gt_cls.to('cuda:0')
roi_loc = roi_loc.to('cuda:0')
gt_loc = torch.from_numpy(gt_loc).float().to('cuda:0')
# print(roi_cls)
# print(roi_cls.shape, gt_cls.shape, roi_loc.shape, gt_loc.shape)
cls_loss = F.cross_entropy(roi_cls, gt_cls)
# print(cls_loss)
num_roi = roi_loc.size(0)
roi_loc = roi_loc.view(-1, 21, 4)
roi_loc = roi_loc[torch.arange(num_roi), gt_cls]
mask = gt_cls>0
mask_loc_pred = roi_loc[mask]
mask_loc_target = gt_loc[mask]
x = torch.abs(mask_loc_pred-mask_loc_target)
loc_loss = ((x<0.5).float()*x**2*0.5 + (x>0.5).float()*(x-0.5)).sum()
# print(loc_loss)
roi_lamda = 10
N_reg = (gt_cls>0).float().sum()
roi_loss = cls_loss + roi_lamda / N_reg * loc_loss
return cls_loss, loc_loss, roi_loss
class FasterRCNN(nn.Module):
def __init__(self, backbone, rpn, head):
super(FasterRCNN, self).__init__()
self.backbone = backbone
self.rpn = rpn
self.head = head # Fast R-CNN
self.proposal_target_creator = ProposalTargetCreator()
def forward(self, img, bboxes, labels):
b, c, h, w = img.shape
##### backbone
feature_map = self.backbone(img)
##### RPN
# anchors = generate_anchors((w, h))
# target_cls, target_loc = assign_cls_loc(bboxes, anchors, (w, h))
pred_loc, pred_cls, rois, roi_indices, anchor = self.rpn(feature_map, (w,h), scale=1.)
target_cls, target_loc = assign_cls_loc(bboxes, anchor, (w,h))
rpn_loc_loss, rpn_cls_loss, t_rpn_loss = rpn_loss(pred_loc, pred_cls, target_loc, target_cls)
# pred_loc, pred_cls, pred_object =
sample_roi, gt_roi_loc, gt_roi_label = self.proposal_target_creator(rois, bboxes, labels)
sample_roi_index = t.zeros(len(sample_roi))
##### HEAD - Fast RCNN
final_loc, final_cls = self.head(feature_map, sample_roi, sample_roi_index)
roi_cls_loss, roi_loc_loss, t_roi_loss = fastrcnn_loss(final_loc, final_cls, gt_roi_loc, gt_roi_label)
t_loss = torch.sum(t_roi_loss + t_rpn_loss)
return rpn_loc_loss, rpn_cls_loss, roi_cls_loss, roi_loc_loss, t_loss
# post_train_rois, post_train_scores = generate_proposal(anchors, pred_loc, pred_cls, pred_object, (w, h))
# final_rois, final_cls = assign_targets(post_train_rois, post_train_scores, bboxes, labels)
# final_rois, final_cls = torch.from_numpy(final_rois).float(), torch.from_numpy(final_cls).long()
# rois = torch.from_numpy(final_rois).float()
# roi_loc, roi_cls = self.fastrcnn(feature_map, final_rois)
# gt_loc = final_rois
# gt_cls = final_cls
# return final_loc, final_cls, rois, roi_indices
def fasterrcnn_loss(rpn_loss, roi_loss):
return torch.sum(rpn_loss + roi_loss)
class FasterRCNNSEMob(FasterRCNN):
down_size = 16
def __init__(self, n_fg_class=20, ratios=[0.5, 1, 2], anchor_scales=[8,16,32]):
backbone, classifier = get_bb_clf()
rpn = RPN()
head = FastRCNN(classifier, n_class=n_fg_class+1, spatial_scale=(1./16))
super(FasterRCNNSEMob, self).__init__(backbone, rpn, head)
def assign_cls_loc(bboxes, anchors, image_size, pos_thres=0.7, neg_thres=0.3, n_sample=256, pos_ratio=0.5):
valid_idx = np.where((anchors[:, 0] >= 0)
&(anchors[:, 1] >= 0)
&(anchors[:, 2] <= image_size[0])
&(anchors[:, 3] <= image_size[1]))[0]
# print(valid_idx.shape)
valid_cls = np.empty((valid_idx.shape[0], ), dtype=np.int32)
valid_cls.fill(-1)
valid_anchors = anchors[valid_idx]
ious = bbox_iou(valid_anchors, bboxes.numpy())
# print(ious.shape) # 8940, 2
# valid cls에 positive로 판단하는 것이 총 두 시나리오에 의해 생성됨
# a
iou_by_anchor = np.amax(ious, axis=1) # anchor별 최대값
pos_idx = np.where(iou_by_anchor >= pos_thres)[0]
neg_idx = np.where(iou_by_anchor < neg_thres)[0]
valid_cls[pos_idx] = 1
valid_cls[neg_idx] = 0
# b
iou_by_gt = np.amax(ious, axis=0) # gt box별 최대 값
gt_idx = np.where(ious == iou_by_gt)[0]
# print(gt_idx)
valid_cls[gt_idx] = 1
total_n_pos = len(np.where(valid_cls == 1)[0])
n_pos = int(n_sample*pos_ratio) if total_n_pos > n_sample*pos_ratio else total_n_pos
n_neg = n_sample - n_pos
# valid label에서 256개 넘는 것은 제외
pos_index = np.where(valid_cls == 1)[0]
# print(pos_index, len(pos_index, n_pos))
if len(pos_index) > n_sample*pos_ratio:
disable_index = np.random.choice(pos_index, size=len(pos_index)-n_pos, replace=False)
valid_cls[disable_index] = -1
neg_index = np.where(valid_cls == 0)[0]
disable_index = np.random.choice(neg_index, size=len(neg_index) - n_neg, replace=False)
valid_cls[disable_index] = -1
# 최종 valid class (object or not)
# print(len(np.where(valid_cls==1)[0]), len(np.where(valid_cls==0)[0]))
# valid loc
# Anchor별로 iou가 더 높은쪽으로 loc 분배
argmax_iou = np.argmax(ious, axis=1)
max_iou_box = bboxes[argmax_iou].numpy() # valid_anchors와 shape 같아야함
valid_loc = format_loc(valid_anchors, max_iou_box)
# print(valid_loc.shape) # 8940, 4 dx dy dw dh
# 기존 anchor에서 valid index에 지금까지 구한 valid label (pos, neg 18, 238) 할당
target_cls = np.empty((len(anchors),), dtype=np.int32)
target_cls.fill(-1)
target_cls[valid_idx] = valid_cls
# 기존 anchor에서 valid index에 지금까지 구한 dx, dy, dw, dh 할당
target_loc = np.zeros((len(anchors), 4), dtype=np.float32)
target_loc[valid_idx] = valid_loc
# print(target_cls.shape)
# print(target_loc.shape)
return target_cls, target_loc
# for Fast RCNN
def generate_proposal(anchors, pred_loc, pred_cls, pred_object, image_size,
n_train_pre_nms=12000,
n_train_post_nms=2000,
n_test_pre_nms=6000,
n_test_post_nms=300,
min_size=16, nms_thresh=0.7):
rois = deformat_loc(anchors=anchors, formatted_base_anchor=pred_loc[0].cpu().data.numpy())
np.where(rois[:,0])
rois[:, [0,2]] = np.clip(rois[:, [0,2]], a_min=0, a_max=image_size[0]) # x [0 ~ 800] width
rois[:, [1,3]] = np.clip(rois[:, [1,3]], a_min=0, a_max=image_size[1]) # y [0 ~ 800] height
w = rois[:, 2] - rois[:, 0]
h = rois[:, 3] - rois[:, 1]
valid_idx = np.where((h>min_size)&(w>min_size))[0]
valid_rois = rois[valid_idx]
valid_scores = pred_object[0][valid_idx].cpu().data.numpy()
order_idx = valid_scores.ravel().argsort()[::-1]
pre_train_idx = order_idx[:n_train_pre_nms]
pre_train_rois = valid_rois[pre_train_idx]
pre_train_scores = valid_scores[pre_train_idx]
keep_index = nms(rois=pre_train_rois, scores=pre_train_scores, nms_thresh=nms_thresh)
post_train_rois = pre_train_rois[keep_index][:n_train_post_nms]
post_train_scores = pre_train_scores[keep_index][:n_train_post_nms]
return post_train_rois, post_train_scores
def assign_targets(post_train_rois, post_train_scores, bboxes, labels,
n_sample = 128,
pos_ratio = 0.25,
pos_thresh = 0.5,
neg_thresh_hi = 0.5,
neg_thresh_lo = 0.0):
ious = bbox_iou(post_train_rois, bboxes.numpy())
# cls
bbox_idx = ious.argmax(axis=1)
box_max_ious = ious.max(axis=1)
final_cls = labels[bbox_idx] # 2000, Object Class 값 들어감
total_n_pos = len(np.where(box_max_ious >= pos_thresh)[0])
n_pos = int(n_sample*pos_ratio) if total_n_pos > n_sample*pos_ratio else total_n_pos
n_neg = n_sample - n_pos
pos_index = np.where(box_max_ious >= pos_thresh)[0]
pos_index = np.random.choice(pos_index, size=n_pos, replace=False)
neg_index = np.where((box_max_ious < neg_thresh_hi) & (box_max_ious >= neg_thresh_lo))[0]
neg_index = np.random.choice(neg_index, size=n_neg, replace=False)
keep_index = np.append(pos_index, neg_index)
final_cls = final_cls[keep_index].data.numpy()
final_cls[len(pos_index):] = 0
final_rois = post_train_rois[keep_index]
post_sample_bbox = bboxes[bbox_idx[keep_index]]
d_rois = format_loc(anchors=final_rois, base_anchors=post_sample_bbox.data.numpy())
return final_rois, final_cls
def weight_init(l):
if type(l) in [nn.Conv2d]:
l.weight.data.normal_(0, 0.01)
l.bias.data.zero_()
if __name__ == "__main__":
pass
backbone.py
import torch.nn as nn
class SEBlock(nn.Module):
def __init__(self, c, r=16):
super(SEBlock, self).__init__()
self.squeeze = nn.AdaptiveAvgPool2d(1)
self.excitation = nn.Sequential(
nn.Linear(c, c // r, bias=False),
nn.ReLU(inplace=True),
nn.Linear(c // r, c, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
se = self.squeeze(x).view(b, c)
se = self.excitation(se).view(b, c, 1, 1)
return x * se.expand_as(x)
def mobile_block(in_dim, out_dim, stride=1):
return nn.Sequential(
nn.Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=3, stride=stride, padding=1, groups=in_dim),
nn.BatchNorm2d(in_dim),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=in_dim, out_channels=out_dim, kernel_size=1, stride=1, padding=0),
nn.BatchNorm2d(out_dim),
nn.ReLU(inplace=True),
SEBlock(c=out_dim, r=16),
)
class SEMobileNet(nn.Module):
def __init__(self, width_multi=1, resolution_multi=1, num_classes=1000):
super(SEMobileNet, self).__init__()
base_width = int(32 * width_multi)
self.conv = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=base_width, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
mobile_block(base_width, base_width*2),
mobile_block(base_width*2, base_width*4, 2),
mobile_block(base_width*4, base_width*4),
mobile_block(base_width*4, base_width*8, 2),
mobile_block(base_width*8, base_width*8),
mobile_block(base_width*8, base_width*16, 2), # 800x800 -> 50x50
*[mobile_block(base_width*16, base_width*16) for _ in range(5)], # 512 channel
mobile_block(base_width*16, base_width*32, 2),
mobile_block(base_width*32, base_width*32),
nn.AvgPool2d(7),
)
self.classifier = nn.Linear(base_width*32, num_classes)
def forward(self, x):
x = self.conv(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def get_bb_clf():
model = SEMobileNet()
backbone = model.conv[:-3]
for p in backbone[:2].parameters():
p.requires_grad = False
return backbone, nn.Sequential(nn.Linear(512*7*7, 4096), nn.ReLU(inplace=True))
creator_tools.py (simple faster rcnn code)
from utils import *
import torch
class ProposalCreator:
"""Proposal regions are generated by calling this object.
The :meth:`__call__` of this object outputs object detection proposals by
applying estimated bounding box offsets
to a set of anchors.
This class takes parameters to control number of bounding boxes to
pass to NMS and keep after NMS.
If the paramters are negative, it uses all the bounding boxes supplied
or keep all the bounding boxes returned by NMS.
This class is used for Region Proposal Networks introduced in
Faster R-CNN [#]_.
.. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \
Faster R-CNN: Towards Real-Time Object Detection with \
Region Proposal Networks. NIPS 2015.
Args:
nms_thresh (float): Threshold value used when calling NMS.
n_train_pre_nms (int): Number of top scored bounding boxes
to keep before passing to NMS in train mode.
n_train_post_nms (int): Number of top scored bounding boxes
to keep after passing to NMS in train mode.
n_test_pre_nms (int): Number of top scored bounding boxes
to keep before passing to NMS in test mode.
n_test_post_nms (int): Number of top scored bounding boxes
to keep after passing to NMS in test mode.
force_cpu_nms (bool): If this is :obj:`True`,
always use NMS in CPU mode. If :obj:`False`,
the NMS mode is selected based on the type of inputs.
min_size (int): A paramter to determine the threshold on
discarding bounding boxes based on their sizes.
"""
def __init__(self,
parent_model,
nms_thresh=0.7,
n_train_pre_nms=12000,
n_train_post_nms=2000,
n_test_pre_nms=6000,
n_test_post_nms=300,
min_size=16
):
self.parent_model = parent_model
self.nms_thresh = nms_thresh
self.n_train_pre_nms = n_train_pre_nms
self.n_train_post_nms = n_train_post_nms
self.n_test_pre_nms = n_test_pre_nms
self.n_test_post_nms = n_test_post_nms
self.min_size = min_size
def __call__(self, loc, score,
anchor, img_size, scale=1.):
"""input should be ndarray
Propose RoIs.
Inputs :obj:`loc, score, anchor` refer to the same anchor when indexed
by the same index.
On notations, :math:`R` is the total number of anchors. This is equal
to product of the height and the width of an image and the number of
anchor bases per pixel.
Type of the output is same as the inputs.
Args:
loc (array): Predicted offsets and scaling to anchors.
Its shape is :math:`(R, 4)`.
score (array): Predicted foreground probability for anchors.
Its shape is :math:`(R,)`.
anchor (array): Coordinates of anchors. Its shape is
:math:`(R, 4)`.
img_size (tuple of ints): A tuple :obj:`height, width`,
which contains image size after scaling.
scale (float): The scaling factor used to scale an image after
reading it from a file.
Returns:
array:
An array of coordinates of proposal boxes.
Its shape is :math:`(S, 4)`. :math:`S` is less than
:obj:`self.n_test_post_nms` in test time and less than
:obj:`self.n_train_post_nms` in train time. :math:`S` depends on
the size of the predicted bounding boxes and the number of
bounding boxes discarded by NMS.
"""
# NOTE: when test, remember
# faster_rcnn.eval()
# to set self.traing = False
if self.parent_model.training:
n_pre_nms = self.n_train_pre_nms
n_post_nms = self.n_train_post_nms
else:
n_pre_nms = self.n_test_pre_nms
n_post_nms = self.n_test_post_nms
# Convert anchors into proposal via bbox transformations.
# roi = loc2bbox(anchor, loc)
roi = deformat_loc(anchor, loc)
# Clip predicted boxes to image.
roi[:, [0,2]] = np.clip(roi[:, [0,2]], a_min=0, a_max=img_size[0]) # x [0 ~ 800] width
roi[:, [1,3]] = np.clip(roi[:, [1,3]], a_min=0, a_max=img_size[1]) # y [0 ~ 800] height
w = roi[:, 2] - roi[:, 0]
h = roi[:, 3] - roi[:, 1]
# Remove predicted boxes with either height or width < threshold.
min_size = self.min_size * scale
keep = np.where((h >= min_size) & (w >= min_size))[0]
roi = roi[keep, :]
score = score[keep]
# Sort all (proposal, score) pairs by score from highest to lowest.
# Take top pre_nms_topN (e.g. 6000).
order = score.ravel().argsort()[::-1]
if n_pre_nms > 0:
order = order[:n_pre_nms]
roi = roi[order, :]
score = score[order]
# Apply nms (e.g. threshold = 0.7).
# Take after_nms_topN (e.g. 300).
# unNOTE: somthing is wrong here!
# TODO: remove cuda.to_gpu
keep = nms(
torch.from_numpy(roi).cuda(),
torch.from_numpy(score).cuda(),
self.nms_thresh)
if n_post_nms > 0:
keep = keep[:n_post_nms]
roi = roi[keep] #.cpu().numpy()
return roi
class ProposalTargetCreator(object):
"""Assign ground truth bounding boxes to given RoIs.
The :meth:`__call__` of this class generates training targets
for each object proposal.
This is used to train Faster RCNN [#]_.
.. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \
Faster R-CNN: Towards Real-Time Object Detection with \
Region Proposal Networks. NIPS 2015.
Args:
n_sample (int): The number of sampled regions.
pos_ratio (float): Fraction of regions that is labeled as a
foreground.
pos_iou_thresh (float): IoU threshold for a RoI to be considered as a
foreground.
neg_iou_thresh_hi (float): RoI is considered to be the background
if IoU is in
[:obj:`neg_iou_thresh_hi`, :obj:`neg_iou_thresh_hi`).
neg_iou_thresh_lo (float): See above.
"""
def __init__(self,
n_sample=128,
pos_ratio=0.25, pos_iou_thresh=0.5,
neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0
):
self.n_sample = n_sample
self.pos_ratio = pos_ratio
self.pos_iou_thresh = pos_iou_thresh
self.neg_iou_thresh_hi = neg_iou_thresh_hi
self.neg_iou_thresh_lo = neg_iou_thresh_lo # NOTE:default 0.1 in py-faster-rcnn
def __call__(self, roi, bbox, label,
loc_normalize_mean=(0., 0., 0., 0.),
loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):
"""Assigns ground truth to sampled proposals.
This function samples total of :obj:`self.n_sample` RoIs
from the combination of :obj:`roi` and :obj:`bbox`.
The RoIs are assigned with the ground truth class labels as well as
bounding box offsets and scales to match the ground truth bounding
boxes. As many as :obj:`pos_ratio * self.n_sample` RoIs are
sampled as foregrounds.
Offsets and scales of bounding boxes are calculated using
:func:`model.utils.bbox_tools.bbox2loc`.
Also, types of input arrays and output arrays are same.
Here are notations.
* :math:`S` is the total number of sampled RoIs, which equals \
:obj:`self.n_sample`.
* :math:`L` is number of object classes possibly including the \
background.
Args:
roi (array): Region of Interests (RoIs) from which we sample.
Its shape is :math:`(R, 4)`
bbox (array): The coordinates of ground truth bounding boxes.
Its shape is :math:`(R', 4)`.
label (array): Ground truth bounding box labels. Its shape
is :math:`(R',)`. Its range is :math:`[0, L - 1]`, where
:math:`L` is the number of foreground classes.
loc_normalize_mean (tuple of four floats): Mean values to normalize
coordinates of bouding boxes.
loc_normalize_std (tupler of four floats): Standard deviation of
the coordinates of bounding boxes.
Returns:
(array, array, array):
* **sample_roi**: Regions of interests that are sampled. \
Its shape is :math:`(S, 4)`.
* **gt_roi_loc**: Offsets and scales to match \
the sampled RoIs to the ground truth bounding boxes. \
Its shape is :math:`(S, 4)`.
* **gt_roi_label**: Labels assigned to sampled RoIs. Its shape is \
:math:`(S,)`. Its range is :math:`[0, L]`. The label with \
value 0 is the background.
"""
n_bbox, _ = bbox.shape
roi = np.concatenate((roi, bbox), axis=0)
pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
iou = bbox_iou(roi, bbox)
gt_assignment = iou.argmax(axis=1)
max_iou = iou.max(axis=1)
# Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
# The label with value 0 is the background.
gt_roi_label = label[gt_assignment] + 1
# Select foreground RoIs as those with >= pos_iou_thresh IoU.
pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]
pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
if pos_index.size > 0:
pos_index = np.random.choice(
pos_index, size=pos_roi_per_this_image, replace=False)
# Select background RoIs as those within
# [neg_iou_thresh_lo, neg_iou_thresh_hi).
neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &
(max_iou >= self.neg_iou_thresh_lo))[0]
neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
neg_roi_per_this_image = int(min(neg_roi_per_this_image,
neg_index.size))
if neg_index.size > 0:
neg_index = np.random.choice(
neg_index, size=neg_roi_per_this_image, replace=False)
# The indices that we're selecting (both positive and negative).
keep_index = np.append(pos_index, neg_index)
gt_roi_label = gt_roi_label[keep_index]
gt_roi_label[pos_roi_per_this_image:] = 0 # negative labels --> 0
sample_roi = roi[keep_index]
# Compute offsets and scales to match sampled RoIs to the GTs.
gt_roi_loc = format_loc(sample_roi, bbox[gt_assignment[keep_index]])
gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
) / np.array(loc_normalize_std, np.float32))
return sample_roi, gt_roi_loc, gt_roi_label
utils.py (faster rcnn from scratch code)
import numpy as np
"""
tools to convert specified type
"""
import torch as t
import numpy as np
def tonumpy(data):
if isinstance(data, np.ndarray):
return data
if isinstance(data, t.Tensor):
return data.detach().cpu().numpy()
def totensor(data, cuda=True):
if isinstance(data, np.ndarray):
tensor = t.from_numpy(data)
if isinstance(data, t.Tensor):
tensor = data.detach()
if cuda:
tensor = tensor.cuda()
return tensor
def scalar(data):
if isinstance(data, np.ndarray):
return data.reshape(1)[0]
if isinstance(data, t.Tensor):
return data.item()
def generate_anchors(image_size, sub_sample=16, anchor_scale=[8,16,32], ratio=[0.5,1,2]):
len_ratio = len(ratio)
anchor_base = np.zeros((len(anchor_scale)*len_ratio, 4)) # 9x4
for idx, scale in enumerate(anchor_scale):
w = scale / np.sqrt(ratio) * sub_sample
h = scale * np.sqrt(ratio) * sub_sample
x1, y1, x2, y2 = -w/2, -h/2, w/2, h/2
anchor_base[idx*len_ratio:(idx+1)*len_ratio] = np.c_[x1, y1, x2, y2]
feature_map_size = image_size[0] // sub_sample, image_size[1] // sub_sample
ctr_x = np.arange(sub_sample//2, image_size[0], sub_sample)
ctr_y = np.arange(sub_sample//2, image_size[1], sub_sample)
ctr = np.zeros((*feature_map_size, 2))
for idx, y in enumerate(ctr_y):
ctr[idx, :, 0] = ctr_x
ctr[idx, :, 1] = y
anchors = np.zeros((*feature_map_size, *anchor_base.shape))
for idx_x in range(feature_map_size[0]):
for idx_y in range(feature_map_size[1]):
anchors[idx_x, idx_y] = (ctr[idx_x, idx_y] + anchor_base.reshape(-1, 2, 2)).reshape(-1, 4)
return anchors.reshape(-1, 4)
# bbox iou 계산, (num_of_boxes1, 4) x (num_of_boxes2, 4)
# bboxes_1: anchor, bboxes_2: target box
# shape : x1 x2 y1 y2
def bbox_iou(bboxes_1, bboxes_2):
len_bboxes_1 = bboxes_1.shape[0]
len_bboxes_2 = bboxes_2.shape[0]
ious = np.zeros((len_bboxes_1, len_bboxes_2))
for idx, bbox_1 in enumerate(bboxes_1):
yy1_max = np.maximum(bbox_1[1], bboxes_2[:, 1])
xx1_max = np.maximum(bbox_1[0], bboxes_2[:, 0])
yy2_min = np.minimum(bbox_1[3], bboxes_2[:, 3])
xx2_min = np.minimum(bbox_1[2], bboxes_2[:, 2])
height = np.maximum(0.0, yy2_min - yy1_max)
width = np.maximum(0.0, xx2_min - xx1_max)
eps = np.finfo(np.float32).eps
inter = height * width
union = (bbox_1[3] - bbox_1[1]) * (bbox_1[2] - bbox_1[0]) + \
(bboxes_2[:, 3] - bboxes_2[:, 1]) * (bboxes_2[:, 2] - bboxes_2[:, 0]) - inter + eps
iou = inter / union
ious[idx] = iou
return ious # ious (num_of_boxes1, num_of_boxes2)
# (x1, y1, x2, y2) -> (x, y, w, h) -> (dx, dy, dw, dh)
'''
t_{x} = (x - x_{a})/w_{a}
t_{y} = (y - y_{a})/h_{a}
t_{w} = log(w/ w_a)
t_{h} = log(h/ h_a)
anchors are the anchors
base_anchors are the boxes
'''
def format_loc(anchors, base_anchors):
width = anchors[:, 2] - anchors[:, 0]
height = anchors[:, 3] - anchors[:, 1]
ctr_x = anchors[:, 0] + width*0.5
ctr_y = anchors[:, 1] + height*0.5
base_width = base_anchors[:, 2] - base_anchors[:, 0]
base_height = base_anchors[:, 3] - base_anchors[:, 1]
base_ctr_x = base_anchors[:, 0] + base_width*0.5
base_ctr_y = base_anchors[:, 1] + base_height*0.5
eps = np.finfo(np.float32).eps
height = np.maximum(eps, height)
width = np.maximum(eps, width)
dx = (base_ctr_x - ctr_x) / width
dy = (base_ctr_y - ctr_y) / height
dw = np.log(base_width / width)
dh = np.log(base_height / height)
anchor_loc_target = np.stack((dx, dy, dw, dh), axis=1)
return anchor_loc_target
# (dx, dy, dw, dh) -> (x, y, w, h) -> (x1, y1, x2, y2)
'''
anchors are the default anchors
formatted_base_anchors are the boxes with (dy, dx, dh, dw)
'''
def deformat_loc(anchors, formatted_base_anchor):
width = anchors[:, 2] - anchors[:, 0]
height = anchors[:, 3] - anchors[:, 1]
ctr_x = anchors[:, 0] + width*0.5
ctr_y = anchors[:, 1] + height*0.5
dx, dy, dw, dh = formatted_base_anchor.T
base_width = np.exp(dw) * width
base_height = np.exp(dh) * height
base_ctr_x = dx * width + ctr_x
base_ctr_y = dy * height + ctr_y
base_anchors = np.zeros_like(anchors)
base_anchors[:, 0] = base_ctr_x - base_width*0.5
base_anchors[:, 1] = base_ctr_y - base_height*0.5
base_anchors[:, 2] = base_ctr_x + base_width*0.5
base_anchors[:, 3] = base_ctr_y + base_height*0.5
return base_anchors
# non-maximum-suppression
def nms(rois, scores, nms_thresh):
# print(scores, scores.shape)
order = (-scores).argsort().cpu().data.numpy()#[::-1]
# x1, y1, x2, y2 = rois.T
rois = rois.cpu().data.numpy()
keep_index = []
# print(order.size)
while order.size > 0:
i = order[0]
keep_index.append(i)
ious = bbox_iou(rois[i][np.newaxis, :], rois[order[1:]])
inds = np.where(ious <= nms_thresh)[1]
order = order[inds + 1]
return np.asarray(keep_index)
며칠 코드짜고 보고 확인하면서 멘탈이 나가버렸습니다.
저는 AI하기엔 머리가 너무 analog인가봅니다...
Ref.
Faster R-CNN pytorch from scratch (KrisHan999)
Faster R-CNN tensor flow Github (ganghee-lee)
Simple Faster R-CNN Pytorch (chenyuntc)
'데이터 분석 > 딥러닝' 카테고리의 다른 글
[YoLo v1] 논문 리뷰 & 구현 (Pytorch) (13) | 2021.01.01 |
---|---|
[R-CNN] 논문 리뷰 & 구현 (Pytorch) (0) | 2020.12.23 |
[SeNet] 논문 리뷰 & 구현 (Pytorch) (3) | 2020.12.22 |