Hey there! This is how you can develop a deep learning computer vision system quickly in about two days. 1) Come up with a use case * Use AI to monitor detect when cars cross into bike lane - To be seen in broader context. Real use case is sth like: + monitor a large set of crossings, roads, etc + identify most dangerous spots and violations etc + act, impose fines, .. 2) Get data * google "Watch CCTV camera feed" - 2nd hit is http://www.insecam.org/ * Browse a view cameras, find this one: http://www.insecam.org/en/view/891271/ * capture images from CCTV - google "stack overflow capture camera feed" - 2nd hit: https://stackoverflow.com/questions/55957298/how-to-read-live-video-feed-or-video-on-demand-feed-in-python - write: capture-data.py 3) label data * use CVAT https://cvat.org 4) Train algo * follow Pytorch tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#finetuning-from-a-pretrained-model - had shown and posted code on barcode bounding box detection - here exchange faster-rcnn (train-faster-rcnn.py) with mask-rcnn.py - train-mask-rcnn.py * activate your pytorch environemtn (see https://pytorch.org/ for installation) env-pytorch\Scripts\activate * train with batch size fitting memory of your GPU, adjust learning rate like 1/N_GPU, 1/batch_size 0.02 / 16 = 0.00125 python train-mask-rcnn.py -b 1 --lr 0.00125 5) Be happy