Some closed-circuit television (CCTV) systems do not have microphones. As a result, sound intensity information is not available in such systems. We present a method to generate traffic noise level estimates using solely video frames as input data. To that end, we trained a fully connected layer on top of VGG16 (pretrained with imagenet) using a dataset that was automatically generated by a single camera with a mono microphone pointing at a busy traffic crossroad with cars, trucks, and motorbikes. For neural network training from that dataset, color images are used as neural network inputs, and true average noise levels are used as neural network targets. The trained neural network successfully tracked trending noise levels with correlation 0.597 despite their blindness to the data temporal properties. These results suggest that average noise level targets are sufficient for convolutional neural networks to detect noise generating sources within a traffic scene.