Hello, my name is George Bouris.
I have a lot of experience in image and video processing.
I will build this application in C# and it will use ffmpeg, behind the scene. It will function exactly as you describe it.
Since it's about CTA, though, I think that you should consider to overlaying the CTA message over the target video, with suitable transparency. In this case the CTA can be a video or even a single image (logo, message, button etc), that will be placed on one of the 4 corners or the center (a custom location is also possible) and appears at user set time.
We can further discuss this solution, if you are interested. The cost will be higher, though.