Can you provide a sample video to understand your use case?
As far recruitment goes, Ai can help detect emotions, Rate of speech can be detected too.
real world issues would be : Distinguishing the voice of interviewer from the candidate, apart from that there could be more than 1 person in panel asking question, which needs a careful deliberate thinking and use of Algo/logic.
That said, we are game for this, but we need to discuss your budget + timeline.
Suggestions from our side: Your microphone and internet issues these days creates a lag in video and buffering etc, if you are using some online platform to fetch these videos, all these aspects need to be finalised. Let us know, if this makes sense !