[audio] standardize conv output length computation

As discussed offline with @ArthurZucker , there's a lot f audio models that require knowledge off conv1d output length (eg for mask length computation). For this reason, exact formula for such is duplicated and rewritten at each integration... let's standardize!