Training;Costs;Signal processing;Transformers;Real-time systems;Decoding;Synchronization;speech recognition;speech translation;streaming;joint;timestamp

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation