Skip to content

Was this page helpful?

Subtitle loader

Register name used to load filter: txtin
This filter may be automatically loaded during graph resolution.

This filter reads subtitle data from input PID to produce subtitle frames on a single PID.
The filter supports the following formats:

  • SRT: https://en.wikipedia.org/wiki/SubRip
  • WebVTT: https://www.w3.org/TR/webvtt1/
  • TTXT: https://wiki.gpac.io/xmlformats/TTXT-Format-Documentation
  • QT 3GPP Text XML (TexML): Apple QT6, likely deprecated
  • TTML: https://www.w3.org/TR/ttml2/
  • SUB: one subtitle per line formatted as {start_frame}{end_frame}text
  • SSA (Substation Alpha): basic parsing support for common files

Input files must be in UTF-8 or UTF-16 format, with or without BOM. The internal frame format is:

  • WebVTT (and srt if desired): ISO/IEC 14496-30 VTT cues
  • TTML: ISO/IEC 14496-30 XML subtitles
  • stxt and sbtt: ISO/IEC 14496-30 text stream and text subtitles
  • Others: 3GPP/QT Timed Text

TTML Support

If ttml_split option is set, the TTML document is split in independent time segments by inspecting all overlapping subtitles in the body.
Empty periods in TTML will result in empty TTML documents or will be skipped if no_empty option is set.

The first sample has a CTS assigned as indicated by ttml_cts:

  • a numerator of -2 indicates the first CTS is 0
  • a numerator of -1 indicates the first CTS is the first active time in document
  • a numerator >= 0 indicates the CTS to use for first sample

When TTML splitting is disabled, the duration of the TTML sample is given by ttml_dur if not 0, or set to the document duration

By default, media resources are kept as declared in TTML2 documents.

ttml_embed can be used to embed inside the TTML sample the resources in <head> or <body>:

  • for <source>, <image>, <audio>, <font>, local URIs indicated in src will be loaded and src rewritten.
  • for <data> with base64 coding, the data will be decoded, <data> element removed and parent <source> rewritten with src attribute inserted.

The embedded data is added as a subsample to the TTML frame, and the referring elements will use src=urn:mpeg:14496-30:N with N the index of the subsample.

A subtitle zero may be specified using ttml_zero. This will remove all subtitles before the given time T0, and rewrite each subtitle begin/end T to T-T0 using millisecond accuracy.

Warning: Original time formatting (tick, frames/subframe ...) will be lost when this option is used, converted to HH:MM:SS.ms.

The subtitle zero time must be prefixed with T when the option is not set as a global argument:
Example

gpac -i test.ttml:ttml_zero=T10:00:00 [...]  
MP4Box -add test.ttml:sopt:ttml_zero=T10:00:00 [...]  
gpac -i test.ttml --ttml_zero=10:00:00 [...]  
gpac -i test.ttml --ttml_zero=T10:00:00 [...]  
MP4Box -add test.ttml --ttml_zero=10:00:00 [...]

Simple Text Support

The text loader can convert input files in simple text streams of a single packet, by forcing the codec type on the input:
Example

gpac -i test.txt:#CodecID=stxt  [...]  
gpac fin:pck="Text Data":#CodecID=stxt  [...]

The content of the source file will be the payload of the text sample. The stxtmod option allows specifying WebVTT, TX3G or simple text mode for output format.
In this mode, the stxtdur option is used to control the duration of the generated subtitle:

  • a positive value always forces the duration
  • a negative value forces the duration if input packet duration is not known

Notes

When reframing simple text streams from demuxers (e.g. subtitles from MKV), the output format of these streams can be selected using stxtmod.

When importing SRT, SUB or SSA files, the output format of the PID can be selected using stxtmod.

Options

nodefbox (bool, default: false): skip default text box

noflush (bool, default: false): skip final sample flush for srt

fontname (str): default font

fontsize (uint, default: 18): default font size

lang (str): default language

width (uint, default: 0): default width of text area

height (uint, default: 0): default height of text area

txtx (uint, default: 0): default horizontal offset of text area: -1 (left), 0 (center) or 1 (right)

txty (uint, default: 0): default vertical offset of text area: -1 (bottom), 0 (center) or 1 (top)

zorder (sint, default: 0): default z-order of the PID

timescale (uint, default: 1000): default timescale of the PID

ttml_split (bool, default: true): split ttml doc in non-overlapping samples

ttml_cts (lfrac, default: -1/1): first sample cts - see filter help

ttml_dur (frac, default: 0/1): sample duration when not spliting split - see filter help

ttml_embed (bool, default: false): force embedding TTML resources

ttml_zero (str): set subtitle zero time for TTML

no_empty (bool, default: false): do not send empty samples

stxtdur (frac, default: 1): duration for simple text

stxtmod (enum, default: tx3g): text stream mode for simple text streams and SRT inputs

  • stxt: output PID formatted as simple text stream
  • sbtt: output PID formatted as subtitle text stream
  • tx3g: output PID formatted as TX3G/Apple stream
  • vtt: output PID formatted as WebVTT stream
  • webvtt: same as vtt (for backward compatiblity

Was this page helpful?