2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 [R-07.2022]

2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

  • (a) The “Sequence Listing XML” as required by § 1.831(a) must be presented as a single file in XML 1.0 encoded using Unicode UTF–8, where the character set complies with paragraphs 40 and 41 and Annex IV of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
  • *****

According to WIPO Standard ST.26, the entire “Sequence Listing XML” must be contained within one file. The file must be encoded using Unicode UTF-8, with the following restrictions:

  • (1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
  • (2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E – see Annex IV). The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40).
  • WIPO Standard ST.26 specifies that in an XML instance of a “Sequence Listing XML”, numeric character references must not be used and the following reserved characters must be replaced by the corresponding predefined entities when used in a value of an attribute or content of an element:

List of Reserved Characters and Predefined Entities
Reserved Character Predefined Entities
< &lt;
> &gt;
& &amp;
&quot;
&apos;

Reproduced from WIPO Standard ST.26, paragraph 41.

The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).