2412.05(a) Use of Sequentially Numbered Sequence Identifiers in the “Sequence Listing XML” [R-07.2022]

2412.05(a) Use of Sequentially Numbered Sequence Identifiers in the “Sequence Listing XML” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.

  • (a) Each disclosed nucleotide or amino acid sequence that meets the requirements of § 1.831(b) must appear separately in the “Sequence Listing XML.” Each sequence set forth in the “Sequence Listing XML” must be assigned a separate sequence identifier. The sequence identifiers must begin with 1 and increase sequentially by integers as defined in paragraph 10 of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
  • *****

In accordance with 37 CFR 1.832(a), the sequence identifiers in the “Sequence Listing XML” must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the “Sequence Listing XML” in numerical order and in the order in which they are discussed in the application.

WIPO Standard ST.26, paragraph 10, requires each “sequence” be assigned a separate sequence identifier, including a sequence which is identical to a region of a longer sequence. Such a “sequence” is one that is disclosed anywhere in an application by enumeration of its residues and can be represented as:

  • (a) an unbranched sequence or a linear region of a branched sequence containing ten or more specifically defined nucleotides, wherein adjacent nucleotides are joined by:
    • (i) a 3’ to 5’ (or 5’ to 3’) phosphodiester linkage; or
    • (ii) any chemical bond that results in an arrangement of adjacent nucleobases that mimics the arrangement of nucleobases in naturally occurring nucleic acids; or
  • (b) an unbranched sequence or a linear region of a branched sequence containing four or more specifically defined amino acids, wherein the amino acids form a single peptide backbone, i.e. adjacent amino acids are joined by peptide bonds. (WIPO Standard ST.26, paragraph 7).

Where no sequence is present for a sequence identifier, i.e. an intentionally skipped sequence, “000” must be used in place of a sequence. The total number of sequences must be indicated in the “Sequence Listing XML” and must equal the total number of sequence identifiers, whether followed by a sequence or by “000”.

For purposes of intentionally skipped sequences, such sequences must be included in the “Sequence Listing XML” and represented as follows:

  • (a) the element SequenceData and its attribute sequenceIDNumber, with the sequence identifier of the skipped sequence provided as the value;
  • (b) the elements INSDSeq _length, INSDSeq _moltype, INSDSeq _division, present but with no value provided;
  • (c) the element INSDSeq _feature-table must not be included; and
  • (d) the element INSDSeq _sequence with the string “000” as the value. (WIPO Standard ST.26, paragraph 58)