2422 Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications Subject to WIPO ST.25 [R-07.2022]

[Editor Note: This section is not applicable to applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). See MPEP §§ 24122419 for guidance on WIPO ST.26 requirements for applications filed on or after July 1, 2022.]

37 CFR 1.821  Nucleotide and/or amino acid sequence disclosures in patent applications.

  • (a) Nucleotide and/or amino acid sequences, as used in §§ 1.821 through 1.825, are interpreted to mean an unbranched sequence of 4 or more amino acids or an unbranched sequence of 10 or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. “Specifically defined” means those amino acids other than “Xaa” and those nucleotide bases other than “n,” defined in accordance with Appendices A through F to this subpart. Nucleotides and amino acids are further defined as follows:
    • (1) Nucleotides: Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in Appendix A to this subpart. Modifications (e.g., methylated bases) may be described as set forth in Appendix B to this subpart but shall not be shown explicitly in the nucleotide sequence.
    • (2) Amino acids: Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in appendix C to this subpart. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in appendix C to this subpart, with the modified positions (e.g., hydroxylations or glycosylations) being described as set forth in appendix D to this subpart, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in appendix C to this subpart, in conjunction with a description in the Feature section, to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition.
  • Note 1 to paragraph (a): Appendices A through F to this subpart contain Tables 1– 6 of the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).
  • (b) Patent applications which contain disclosures of nucleotide and/or amino acid sequences, in accordance with the definition in paragraph (a) of this section, shall, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, conform exclusively to the requirements of §§ 1.821 through 1.825.
  • (c) Patent applications that contain disclosures of nucleotide and/or amino acid sequences, as defined in paragraph (a) of this section, must contain a “Sequence Listing,” which is a separate part of the specification containing each of those nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of §§ 1.822 and 1.823. The “Sequence Listing” must be submitted as follows, except for a national stage entry under § 1.495(b)(1), where the “Sequence Listing” has been previously communicated by the International Bureau or originally filed in the United States Patent and Trademark Office and complies with Patent Cooperation Treaty (PCT) Rule 5.2:
    • (1) As an ASCII plain text file, in compliance with § 1.824, submitted via the USPTO patent electronic filing system or on a read-only optical disc under § 1.52(e), accompanied by an incorporation by reference statement of the ASCII plain text file, in a separate paragraph of the specification, in accordance with § 1.77(b)(5);
    • (2) As a PDF file via the USPTO patent electronic filing system; or
    • (3) On physical sheets of paper.
  • (d) Where the description or claims of a patent application discuss a sequence that is set forth in the “Sequence Listing,” in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the sequence identifier (§ 1.823(a)(5)), preceded by “SEQ ID NO:” or the like, in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application. Where a sequence is presented in a drawing, reference must be made to the sequence by use of the sequence identifier (§ 1.823(a)(5)), either in the drawing or in the Brief Description of the Drawings, where the correlation between multiple sequences in the drawing and their sequence identifiers (§ 1.823(a)(5)) in the Brief Description is clear.
  • (e)
    • (1) If the “Sequence Listing” under paragraph (c) of this section is submitted in an application filed under 35 U.S.C. 111(a) as a PDF file (§ 1.821(c)(2)) via the USPTO patent electronic filing system or on physical sheets of paper (§ 1.821(c)(3)), then the following must be submitted:
      • (i) A CRF of the “Sequence Listing,” in accordance with the requirements of § 1.824; and
      • (ii) A statement that the sequence information contained in the CRF submitted under paragraph (e)(1)(i) of this section is identical to the sequence information contained in the “Sequence Listing” under paragraph (c) of this section.
    • (2) If the “Sequence Listing” under paragraph (c) of this section in an application submitted under 35 U.S.C. 371 is a PDF file (paragraph (c)(2) of this section) or on physical sheets of paper (paragraph (c)(3) of this section), and not also as an ASCII plain text file, in compliance with § 1.824 (paragraph (c)(1) of this section), then the following must be submitted:
      • (i) A CRF of the “Sequence Listing,” in accordance with the requirements of § 1.824; and
      • (ii) A statement that the sequence information contained in the CRF submitted under paragraph (e)(2)(i) of this section is identical to the sequence information contained in the “Sequence Listing” under paragraph (c)(2) or (3) of this section.
    • (3) If a “Sequence Listing” in ASCII plain text format, in compliance with § 1.824, has not been submitted for an international application under the PCT, and that application contains disclosures of nucleotide and/or amino acid sequences, as defined in paragraph (a) of this section, and is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, then the following must be submitted:
      • (i) A CRF of the “Sequence Listing,” in accordance with the requirements of § 1.824;
      • (ii) The late furnishing fee for providing a “Sequence Listing” in response to an invitation, as set forth in § 1.445(a)(5); and
      • (iii) A statement that the sequence information contained in the CRF, submitted under paragraph (e)(3)(i) of this section, does not go beyond the disclosure in the international application as filed, or a statement that the information recorded in the ASCII plain text file, submitted under paragraph (e)(3)(i) of this section, is identical to the sequence listing contained in the international application as filed, as applicable.
    • (4) The CRF may not be retained as a part of the patent application file.
  • (f) [reserved]
  • (g) If any of the requirements of paragraphs (b) through (e) of this section are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage under 35 U.S.C. 371, the applicant will be notified and given a period of time within which to comply with such requirements in order to prevent abandonment of the application. Any amendment to add or replace a “Sequence Listing” and CRF copy thereof in reply to a requirement under this paragraph must be submitted in accordance with the requirements of § 1.825.
  • (h) If any of the requirements of paragraph (e)(3) of this section are not satisfied at the time of filing an international application under the PCT, and the application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, the applicant may be sent a notice necessitating compliance with the requirements within a prescribed time period. Where a “Sequence Listing” under PCT Rule 13ter is provided in reply to a requirement under this paragraph, it must be accompanied by a statement that the information recorded in the ASCII plain text file under paragraph (e)(3)(i) of this section is identical to the sequence listing contained in the international application as filed, or does not go beyond the disclosure in the international application as filed, as applicable. It must also be accompanied by the late furnishing fee, as set forth in § 1.445(a)(5). If the applicant fails to timely provide the required CRF, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed without the CRF, and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed without the CRF.

I. APPENDICES A-F REFERENCED IN 37 CFR 1.821 AND 1.822

37 CFR 1.821 and 37 CFR 1.822 reference Appendices A-F, which contain Tables 1–6 of the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009) (hereinafter WIPO Standard ST.25 (2009)). Appendices A-F are reproduced below. The current version of WIPO Standard ST.25 is available online at www.wipo.int /export/sites/ www/standards/en/pdf/03-25-01.pdf.

Appendix A provides that the bases of a nucleotide sequence should be represented using the following one-letter symbol for nucleotide sequence characters:

Appendix A to Subpart G of Part 1 – List of Nucleotides
Symbol Meaning Origin of designation
a a adenine.
g g guanine.
c c cytosine.
t t thymine.
u u uracil.
r g or a purine.
y t/u or c pyrimidine.
m a or c amino.
k g or t/u keto.
s g or c strong interactions 3H-bonds.
w a or t/u weak interactions 2H-bonds.
b g or c or t/u not a.
d a or g or t/u not c.
h a or c or t/u not g.
v a or g or c not t, not u.
n a or g or c or t/u, unknown, or other any.

Appendix B provides that modified bases may be represented as the corresponding unmodified bases in the sequence itself, if the modification is further described in numeric identifier <223> of the Feature section of the “Sequence Listing”. The symbols from the list below may be used in the description (i.e., the specification and drawing, or in the Feature section of the “Sequence Listing”) but these symbols may not be used in the sequence itself. Modifications not listed in Appendix B may also be represented as the corresponding unmodified base in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the “Sequence Listing”.

Appendix B to Subpart G of Part 1 – List of Modified Nucleotides
Symbol Meaning
ac4c 4-acetylcytidine.
chm5u 5-(carboxyhydroxymethyl)uridine.
cm 2′-O-methylcytidine.
cmnm5s2u 5-carboxymethylaminomethyl-2- thiouridine.
cmnm5u 5-carboxymethylaminomethyluridine.
d dihydrouridine.
fm 2′-O-methylpseudouridine.
gal q beta, D-galactosylqueuosine.
gm 2′-O-methylguanosine.
i inosine.
i6a N6-isopentenyladenosine.
m1a 1-methyladenosine.
m1f 1-methylpseudouridine.
m1g 1-methylguanosine.
m1i 1-methylinosine.
m22g 2,2-dimethylguanosine.
m2a 2-methyladenosine.
m2g 2-methylguanosine.
m3c 3-methylcytidine.
m5c 5-methylcytidine.
m6a N6-methyladenosine.
m7g 7-methylguanosine.
mam5u 5-methylaminomethyluridine.
mam5s2u 5-methoxyaminomethyl-2-thiouridine.
man q beta, D-mannosylqueuosine.
mcm5s2u 5-methoxycarbonylmethyl-2- thiouridine.
mcm5u 5-methoxycarbonylmethyluridine.
mo5u 5-methoxyuridine.
ms2i6a 2-methylthio-N6- isopentenyladenosine.
ms2t6a N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6- yl)carbamoyl)threonine.
mt6a N-((9-beta-D-ribofuranosylpurine-6- yl)N-methylcarbamoyl)threonine.
mv uridine-5-oxyacetic acid-methylester.
o5u uridine-5-oxyacetic acid.
osyw wybutoxosine.
p pseudouridine.
q queuosine.
s2c 2-thiocytidine.
s2t 5-methyl-2-thiouridine.
s2u 2-thiouridine.
s4u 4-thiouridine.
t 5-methyluridine.
t6a N-((9-beta-D-ribofuranosylpurine-6- yl)-carbamoyl)threonine.
tm 2′-O-methyl-5-methyluridine.
um 2′-O-methyluridine.
yw wybutosine.
x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u.

Appendix C provides that the amino acids should be represented using the following three-letter symbols with the first letter as a capital.

Appendix C to Subpart G of Part 1 – List of Amino Acids
Symbol Meaning
Ala Alanine.
Cys Cysteine.
Asp Aspartic Acid.
Glu Glutamic Acid.
Phe Phenylalanine.
Gly Glycine.
His Histidine.
Ile Isoleucine.
Lys Lysine.
Leu Leucine.
Met Methionine.
Asn Asparagine.
Pro Proline.
Gln Glutamine.
Arg Arginine.
Ser Serine.
Thr Threonine.
Val Valine.
Trp Tryptophan.
Tyr Tyrosine.
Asx Asp or Asn.
Glx Glu or Gln.
Xaa unknown or other.

Appendix D provides that modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modification is further described in numeric identifier <223> of the Feature section of the “Sequence Listing”. The symbols from the list below may be used in the description (i.e., the specification and drawings, or in the Feature section of the “Sequence Listing”) but these symbols may not be used in the sequence itself. Modifications not listed in Appendix D may also be represented as the corresponding unmodified amino acid in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the “Sequence Listing”.

Appendix D to Subpart G of Part 1 – List of Modified and Unusual Amino Acids
Symbol Meaning
Aad 2-Aminoadipic acid.
bAad 3-Aminoadipic acid.
bAla beta-Alanine, beta-Aminopropionic acid.
Abu 2-Aminobutyric acid.
4Abu 4-Aminobutyric acid, piperidinic acid.
Acp 6-Aminocaproic acid.
Ahe 2-Aminoheptanoic acid.
Aib 2-Aminoisobutyric acid.
bAib 3-Aminoisobutyric acid.
Apm 2-Aminopimelic acid.
Dbu 2,4 Diaminobutyric acid.
Des Desmosine.
Dpm 2,2′-Diaminopimelic acid.
Dpr 2,3-Diaminopropionic acid.
EtGly N-Ethylglycine.
EtAsn N-Ethylasparagine.
Hyl Hydroxylysine.
aHyl allo-Hydroxylysine.
3Hyp 3-Hydroxyproline.
4Hyp 4-Hydroxyproline.
Ide Isodesmosine.
aIle allo-Isoleucine.
MeGly N-Methylglycine, sarcosine.
MeIle N-Methylisoleucine.
MeLys 6-N-Methyllysine.
MeVal N-Methylvaline.
Nva Norvaline.
Nle Norleucine.
Orn Ornithine.

Appendix E provides for feature keys related to nucleotide sequences.

Appendix E to Subpart G of Part 1 – List of Feature Keys Related to Nucleotide Sequences
Key Description
allele a related individual or strain contains stable, alternative forms of the same gene, which differs from the presented sequence at this location (and perhaps others).
attenuator (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription.
C_region constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain.
CAAT_signal CAAT box; part of a conserved sequence located about 75 bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT.
CDS coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation.
conflict independent determinations of the “same” sequence differ at this site or region.
D-loop displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein.
D-segment diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain.
enhancer a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter.
exon region of genome that codes for portion of spliced mRNA; may contain 5’UTR, all CDSs, and 3’UTR.
GC_signal GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG.
gene region of biological interest identified as a gene and for which a name has been assigned.
iDNA intervening DNA; DNA which is eliminated through any of several kinds of recombination.
intron a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it.
J_segment joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
LTR long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses.
mat_peptide mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post- translational modification; the location does not include the stop codon (unlike the corresponding CDS).
misc_binding site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind).
misc_difference feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base).
misc_feature region of biological interest which cannot be described by any other feature key; a new or rare feature.
misc_recomb site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral).
misc_RNA any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5’clip, 3’clip, 5’UTR, 3’UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA).
misc_signal any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, –35_signal, –10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin).
misc_structure any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop).
modified_base the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value).
mRNA messenger RNA; includes 5′ untranslated region (5’UTR), coding sequences (CDS, exon) and 3′ untranslated region (3’UTR).
mutation a related strain has an abrupt, inheritable change in the sequence at this location.
N_region extra nucleotides inserted between rearranged immunoglobulin segments.
old_sequence the presented sequence revises a previous version of the sequence at this location.
polyA_signal recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA.
polyA_site site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation.
precursor_RNA any RNA species that is not yet the mature RNA product; may include 5′ clipped region (5’clip), 5′ untranslated region (5’UTR), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3’UTR), and 3′ clipped region (3’clip).
prim_transcript primary (initial, unprocessed) transcript; includes 5′ clipped region (5’clip), 5′ untranslated region (5’UTR), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3’UTR), and 3′ clipped region (3’clip).
primer_bind non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements.
promoter region on a DNA molecule involved in RNA polymerase binding to initiate transcription.
protein_bind non-covalent protein binding site on nucleic acid.
RBS ribosome binding site.
repeat_region region of genome containing repeating units.
repeat_unit single repeat element.
rep_origin origin of replication; starting site for duplication of nucleic acid to give two identical copies.
rRNA mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins.
S_region switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell.
satellite many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA.
scRNA small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote.
sig_peptide signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence.
snRNA small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions.
source identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissible.
stem_loop hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA
STS Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs.
TATA_signal TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T).
terminator sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein.
transit_peptide transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle.
tRNA mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence.
unsure author is unsure of exact sequence in this region.
V_region variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments.
V_segment variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide.
variation a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others).
3’clip 3′-most region of a precursor transcript that is clipped off during processing.
3’UTR region at the 3′ end of a mature transcript (following the stop codon) that is not translated into a protein.
5’clip 5′-most region of a precursor transcript that is clipped off during processing.
5’UTR region at the 5′ end of a mature transcript (preceding the initiation codon) that is not translated into a protein.
–10_signal pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT.
–35_signal a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ].

Appendix F provides for feature keys related to protein sequences.

Appendix F to Subpart G of Part 1-List of Feature Keys Related to Protein Sequences
Key Description
CONFLICT different papers report differing sequences.
VARIANT authors report that sequence variants exist.
VARSPLIC description of sequence variants produced by alternative splicing.
MUTAGEN site which has been experimentally altered.
MOD_RES post-translational modification of a residue.
ACETYLATION N-terminal or other.
AMIDATION generally at the C-terminal of a mature active peptide.
BLOCKED undetermined N- or C-terminal blocking group.
FORMYLATION of the N-terminal methionine.
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION of asparagine, aspartic acid, proline, or lysine.
METHYLATION generally of lysine or arginine.
PHOSPHORYLATION of serine, threonine, tyrosine, aspartic acid or histidine.
PYRROLIDONE CARBOXYLIC ACID N-terminal glutamate which has formed an internal cyclic lactam.
SULFATATION generally of tyrosine.
LIPID covalent binding of a lipidic moiety.
MYRISTATE myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue.
PALMITATE palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue.
FARNESYL farnesyl group attached through a thioether bond to a cysteine residue.
GERANYL-GERANYL geranyl-geranyl group attached through a thioether bond to a cysteine residue.
GPI-ANCHOR glycosyl-phosphatidylinositol (GPI) group linked to the alpha- carboxyl group of the C-terminal residue of the mature form of a protein.
N-ACYL DIGLYCERIDE N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide- linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages.
DISULFID disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the `FROM’ and `TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link.
THIOLEST thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond.
THIOETH thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond.
CARBOHYD glycosylation site; the nature of the carbohydrate (if known) is given in the description field.
METAL binding site for a metal ion; the description field indicates the nature of the metal.
BINDING binding site for any chemical group (co- enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field.
SIGNAL extent of a signal sequence (prepeptide).
TRANSIT extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody).
PROPEP extent of a propeptide.
CHAIN extent of a polypeptide chain in the mature protein.
PEPTIDE extent of a released active peptide.
DOMAIN extent of a domain of interest on the sequence; the nature of that domain is given in the description field.
CA_BIND extent of a calcium-binding region.
DNA_BIND extent of a DNA-binding region.
NP_BIND extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field.
TRANSMEM extent of a transmembrane region.
ZN_FING extent of a zinc finger region.
SIMILAR extent of a similarity with another protein sequence; precise information, relative to that sequence, is given in the description field.
REPEAT extent of an internal sequence repetition.
HELIX secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi- helix.
STRAND secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge.
TURN secondary structure Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn).
ACT_SITE amino acid(s) involved in the activity of an enzyme.
SITE any other interesting site on the sequence.
INIT_MET the sequence is known to start with an initiator methionine.
NON_TER the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N- terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key.
NON_CONS non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them.
UNSURE uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment.

II. INTERNATIONAL AND FOREIGN APPLICATIONS

The requirements of 37 CFR 1.821 through 37 CFR 1.825 are the result of an effort to harmonize the USPTO requirements with international sequence listing requirements to the extent possible. The requirements of 37 CFR 1.821 through 37 CFR 1.825 substantially correspond to the requirements of WIPO Standard ST.25 (2009). However, the requirements of 37 CFR 1.821 through 37 CFR 1.825 are less stringent than the requirements of WIPO Standard ST.25 (2009). Thus, applicants who have filed or wish to file international applications or applications in countries that adhere to WIPO Standard ST.25 (2009) should be aware of the following requirements:

  • (A) The data in numeric identifier <221> must use selections from Tables 5 and 6 of WIPO Standard ST.25 (2009) to comply with that standard. The terms from these Tables are considered language neutral vocabulary;
  • (B) WIPO Standard ST.25 (2009), paragraph 24, requires a blank line between numeric identifiers in the sequence listing when the digit in the first or second position of the numeric identifier changes;
  • (C) Where the sequence listing forming part of the description of the international application contains free text, e.g., free text in numeric identifier <223>, any such free text shall be repeated in the main part of the description in the language thereof (PCT Rule 5.2(b)). It is recommended that the free text in the language of the main part of the description be put in a specific section of the description called “Sequence Listing Free Text”;
  • (D) A sequence listing filed after the international filing date is generally not considered to be part of the disclosure and usually will not be published as part of the international application publication (see PCT Article 34 and PCT Rules 26 and 91 for exceptions); and
  • (E) Paragraph 4(v) of WIPO Standard ST.25 (2009) requires an accompanying statement with the specific wording “the information recorded in electronic form furnished under PCT Rule 13ter is identical to the sequence listing as contained in the international application”.

With further regard to requirements (A) and (B), is noted that PatentIn Version 3.5.1 software (see MPEP § 2430) generates sequence listings that meet all of the requirements of WIPO Standard ST.25 (2009). Applicants should similarly be aware that filing requirements for sequence listings may differ between a national US application, a foreign application and an international application during international phase. For example, where an international application is filed in paper, the sequence listing part of the international application must similarly be provided in paper. In addition, a copy of the sequence listing in ASCII plain text, to be used for the purpose of the international search (PCT Rule 13ter) must be filed on read-only optical disc or via the USPTO electronic filing system. Furthermore, in contrast to US national applications, a sequence listing filed with RO/US in ASCII plain text that is 300 MB or more in size is not subject to a size fee during the international phase of an international application.