-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi,
I faced an issue with the Formatter with the new version of VariantValidator (I had no problem with the same variant with the previous version). For many variants, I had no results for Formatter for old refseq transcripts (from cbioportal data), but I found the transcript with gene2transcripts, and the variant was correctly validated with VariantValidator on the same transcript. After few tests, I realized that my transcript was correctly managed if I set specify_transcripts = "raw"
(no differences between GRCh37 and GRCh38). For exemple:
Recent transcript, works perfectly
specify_transcripts = "NM_000546.6"
variant = {"genesymbol": "TP53", "hgvsg": "NC_000017.10:g.7578449C>T", "assembly": "GRCh37"}
formatter_result = VariantFormatter.simpleVariantFormatter.format(batch_input = variant["hgvsg"], genome_build = variant["assembly"], transcript_model = "refseq", specify_transcripts = specify_transcripts)
Output:
{
"NC_000017.10:g.7578449C>T": {
"errors": [],
"flag": null,
"NC_000017.10:g.7578449C>T": {
"p_vcf": "17:7578449:C:T",
"g_hgvs": "NC_000017.10:g.7578449C>T",
"selected_build": "GRCh37",
"genomic_variant_error": null,
"hgvs_t_and_p": {
"NM_000546.6": {
"t_hgvs": "NM_000546.6:c.481G>A",
"p_hgvs_tlc": "NP_000537.3:p.(Ala161Thr)",
"p_hgvs_slc": "NP_000537.3:p.(A161T)",
"select_status": {
"refseq_select": true,
"mane_select": true
},
"gene_info": {
"symbol": "TP53",
"hgnc_id": "HGNC:11998"
},
"transcript_version_warning": null,
"gapped_alignment_warning": null,
"gap_statement": null,
"transcript_variant_error": null
}
}
}
},
"metadata": {
"variantvalidator_version": "3.0.2.dev57+gf344e4e.d20250402",
"variantvalidator_hgvs_version": "2.2.1.dev7+g59392c5",
"vvta_version": "vvta_2025_02",
"vvseqrepo_db": "VV_SR_2025_02/master",
"vvdb_version": "vvdb_2025_3",
"variantformatter_version": "3.0.1.dev13+ga2cbf30"
}
}
Old transcript, no result
specify_transcripts = "NM_001126112.2"
variant = {"genesymbol": "TP53", "hgvsg": "NC_000017.10:g.7578449C>T", "assembly": "GRCh37"}
formatter_result = VariantFormatter.simpleVariantFormatter.format(batch_input = variant["hgvsg"], genome_build = variant["assembly"], transcript_model = "refseq", specify_transcripts = specify_transcripts)
Output:
{
"NC_000017.10:g.7578449C>T": {
"errors": [],
"flag": null,
"NC_000017.10:g.7578449C>T": {
"p_vcf": "17:7578449:C:T",
"g_hgvs": "NC_000017.10:g.7578449C>T",
"selected_build": "GRCh37",
"genomic_variant_error": null,
"hgvs_t_and_p": {
"intergenic": {
"alt_genomic_loci": null
}
}
}
},
"metadata": {
"variantvalidator_version": "3.0.2.dev57+gf344e4e.d20250402",
"variantvalidator_hgvs_version": "2.2.1.dev7+g59392c5",
"vvta_version": "vvta_2025_02",
"vvseqrepo_db": "VV_SR_2025_02/master",
"vvdb_version": "vvdb_2025_3",
"variantformatter_version": "3.0.1.dev13+ga2cbf30"
}
}
specify_transcripts to "raw', my old transcript NM_001126112.2 is in result, and correctly managed
specify_transcripts = "raw"
variant = {"genesymbol": "TP53", "hgvsg": "NC_000017.10:g.7578449C>T", "assembly": "GRCh37"}
formatter_result = VariantFormatter.simpleVariantFormatter.format(batch_input = variant["hgvsg"], genome_build = variant["assembly"], transcript_model = "refseq", specify_transcripts = specify_transcripts)
Output (only a short extract):
"NM_001126116.2": {
"t_hgvs": "NM_001126116.2:c.85G>A",
"p_hgvs_tlc": "NP_001119588.1:p.(Ala29Thr)",
"p_hgvs_slc": "NP_001119588.1:p.(A29T)",
"select_status": {},
"gene_info": {
"symbol": "TP53",
"hgnc_id": "HGNC:11998"
},
"transcript_version_warning": null,
"gapped_alignment_warning": null,
"gap_statement": null,
"transcript_variant_error": null
}
I can still run with specify_transcripts to "raw" when necessary, so it's not a huge problem and there is no emergency.
Thanks!