Skip to content

Error parsing XML files #647

@shaniAmare

Description

@shaniAmare

Hi to the developers,

I don't understand why only for the clinical data processing, I get the following error:

read_xml.character(xmlfile) : 
  Start tag expected, '<' not found [4]

I have done the following;

query_clin <- GDCquery(project = "TCGA-BRCA",
                       data.type = "Clinical Supplement", 
                       data.category = "Clinical",
                       data.format = "XML")
GDCdownload(query_clin, files.per.chunk = 20)
clinical_data <- GDCprepare_clinic(query_clin, clinical.info = "patient")

and it hits with that error above about 31% in.

I manually checked if there is a XML ill-formatting in the dowloaded files, but cannot seem to find any.

To check I tried

cd ~/BC_CosMx/GDCdata/TCGA-BRCA/Clinical/Clinical_Supplement
find . -type f -name "*.xml" | while read -r file; do   first_char=$(head -c 1 "$file");   if [ "$first_char" != "<" ]; then
    echo "<$(cat "$file")" > "$file";   fi; done

None of them returned as modified. SO the XML files are fine that means.

I have been doing all this on the latest R Studio and R studio terminal (checking XML format) supplied to me through a HPC cluster.

Please help me figure out what am I doing wrong. If it is soon, that would be great! I have a conference in a few days which I like to present some results of this in.

Many thanks,
Shani.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions