Skip to content

column 'line_idx' gets 'object' d_type for empty frames. #64

@ZivBA

Description

@ZivBA

If a PDB file has no records of a certain type (for instance no HETATM or no ATOM), then the (empty) dataframe is created with the 'line_idx' column as type 'object' (default for pandas?)
I've noticed that there's no 'line_idx' record in the 'pdb_atomdict' (engines.py).
suggested fix would either to add it to that dict, removing the need for this 'hack', (starred) in 'pandas_pdb.py', line 363:

            df = pd.DataFrame(r[1], columns=[c['id'] for c in
                                             pdb_records[r[0]]] ** + ['line_idx']** )

unfortunately I have no idea if this will have a cascading effect, as I'm certain this was done on purpose.

Another quick and dirty workaround would be to add the (starred) line:

            for c in pdb_records[r[0]]:
                try:
                    df[c['id']] = df[c['id']].astype(c['type'])
                except ValueError:
                # expect ValueError if float/int columns are empty strings
                    df[c['id']] = pd.Series(np.nan, index=df.index)
            **df['line_idx'] = df['line_idx'].astype(int)**

after the d_type correction loop right after the above code in line 363.

This is an incredibly minor issue, but has caused some unexpected glitches for me when fetching the columns with type 'object' and then converting them to string in both ATOM and HETATM frames, as one frame would have the wrong datatype and conversion would crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions