-
Notifications
You must be signed in to change notification settings - Fork 118
Description
If a PDB file has no records of a certain type (for instance no HETATM or no ATOM), then the (empty) dataframe is created with the 'line_idx' column as type 'object' (default for pandas?)
I've noticed that there's no 'line_idx' record in the 'pdb_atomdict' (engines.py).
suggested fix would either to add it to that dict, removing the need for this 'hack', (starred) in 'pandas_pdb.py', line 363:
df = pd.DataFrame(r[1], columns=[c['id'] for c in
pdb_records[r[0]]] ** + ['line_idx']** )
unfortunately I have no idea if this will have a cascading effect, as I'm certain this was done on purpose.
Another quick and dirty workaround would be to add the (starred) line:
for c in pdb_records[r[0]]:
try:
df[c['id']] = df[c['id']].astype(c['type'])
except ValueError:
# expect ValueError if float/int columns are empty strings
df[c['id']] = pd.Series(np.nan, index=df.index)
**df['line_idx'] = df['line_idx'].astype(int)**
after the d_type correction loop right after the above code in line 363.
This is an incredibly minor issue, but has caused some unexpected glitches for me when fetching the columns with type 'object' and then converting them to string in both ATOM and HETATM frames, as one frame would have the wrong datatype and conversion would crash.