column 'line_idx' gets 'object' d_type for empty frames.

If a PDB file has no records of a certain type (for instance no HETATM or no ATOM), then the (empty) dataframe is created with the 'line_idx' column as type 'object' (default for pandas?)
I've noticed that there's no 'line_idx' record in the 'pdb_atomdict' (engines.py).
suggested fix would either to add it to that dict, removing the need for this 'hack', (starred) in 'pandas_pdb.py', line 363: 
```python 
            df = pd.DataFrame(r[1], columns=[c['id'] for c in
                                             pdb_records[r[0]]] ** + ['line_idx']** )
```
unfortunately I have no idea if this will have a cascading effect, as I'm certain this was done on purpose.

Another quick and dirty workaround would be to add the (starred) line:
```python
            for c in pdb_records[r[0]]:
                try:
                    df[c['id']] = df[c['id']].astype(c['type'])
                except ValueError:
                # expect ValueError if float/int columns are empty strings
                    df[c['id']] = pd.Series(np.nan, index=df.index)
            **df['line_idx'] = df['line_idx'].astype(int)**
```
after the d_type correction loop right after the above code in line 363.

This is an incredibly minor issue, but has caused some unexpected glitches for me when fetching the columns with type 'object' and then converting them to string in both ATOM and HETATM frames, as one frame would have the wrong datatype and conversion would crash.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

column 'line_idx' gets 'object' d_type for empty frames. #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

column 'line_idx' gets 'object' d_type for empty frames. #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions