Skip to content

Unreliable reduced formula order #4475

@Lattay

Description

@Lattay

Python version

3.13.5

Pymatgen version

2025.5.2

Operating system version

Fedora 25

Current behavior

Because of the choice of NaN for undefined electronegativities, the reduced_formula property of Composition is unreliable, further breaking systems like get_protostructure_label from matbench-discovery (https://github.com/janosh/matbench-discovery/blob/75f0ca207d9b60b64b65790b318e6680a9ac74c5/matbench_discovery/structure/prototype.py#L187-L188).

This is the consequence of NaN causing all comparision tests to be false, thus making the sorting order dependant.

Expected Behavior

The reduced formula is relied uppon by many hight throughput packages, thus it needs to be consistent.

Minimal example

Here is demonstration:

>>> import pymatgen.core
>>> pymatgen.core.Composition({"Zn":3,"Ar":1,"O":3}).reduced_formula
... warning about unphysical electronegativity
'Zn3ArO3'
>>> pymatgen.core.Composition({"Ar":1,"Zn":3,"O":3}).reduced_formula
... warning here
'ArZn3O3'

A simple solution would be to choose the equally unphysical yet consistently ordered float("+inf") value for "undefined" electronegativity.

Relevant files to reproduce this bug

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions