Skip to content

Commit f2a5d65

Browse files
Aggregate publication counts in AuthorSearch().subject_areas
1 parent 52adf0f commit f2a5d65

File tree

4 files changed

+29
-20
lines changed

4 files changed

+29
-20
lines changed

docs/reference/scopus/AuthorSearch.rst

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,8 @@ You can obtain a search summary just by printing the object:
3232
.. code-block:: python
3333
3434
>>> print(s)
35-
Search 'AUTHLAST(Selten) and AUTHFIRST(Reinhard)' yielded 2 authors as of 2024-05-11:
36-
Selten, Reinhard; AUTHOR_ID:6602907525 (74 document(s))
37-
Selten, Reinhard; AUTHOR_ID:57213632570 (1 document(s))
35+
Search 'AUTHLAST(Selten) and AUTHFIRST(Reinhard)' yielded 1 author as of 2025-06-13:
36+
Selten, Reinhard; AUTHOR_ID:6602907525 (76 document(s))
3837
3938
4039
To determine the the number of results use the `.get_results_size()` method, even before you download the results:
@@ -43,7 +42,7 @@ To determine the the number of results use the `.get_results_size()` method, eve
4342
4443
>>> other = AuthorSearch("AUTHLAST(Selten)", download=False)
4544
>>> other.get_results_size()
46-
27
45+
30
4746
4847
4948
Primarily, the class provides a list of `namedtuples <https://docs.python.org/3/library/collections.html#collections.namedtuple>`_ storing author EIDs, which you can use for the :doc:`AuthorRetrieval <AuthorRetrieval>` class, and corresponding information:
@@ -54,27 +53,23 @@ Primarily, the class provides a list of `namedtuples <https://docs.python.org/3/
5453
[Author(eid='9-s2.0-6602907525', orcid=None, surname='Selten', initials='R.',
5554
givenname='Reinhard', affiliation='Universitat Bonn', documents=74,
5655
affiliation_id='60007493', city='Bonn', country='Germany',
57-
areas='ECON (73); MATH (19); BUSI (16)')]
56+
areas='ECON (72); BUSI (8)')]
5857
5958
59+
Please note that Scopus sometimes returns duplicate `areas`, which are then aggregated (e.g. 'ECON (51); ECON (21)' → 'ECON (72)').
60+
6061
Working with namedtuples is straightforward: Using `pandas <https://pandas.pydata.org/>`_, you can quickly convert the results set into a DataFrame:
6162

6263
.. code-block:: python
6364
6465
>>> import pandas as pd
6566
>>> pd.set_option('display.max_columns', None)
6667
>>> print(pd.DataFrame(s.authors))
67-
eid orcid surname initials givenname \
68-
0 9-s2.0-6602907525 None Selten R. Reinhard
69-
1 9-s2.0-57213632570 None Selten R. Reinhard
70-
71-
affiliation documents affiliation_id city country \
72-
0 Universität Bonn 74 60007493 Bonn Germany
73-
1 Southwest Jiaotong University 1 60010421 Chengdu China
68+
eid orcid surname initials givenname affiliation \
69+
0 9-s2.0-6602907525 None Selten R. Reinhard Universität Bonn
7470
75-
areas
76-
0 ECON (73); MATH (19); BUSI (16)
77-
1 COMP (3)
71+
documents affiliation_id city country areas
72+
0 76 60007493 Bonn Germany ECON (72); BUSI (8)
7873
7974
8075
Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set `refresh=True`, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set `refresh=100`. Use `ab.get_cache_file_mdate()` to obtain the date of last modification, and `ab.get_cache_file_age()` to determine the number of days since the last modification.

pybliometrics/scopus/author_search.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from pybliometrics.superclasses import Search
55
from pybliometrics.utils import check_integrity, check_parameter_value, \
6-
check_field_consistency, listify, make_search_summary
6+
check_field_consistency, get_and_aggregate_subjects, listify, make_search_summary
77

88

99
class AuthorSearch(Search):
@@ -15,7 +15,8 @@ def authors(self) -> Optional[list[namedtuple]]:
1515
documents affiliation affiliation_id city country areas)`.
1616
1717
All entries are `str` or `None`. Areas combines abbreviated subject
18-
areas followed by the number of documents in this subject.
18+
areas followed by the number of documents in this subject. The number of
19+
documents on duplicate subject areas is summed up.
1920
2021
Raises
2122
------
@@ -35,8 +36,8 @@ def authors(self) -> Optional[list[namedtuple]]:
3536
aff = item.get('affiliation-current', {})
3637
fields = item.get('subject-area',
3738
[{'@abbrev': '', '@frequency': ''}])
38-
areas = [f"{d.get('@abbrev', '')} ({d.get('@frequency', '')})"
39-
for d in listify(fields)]
39+
subjects = get_and_aggregate_subjects(fields)
40+
areas = [f"{abbrev} ({'' if freq == 0 else freq})" for abbrev, freq in subjects.items()]
4041
new = auth(eid=item.get('eid'),
4142
orcid=item.get('orcid'),
4243
initials=name.get('initials'),

pybliometrics/scopus/tests/test_AuthorSearch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def test_authors():
1919
expected = Author(eid='9-s2.0-6602907525', orcid=None, surname='Selten',
2020
initials='R.', givenname='Reinhard', affiliation='Universität Bonn',
2121
documents=76, affiliation_id='60007493', city='Bonn',
22-
country='Germany', areas='ECON (78); MATH (21); BUSI (16)')
22+
country='Germany', areas='ECON (72); BUSI (8)')
2323
assert s1.authors[0] == expected
2424

2525

pybliometrics/utils/parse_content.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,19 @@ def deduplicate(lst):
6464
[])
6565
return new
6666

67+
def get_and_aggregate_subjects(fields):
68+
"""Get and aggregate subject areas from Scopus AuthorSearch."""
69+
frequencies = {}
70+
for field in fields:
71+
abbrev = field.get('@abbrev', '')
72+
freq_str = field.get('@frequency', '')
73+
frequency = int(freq_str) if freq_str.isdigit() else 0
74+
if abbrev in frequencies:
75+
frequencies[abbrev] += frequency
76+
else:
77+
frequencies[abbrev] = frequency
78+
return frequencies
79+
6780

6881
def get_id(s, integer=True):
6982
"""Helper function to return the Scopus ID at a fixed position."""

0 commit comments

Comments
 (0)