Skip to content
Steve edited this page Sep 22, 2017 · 23 revisions

Page data

Page (en) data { cache, categories, claims, contributors, description, endpoint, exhtml, exrest, extext, extract, files, html, image, infobox, label, lang, languages, lead, length, links, modified, pageid, parsetree, props, random, title, url, url_raw, views, watchers, what, wikibase, wikidata, wikidata_url, wikitext }

page.data['cache']

<dict> HTTP request, response, info cache

Each API query action is cached. The cache prevents duplicate API requests and captures valuable debugging information.

Example:

>>> gandhi.info('query')
{'bytes': 4988.0,
 'content': 'application/json; charset=utf-8',
 'kB/s': '15.0',
 'seconds': '0.333',
 'status': 200,
 'url': 'https://en.wikipedia.org/w/api.php?action=query&exintro&inprop=displaytitle|url&list=random&pithumbsize=240&ppprop=wikibase_item&prop=extracts|images|info|pageimages|pageprops|pageterms&redirects&rnlimit=1&rnnamespace=0&titles=Gandhi',
 'user-agent': 'wptools/0.2.3 (https://github.com/siznax/wptools) PycURL/7.43.0 libcurl/7.54.0 SecureTransport zlib/1.2.8'}

Methods: get_claims(), get_imageinfo(), get_parse(), get_query(), get_restbase(), get_wikidata()

page.data['categories']

<list> all categories page belongs to

This is a list of all categories that appear on the page and which the page belongs to. We get this from the Mediawiki API:Categories module.

Example:

gandhi.categories: <list(66)>

Methods: get_query()

page.data['claims']

<dict> Q entity claims that resolve to wikidata labels

Claims are Q entities that have labels in different languages. For example Q1 is "universe" in English. Claims are collected when fetching wikidata. They are later resolved into labels for the specified language to populate the wikidata page attribute.

Example:

>>> gandhi.claims
{u'Q129286': 'citizenship',
 u'Q5': 'instance',
 u'Q6512732': 'category',
 u'Q668': 'citizenship'}

Methods: get_claims()

page.data['contributors']

<int> total number of contributors to this page

This is the total of logged-in AND anonymous contributors from Mediawiki API:Contributors. Fascinating!

Example:

gandhi.contributors: 2118

Methods: get_query()

page.data['description']

<str> short description

This is a short description of the page from Mediawiki or Wikidata. When it is available it is often very enlightening.

Example:

ghandi.description: pre-eminent leader of Indian nationalism during British-ruled India

Methods: get_query(), get_wikidata()

page.data['endpoint']

<str> RESTBase entry point requested

This attribute contains the endpoint formed by wptools to make a request to a RESTBase /page/ entry point.

Example:

gandhi.endpoint: /page/summary/Mahatma_Gandhi

Methods: get_restbase()

page.data['exhtml']

<str> RESTBase page extract in HTML

This is the RESTBase "extract_html" (summary) in limited HTML. It does not contain interwiki links, citations or infoboxes. It is basically a trunctated version of page.data['extract'].

Example:

gandhi.exhtml: <str(1036)> <p><b>Mohandas Karamchand Gandhi</b> (<span></...

Methods: get_restbase('/page/summary')

page.data['exrest']

<str> RESTBase page extract in plain text

This is the RESTBase "extract" (summary) in plain text. It is basically a truncated version of page.data['extext'].

Example:

gandhi.exrest: <str(889)> Mohandas Karamchand Gandhi (; Hindustani: [ˈmoː...

Methods: get_restbase('/page/summary')

page.data['extext']

<str> page extract in plain text

This is the lead section, or summary, of the page in plain text. It does not include infoboxes, and some data is removed by the API.

Example:

gandhi.extext: <str(2987)> **Mohandas Karamchand Gandhi** (; Hindustani: ...

Methods: get_query()

page.data['extract']

<str> page extract in limited HTML

This is the lead section, or summary, of the page in limited HTML. It is simple markup only; no wikilinks, citations, infoboxes, etc.

Example:

gandhi.extract: <str(3192)> <p><b>Mohandas Karamchand Gandhi</b> (<span><...

Methods: get_query()

page.data['files']

<list> list of files embedded in this page

This is the list of embedded (image, audio, video) files included on this page from Mediawiki API:Images. Awesome!

Example:

gandhi.files: <list(53)>
>>> [x for x in p.files if not x.endswith('jpg') and not x.endswith('svg')]
[u'File:Mohandas Karamchand Gandhi pronunciation 3.oga',
 u'File:Salt March.ogg',
 u'File:Salt March.ogv',
 u'File:Socrates.png',
 u'File:Young India.png']

Methods: get_query()

page.data['html']

<str> page content in full HTML

This is the most performant way to get page HTML outside of running your own Mediawiki instance. It is verbatim what Mediawiki is serving for that page.

Example:

> gandhi.html: <str(1394338)> <!DOCTYPE html><html prefix="dc: http://purl....

Methods: get_restbase('/page/html')

page.data['image']

<list> representative image(s) for this page

The epitome ("single most appropriate") image data contained in each API response is stored in this attribute with a kind label. They are often all the same image file. These are NOT all the images/files contained in a page—that would be page.data['files']—, only the so-called PageImage that aims to be a representative image for the page. See the Images documentation for more details.

Example:

>>> gandhi.pageimage()
['query-pageimage',
 'query-thumbnail',
 'parse-image',
 'wikidata-image',
 'rest-image',
 'rest-thumb']

>>> gandhi.pageimage('thumb')
{'file': u'Portrait_Gandhi.jpg',
 u'height': 240,
 'kind': 'query-thumbnail',
 'url': u'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Portrait_Gandhi.jpg/160px-Portrait_Gandhi.jpg',
 u'width': 160}

Methods: get_imageinfo(), get_parse(), get_query(), get_restbase(), get_wikidata()

page.data['infobox']

<dict> parsed infobox data

This attribute contains Infobox template data extracted from a page's parsetree. Unfortunately, there is usually more data available from a page's infobox than from wikidata. See the Infoboxes documentation for details.

Example:

>>> gandhi.infobox
{'alma_mater': [[University College London]]<ext><name>ref</name><attr/><i...
 'alt': u'The face of Gandhi in old age\u2014smiling, wearing glasses, and ...
 'birth_date': '{{Birth date|df|=|yes|1869|10|2}}',
 'birth_name': 'Mohandas Karamchand Gandhi',
 'birth_place': [[Porbandar State]], [[Kathiawar Agency]], [[British India...
 'children': '{{hlist|[[Harilal Gandhi|Harilal]]|[[Manilal Gandhi|Manilal]]...
 'death_cause': '[[Assassination of Mahatma Gandhi|Assassination]]',
 'death_date': '{{Death date and age|df|=|yes|1948|1|30|1869|10|2}}',
 'death_place': 'New Delhi, [[Delhi]], [Dominion of India] (now India)',
 'father': '[[Karamchand Uttamchand Gandhi|Karamchand Gandhi]]',
 'honorific_prefix': '[[Mahatma]]',
 'image': 'MKGandhi.jpg',
 'known_for': [[Indian Independence Movement]],<br>[[Peace movement]]',
 'mother': 'Putlibai Gandhi',
 'movement': '[[Indian independence movement]]',
 'name': 'Mohandas Karamchand Gandhi',
 'occupation': '{{hlist|Lawyer|Politician|Activist|Writer|Soldier}}',
 'other_names': 'Mahatma Gandhi, Bapu, Gandhiji',
 'party': '[[Indian National Congress]]',
 'resting_place': [[Raj Ghat and associated memorials|Raj Ghat]], Delhi',
 'signature': 'Mohandas K. Gandhi signature.svg',
 'spouse': '{{marriage|[[Kasturba Gandhi]]|1883|1944|end|=|died}}'}

Methods: get_parse()

page.data['label']

<str> Wikidata label

This is the Wikidata label (common name) in the language specified.

Example:

gandhi.label: Mahatma Gandhi

Methods: get_query(), get_wikidata()

page.data['lang']

<str> language code of this page

This is the Mediawiki language code of the page requestsed. The default language is "en" (English).

Example:

gandhi.lang: en

Methods: __init__

page.data['languages']

<list> languages available

This is the list of languages that this page can be found in on other Wikipedias from Mediawiki API:Langlinks. Each entry contains the language code and the name of the page rendered in that language. What a treasure!

Example:

gandhi.languages: <list(167)>
>>> gandhi.languages[5]
{u'lang': u'ar',
 u'title': u'\u0645\u0647\u0627\u062a\u0645\u0627 \u063a\u0627\u0646\u062f\u064a'}

>>> print gandhi.languages[5]['title']
مهاتما غاندي

Methods: get_query()

page.data['lead']

<str> lead section full HTML

This is the page's lead section, or summary, in full HTML including references, citations, and infoboxes.

Example:

gandhi.lead: <str(15770)> <span><p><b>Mohandas Karamchand Gandhi</b> (<sp...

Methods: get_restbase('/page/mobile-sections-lead')

page.data['length']

<int> page length in bytes

This is the size of the page in bytes from Mediawiki API:Info.

Example:

gandhi.length: 264260

Methods: get_query()

page.data['links']

<list> list of interwiki links

This is the list of interwiki links from the page's parsetree.

Example:

>>> pprint(gandhi.links)
[u'https://biblio.wiki/wiki/Mohandas_K._Gandhi',
 u'https://commons.wikimedia.org/wiki/Special:Search/Mohandas_K._Gandhi',
 u'https://en.wikiquote.org/wiki/Special:Search/Mohandas_K._Gandhi',
 u'https://en.wikisource.org/wiki/An_Autobiography_or_The_Story_of_my_Experiments_with_Truth',
 u'https://en.wikisource.org/wiki/Chronology_of_Mahatma_Gandhi%27s_life/India_1918',
 u'https://en.wikisource.org/wiki/Special:Search/Author:Mohandas_K._Gandhi',
 u'https://en.wikisource.org/wiki/The_Collected_Works_of_Mahatma_Gandhi',
 u'https://en.wikisource.org/wiki/The_Collected_Works_of_Mahatma_Gandhi/Volume_II',
 u'https://en.wikisource.org/wiki/The_Collected_Works_of_Mahatma_Gandhi/Volume_II/March_1897_Memorial',
 u'https://www.wikidata.org/wiki/Q1001']

Methods: get_parse()

page.data['modified']

<dict> last modified dates

This attribute contains the last modified dates of the page and its associated wikidata.

Example:

>>> gandhi.modified
{'page': u'2017-08-13T10:14:17Z', 'wikidata': u'2017-08-13T04:30:27Z'}

Methods: get_query(), get_restbase(), get_wikidata()

page.data['pageid']

<int> Wikipedia database ID

This is the numeric identifier of the page in the Mediawiki database. It is useful as a pivot point for wptools to gather information across APIs.

Example:

gandhi.pageid: 19379

Methods: get_query(), get_parse()

page.data['parsetree']

<str> page parsetree XML

This the full parsetree XML for the page which is used by wptools to parse infoboxes. It is certainly useful for a great many other things too.

Example:

gandhi.parsetree: <str(333213)> <root><template><title>Redirect</title><p...

Methods: get_parse()

page.data['props']

<dict> page wikidata properties

This attribute contains any properties found in the page's wikidata. These properties are basically wikidata values. In wikibase, entities have claims (labels) and properties (values). Properties can have claims as values.

Example:

>>>gandhi.props
{u'P18': [u'Portrait Gandhi.jpg'],
 u'P27': [u'Q129286', u'Q668'],
 u'P31': [u'Q5'],
 u'P345': [u'nm0003987'],
 u'P569': [u'+1869-10-02T00:00:00Z'],
 u'P570': [u'+1948-01-30T00:00:00Z'],
 u'P910': [u'Q6512732']}

Methods: get_wikidata()

random

<str> a random Mediawiki title

This attribute contains a random title that we get for free with some requests.

Example:

gandhi.random: Elfcon

Methods: get_query()

page.data['title']

<str> the page's normalized title

This is the normalized title of the page with proper case and underscores inserted from the Mediawiki database.

Example:

gandhi.title: Mahatma_Gandhi

Methods: get_parse(), get_query(), get_random(), get_restbase(), get_wikidata()

page.data['url']

<str> canonical URL

This is the canonical URL formed from Mediawiki convention.

Example:

gandhi.url: https://en.wikipedia.org/wiki/Mahatma_Gandhi

Methods: get_query(), get_restbase()

page.data['url_raw']

<str> raw wikitext URL

This is the ostensible direct link to a page's wikitext. However, this link does not resolve correctly in many cases.

Example:

gandhi.url_raw: https://en.wikipedia.org/wiki/Mahatma_Gandhi?action=raw

Methods: get_query(), get_restbase()

page.data['views']

<int> average daily page views for last 60 days

We average the daily page views from the last 60 days from API:Query prop=pageviews. No way!

Example:

gand.views: 21479

Methods: get_query()

page.data['watchers']

<int> number of page watchers

This is simply the number of people watching the page from Mediawiki API:Info. Intriguing!

Example:

gandhi.watchers: 1724

Methods: get_query()

page.data['what']

<str> wikidata classification

This is Wikidata Property:P31 "instance of", which basically tells us something about what this page is. Incredibly useful if you're not familiar with the title and want to know what kind of data you are looking at.

Example:

gandhi.what: human

Methods: get_wikidata()

page.data['wikibase']

<str> wikibase item ID

This is the wikibase item identifier that represents an object, concept, or event in Wikidata.

Example:

gandhi.wikibase: Q1001

Methods: get_wikidata()

page.data['wikidata']

<dict> the actual wikidata for a page

This is the collection of wikidata that wptools has managed to gather for a page. Claim labels and properties that have claim values have been resolved into this attribute. As the Wikidata project matures, it will come closer to what we get with page.data['infobox'], but much better because it will have a standardized structure!

Example:

>>> gandhi.wikidata
{'IMDB': u'nm0003987',
 'birth': u'+1869-10-02T00:00:00Z',
 'category': u'Category:Mahatma Gandhi',
 'citizenship': [u'British Raj', u'India'],
 'death': u'+1948-01-30T00:00:00Z',
 'image': u'Portrait Gandhi.jpg',
 'instance': u'human'}

Methods: get_claims(), get_wikidata()

page.data['wikidata_url']

<str> wikidata URL

This is simply the URL to a page's Wikidata page.

Example:

gandhi.wikidata_url: https://www.wikidata.org/wiki/Q1001

Methods: get_parse(), get_query(), get_restbase(), get_wikidata()

page.data['wikitext']

<str> page wikitext

This is the raw wikitext used to render Mediawiki pages. It took me a while to figure out that there is absolutely no hope of reproducing the HTML that results from Mediawiki and its vast ecosystem of templates and add-ons from the raw wikitext. Phenomenal!

Example:

gandhi.wikitext: <str(262702)> {{Redirect|Gandhi}}{{pp-move-indef}}{{pp-s...

Methods: get_parse()

Clone this wiki locally