diff --git a/analyses/2019_04_KingCyrus_#35_uniqueIDs_in_dataset.ipynb b/analyses/2019_04_KingCyrus_#35_uniqueIDs_in_dataset.ipynb
new file mode 100644
index 0000000..00bb1bb
--- /dev/null
+++ b/analyses/2019_04_KingCyrus_#35_uniqueIDs_in_dataset.ipynb
@@ -0,0 +1,949 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import dask.dataframe as dd"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data_set = './sample_10percent_value_1000_only.parquet' # create a variable for the dataset "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "TASK: We want to identify scripts that have stored or created unique ids. We also want to identify scripts that have not been storing or creating unique ids."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "ANALYSIS: When a site is visited for the first time, cookies are installed for subsequent visits. A Web site might generate a unique ID number for each visitor and store the number on the machine of each user using a cookie file. Since these files are stored in the local storage of the machine, we want to look at calls, probably to storage APIs, to discover scripts that have been setting unique ids.\n",
+ "\n",
+ "Creating unique IDs is a pointer to fingerprinting. However, it is not an axiom that the storage of unique IDs mean that fingerprinting truly occurred. If we also fail to detect unique IDs in storage, we can also not conclude fingerprinting did not happen. This task, in essence, is foundational in helping us understand what a script is possibly doing. Our local storage APIs are: window.localStorage, window.document.cookie, and window.sessionStorage."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SECTION 1: In this section, we will do the following:\n",
+ "\n",
+ "a. Take a look at the data \n",
+ "\n",
+ "b. Check number of calls to local Storage\n",
+ "\n",
+ "c. Show scripts that made calls to local Storage\n",
+ "\n",
+ "d. Show scripts that did not make calls to local Storage."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SECTION 1A: We take a look at the data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " argument_0 | \n",
+ " argument_1 | \n",
+ " argument_2 | \n",
+ " argument_3 | \n",
+ " argument_4 | \n",
+ " argument_5 | \n",
+ " argument_6 | \n",
+ " argument_7 | \n",
+ " argument_8 | \n",
+ " arguments | \n",
+ " ... | \n",
+ " location | \n",
+ " operation | \n",
+ " script_col | \n",
+ " script_line | \n",
+ " script_loc_eval | \n",
+ " script_url | \n",
+ " symbol | \n",
+ " time_stamp | \n",
+ " value_1000 | \n",
+ " value_len | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " {} | \n",
+ " ... | \n",
+ " https://vk.com/widget_comments.php?app=2297596... | \n",
+ " get | \n",
+ " 1 | \n",
+ " 163 | \n",
+ " | \n",
+ " https://vk.com/js/api/xdm.js?1449919642 | \n",
+ " window.name | \n",
+ " 2017-12-16 19:02:31.406 | \n",
+ " fXDcab74 | \n",
+ " 8 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " {} | \n",
+ " ... | \n",
+ " https://vk.com/widget_comments.php?app=2297596... | \n",
+ " get | \n",
+ " 9 | \n",
+ " 164 | \n",
+ " | \n",
+ " https://vk.com/js/api/xdm.js?1449919642 | \n",
+ " window.name | \n",
+ " 2017-12-16 19:02:31.407 | \n",
+ " fXDcab74 | \n",
+ " 8 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " {} | \n",
+ " ... | \n",
+ " https://vk.com/widget_comments.php?app=2297596... | \n",
+ " get | \n",
+ " 67 | \n",
+ " 1 | \n",
+ " | \n",
+ " https://vk.com/js/al/aes_light.js?592436914 | \n",
+ " window.navigator.userAgent | \n",
+ " 2017-12-16 19:02:31.659 | \n",
+ " Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... | \n",
+ " 68 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " {} | \n",
+ " ... | \n",
+ " https://pos.baidu.com/s?hei=70&wid=670&di=u313... | \n",
+ " get | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " | \n",
+ " https://cpro.baidustatic.com/cpro/ui/noexpire/... | \n",
+ " window.navigator.userAgent | \n",
+ " 2017-12-16 00:24:09.355 | \n",
+ " Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... | \n",
+ " 68 | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " None | \n",
+ " {} | \n",
+ " ... | \n",
+ " http://serienjunkies.org/smilf/smilf-season-1-... | \n",
+ " get | \n",
+ " 32 | \n",
+ " 59 | \n",
+ " | \n",
+ " https://apis.google.com/js/plusone.js?_=151338... | \n",
+ " window.document.cookie | \n",
+ " 2017-12-16 01:24:30.372 | \n",
+ " _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... | \n",
+ " 288 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
5 rows × 26 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " argument_0 argument_1 argument_2 argument_3 argument_4 argument_5 \\\n",
+ "0 None None None None None None \n",
+ "1 None None None None None None \n",
+ "2 None None None None None None \n",
+ "3 None None None None None None \n",
+ "4 None None None None None None \n",
+ "\n",
+ " argument_6 argument_7 argument_8 arguments ... \\\n",
+ "0 None None None {} ... \n",
+ "1 None None None {} ... \n",
+ "2 None None None {} ... \n",
+ "3 None None None {} ... \n",
+ "4 None None None {} ... \n",
+ "\n",
+ " location operation script_col \\\n",
+ "0 https://vk.com/widget_comments.php?app=2297596... get 1 \n",
+ "1 https://vk.com/widget_comments.php?app=2297596... get 9 \n",
+ "2 https://vk.com/widget_comments.php?app=2297596... get 67 \n",
+ "3 https://pos.baidu.com/s?hei=70&wid=670&di=u313... get 1 \n",
+ "4 http://serienjunkies.org/smilf/smilf-season-1-... get 32 \n",
+ "\n",
+ " script_line script_loc_eval \\\n",
+ "0 163 \n",
+ "1 164 \n",
+ "2 1 \n",
+ "3 1 \n",
+ "4 59 \n",
+ "\n",
+ " script_url \\\n",
+ "0 https://vk.com/js/api/xdm.js?1449919642 \n",
+ "1 https://vk.com/js/api/xdm.js?1449919642 \n",
+ "2 https://vk.com/js/al/aes_light.js?592436914 \n",
+ "3 https://cpro.baidustatic.com/cpro/ui/noexpire/... \n",
+ "4 https://apis.google.com/js/plusone.js?_=151338... \n",
+ "\n",
+ " symbol time_stamp \\\n",
+ "0 window.name 2017-12-16 19:02:31.406 \n",
+ "1 window.name 2017-12-16 19:02:31.407 \n",
+ "2 window.navigator.userAgent 2017-12-16 19:02:31.659 \n",
+ "3 window.navigator.userAgent 2017-12-16 00:24:09.355 \n",
+ "4 window.document.cookie 2017-12-16 01:24:30.372 \n",
+ "\n",
+ " value_1000 value_len \n",
+ "0 fXDcab74 8 \n",
+ "1 fXDcab74 8 \n",
+ "2 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... 68 \n",
+ "3 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... 68 \n",
+ "4 _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... 288 \n",
+ "\n",
+ "[5 rows x 26 columns]"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "ddf = dd.read_parquet(data_set, engine='pyarrow')\n",
+ "ddf.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Fig. 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "From the table Fig. 1, the \"script_url\" column tells us the scripts making the call"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To the task of the day. We want to check the number of calls to local storage made by scripts. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# A list of localstorage spaces\n",
+ "stores = ['window.localStorage','window.document.cookie', 'window.sessionStorage']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# We start by defining a function'findID'. Then we read the parquet file into a variable\n",
+ "#Second line involves filtering the calls made to ensure we extract only those that made calls to local Storage\n",
+ "# We then return the result for further analysis toward the task\n",
+ "\n",
+ "def findID(data):\n",
+ " ddf = dd.read_parquet(data, engine='pyarrow') \n",
+ " unique_symbl = ddf.symbol.isin(stores) \n",
+ " new_table = ddf[unique_symbl]\n",
+ " return new_table"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SECTION 1B: Number of calls to each local Storage "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "window.document.cookie 3521632\n",
+ "window.localStorage 866712\n",
+ "window.sessionStorage 408549\n",
+ "Name: symbol, dtype: int64"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "findID(data_set).symbol.value_counts().compute() # number of times calls were made to local Storage"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The code below corresponds with what the function gave us above. We had to use head(10) to keep things compact. \n",
+ "However, all the local Storage APIs were captured. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "window.document.cookie 3521632\n",
+ "window.navigator.userAgent 1542070\n",
+ "window.Storage.getItem 1046105\n",
+ "window.localStorage 866712\n",
+ "window.Storage.setItem 408566\n",
+ "window.sessionStorage 408549\n",
+ "window.Storage.removeItem 284856\n",
+ "window.name 248318\n",
+ "CanvasRenderingContext2D.fillStyle 196829\n",
+ "window.navigator.plugins[Shockwave Flash].description 184937\n",
+ "Name: symbol, dtype: int64"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We want to get the exact number of times calls were made to local storage. This will serve as a test for any code \n",
+ "# we write to \n",
+ "\n",
+ "ddf = dd.read_parquet(data_set, engine='pyarrow')\n",
+ "ddf.symbol.value_counts().compute().head(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SECTION 1C: Below we check for scripts that made calls to local Storage using the funtion in Section 1B. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 https://apis.google.com/js/plusone.js?_=151338...\n",
+ "1 https://assets.adobedtm.com/caacec67651710193d...\n",
+ "2 https://assets.adobedtm.com/caacec67651710193d...\n",
+ "3 https://www.google-analytics.com/analytics.js\n",
+ "4 https://www.google-analytics.com/plugins/ua/li...\n",
+ "5 https://www.canada.ca/etc/designs/canada/wet-b...\n",
+ "6 https://g.alicdn.com/sea/sitenav-global/0.8.0/...\n",
+ "7 https://g.alicdn.com/tb/tracker/3.0.7/index.js\n",
+ "8 https://maniform.world.tmall.com/category-1282...\n",
+ "9 https://g.alicdn.com/sanwant/shop-render/0.0.9...\n",
+ "10 https://g.alicdn.com/alilog/mlog/aplus_v2.js\n",
+ "11 https://g.alicdn.com/tb/tracker/4.0.1/p/index/...\n",
+ "12 https://g.alicdn.com/aliww/web.ww/scripts/webw...\n",
+ "13 https://g.alicdn.com/kissy/k/1.4.2/??io-min.js...\n",
+ "14 https://g.alicdn.com/tbc/??search-suggest/1.3....\n",
+ "15 https://g.alicdn.com/shop/wangpu/1.7.5/??decor...\n",
+ "16 https://g.alicdn.com/secdev/sufei_data/3.2.2/i...\n",
+ "17 https://platform.twitter.com/widgets/follow_bu...\n",
+ "18 https://www.coches.net/scripts/common.min.js?2...\n",
+ "19 https://c.ccdn.es/JSHandler.ashx?v=20171205&fi...\n",
+ "20 https://tags.tiqcdn.com/utag/schibsted/coches....\n",
+ "21 https://cdn.optimizely.com/js/8475962892.js\n",
+ "22 https://static.criteo.net/js/ld/ld.js\n",
+ "23 https://tags.tiqcdn.com/utag/schibsted/coches....\n",
+ "24 https://jssdk.pulse.schibsted.com/autoTrackerC...\n",
+ "25 https://sb.scorecardresearch.com/beacon.js\n",
+ "26 https://script.hotjar.com/modules-526d80f8c014...\n",
+ "27 https://acdn.adnxs.com/ast/ast.js\n",
+ "28 https://tag.aticdn.net/561882/smarttag.js\n",
+ "29 https://pos.baidu.com/lcdm?rdid=3124385&dc=3&d...\n",
+ " ... \n",
+ "95771 https://connect.facebook.net/en_US/sdk.js?_=15...\n",
+ "95772 http://s7.addthis.com/static/sh.7a295a410262af...\n",
+ "95773 https://cdnjs.cloudflare.com/ajax/libs/fingerp...\n",
+ "95774 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "95775 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "95776 http://pos.baidu.com/s?hei=250&wid=300&di=u163...\n",
+ "95777 http://static4style.qiumibao.com/libs/angular....\n",
+ "95778 https://hm.baidu.com/hm.js?3212511d67978fc36e9...\n",
+ "95779 http://s7.addthis.com/static/sh.7a295a410262af...\n",
+ "95780 https://www.shop.com.mm/coffee/\n",
+ "95781 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "95782 https://www.notebooksbilliger.de/changhong+led...\n",
+ "95783 https://cloud.tencent.com/product/Tcaplus?lang=en\n",
+ "95784 http://sync.teads.tv/iframe?pid=68631&userId=c...\n",
+ "95785 https://static.foodity.com/shopping-tools/jami...\n",
+ "95786 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "95787 https://www.jcrew.com/checkout2/shoppingbag.js...\n",
+ "95788 https://cdn.dynamicyield.com/api/8767798/api_s...\n",
+ "95789 https://us-sonar.sociomantic.com/js/2010-07-01...\n",
+ "95790 http://g.alicdn.com/sd/ncpc/nc.js?t=1513384598\n",
+ "95791 http://www.qichacha.com/g_HEN\n",
+ "95792 https://platform.linkedin.com/js/xdrpc.html?v=...\n",
+ "95793 https://platform.twitter.com/widgets/follow_bu...\n",
+ "95794 http://www.qvc.com/akam/10/5de35063\n",
+ "95795 http://www.qvc.com/content/electronics/portabl...\n",
+ "95796 http://hz.cityhouse.cn/district/YH/\n",
+ "95797 http://fast.aufeminin.demdex.net/dest5.html?d_...\n",
+ "95798 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "95799 http://am15.net/x/uid.php?rand=1231999510&uid=...\n",
+ "95800 http://s7.addthis.com/static/sh.7a295a410262af...\n",
+ "Name: script_url, Length: 95801, dtype: object"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Scripts that made calls to local storage APIs. \n",
+ "\n",
+ "findID(data_set).script_url.unique().compute() # we used 'unique()' to eliminate duplicates. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "From the result above, we deduce that 95, 801 unique scripts made calls to local Storage."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SECTION 1D: We now ascertain scripts that did not make calls to local Storage"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def other_script(data):\n",
+ " ddf = dd.read_parquet(data, engine='pyarrow') \n",
+ " no_calls = ddf[ddf.symbol.isin(stores) == False]\n",
+ " return no_calls"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 https://vk.com/js/api/xdm.js?1449919642\n",
+ "1 https://vk.com/js/al/aes_light.js?592436914\n",
+ "2 https://cpro.baidustatic.com/cpro/ui/noexpire/...\n",
+ "3 https://apis.google.com/_/scs/apps-static/_/js...\n",
+ "4 https://assets.adobedtm.com/caacec67651710193d...\n",
+ "5 https://www.google-analytics.com/analytics.js\n",
+ "6 https://assets.adobedtm.com/caacec67651710193d...\n",
+ "7 https://www.canada.ca/etc/designs/canada/wet-b...\n",
+ "8 https://www.canada.ca/etc/designs/canada/wet-b...\n",
+ "9 https://s0.2mdn.net/879366/Enabler_01_197.js\n",
+ "10 https://g.alicdn.com/kissy/k/1.4.2/seed-min.js\n",
+ "11 https://g.alicdn.com/sea/sitenav-global/0.8.0/...\n",
+ "12 https://g.alicdn.com/tb/tracker/3.0.7/index.js\n",
+ "13 https://g.alicdn.com/sanwant/shop-render/0.0.9...\n",
+ "14 https://g.alicdn.com/mtb/videox/0.1.33/videox-...\n",
+ "15 https://uaction.alicdn.com/js/ua.js\n",
+ "16 https://g.alicdn.com/alilog/mlog/aplus_v2.js\n",
+ "17 https://g.alicdn.com/tb/tracker/4.0.1/p/index/...\n",
+ "18 https://g.alicdn.com/aliww/web.ww/scripts/webw...\n",
+ "19 https://maniform.world.tmall.com/category-1282...\n",
+ "20 https://g.alicdn.com/secdev/sufei_data/3.2.2/i...\n",
+ "21 https://z.moatads.com/thetradedeskv27587456874...\n",
+ "22 https://www.coches.net/scripts/common.min.js?2...\n",
+ "23 https://c.ccdn.es/JSHandler.ashx?v=20171205&fi...\n",
+ "24 https://tags.tiqcdn.com/utag/schibsted/coches....\n",
+ "25 https://static.criteo.net/js/ld/ld.js\n",
+ "26 https://c.ccdn.es/nochange/scripts/pixels.min....\n",
+ "27 https://tags.tiqcdn.com/utag/schibsted/coches....\n",
+ "28 https://jssdk.pulse.schibsted.com/autoTrackerC...\n",
+ "29 https://script.hotjar.com/modules-526d80f8c014...\n",
+ " ... \n",
+ "138784 http://s7.addthis.com/static/sh.7a295a410262af...\n",
+ "138785 https://www.shop.com.mm/coffee/\n",
+ "138786 https://x.cnt.my/async/track/?r=0.290653635635...\n",
+ "138787 https://x.cnt.my/async/track/?r=0.226036316397...\n",
+ "138788 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "138789 http://seasonvar.ru/serial-9836-Hrupkost_-2-se...\n",
+ "138790 https://www.notebooksbilliger.de/nbbvewsyyf.js\n",
+ "138791 https://sslwidget.criteo.com/event?a=1521&v=4....\n",
+ "138792 https://ad.doubleclick.net/ddm/adi/N711134.151...\n",
+ "138793 https://cloud.tencent.com/product/Tcaplus?lang=en\n",
+ "138794 https://static.foodity.com/shopping-tools/jami...\n",
+ "138795 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "138796 http://d2auhg88b162pf.cloudfront.net/assets/ap...\n",
+ "138797 https://www.jcrew.com/checkout2/shoppingbag.js...\n",
+ "138798 https://www.jcrew.com/media/wro/js/checkout-co...\n",
+ "138799 https://cdn.dynamicyield.com/api/8767798/api_s...\n",
+ "138800 http://g.alicdn.com/sd/ncpc/nc.js?t=1513384598\n",
+ "138801 http://www.qichacha.com/g_HEN\n",
+ "138802 https://platform.linkedin.com/js/xdrpc.html?v=...\n",
+ "138803 http://company.hani.co.kr/css/script.js\n",
+ "138804 https://platform.twitter.com/widgets/follow_bu...\n",
+ "138805 https://platform.twitter.com/widgets/tweet_but...\n",
+ "138806 https://googleads.g.doubleclick.net/dbm/ad?dbm...\n",
+ "138807 http://www.qvc.com/akam/10/5de35063\n",
+ "138808 http://hz.cityhouse.cn/js/jquery.js\n",
+ "138809 http://hz.cityhouse.cn/js/highcharts/highchart...\n",
+ "138810 http://hz.cityhouse.cn/district/YH/\n",
+ "138811 https://staticxx.facebook.com/connect/xd_arbit...\n",
+ "138812 http://am15.net/x/uid.php?rand=1231999510&uid=...\n",
+ "138813 https://ulogin.ru/version/2.0/html/drop.html?i...\n",
+ "Name: script_url, Length: 138814, dtype: object"
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "other_script(data_set).script_url.unique().compute()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see that a total of 138,814 scripts did not call local Storage"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SECTION 2: CONCLUSION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "1. If you add the total number of calls made to each local Storage, you will get 4, 796, 893. Window.cookie has the\n",
+ "highest number of calls, which is 3,521,632. \n",
+ "\n",
+ "2. We decided to use 'unique()' in a bid to get only the unique scripts and keep things simple. Using unique(), the \n",
+ "total number of scripts that made local storage calls stood at 95, 801(SECTION 1C). Without 'unique()', the number of \n",
+ "scripts would tally with the total number of times the local storage APIs were called. To verify that claim, we do \n",
+ "the following:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "4796893"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(findID(data_set))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As you can see, the total number of scripts tally with the number of times local Storage APIs were called. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "3. The number of unique scripts that did not make calls to local storage APIs stood at 138, 813. If we get rid of \n",
+ "'unique()', we get:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "6495974"
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(other_script(data_set))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "4. If we add the number of scripts that made calls to local APIs and the ones that didn't, we will get the actual\n",
+ "number of rows in the data set. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "11292867"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "total_scripts = 4796893 + 6495974\n",
+ "total_scripts"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "So the total number of scripts is 11, 292, 867. The number of rows in the data set is:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "11292867"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(ddf)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}