Skip to content
Pierre de Malliard edited this page Nov 16, 2016 · 2 revisions

Facebook Project

A few weeks ago, while talking with my roommates about the different usage everyone does of Facebook. In particular, we asked ourselves if male users tend to have more female friends or male friends and vice-versa. I decided to check this in my particular case, but as I have more than 500 friends, I just created this small program I decided to share on my GitHub, in case someone wants to check this on his particular case. Have fun !

Access to Facebook

Required Packages:

from selenium import webdriver
import time
import os
from selenium.webdriver.common.keys import Keys

Access to Facebook through Selenium

browser=webdriver.Chrome
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("https:/facebook.com")
time.sleep(4)
elem = driver.find_element_by_id("email")
elem.send_keys("XXXX@facebook.com")
elem = driver.find_element_by_id("pass")
elem.send_keys("XXXXX")
elem.send_keys(Keys.RETURN)
time.sleep(5)

driver.get("https://www.facebook.com/firstname.lastname/friends?source_ref=pb_friends_tl")

for i in range(10):
    driver.execute_script("window.scrollTo(0, 5000)")
time.sleep(1)

At this point, the browser has opened the user profile with the overview of all the friends. The default setting is to display only the first 120 friends, so it's important to make sure that the entire list is uploaded by scrolling down the page. It is now possible to hash the page and look for the list of friends using the BeautifulSoup package and Re

Hash Facebook HTML-source code

Packages from bs4 import BeautifulSoup import re

Show all the tags page_data_soup = BeautifulSoup(driver.page_source,'lxml') page_data_soup.find_all('a')

Hash the string code of the page string=str(page_data_soup.find_all('a')) friends=list() for i in range(len(string)): if string[i:i+4]=="fref": friends.append(string[i-30:i])

pattern="friends_tab.>.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?.?</a>"
regex=re.compile(pattern)
friends=regex.findall(string)

friends_first=friends
for i in range(len(friends_first)):
    friends_first[i]=friends_first[i][13:-4]
    friends_first[i]=friends_first[i].rsplit(' ',3)[0]
    friends_first[i]=friends_first[i].upper()

Names database

I then looked for a names database on the web. Since I lived in Germany, France, China and the US, I had to combine different csv files to cover most of the names. French Database German Database

the csv files has 2 columns: Name / Sex

import csv
with open("names.csv") as csvfile:
reader = csv.reader(csvfile)
document=[]
for row in reader:
    document.append(row)
name_list=dict()

for i in range(1,len(document)):
    name_list[document[i][0].upper()]=document[i][1]

Count male and female friends

At this point we are all set, and I can count my friends...

male=0
female=0
unknown=list()

for friend in friends:
    try:
       if name_list[friend.upper()]=="f":
            female+=1
       elif name_list[friend.upper()]=="m":
            male+=1
       else:
            unknown.append(friend.upper())
    except:
       unknown.append(friend.upper())

Unknown Names

And for the names that are not common (my nordic friends for example), I than update the dictionary manually for name in unknown: print("I unfortunately don't know all names, please help me.Is {} a f or m ?".format(name)) sex=input("Enter f or m") name_list[name]=sex

and re-run the last part

In my personal case, I have more male friends (60 vs 40%). This should be due to the fact that I studied engineering

Clone this wiki locally