Skip to content

Testing current full web crawling functionality #114

@BradKML

Description

@BradKML

Currently I am testing to see if PyWebCopy can download the whole website (subdomain) rather than merely a webpage. Unfortunately it did not work as intended. save_webpage and save_website should be different

import os # borrowed from https://stackoverflow.com/a/14125914
relative_path = r'book_test'
current_directory = os.getcwd()
final_directory = os.path.join(current_directory, relative_path)
if not os.path.exists(final_directory): os.makedirs(final_directory)

from pywebcopy import save_website
save_website(url='https://www.nateliason.com/notes', project_folder=final_directory, 
             project_name="test_site", 
             bypass_robots=True, debug=True, open_in_browser=False,
             delay=None, threaded=False,)

In the debug logs, none of the URL calls go beyond the main URL, it did not jump down the layers into other related URLs. What could be the cause of this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions