Skip to content

Missing Data Files Due to 404 Errors in get-all-data.sh #45

@ciioprof0

Description

@ciioprof0

The script get-all-data.sh downloads most of the data files successfully but fails to download the following files due to 404 errors:

  • data/nmt/eng-fra.txt
  • data/nmt/simplest_eng_fra.csv
  • data/yelp/raw_train.csv

Example Error Content

In the case of raw_train.csv, the downloaded file contains the following HTML, indicating a "404 Not Found" error:

<html lang="en" dir="ltr">
  <meta charset="utf-8">
  <meta name="viewport" content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    /* Truncated for brevity */
  </style>
  <main>
    <a href="//www.google.com">
      <span id="logo" aria-label="Google" role="img"></span>
    </a>
    <p><b>404.</b> <ins>That’s an error.</ins></p>
    <p>The requested URL was not found on this server. <ins>That’s all we know.</ins></p>
  </main>
</html>

Steps to Reproduce

  1. Run the get-all-data.sh script as described in the book.
  2. Observe that the listed files are not downloaded, and the resulting files contain 404 error messages.

Expected Behavior

The script should download the required data files or provide updated instructions if the files have been moved or removed.

Suggestions

  • Verify whether the file links (Google Drive IDs or other sources) are still valid.
  • Provide updated links or alternative sources for the missing files.
  • If the files are no longer available, include placeholder data or instructions to generate equivalent datasets.

Thank you for addressing this issue!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions