Skip to content

Only Download Qdrant Collection When Necessary #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

audieleon
Copy link
Contributor

Made the CLI store the collection and some metadata about it, so when you run gaianet init again, it won't download the file if the local version is still current. The version check is against the x-repo-commit header, which is now recorded in a file in a new folder to hold these snapshots.

Made the CLI store the collection and some metadata about it, so when you run `gaianet init` again, it won't download it again if the local version is still current. The check is against the `x-repo-commit` header, which is now recorded in a file in a new folder to hold these snapshots.
left some log files in when I should have deleted.
Copy link
Contributor

juntao commented Mar 4, 2025

Hello, I am a code review agent on flows.network. Here are my reviews of changed source code files in this PR.


gaianet

Potential issues

  1. Issue 1: Incomplete Command in start Function
    The command to start the LlamaEdge API Server is incomplete and truncated, ending with --batch-size $chat_batch_size,$embedding_bat. This will cause a syntax error and failure of the script.

  2. Issue 2: Incorrect PID File Handling in create_collection
    In the create_collection function, the PID file (qdrant_pid) is not properly managed if Qdrant was already running. If qdrant_already_running is true, the PID should not be written or used later to kill a non-owned process.

  3. Issue 3: Potential Data Loss in validate_snapshot
    The validate_snapshot function starts Qdrant and creates a test collection for validation. However, it does not ensure that this temporary state is cleaned up if the validation fails, which could leave leftover data or running processes.

Summary of changes

    • Added need_snapshot_update function: Checks if the snapshot needs to be updated by comparing remote and local commit hashes.
  • Implemented snapshot validation in create_collection: Validates the downloaded snapshot before importing it into Qdrant.
  • Improved error handling and messages: Enhanced user feedback with clear success and error messages.

@audieleon
Copy link
Contributor Author

Addressing the potential issues from the automated code review:

1. **Issue with `sed_in_place` function**: The `sed_in_place` function does not handle the case where the file path contains spaces or special characters, which can lead to incorrect sed commands and potential data corruption.

I didn't change the sed_in_place function, and it's working. The bot is technically correct though.

2. **Issue in `need_snapshot_update` function**: The `get_remote_snapshot_hash` function relies on an HTTP header (`X-Repo-Commit`) that may not be present or consistent across all servers hosting snapshots, leading to false negatives when checking for snapshot updates.

This is OK, but the question did reveal a mistake I made. The header I use, if not present, should just result in a download of the collection. I had that backwards in my prior code, and have checked in a change to fix that. HuggingFace does use that header, so for data coming from there, this method works. Other data will always download if they don't include this header.

3. **Incomplete `cmd` array in `start` subcommand**: The `cmd` array is incomplete and ends abruptly with `--prompt-templat`, which will cause an error when attempting to execute the command.

This is not correct, and is likely the context window cutting off at that line in the code. This is line 493 in the gaianet code, and the code is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants