-
Notifications
You must be signed in to change notification settings - Fork 27
Description
https://github.com/spring/pr-downloader/blob/master/src/Downloader/Rapid/Sdp.cpp#L197-L205
This closes each downloaded file, then reopens/rereads and checks md5:
pr-downloader/src/FileSystem/FileSystem.cpp
Lines 48 to 54 in 5397db3
| bool CFileSystem::fileIsValid(const FileData* mod, const std::string& filename) const | |
| { | |
| HashMD5 md5hash; | |
| int bytes; | |
| unsigned char data[IO_BUF_SIZE]; | |
| FILE* f= propen(filename.c_str(), "rb"); | |
| gzFile inFile = gzdopen (fileno(f), "rb"); |
Since the curl write calls are sequential:
pr-downloader/src/Downloader/Rapid/Sdp.cpp
Line 276 in 5498204
| curl_easy_setopt(curlw->GetHandle(), CURLOPT_WRITEFUNCTION, write_streamed_data); |
... this blocks the download loop until each file is opened, written to, closed, re-opened, loaded in a gzip file object, md5 checked and closed again.
There are much better/faster ways to check the files in-memory, during streamer download, by chunks, using gzip, md5 and crc objects - crc might be the fastest and the sdp provides such data, python example (though thoroughly untested early code that's really unstable):
https://github.com/serg9/rapid_nonet/blob/master/receive_loop.py#L118-L146
There's only one file close, no-rereading the file from disk, using zlib.decompressobj and binascii.crc32, they are meant for on-the-fly stream decompression, there have to be c++ ways to do it too.
Memory usage is not a problem since each received chunk fits comfortably in RAM. I don't know if I have some disk issues or it's my current fstab setup, but the current version of pr-downloader is using my disk to 100% and the download speed is ~100-500kbps max (since it waits for disk). In-ram decompression+crc check is substantially faster on my machine.
For me, this makes it unusable under Linux, might also be related to how disk caching is handled, but I didn't look into that because python kind of handles that on its own.
I can't replicate the crash I told you about now but I'll keep trying.
If not enough linux users get this, it should be marked low priority (I have special mounts).
strace http://pastebin.com/f0HEJ1X6
with time of the day http://pastebin.com/dV6070GX , this makes the sluggishness obvious.