-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
I was trying to train on imagenet and I noticed that Dataloader kept running into OOM issues so I had to write my own simple loader that may not be as fast but doesn't crash.
I made this MWE to demo the issue:
using MLUtils
using Random
struct RandDataset
len::Int
end
Base.length(dataset::RandDataset) = dataset.len
function Base.getindex(dataset::RandDataset, idx::Int)
return rand(Float32, 224,224,3), rand(1:1000)
end
function Base.getindex(dataset::RandDataset, indices::AbstractVector{Int})
return [dataset[idx] for idx in indices]
end
function Base.getindex(dataset::RandDataset, range::AbstractRange{Int})
return [dataset[idx] for idx in range]
end
struct SimpleLoader{D}
dataset::D
batchsize::Int
shuffle::Bool
indices::Vector{Int}
end
function SimpleLoader(dataset, batchsize::Int; shuffle::Bool=false)
indices = collect(1:numobs(dataset))
return SimpleLoader(dataset, batchsize, shuffle, indices)
end
function load_batch(dataset, indices::Vector{Int})
actual_batch_size = length(indices)
batch_array = Array{Float32}(undef, 224, 224, 3, actual_batch_size)
labels = Vector{Int}(undef, actual_batch_size)
Threads.@threads for idx in eachindex(indices)
i = indices[idx]
img, lbl = dataset[i]
batch_array[:, :, :, idx] .= img
labels[idx] = lbl
end
return batch_array, labels
end
function Base.iterate(loader::SimpleLoader, state::Int = 1)
if state > length(loader.indices)
return nothing
end
if loader.shuffle && state == 1
Random.shuffle!(loader.indices)
end
end_idx = min(state + loader.batchsize - 1, length(loader.indices))
batch_inds = loader.indices[state:end_idx]
batch = load_batch(loader.dataset, batch_inds)
return (batch, end_idx + 1)
end
dataset = RandDataset(1_200_000)
#dl = DataLoader(dataset, batchsize=1024, collate=true, parallel=true, partial=true)
dl = SimpleLoader(dataset, 1024)
@time for _ in dl end
@time for _ in dl end
Result from running the dataloader and 16 threads:
92.184423 seconds (8.12 M allocations: 1.315 TiB, 18.53% gc time, 319 lock conflicts, 2.68% compilation time)
91.870471 seconds (6.62 M allocations: 1.314 TiB, 16.23% gc time, 210 lock conflicts)
Command being timed: "julia --project=. --threads=16 benchmark_imagenet_loading.jl"
User time (seconds): 286.29
System time (seconds): 297.57
Percent of CPU this job got: 312%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:07.08
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 40386076
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 61203304
Voluntary context switches: 3746914
Involuntary context switches: 1050
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Result from running the simpleloader and 16 threads:
74.446160 seconds (4.85 M allocations: 1.314 TiB, 48.53% gc time, 13.69% compilation time)
74.961892 seconds (3.71 M allocations: 1.314 TiB, 49.67% gc time)
Command being timed: "julia --project=. --threads=16 benchmark_imagenet_loading.jl"
User time (seconds): 444.61
System time (seconds): 632.58
Percent of CPU this job got: 709%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:31.84
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2678888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 163159298
Voluntary context switches: 2486217
Involuntary context switches: 865
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I know I'm not using the buffer (coz I can't figure out how to) but still, the simple loader uses around 2GB and dataloader uses around 40GB and it's not even faster.
Since I know the item shapes and use that for preallocation in my simple loader I know dataloader wouldn't be faster or use less memory but the amount of memory it uses is too much, is there a memory leak somewhere?
Metadata
Metadata
Assignees
Labels
No labels