-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Having:
export PGRSTBENCH_SEPARATE_PG="false" # this is just to make the deployment faster
export PGRSTBENCH_EC2_INSTANCE_TYPE="t3a.micro" # needs micro at minimum otherwise at nano the middleware instance becomes unresponsive
$ postgrest-bench-deploy
...
Using 1000 VUs for a k6 load test causes many failures:
$ postgrest-bench-k6 1000 k6/POSTBulk.js
time="2025-02-10T23:45:17Z" level=warning msg="Request Failed" error="Post \"http://pgrst/employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email\": write tcp 10.0.0.132:36880->10.0.0.250:80: use of closed network connection"
time="2025-02-11T00:04:40Z" level=warning msg="Request Failed" error="Post \"http://pgrst/employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,
city,state,country,postal_code,phone,fax,email\": read tcp 10.0.0.132:52914->10.0.0.250:80: read: connection reset by peer"
..
data_received..................: 4.4 MB 129 kB/s
data_sent......................: 198 MB 5.7 MB/s
http_req_blocked...............: avg=43.35ms min=1.12µs med=3.7µs max=11.24s p(90)=40.87ms p(95)=92.11ms
http_req_connecting............: avg=43.19ms min=0s med=0s max=11.24s p(90)=40.31ms p(95)=91.21ms
http_req_duration..............: avg=767.83ms min=0s med=953.54ms max=4.4s p(90)=1.24s p(95)=1.57s
{ expected_response:true }...: avg=1.02s min=10.53ms med=993.78ms max=4.4s p(90)=1.35s p(95)=1.72s
✗ http_req_failed................: 25.35% ✓ 18776 ✗ 55283
http_req_receiving.............: avg=1.4ms min=0s med=40.54µs max=120.22ms p(90)=403.47µs p(95)=8.3ms
http_req_sending...............: avg=2.43ms min=0s med=45.23µs max=452.1ms p(90)=2.86ms p(95)=10.87ms
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=763.99ms min=0s med=952.48ms max=4.4s p(90)=1.23s p(95)=1.57s
http_reqs......................: 37030 1077.252069/s
iteration_duration.............: avg=816.73ms min=812.71µs med=956.25ms max=11.3s p(90)=1.26s p(95)=1.63s
iterations.....................: 37029 1077.222978/s
vus............................: 0 min=0 max=1000
vus_max........................: 1000 min=1000 max=1000
No error is logged on journalctl -u postgrest
. But on Nginx, we can see a lot of connect() failed (111: Connection refused)
errors:
$ postgrest-bench-ssh pgrst
$ cat /var/log/nginx/error.log | less
2025/02/10 23:37:53 [error] 1301#1301: *15884 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.0.132, server: , request: "POST /employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email HTTP/1.1", upstream: "http://[::1]:3000/employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email", host: "pgrst"
...
Not sure if PostgREST or Nginx is at fault here.
Observations
- At 800 VUs, there's no error:
$ postgrest-bench-k6 800 k6/POSTBulk.js
Running k6 with 800 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
data_received..................: 4.3 MB 134 kB/s
data_sent......................: 188 MB 5.9 MB/s
http_req_blocked...............: avg=6.35ms min=1.08µs med=2.8µs max=416.84ms p(90)=4.47µs p(95)=10.6µs
http_req_connecting............: avg=6.27ms min=0s med=0s max=416.79ms p(90)=0s p(95)=0s
http_req_duration..............: avg=896.09ms min=12.59ms med=845.24ms max=12.15s p(90)=1.48s p(95)=2.3s
{ expected_response:true }...: avg=896.09ms min=12.59ms med=845.24ms max=12.15s p(90)=1.48s p(95)=2.3s
✓ http_req_failed................: 0.00% ✓ 0 ✗ 53375
http_req_receiving.............: avg=70.81µs min=21.02µs med=45.5µs max=22.49ms p(90)=81.92µs p(95)=109.59µs
http_req_sending...............: avg=490.92µs min=24.07µs med=49.92µs max=76.33ms p(90)=102.82µs p(95)=315.1µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=895.53ms min=12.47ms med=845.12ms max=12.14s p(90)=1.48s p(95)=2.3s
http_reqs......................: 26688 836.207664/s
iteration_duration.............: avg=903.03ms min=13.08ms med=845.99ms max=12.28s p(90)=1.48s p(95)=2.34s
iterations.....................: 26687 836.176331/s
vus............................: 0 min=0 max=800
vus_max........................: 800 min=800 max=800
- By using unix socket instead of TCP, the failure rate goes down somehow (16% from the previous 25%):
$ export PGRSTBENCH_WITH_UNIX_SOCKET="false"
$ postgrest-bench-deploy
$ postgrest-bench-k6 1000 k6/POSTBulk.js
data_received..................: 4.3 MB 135 kB/s
data_sent......................: 193 MB 6.0 MB/s
http_req_blocked...............: avg=25.87ms min=1.09µs med=3.05µs max=11.24s p(90)=36.15ms p(95)=78.73ms
http_req_connecting............: avg=25.69ms min=0s med=0s max=11.24s p(90)=35.73ms p(95)=77.85ms
http_req_duration..............: avg=903.86ms min=0s med=1.02s max=12.3s p(90)=1.34s p(95)=1.76s
{ expected_response:true }...: avg=1.07s min=14.32ms med=1.04s max=12.3s p(90)=1.46s p(95)=1.88s
✗ http_req_failed................: 16.22% ✓ 10494 ✗ 54195
http_req_receiving.............: avg=998.27µs min=0s med=42.96µs max=169.44ms p(90)=110.52µs p(95)=655.59µs
http_req_sending...............: avg=1.8ms min=0s med=46.79µs max=278.75ms p(90)=325.06µs p(95)=8.9ms
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=901.05ms min=0s med=1.02s max=12.29s p(90)=1.34s p(95)=1.76s
http_reqs......................: 32345 1003.925992/s
iteration_duration.............: avg=933.73ms min=685.4µs med=1.02s max=12.31s p(90)=1.35s p(95)=1.78s
iterations.....................: 32344 1003.894954/s
vus............................: 0 min=0 max=1000
vus_max........................: 1000 min=1000 max=1000
-
Raising the VUs to 1200, adds a new Nginx error:
2025/02/11 00:15:19 [alert] 40240#40240: 1024 worker_connections are not enough 2025/02/11 00:15:19 [alert] 40240#40240: *36991 1024 worker_connections are not enough while connecting to upstream, client: 10.0.0.132, server: , request: "POST /employee?columns=employ ee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email HTTP/1.1", upstream: "http://127.0.0.1:3000/employee?columns=emplo yee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email", host: "pgrst" 2025/02/11 00:15:30 [error] 40240#40240: *38310 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.0.132, server: , request: "POST /employee?columns=em ployee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email HTTP/1.1", upstream: "http://[::1]:3000/employee?columns=emplo yee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email", host: "pgrst"
- The 1024 worker_connection are set here: https://github.com/PostgREST/postgrest-benchmark/blob/master/nixops.nix#L182
- So it could be that 1000 VUs are too close to Nginx max worker_connections and that causes failures.
-
Increasing the instance size (e.g. to
m5a.4xlarge
) reduces the failure rate to 0.
Metadata
Metadata
Assignees
Labels
No labels