Commit f43ab1f
Optimize two-locus site operations
This PR is a combination of three separate modifications. They are
described below and in #3290. Fixes (#3290).
* Two-locus malloc optimizations
This revision moves all malloc operations out of the hot loop in
two-locus statistics, instead providing pre-allocated regions of memory
that the two-locus framework will use to perform work. Instead of simply
passing each pre-allocated array into each function call, we introduce a
simple structure called `two_locus_work_t`, which stores the statistical
results, and provides temporary arrays for storing the normalisation
constants. Setup and teardown methods for this work structure are
provided. Python and C tests are passing and valgrind reports no errors.
* Refactor bit array api, rename to bitset.
As discussed in #2834, this patch renames tsk_bit_array_t to
tsk_bitset_t. Philosophically, we treat these as sets and not arrays,
performing intersections, unions, and membership tests. Therefore, it
makes sense to alter the API to use set theoretic vocabulary, describing
the intent more precisely. Fundamentally, the bitset structure is a list
of N independent bitsets. Each operation on two sets must select the row
on which to operate. The tsk_bitset_t originally tracked `len` only,
which was N, the number of sets. For convenience, we also track the
`row_len`, which is the number of unsigned integers per row. If we
multiply `row_len` by `TSK_BITSET_BITS`, we get the number of bits that
each set (or row) in the list of bitsets will hold.
We had also discussed each set theoretic operation accepting a row index
instead of a pointer to a row within the bitset object. Now, each
operation accepts a row index for each bitset structure passed into the
function. This simplifies the consumption of this API considerably,
removing the need of storing and tracking many intermediate temporary
array pointers. We also see some performance improvements from this
cleanup. For DRY purposes, I've created a private macro,
`BITSET_DATA_ROW`, which abstracts away the pointer arithmetic for
selecting a row out of the list of sets. Because of these changes,
`tsk_bit_array_get_row` is no longer needed and has been removed from
the API.
This change does not change the size of the "chunk", which is the
unsigned integer storing bits. It remains a 32 bit unsigned integer,
which is most performant for bit counting (popcount). I've streamlined
the macros used to determine which integer in the row will be used to
store a particular bit. Everything now revolves around the
TSK_BITSET_BITS macro, which is simply 32 and bitshift operations have
been converted to unsigned integer division.
Testing has been refactored to reflect these changes, removing tests
that operate on a specific rows. Tests in c and python are passing and
valgrind shows no errors.
Fixes (#2834).
* Precompute A/B Counts and Biallelic Summary Func
Precompute A/B counts for each sample set. We were previously computing
them redundantly each for each site pair in our results matrix. The
precomputation happens in a function called `get_mutation_sample_sets`,
which takes our list of sets (`tsk_bitset_t`) for each mutation and
intersects the samples with a particular mutation with the sample sets
passed in by the user. The result is an expanded list of sets with one
set per mutation per sample set. During this operation, we compute the
number of samples containing the given allele for each mutation,
avoiding the need to perform redundant count operations on the data.
In addition to precomputation, we add a non-normalized version of
`compute_general_two_site_stat_result` for situations where we're
computing stats from biallelic loci. We dispatch the computation of the
result based on the number of alleles in the two loci we're comparing.
If the number of alleles in both loci is 2, then we simply perform an LD
computation on the derived alleles for the two loci. As a result, we
remove the need to compute a matrix of LD values, then take a weighted
sum. This is much more efficient and means that we only run the full
multiallelic LD routine on sites that are multiallelic.1 parent e956149 commit f43ab1f
4 files changed
+457
-360
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
531 | 531 | | |
532 | 532 | | |
533 | 533 | | |
534 | | - | |
| 534 | + | |
535 | 535 | | |
536 | 536 | | |
537 | 537 | | |
538 | 538 | | |
539 | | - | |
540 | | - | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
541 | 543 | | |
542 | 544 | | |
543 | | - | |
544 | | - | |
| 545 | + | |
| 546 | + | |
545 | 547 | | |
546 | 548 | | |
547 | 549 | | |
548 | | - | |
549 | | - | |
| 550 | + | |
| 551 | + | |
550 | 552 | | |
551 | 553 | | |
552 | 554 | | |
553 | 555 | | |
554 | 556 | | |
555 | 557 | | |
556 | 558 | | |
557 | | - | |
| 559 | + | |
558 | 560 | | |
559 | | - | |
| 561 | + | |
560 | 562 | | |
561 | 563 | | |
562 | 564 | | |
563 | 565 | | |
564 | 566 | | |
565 | | - | |
| 567 | + | |
566 | 568 | | |
567 | | - | |
568 | | - | |
569 | | - | |
570 | | - | |
571 | | - | |
572 | | - | |
573 | | - | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
574 | 573 | | |
575 | 574 | | |
576 | | - | |
577 | | - | |
| 575 | + | |
| 576 | + | |
578 | 577 | | |
579 | 578 | | |
580 | 579 | | |
581 | 580 | | |
582 | | - | |
| 581 | + | |
583 | 582 | | |
584 | 583 | | |
585 | 584 | | |
586 | 585 | | |
587 | 586 | | |
588 | 587 | | |
589 | 588 | | |
590 | | - | |
591 | | - | |
| 589 | + | |
| 590 | + | |
592 | 591 | | |
593 | 592 | | |
594 | 593 | | |
595 | 594 | | |
596 | | - | |
| 595 | + | |
597 | 596 | | |
598 | 597 | | |
599 | 598 | | |
600 | 599 | | |
601 | 600 | | |
602 | 601 | | |
603 | 602 | | |
604 | | - | |
605 | | - | |
606 | | - | |
607 | | - | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | | - | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
613 | 607 | | |
614 | 608 | | |
615 | | - | |
| 609 | + | |
616 | 610 | | |
617 | 611 | | |
618 | | - | |
619 | | - | |
| 612 | + | |
| 613 | + | |
620 | 614 | | |
621 | | - | |
622 | | - | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
623 | 619 | | |
624 | 620 | | |
625 | | - | |
| 621 | + | |
626 | 622 | | |
627 | 623 | | |
628 | 624 | | |
629 | 625 | | |
630 | | - | |
| 626 | + | |
631 | 627 | | |
632 | 628 | | |
633 | | - | |
| 629 | + | |
634 | 630 | | |
635 | 631 | | |
636 | 632 | | |
637 | | - | |
638 | | - | |
| 633 | + | |
| 634 | + | |
639 | 635 | | |
640 | 636 | | |
641 | 637 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1260 | 1260 | | |
1261 | 1261 | | |
1262 | 1262 | | |
1263 | | - | |
| 1263 | + | |
1264 | 1264 | | |
1265 | 1265 | | |
1266 | | - | |
| 1266 | + | |
1267 | 1267 | | |
1268 | 1268 | | |
1269 | 1269 | | |
1270 | | - | |
1271 | | - | |
1272 | | - | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
1273 | 1273 | | |
1274 | 1274 | | |
1275 | 1275 | | |
| |||
1278 | 1278 | | |
1279 | 1279 | | |
1280 | 1280 | | |
1281 | | - | |
1282 | | - | |
1283 | | - | |
1284 | | - | |
1285 | | - | |
1286 | | - | |
| 1281 | + | |
1287 | 1282 | | |
1288 | 1283 | | |
1289 | | - | |
1290 | | - | |
| 1284 | + | |
| 1285 | + | |
1291 | 1286 | | |
1292 | | - | |
1293 | | - | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
1294 | 1292 | | |
1295 | 1293 | | |
1296 | 1294 | | |
1297 | 1295 | | |
1298 | | - | |
| 1296 | + | |
| 1297 | + | |
1299 | 1298 | | |
1300 | | - | |
1301 | | - | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
1302 | 1303 | | |
1303 | 1304 | | |
1304 | 1305 | | |
1305 | 1306 | | |
1306 | | - | |
| 1307 | + | |
| 1308 | + | |
1307 | 1309 | | |
1308 | | - | |
1309 | | - | |
| 1310 | + | |
| 1311 | + | |
| 1312 | + | |
| 1313 | + | |
1310 | 1314 | | |
1311 | 1315 | | |
1312 | 1316 | | |
1313 | 1317 | | |
1314 | | - | |
| 1318 | + | |
1315 | 1319 | | |
1316 | | - | |
1317 | | - | |
| 1320 | + | |
| 1321 | + | |
| 1322 | + | |
1318 | 1323 | | |
1319 | 1324 | | |
1320 | 1325 | | |
1321 | | - | |
| 1326 | + | |
1322 | 1327 | | |
1323 | | - | |
1324 | | - | |
1325 | | - | |
| 1328 | + | |
| 1329 | + | |
| 1330 | + | |
1326 | 1331 | | |
1327 | 1332 | | |
1328 | | - | |
1329 | | - | |
| 1333 | + | |
| 1334 | + | |
1330 | 1335 | | |
1331 | | - | |
| 1336 | + | |
1332 | 1337 | | |
1333 | 1338 | | |
1334 | 1339 | | |
1335 | 1340 | | |
1336 | | - | |
1337 | | - | |
1338 | 1341 | | |
1339 | | - | |
1340 | | - | |
1341 | | - | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
1342 | 1349 | | |
1343 | | - | |
1344 | | - | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
1345 | 1361 | | |
1346 | | - | |
1347 | | - | |
1348 | | - | |
1349 | | - | |
| 1362 | + | |
| 1363 | + | |
1350 | 1364 | | |
1351 | 1365 | | |
1352 | 1366 | | |
1353 | 1367 | | |
1354 | 1368 | | |
1355 | | - | |
1356 | | - | |
| 1369 | + | |
| 1370 | + | |
1357 | 1371 | | |
1358 | 1372 | | |
1359 | | - | |
1360 | | - | |
| 1373 | + | |
| 1374 | + | |
1361 | 1375 | | |
1362 | 1376 | | |
1363 | | - | |
| 1377 | + | |
1364 | 1378 | | |
1365 | 1379 | | |
| 1380 | + | |
1366 | 1381 | | |
1367 | 1382 | | |
1368 | | - | |
1369 | | - | |
1370 | | - | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
1371 | 1386 | | |
1372 | 1387 | | |
1373 | 1388 | | |
| |||
1381 | 1396 | | |
1382 | 1397 | | |
1383 | 1398 | | |
1384 | | - | |
| 1399 | + | |
1385 | 1400 | | |
1386 | 1401 | | |
1387 | 1402 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1104 | 1104 | | |
1105 | 1105 | | |
1106 | 1106 | | |
1107 | | - | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
1108 | 1112 | | |
1109 | | - | |
1110 | | - | |
1111 | | - | |
1112 | | - | |
1113 | | - | |
1114 | | - | |
1115 | | - | |
1116 | | - | |
1117 | | - | |
1118 | | - | |
1119 | | - | |
1120 | | - | |
1121 | | - | |
1122 | | - | |
1123 | | - | |
1124 | | - | |
1125 | | - | |
1126 | | - | |
1127 | | - | |
1128 | | - | |
1129 | | - | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
1130 | 1132 | | |
1131 | 1133 | | |
1132 | 1134 | | |
| |||
0 commit comments