Added different predictors proc and new AVX2 SAD #45

DTL2020 · 2021-10-16T16:26:30Z

Added different predictors proc and new AVX2 SAD for ExhaustiveSearch

Added new more AVX2 Esa search sp2 and sp4. One pass only without immediate array of SADs.

pinterf · 2021-10-19T14:17:55Z

Does this second commit replace the previous one, so are they doing the same just the new one is quicker/uses less temporary storage?
Really needs human readable C implementation of these black boxes without any tricks inside.

DTL2020 · 2021-10-19T15:19:59Z

Greetings ! Todays commint it is my long awaiting first release of idea about full Esa search inside SIMD register file (at least it only reloads ref lines from cache at V-steps and not produce immediate write traffic - only 1 final output write result). It possibly perspective for future multicores chips because it creates much less write traffic (that cause cache coherence protocol transactions between cores I think). Though it uses a bit more SIMD AVX2 instructions. Also it uses much less intermetiate memory for immediate SADs of all search positions array and it is good step to the future of 'array of blocks vs ref plane' search (need only 1 SAD and 1 x,y memory output per block). Also it skips simple but may be still slow 2 loops search inside 5x5 (or 9x8 for r=4) immediate array for minsad. About final output - I use m256 write instruction because do not found AVX2 for writing 64bits only. As I remember it may be penalty fo using SSE instruction for writing half of SSE-128 register after AVX2-256 - may I wrong. I make it this morning and also debug with SDE - I go to a day work this evening and will have a time at work to test many already comittet tests tomorrow. It was an idea to write you a message about new Esa search functions update after more tests with real 6core AVX2 CPU for both speed and output MDegrainN results and encoding results. I will try to write more results tomorrow till end of day (in eupore). About C-reference - as it is very simple I still not dot it as promise. I hope tomorrow. It all about: Old processing for(dy = -r; dy < +r; dy++) for(dx = -r; dx < +r; dx++) ArraySADs[dx, dy] = CalcSAD(dx, dy); } } +make search for minsad and save minsad_x and minsad_y in C-loops New method: minsad=max_sad_value; xcurrent=0, ycurrent=0; for(dy = -r; dy < +r; dy++) for(dx = -r; dx < +r; dx++) sad = CalcSAD(dx, dy); if (sad < minsad) { minsad=sad; x_minsad=xcurrent; y_minsad=xcurrent; } xcurrent+=1; } ycurrent+=1; } save ymm_reg with x_minsad, y_minsad, minsad and parse.

…

On 10/19/21, Ferenc Pintér ***@***.***> wrote: Does this second commit replace the previous one, so are they doing the same just the new one is quicker/uses less temporary storage? Really needs human readable C implementation of these black boxes without any tricks inside. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #45 (comment)

pinterf · 2021-10-20T07:21:16Z

(Note: Until I upload an .editorconfig file (same folder as .gitignore), can you set your editors to use spaces (not tabs), indent size=2?)
Or you can create your own with the following content, and Visual Studio will read and use it for the project:

root = true

# NOTE: editorconfig-core-c has a hardcoded limitation of 50 chars per .ini section name

[*.{cpp,h,hpp,c,asm}]
indent_style = space
indent_size = 2
trim_trailing_whitespace = true
end_of_line = crlf
insert_final_newline = true

[*.md]
indent_style = space
trim_trailing_whitespace = false

pinterf · 2021-10-20T10:47:33Z

Integrated by cherry pick directly from your repo. Usually this is the way for codes which are not ready-to-merge.
Howto: I'm using VS IDE. From developer command prompt:

git remote add DTLmvtools https://github.com/DTL2020/mvtools.git
git fetch DTLmvtools

then cherry picked from IDE and continued with the integration.

You'll need optsearchoption = 1 for MAnalyze and MRecalculate
My test script resulted in 13,72 fps with, 12.99 fps without the option.

clip=ffms2(film)

ov=4
blksize = 8
blksizev = 8
optSearchOption = 0
chroma=false

sup = MSuper(clip, pel=2)
fv1 = MAnalyse(sup, isb=False, delta=1, overlap=ov, blksize = blksize, blksizev = blksizev, optSearchOption = optSearchOption, chroma=chroma )
bv1 = MAnalyse(sup, isb=True, delta=1, overlap=ov, blksize = blksize, blksizev = blksizev, optSearchOption = optSearchOption, chroma=chroma )
fv2 = MAnalyse(sup, isb=False, delta=2, overlap=ov, blksize = blksize, blksizev = blksizev, optSearchOption = optSearchOption, chroma=chroma )
bv2 = MAnalyse(sup, isb=True, delta=2, overlap=ov, blksize = blksize, blksizev = blksizev, optSearchOption = optSearchOption, chroma=chroma )
fv3 = MAnalyse(sup, isb=False, delta=3, overlap=ov, blksize = blksize, blksizev = blksizev, optSearchOption = optSearchOption, chroma=chroma )
bv3 = MAnalyse(sup, isb=True, delta=3, overlap=ov, blksize = blksize, blksizev = blksizev, optSearchOption = optSearchOption, chroma=chroma )
clip = MDegrain3(clip, sup, bv1, fv1, bv2, fv2, bv3, fv3, thSAD=1000)
clip

Added C-ref functions (uses SAD() as old), fixed bug with ends of ArraySADs for sp2, added prefetch attempt.

DTL2020 · 2021-10-20T17:05:04Z

I uploaded new commint with C-ref functions and some fixes to _avx2. Do it correct or I need to take (fork) latest repository and add new files to it and create new pull request ?
About new full-SIMD minSAD search - unfortunately it still shows worst speed (in compare with ArrayofSADs and C-search of minsad). But I found the minSAD search is also some time-critical part of function so I try to made currently half-SIMD and half-C version. I commit it to separate branch and it need debug. So in the last commit to main branch these new functions removed.
Addition: shame on me - I finally found already existed 'levels' param on MAnalyse() so will try to see how it is need to set to get nLevelsMax to 1 and 2.

DTL2020 · 2021-10-21T10:13:00Z

I delete and fork my repository again to have new copy with a places where to put C-ref functions and corrected _avx2. So need to open new pull-request with new commited changes or it will adds to this ?

DJATOM · 2021-10-21T10:21:36Z

You can reset your commits in SourceTree as shown on the screenshot (or via git cli, but I don't remember exact command)

Then force-push to your fork
Then pull changes from main repo (via github interface)
Then add your new changes
I suggest to make new branches for pull requests, since it's not convenient to pullback your unaccepted commits every time.

DTL2020 · 2021-10-21T10:26:51Z

Oh - I see pinterf already merge my latest commit. The only comment (it looks it not works in web-interface of github ?) - line 803 of PlaneofBlocks_avx2.cpp -

mvtools/Sources/PlaneOfBlocks_avx2.cpp

Line 802 in 0361d4b

    
           ymm3_Ref_67 = _mm256_loadu2_m128i((__m128i*)(pucRef + nRefPitch[0] * (i + 7)), (__m128i*)(pucRef + nRefPitch[0] * (i + 6)));

It looks I or you forgot to add prefetch line:
_mm_prefetch(const_cast<const CHAR*>(reinterpret_cast<const CHAR*>(pucRef + nRefPitch[0] * (i + 8))), _MM_HINT_NTA); // prefetch next Ref row

(same as now in line 114 in sp4 function).

I have more ideas for optimize of Esa_avx2 search (more compacting storage of current and ref data for sad calculation and using of 2 _mm256_sad_epu8() instructions instead of 4 for 8x8 block and may be more attempts to use SIMD minsad instruction instead of 25-points C-loop search). Will try to make tests if it help to speed to the end of Saturday 23.10.2021.

pinterf · 2021-10-21T10:46:39Z

The merge was full manual, since the commits and changes involved different files and coding conventions.
Seems I missed that line. Now that big changes happen almost daily, I'm gonna leave a longer time until merge to have things tested and settle down.

DTL2020 · 2021-10-21T18:59:11Z

(Note: Until I upload an .editorconfig file (same folder as .gitignore), can you set your editors to use spaces (not tabs), indent size=2?) Or you can create your own with the following content, and Visual Studio will read and use it for the project:
root = true

# NOTE: editorconfig-core-c has a hardcoded limitation of 50 chars per .ini section name

[*.{cpp,h,hpp,c,asm}]
indent_style = space
indent_size = 2
trim_trailing_whitespace = true
end_of_line = crlf
insert_final_newline = true

[*.md]
indent_style = space
trim_trailing_whitespace = false

I load all new commits and this new file. But after I commit new edits to separate branch - DTL2020@f23f587 It strangely changes PlaneOfBlocks.cpp file with almost all old deletions and all new additions. Is it correct and as expected ? I use VS2017 at home.

DTL2020 · 2021-10-23T15:54:14Z

Line 206 of PlaneofBlocks.cpp - used and instead of &&

mvtools/Sources/PlaneOfBlocks.cpp

Line 206 in bbc2ac7

if (nBlkSizeX == 8 && nBlkSizeY == 8 and pixelsize == 1 && !chroma) {

It confuses Intel C++ 19.1 compiler. But looks like works with VS2019.

Also found the main issue of the new _avx2 search functions - they work correctly only with nPel=1. Because Pel=2 uses different x,y of vector to memory pointer of start address of each ref block conversion (not contiguous even in Horizontal x-direction). In old functions ExpandingSearch (and in c_ref for _avx2) it is done by calling of GetRefBlock() for each ref block in the search. And current _avx2 functions load all columns of ref plane expecting it is contigous in memory.

So need 1 more check for use. For nPel=2 it looks also possible to make separate set of functions but it need completely different 'gathering' of input data or re-design pel=2 plane storage to be contigous in memory (?).

pinterf · 2021-10-23T16:24:17Z

aaaaa, only noticed for fifth re-read. So yes, there is 'and' instead of && :)
Shame on me, I'm parallelly working on a Delphi project :)

pinterf · 2021-10-23T16:29:23Z

I load all new commits and this new file. But after I commit new edits to separate branch - DTL2020@f23f587 It strangely changes PlaneOfBlocks.cpp file with almost all old deletions and all new additions. Is it correct and as expected ? I use VS2017 at home.

Dunno. Line endings??? CRLF vs LF?

DTL2020 · 2021-10-23T16:55:18Z

I make new commit to main branch - DTL2020@0f3e9d7 . Now the sp2 going to be faster and also for nPel=1 added sp1 using latest method of calculating sad and searching for minimum. Sp4 still old and sp3 need to be added to have all sp 1,2,3,4 filled for nPel=1 at least.

It looks like this commit not listed here. Do I need to close this and open new pull-request ? Also it looks editor again change something in all lines of files and github list almost all lines as changed - do not know what happens.

pinterf · 2021-10-23T20:43:15Z

I'm gonna then cherry-pick again. I've still got your repo as remote so I can easily access your commits.

DTL2020 · 2021-10-24T10:16:51Z

I load all new commits and this new file. But after I commit new edits to separate branch - DTL2020@f23f587 It strangely changes PlaneOfBlocks.cpp file with almost all old deletions and all new additions. Is it correct and as expected ? I use VS2017 at home.

Dunno. Line endings??? CRLF vs LF?

As I see it mostly about line beginnings - almost every line is shifted (to the left or right). May be it re-format all document to
indent_style = space
indent_size = 2 ?

pinterf · 2021-10-24T20:08:29Z

I hope that was the last time when I needed to cherry-pick. Commited a lot, including (hopefully) all your changes, uniformizing spaces and line endings. It was easy when I commited alone but now there must be order.
I think you'll need to re-pull the repo.

DTL2020 · 2021-10-24T20:42:09Z

Yes - it looks most new is integrated.

Some known inconsistency between C-refs and _avx2 Esa search functions: If there are more 1 equal minsad value the C-ref return last found x,y pair and _avx2 looks like returns first (as the instruction _mm_minpos_epu16() internal logic). So if more equal result required - may be make C-ref reverse order scan (from botom right to top left ) ? Though it is rare enough situation and may be not any danger for MDegrainN because if 2 or more blocks have equal sad they possibly can be blended without distortions. But the some 'intelectual logic' may treat as different motion vectors. Also the old Expanding search may be make different result because of different scan order (from center of area to the edges) and _avx2 uses scan from top-left corner. So it also creates different result if 'workarea' contain >1 blocks with completely equal sad value. Also the Expanding Search from center uses 'cost of new minsad' compare at each search step (CheckMV()) and new _avx2 only at the end of full scan of area.

pinterf · 2021-10-25T09:03:29Z

Once I had a week investigating why different runs sometimes yield different output. (Debugging nightmare: we get a slightly different vectors for once in ten thousand frames, happening once in 10 runs)
Well, when avstp is used, some search 'exit' logic depends on which thread (which group of lines of blocks) reaches a specific threshold sooner. The result is not 100% deterministic and cannot be used as a ground truth when I'd compare changes in the code. (I we'd had such unit test it would definitely fail sometimes). Look at my // MAnalyze mt-inconsistency reason comments in PlaneOfBlocks.cpp

DTL2020 added 2 commits October 16, 2021 19:24

Added different predictors proc and new AVX2 SAD

2183927

Added different predictors proc and new AVX2 SAD for ExhaustiveSearch

Added new more AVX2 Esa search sp2 and sp4

c9d7636

Added new more AVX2 Esa search sp2 and sp4. One pass only without immediate array of SADs.

Added C-ref functions, fixed some bug, +

c12a904

Added C-ref functions (uses SAD() as old), fixed bug with ends of ArraySADs for sp2, added prefetch attempt.

Added different predictors proc and new AVX2 SAD #45

Are you sure you want to change the base?

Added different predictors proc and new AVX2 SAD #45

Uh oh!

Conversation

DTL2020 commented Oct 16, 2021

Uh oh!

pinterf commented Oct 19, 2021

Uh oh!

DTL2020 commented Oct 19, 2021 via email

Uh oh!

pinterf commented Oct 20, 2021

Uh oh!

pinterf commented Oct 20, 2021

Uh oh!

DTL2020 commented Oct 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DTL2020 commented Oct 21, 2021

Uh oh!

DJATOM commented Oct 21, 2021

Uh oh!

DTL2020 commented Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pinterf commented Oct 21, 2021

Uh oh!

DTL2020 commented Oct 21, 2021

Uh oh!

DTL2020 commented Oct 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pinterf commented Oct 23, 2021

Uh oh!

pinterf commented Oct 23, 2021

Uh oh!

DTL2020 commented Oct 23, 2021

Uh oh!

pinterf commented Oct 23, 2021

Uh oh!

DTL2020 commented Oct 24, 2021

Uh oh!

pinterf commented Oct 24, 2021

Uh oh!

DTL2020 commented Oct 24, 2021

Uh oh!

pinterf commented Oct 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DTL2020 commented Oct 20, 2021 •

edited

Loading

DTL2020 commented Oct 21, 2021 •

edited

Loading

DTL2020 commented Oct 23, 2021 •

edited

Loading