Skip to content

V5 optimize setof function performance ? #2724

@roytan883

Description

@roytan883

Feature description

As mentioned here: custom-queries
LIMIT/OFFSET pagination has performance issue when over 100,000 records.

In V4, we implement our custom setof function in such a complicated way.

I hope the V5 can provide some kind of framework to create optimized setof function.

Motivating example

Graphile is very powerful, and performance is very fast.
But the setof function is some kind of hidden bomb.
In our recent project, development stage is good.
But after deploy to production, import 1,000,000 real data records.
The setof function become very slow.

we have tested it, OFFSET 900,000 records limit 10, need over 15s. It is very slow and unacceptable.
to solve it, currently, we implement our custom-queries by a very complicated way.

here is out V4 solution:

Our V4 custom-queries

As the doc says: One way to solve this is to have your function apply its own internal limits and filters which can be exposed as GraphQL field arguments - if you reduce the amount of data that the function can produce (e.g. to 100 rows) then it reduces the potential cost of having this function in your schema.

The primary goal is keeping the setof function input/output no breaking change.
we implement our custom-queries in such a complicated way:

1, pass args to setof function (as graphile setof function: first, offset, last, after, before)

CREATE OR REPLACE FUNCTION test.fn_test_setof_optimize_aaa(_filters jsonb[] DEFAULT NULL,
                                                           _orders jsonb[] DEFAULT NULL,
                                                           first int DEFAULT NULL,
                                                           "offset" int DEFAULT NULL,
                                                           last int DEFAULT NULL,
                                                           after text DEFAULT NULL,
                                                           before text DEFAULT NULL)
RETURNS test.test_aaa_page_connection -- return type is custom, not setof

2, add hook pluin to make gql Cursor type for after and before

because text type can't be recognize as GQL Cursor type, need a pluin to do this:

module.exports = makeProcessSchemaPlugin(schema => {
  const stringField = schema._typeMap.String
  const fileds = schema._typeMap.Query._fields
  const fnNames = Object.keys(schema._typeMap.Query._fields)
  for (let fnName of fnNames) {
    if (fnName.startsWith('fn')) {
      const item = fileds[fnName]
      const args = item.args
      if (Array.isArray(args)) {
        for (let arg of args) {
          const type = arg.type
          if (['after', 'before'].includes(arg.name)
            && typeof type === 'object' && type !== null
            && typeof stringField === 'object' && stringField !== null
            && type.constructor === stringField.constructor) {
            arg.type = schema._typeMap.Cursor
          }
        }
      }
    }
  }
  return schema;
});

3, add setof context at pg_setting

const gql = require('graphql-tag')

pgSettings: async (req) => {

let qlParsed = gql`${gqlString}`

// recursive function
setContextOpt(gqlParsed, 'definitions[0].selectionSet.selections', [])

// ...
const fieldsToCheck = ['totalCount', 'nodes', 'edges', 'pageInfo']

//...

subSelections.forEach(subSelection => {
  const subSelectionName = _.get(subSelection, 'name.value', '')
  if (fieldsToCheck.includes(subSelectionName)) {
	ret[`opt.setof.${underscoreName}.has${subSelectionName.charAt(0).toUpperCase() + subSelectionName.slice(1)}`] = true
  }
})

//...

// pgSettings output like:
// opt.setof.fn_test_setof_optimize_aaa.hasTotalCount: true
// opt.setof.fn_test_setof_optimize_aaa.hasNodes: true

4, create setof custom return type (actually, this should be step 1)

CREATE TYPE test.type_test_aaa_edge AS (
    cursor varchar,
    node   test.test_aaa
);

CREATE TYPE test.test_aaa_page_connection AS (
    nodes       test.test_aaa[],
    edges       test.type_test_aaa_edge[],
    total_count INT,
    page_info   basic.type_basic_page_info
);

5, finally, optimize the custom function

IF COALESCE(CURRENT_SETTING('opt.setof.' || 'fn_test_setof_optimize_aaa' || '.hasTotalCount', TRUE), '') = 'true' THEN
	-- ...
	-- set total_count in test.test_aaa_page_connection
END IF;

IF COALESCE(CURRENT_SETTING('opt.setof.' || 'fn_test_setof_optimize_aaa' || '.hasNodes', TRUE), '') = 'true'
 OR COALESCE(CURRENT_SETTING('opt.setof.' || 'fn_test_setof_optimize_aaa' || '.hasEdges', TRUE), '') = 'true'
 OR COALESCE(CURRENT_SETTING('opt.setof.' || 'fn_test_setof_optimize_aaa' || '.hasPageInfo', TRUE), '') = 'true'
 THEN
	-- ...
	-- use offset and first, optimize use index, two steps query, only return need records, like 10 records
	-- use after and first, fast cursor location query, best performance
	-- create own cursor text for each edge
	-- get one more to check hasNextPage
END IF;

Also, need other optimize _orders and _filter. and _uniq_key to create cursor. and Multicolumn Indexes for performance.

After those optimization, we success reduce the function time from 15s to 50~500ms. And client no need change any code.

suggest V5 solution

No need step 1~4, just use smart tag, only need implement step 5 pg function.
like this smart tags:

CREATE OR REPLACE FUNCTION test.fn_test_aa_bb_cc(...)
RETURNS jsonb -- include: total_count, nodes, edges, page_info

COMMENT ON FUNCTION test.fn_test_aa_bb_cc  IS
    E'@setofFnOpt\n@setofFnOptOutType test.test_table_kkk';

one more question

is it possible implement it by smart tag using plugin do all stuff in V4? currently, this solution use hook plugin, pgSetting function, pg custom type, seems not elegant.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🌳 Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions