Skip to content

[FEA] Add case-insensitive options and additional string predicates to GFQL #697

@lmeyerov

Description

@lmeyerov

name: Feature request
about: Suggest an idea for this project
title: "[FEA] Add case-insensitive options and additional string predicates to GFQL"
labels: enhancement, gfql, help wanted, good-first-issue
assignees: ''


Is your feature request related to a problem? Please describe.

GFQL's string predicates have inconsistent support for case sensitivity and are missing some common string operations that users expect from a query language. Currently:

  • startswith() and endswith() are always case-sensitive with no option to change this
  • Several useful string predicates available in pandas are not exposed in GFQL
  • The documentation doesn't clearly indicate which predicates support case-insensitive matching

Describe the solution you'd like

  1. Add case parameter to startswith() and endswith()

    # Current (case-sensitive only)
    n({"name": startswith("John")})
    
    # Proposed (with optional case parameter)
    n({"name": startswith("john", case=False)})  # Would match "John", "JOHN", etc.
    n({"email": endswith(".COM", case=False)})   # Would match ".com", ".Com", etc.
  2. Add commonly needed string predicates

    # Length checking
    n({"name": len_eq(5)})         # Exactly 5 characters
    n({"name": len_gt(10)})        # More than 10 characters
    n({"name": len_between(3, 20)}) # Between 3 and 20 characters
    
    # Strip/trim operations  
    n({"name": strip_eq("John")})   # Strip whitespace before comparing
    
    # Not/inverse operations
    n({"name": not_contains("test")})  # Doesn't contain pattern
    n({"name": not_match("^[0-9]+$")}) # Doesn't match regex
  3. Consider adding fuzzy/similarity matching (stretch goal)

    n({"name": similar_to("Jon", threshold=0.8)})  # Fuzzy string matching

Describe alternatives you've considered

  1. Query strings - Users can use pandas query syntax, but this is less discoverable and not type-safe
  2. Custom predicates - Users can implement their own, but built-in support would be better
  3. Post-processing - Filter results after the query, but this is less efficient

Additional context

  • The implementation should follow the existing pattern in graphistry/compute/predicates/str.py
  • All new predicates should inherit from ASTPredicate and delegate to pandas string methods
  • Tests should be added to ensure compatibility with both pandas and cuDF backends
  • Documentation should be updated in docs/source/gfql/spec/language.md

Implementation hints

For case-insensitive startswith/endswith, the implementation could use pandas' case conversion:

def startswith(self, pat, case=True, na=None):
    if not case:
        return self._series.str.lower().str.startswith(pat.lower(), na=na)
    return self._series.str.startswith(pat, na=na)

This would be a great first issue for someone familiar with Python and pandas who wants to contribute to GFQL!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions