-
Notifications
You must be signed in to change notification settings - Fork 217
Description
name: Feature request
about: Suggest an idea for this project
title: "[FEA] Add case-insensitive options and additional string predicates to GFQL"
labels: enhancement, gfql, help wanted, good-first-issue
assignees: ''
Is your feature request related to a problem? Please describe.
GFQL's string predicates have inconsistent support for case sensitivity and are missing some common string operations that users expect from a query language. Currently:
startswith()
andendswith()
are always case-sensitive with no option to change this- Several useful string predicates available in pandas are not exposed in GFQL
- The documentation doesn't clearly indicate which predicates support case-insensitive matching
Describe the solution you'd like
-
Add case parameter to startswith() and endswith()
# Current (case-sensitive only) n({"name": startswith("John")}) # Proposed (with optional case parameter) n({"name": startswith("john", case=False)}) # Would match "John", "JOHN", etc. n({"email": endswith(".COM", case=False)}) # Would match ".com", ".Com", etc.
-
Add commonly needed string predicates
# Length checking n({"name": len_eq(5)}) # Exactly 5 characters n({"name": len_gt(10)}) # More than 10 characters n({"name": len_between(3, 20)}) # Between 3 and 20 characters # Strip/trim operations n({"name": strip_eq("John")}) # Strip whitespace before comparing # Not/inverse operations n({"name": not_contains("test")}) # Doesn't contain pattern n({"name": not_match("^[0-9]+$")}) # Doesn't match regex
-
Consider adding fuzzy/similarity matching (stretch goal)
n({"name": similar_to("Jon", threshold=0.8)}) # Fuzzy string matching
Describe alternatives you've considered
- Query strings - Users can use pandas query syntax, but this is less discoverable and not type-safe
- Custom predicates - Users can implement their own, but built-in support would be better
- Post-processing - Filter results after the query, but this is less efficient
Additional context
- The implementation should follow the existing pattern in
graphistry/compute/predicates/str.py
- All new predicates should inherit from
ASTPredicate
and delegate to pandas string methods - Tests should be added to ensure compatibility with both pandas and cuDF backends
- Documentation should be updated in
docs/source/gfql/spec/language.md
Implementation hints
For case-insensitive startswith/endswith, the implementation could use pandas' case conversion:
def startswith(self, pat, case=True, na=None):
if not case:
return self._series.str.lower().str.startswith(pat.lower(), na=na)
return self._series.str.startswith(pat, na=na)
This would be a great first issue for someone familiar with Python and pandas who wants to contribute to GFQL!