Skip to content

Commit 9b50d8d

Browse files
anuunchinrudolfix
andauthored
Minor adjsutments to cursor rules (#630)
* Minor adjsutments * fix typing --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
1 parent 3269931 commit 9b50d8d

File tree

9 files changed

+26
-32
lines changed

9 files changed

+26
-32
lines changed

ai/cursor/.cursor/rules/build-rest-api.mdc

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,32 @@
11
---
2-
description:
2+
description: Crucial guidelines to build a dlt rest api source
33
globs:
44
alwaysApply: true
55
---
66
## Prerequisities to writing a source
77

8-
1. VERY IMPORTANT. When writing new source, you should have example available in rest_api_pipeline.py file. Use github rest api source for the general structure of the code. If you do not see this file, ask user to add it
9-
2. Recall OpenAPI spec. You will need to figure out the same information that OpenAPI spec contains on each API.
8+
1. VERY IMPORTANT. When writing a new source, you should have an example available in the rest_api_pipeline.py file.
9+
Use this example or the github rest api source example from dlt's documentation on rest api for the general structure of the code. If you do not see this file rest_api_pipeline.py, ask the user to add it
10+
2. Recall OpenAPI spec. You will figure out the same information that the OpenAPI spec contains for each API.
1011
3. In particular:
1112
- API base url
1213
- type of authentication
1314
- list of endpoints with method GET (you can read data for those)
14-
4. You will need to figure additional information that is required for successful data extraction
15+
4. You will figure out additional information that is required for successful data extraction
1516
- type of pagination
1617
- if data from an endpoint can be loaded incrementally
1718
- unwrapping end user data from a response
1819
- write disposition of the endpoint: append, replace, merge
1920
- in case of merge, you need to find primary key that can be compound
20-
5. Some endpoints take data from other endpoints. For example `comments` endpoint needs `post id` to get list of comments per particular post. You'll need to figure out such connections
21-
6. **ASK USER IF YOU MISS CRUCIAL INFORMATION** You should make sure user provided you with enough information to figure out the above. Below are the most common possibilities
21+
5. Some endpoints take data from other endpoints. For example, in the github rest api source example from dlt's documentation, the `comments` endpoint needs `post id` to get the list of comments per particular post. You'll need to figure out such connections
22+
6. **ASK USER IF YOU MISS CRUCIAL INFORMATION** You will make sure the user has provided you with enough information to figure out the above. Below are the most common possibilities
2223
- open api spec (file or link)
2324
- any other api definition, for example Airbyte low code yaml
2425
- a source code in Python, java or c# of such connector or API client
2526
- a documentation of the api or endpoint
2627
7. In case you find more than 10 endpoints and you do not get instructions which you should add to the source, ask user.
27-
8. please make sure you use right pagination and you use exactly the arguments that are available in pagination guide. do not try to guess anything. remember that we have many paginator types that are configured differently
28-
9. When creating pipeline instance add progress="log" as parameter
28+
8. Make sure you use the right pagination and use exactly the arguments that are available in the pagination guide. do not try to guess anything. remember that we have many paginator types that are configured differently
29+
9. When creating pipeline instance add progress="log" as parameter `pipeline = dlt.pipeline(..., progress="log")`
2930
10. When fixing a bug report focus only on a single cause. ie. incremental, pagination or authentication or wrong dict fields
3031
11. You should have references for paginator types, authenticator types and general reference for rest api in you context. **DO NOT GUESS. DO NOT INVENT CODE. YOU SHOULD HAVE DOCUMENTATION FOR EVERYTHING YOU NEED. IF NOT - ASK USER**
3132

ai/cursor/.cursor/rules/dlt.mdc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description:
2+
description: Information about dlt
33
globs:
44
alwaysApply: true
55
---

ai/cursor/.cursor/rules/rest_api_extract_parameters.mdc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: This rule helps identify and extract ALL necessary parameters from API documentation to build a dlt REST API source
33
globs:
4-
alwaysApply: false
4+
alwaysApply: true
55
---
66
# REST API Parameter Extraction Guide
77

ai/cursor/.cursor/rules/rest_api_pagination.mdc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: Use this rule when writing REST API Source to configure right pagination type for an Endpoint
33
globs:
4-
alwaysApply: false
4+
alwaysApply: true
55
---
66

77
# dlt REST API Pagination Configuration Guide

sources/hubspot/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ def crm_objects(
134134

135135

136136
def crm_object_history(
137-
object_type: THubspotObjectType,
137+
object_type: str,
138138
api_key: str,
139139
props: Optional[Sequence[str]] = None,
140140
include_custom_props: bool = True,
@@ -143,7 +143,7 @@ def crm_object_history(
143143
Fetch the history of property changes for a given CRM object type.
144144
145145
Args:
146-
object_type (THubspotObjectType): Type of HubSpot object (e.g., 'company', 'contact').
146+
object_type (str): Type of HubSpot object (e.g., 'company', 'contact').
147147
api_key (str, optional): API key for HubSpot authentication.
148148
props (Optional[Sequence[str]], optional): List of properties to retrieve. Defaults to None.
149149
include_custom_props (bool, optional): Include custom properties in the result. Defaults to True.
@@ -356,7 +356,7 @@ def pipelines_for_objects(
356356
Iterator[DltResource]: dlt resources for pipelines and stages.
357357
"""
358358

359-
def get_pipelines(object_type: THubspotObjectType) -> Iterator[TDataItems]:
359+
def get_pipelines(object_type: str) -> Iterator[TDataItems]:
360360
yield from fetch_data(
361361
CRM_PIPELINES_ENDPOINT.format(objectType=object_type),
362362
api_key=api_key_inner,

sources/jira/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def jira(
3333
res_function = dlt.resource(
3434
get_paginated_data, name=endpoint_name, write_disposition="replace"
3535
)(
36-
**endpoint_parameters,
36+
**endpoint_parameters, # type: ignore[arg-type]
3737
subdomain=subdomain,
3838
email=email,
3939
api_token=api_token,

sources/pipedrive/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,8 @@ def pipedrive_source(
9494
name="deals_flow", write_disposition="merge", primary_key="id"
9595
)(_get_deals_flow)(pipedrive_api_key)
9696

97-
yield leads(pipedrive_api_key, update_time=since_timestamp)
97+
# if simple value is passed in place of incremental, it will be used as initial value
98+
yield leads(pipedrive_api_key, update_time=since_timestamp) # type: ignore[arg-type]
9899

99100

100101
def _get_deals_flow(

sources/unstructured_data/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import asyncio
44
import os
5-
from typing import Dict
5+
from typing import Dict, Optional
66

77
import dlt
88
from dlt.common import logger
@@ -18,7 +18,7 @@
1818

1919
@dlt.resource
2020
def unstructured_to_structured_resource(
21-
queries: Dict[str, str] = INVOICE_QUERIES,
21+
queries: Optional[Dict[str, str]] = dlt.config.value,
2222
openai_api_key: str = dlt.secrets.value,
2323
vectorstore: str = "chroma",
2424
table_name: str = "unstructured_to_structured_resource",
@@ -43,6 +43,8 @@ def unstructured_to_structured_resource(
4343
"""
4444
if openai_api_key:
4545
os.environ["OPENAI_API_KEY"] = openai_api_key
46+
if queries is None:
47+
queries = dict(INVOICE_QUERIES)
4648

4749
return dlt.transformer(
4850
convert_data,

uv.lock

Lines changed: 4 additions & 14 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)