Skip to content

Commit d802779

Browse files
committed
Continue rules (#637)
* Initial commit with rules * IDE rules in separate folders * Message file copied, contributing.md udpated
1 parent 9b50d8d commit d802779

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+6703
-24
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
name: test transpiled rules
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- master
7+
workflow_dispatch:
8+
9+
jobs:
10+
11+
test_transpiled_rules:
12+
name: Check that transpiled rules are up to date
13+
runs-on: ubuntu-latest
14+
15+
steps:
16+
17+
- name: Check out
18+
uses: actions/checkout@v4
19+
20+
- name: Install uv
21+
uses: astral-sh/setup-uv@v4
22+
with:
23+
python-version: "3.9"
24+
25+
- name: Install dlt
26+
run: uv run pip install dlt
27+
28+
- name: Setup Node.js
29+
uses: actions/setup-node@v4
30+
with:
31+
node-version: '18'
32+
33+
- name: Install rules-cli
34+
run: npm install -g rules-cli
35+
36+
- name: Run transpile-rules
37+
run: make transpile-rules
38+
39+
- name: Check for changes
40+
run: |
41+
# Add any new files that might have been generated
42+
git add -A
43+
44+
# Check if there are any differences
45+
if ! git diff --staged --quiet; then
46+
echo "❌ Transpiled rules are out of date!"
47+
echo ""
48+
echo "The following files have changes:"
49+
git diff --staged --name-only
50+
echo ""
51+
echo "Please run 'make transpile-rules' locally and commit the changes."
52+
exit 1
53+
else
54+
echo "✅ Transpiled rules are up to date!"
55+
fi
56+
57+
- name: Run dlt ai command test
58+
run: uv run pytest tests/test_dlt_ai.py

CONTRIBUTING.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,9 +100,19 @@ your source will be distributed to other users once it is accepted into our repo
100100

101101
## Walkthrough: Modify or add rules files for LLM-enabled IDEs
102102
In this section, you will learn how to contribute rules files.
103-
1. Follow the [coding prerequisites](#coding-prerequisites) to setup the repository
104-
2. On your branch, add or modify rules files under the `/ai` directory
105-
3. Verify that the rules are properly formatted and work with the target IDE.
103+
104+
### 1. How the `ai/` directory works
105+
106+
1. The `.rules` folder is the master folder from which the files are rendered and copied into the respective IDE folders with the appropriate structure and format.
107+
2. Rendering and copying of the files are handled by the `make transpile-rules` command.
108+
3. The make transpile-rules command relies on Continue's [rules CLI](https://github.com/continuedev/rules).
109+
110+
### 1. How to modify or add rules
111+
112+
1. Follow the [coding prerequisites](#coding-prerequisites) to setup the repository.
113+
2. On your branch, add or modify rules files under the `/ai/.rules/` directory.
114+
3. Verify that the rules are properly formatted.
115+
4. Run `make transpile-rules` to update the respective IDE folders.
106116
4. Proceed to the pull request section to [create a pull request to the main repo](#making-a-pull-request-to-the-main-repo-from-fork). Please explain for what use cases these rules are useful and share what IDE version you're using.
107117

108118
## Coding Prerequisites

Makefile

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,18 @@ format:
3232

3333
format-lint: format lint
3434

35+
transpile-rules:
36+
cd ai && \
37+
uv run rules render claude && mkdir -p claude && mv CLAUDE.md claude/ && cp .rules/.message claude/ && \
38+
uv run rules render amp && mkdir -p amp && mv AGENT.md amp/ && cp .rules/.message amp/ && \
39+
uv run rules render codex && mkdir -p codex && mv AGENT.md codex/ && cp .rules/.message codex/ && \
40+
uv run rules render cody && mkdir -p cody && cp -r .sourcegraph cody/ && rm -rf .sourcegraph && cp .rules/.message cody/ && \
41+
uv run rules render cline && mkdir -p cline && cp -r .clinerules cline/ && rm -rf .clinerules && cp .rules/.message cline/ && \
42+
uv run rules render cursor && mkdir -p cursor && cp -r .cursor cursor/ && rm -rf .cursor && cp .rules/.message cursor/ && \
43+
uv run rules render continue && mkdir -p continue && cp -r .continue continue/ && rm -rf .continue && cp .rules/.message continue/ && \
44+
uv run rules render windsurf && mkdir -p windsurf && cp -r .windsurf windsurf/ && rm -rf .windsurf && cp .rules/.message windsurf/ && \
45+
uv run rules render copilot && mkdir -p copilot && cp -r .github copilot/ && rm -rf .github && cp .rules/.message copilot/ \
46+
3547
test:
3648
uv run pytest tests
3749

ai/.rules/.message

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
2+
This command and rule-set is a work in progress. Currently we provide rules specific to one workflow:
3+
creating REST API sources and pipeline from legacy code, OpenAPI specs, REST API documentation but also from scratch.

ai/.rules/build-rest-api.md

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
---
2+
globs:
3+
description: Crucial guidelines to build a dlt rest api source
4+
alwaysApply: true
5+
---
6+
## Prerequisities to writing a source
7+
8+
1. VERY IMPORTANT. When writing a new source, you should have an example available in the rest_api_pipeline.py file.
9+
Use this example or the github rest api source example from dlt's documentation on rest api for the general structure of the code. If you do not see this file rest_api_pipeline.py, ask the user to add it
10+
2. Recall OpenAPI spec. You will figure out the same information that the OpenAPI spec contains for each API.
11+
3. In particular:
12+
- API base url
13+
- type of authentication
14+
- list of endpoints with method GET (you can read data for those)
15+
4. You will figure out additional information that is required for successful data extraction
16+
- type of pagination
17+
- if data from an endpoint can be loaded incrementally
18+
- unwrapping end user data from a response
19+
- write disposition of the endpoint: append, replace, merge
20+
- in case of merge, you need to find primary key that can be compound
21+
5. Some endpoints take data from other endpoints. For example, in the github rest api source example from dlt's documentation, the `comments` endpoint needs `post id` to get the list of comments per particular post. You'll need to figure out such connections
22+
6. **ASK USER IF YOU MISS CRUCIAL INFORMATION** You will make sure the user has provided you with enough information to figure out the above. Below are the most common possibilities
23+
- open api spec (file or link)
24+
- any other api definition, for example Airbyte low code yaml
25+
- a source code in Python, java or c# of such connector or API client
26+
- a documentation of the api or endpoint
27+
7. In case you find more than 10 endpoints and you do not get instructions which you should add to the source, ask user.
28+
8. Make sure you use the right pagination and use exactly the arguments that are available in the pagination guide. do not try to guess anything. remember that we have many paginator types that are configured differently
29+
9. When creating pipeline instance add progress="log" as parameter `pipeline = dlt.pipeline(..., progress="log")`
30+
10. When fixing a bug report focus only on a single cause. ie. incremental, pagination or authentication or wrong dict fields
31+
11. You should have references for paginator types, authenticator types and general reference for rest api in you context. **DO NOT GUESS. DO NOT INVENT CODE. YOU SHOULD HAVE DOCUMENTATION FOR EVERYTHING YOU NEED. IF NOT - ASK USER**
32+
33+
34+
## Look for Required Client Settings
35+
When scanning docs or legacy code, first extract the API-level configuration including:
36+
37+
Base URL:
38+
• The API's base URL (e.g. "https://api.pipedrive.com/").
39+
40+
Authentication:
41+
• The type of authentication used (commonly "api_key" or "bearer").
42+
• The name/key (e.g. "api_token") and its placement (usually in the query).
43+
• Use secrets (e.g. dlt.secrets["api_token"]) to keep credentials secure.
44+
45+
Headers (optional):
46+
• Check if any custom headers are required.
47+
48+
## Authentication Methods
49+
Configure the appropriate authentication method:
50+
51+
API Key Authentication:
52+
```python
53+
"auth": {
54+
"type": "api_key",
55+
"name": "api_key",
56+
"api_key": dlt.secrets["api_key"],
57+
"location": "query" # or "header"
58+
}
59+
```
60+
61+
Bearer Token Authentication:
62+
```python
63+
"auth": {
64+
"type": "bearer",
65+
"token": dlt.secrets["bearer_token"]
66+
}
67+
```
68+
69+
Basic Authentication:
70+
```python
71+
"auth": {
72+
"type": "basic",
73+
"username": dlt.secrets["username"],
74+
"password": dlt.secrets["password"]
75+
}
76+
```
77+
78+
OAuth2 Authentication:
79+
```python
80+
"auth": {
81+
"type": "oauth2",
82+
"token_url": "https://auth.example.com/oauth/token",
83+
"client_id": dlt.secrets["client_id"],
84+
"client_secret": dlt.secrets["client_secret"],
85+
"scopes": ["read", "write"]
86+
}
87+
```
88+
89+
## Find right pagination type
90+
These are the available paginator types to be used in `paginator` field of `endpoint`:
91+
92+
* `json_link`: The link to the next page is in the body (JSON) of the response
93+
* `header_link`: The links to the next page are in the response headers
94+
* `offset`: The pagination is based on an offset parameter, with the total items count either in the response body or explicitly provided
95+
* `page_number`: The pagination is based on a page number parameter, with the total pages count either in the response body or explicitly provided
96+
* `cursor`: The pagination is based on a cursor parameter, with the value of the cursor in the response body (JSON)
97+
* `single_page`: The response will be interpreted as a single-page response, ignoring possible pagination metadata
98+
99+
100+
## Different Paginations per Endpoint are possible
101+
When analyzing the API documentation, carefully check for multiple pagination strategies:
102+
103+
• Different Endpoint Types:
104+
- Some endpoints might use cursor-based pagination
105+
- Others might use offset-based pagination
106+
- Some might use page-based pagination
107+
- Some might use link-based pagination
108+
109+
• Documentation Analysis:
110+
- Look for sections describing different pagination methods
111+
- Check if certain endpoints have special pagination requirements
112+
- Verify if pagination parameters differ between endpoints
113+
- Look for examples showing different pagination patterns
114+
115+
• Implementation Strategy:
116+
- Configure pagination at the endpoint level rather than globally
117+
- Use the appropriate paginator type for each endpoint
118+
- Document which endpoints use which pagination strategy
119+
- Test pagination separately for each endpoint type
120+
121+
## Select the right data from the response
122+
In each endpoint the interesting data (typically an array of objects) may be wrapped
123+
differently. You can unwrap this data by using `data_selector`
124+
125+
Data Selection Patterns:
126+
```python
127+
"endpoint": {
128+
"data_selector": "data.items.*", # Basic array selection
129+
"data_selector": "data.*.items", # Nested array selection
130+
"data_selector": "data.{id,name,created_at}", # Field selection
131+
}
132+
```
133+
134+
## Resource Defaults & Endpoint Details
135+
Ensure that the default settings applied across all resources are clearly delineated:
136+
137+
Defaults:
138+
• Specify the default primary key (e.g., "id").
139+
• Define the write disposition (e.g., "merge").
140+
• Include common endpoint parameters (for example, a default limit value like 50).
141+
142+
Resource-Specific Configurations:
143+
• For each resource, extract the endpoint path, method, and any additional query parameters.
144+
• If incremental loading is supported, include the minimal incremental configuration (using fields like "start_param", "cursor_path", and "initial_value"), but try to keep it within the REST API config portion.
145+
146+
## Incremental Loading Configuration
147+
Configure incremental loading for efficient data extraction. Your task is to get only new data from
148+
the endpoint.
149+
150+
Typically you will identify query parameter that allows to get items that are newer than certain date:
151+
152+
```py
153+
{
154+
"path": "posts",
155+
"data_selector": "results",
156+
"params": {
157+
"created_since": "{incremental.start_value}", # Uses cursor value in query parameter
158+
},
159+
"incremental": {
160+
"cursor_path": "created_at",
161+
"initial_value": "2024-01-25T00:00:00Z",
162+
},
163+
}
164+
```
165+
166+
167+
## End to end example
168+
Below is an annotated template that illustrates how your output should look. Use it as a reference to guide your extraction:
169+
170+
```python
171+
import dlt
172+
from dlt.sources.rest_api import rest_api_source
173+
174+
# Build the REST API config with cursor-based pagination
175+
source = rest_api_source({
176+
"client": {
177+
"base_url": "https://api.pipedrive.com/", # Extract this from the docs/legacy code
178+
"auth": {
179+
"type": "api_key", # Use the documented auth type
180+
"name": "api_token",
181+
"api_key": dlt.secrets["api_token"], # Replace with secure token reference
182+
"location": "query" # Typically a query parameter for API keys
183+
}
184+
},
185+
"resource_defaults": {
186+
"primary_key": "id", # Default primary key for resources
187+
"write_disposition": "merge", # Default write mode
188+
"endpoint": {
189+
"params": {
190+
"limit": 50 # Default query parameter for pagination size
191+
}
192+
}
193+
},
194+
"resources": [
195+
{
196+
"name": "deals", # Example resource name extracted from code or docs
197+
"endpoint": {
198+
"path": "v1/recents", # Endpoint path to be appended to base_url
199+
"method": "GET", # HTTP method (default is GET)
200+
"params": {
201+
"items": "deal"
202+
"since_timestamp": "{incremental.start_value}"
203+
},
204+
"data_selector": "data.*", # JSONPath to extract the actual data
205+
"paginator": { # Endpoint-specific paginator
206+
"type": "offset",
207+
"offset": 0,
208+
"limit": 100
209+
},
210+
"incremental": { # Optional incremental configuration
211+
"cursor_path": "update_time",
212+
"initial_value": "2023-01-01 00:00:00"
213+
}
214+
}
215+
}
216+
]
217+
})
218+
219+
if __name__ == "__main__":
220+
pipeline = dlt.pipeline(
221+
pipeline_name="pipedrive_rest",
222+
destination="duckdb",
223+
dataset_name="pipedrive_data"
224+
)
225+
pipeline.run(source)
226+
```
227+
228+
## How to Apply This Rule
229+
Extraction:
230+
• Search both the REST API docs and any legacy pipeline code for all mentions of "cursor" or "pagination".
231+
• Identify the exact keys and JSONPath expressions needed for the cursor field.
232+
• Look for authentication requirements and rate limiting information.
233+
• Identify any dependent resources and their relationships.
234+
• Check for multiple pagination strategies across different endpoints.
235+
236+
Configuration Building:
237+
• Assemble the configuration in a dictionary that mirrors the structure in the example.
238+
• Ensure that each section (client, resource defaults, resources) is as declarative as possible.
239+
• Implement proper state management and incremental loading where applicable.
240+
• Configure rate limiting based on API requirements.
241+
• Configure pagination at the endpoint level when multiple strategies exist.
242+
243+
Verification:
244+
• Double-check that the configuration uses the REST API config keys correctly.
245+
• Verify that no extraneous Python code is introduced.
246+
• Test the configuration with mock responses.
247+
• Verify rate limiting and error handling.
248+
• Test pagination separately for each endpoint type.
249+
250+
Customization:
251+
• Allow for adjustments (like modifying the "initial_value") where incremental loading is desired.
252+
• Customize rate limiting parameters based on API requirements.
253+
• Adjust batch sizes and pagination parameters as needed.
254+
• Implement custom error handling and retry logic where necessary.
255+
• Handle different pagination strategies appropriately.
256+

ai/.rules/dlt.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
globs:
3+
description: Information about dlt
4+
alwaysApply: true
5+
---
6+
7+
# Guidelines
8+
1. dlt means "data load tool". It is an open source Python library installable via `pip install dlt`.
9+
2. To create a new pipeline, use `dlt init <source> <destination>`.
10+
3. The dlt library comes with the `dlt` CLI. Add the `--help` flag to any command to verify its specs.
11+
4. The preferred way to configure dlt (sources, resources, destinations, etc.) is to use `.dlt/config.toml` and `.dlt/secrets.toml`
12+
5. During development, always set `dev_mode=True` when creating a dlt Pipeline. `pipeline = dlt.pipeline(..., dev_mode=True)`. This allows to reset the pipeline's schema and state between iterations.
13+
6. Use type annotations only if you're certain you're properly importing the types.
14+
7. Use dlt's REST API source if loading data from the web.
15+
8. Use dlt's SQL source when loading data from an SQL database or backend.
16+
9. Use dlt's filesystem source if loading data from files (CSV, PDF, Parquet, JSON, and more). This works for local filesystems and cloud buckets (AWS, Azure, GCP, Minio, etc.).

0 commit comments

Comments
 (0)