|
1 | 1 | ---
|
2 |
| -description: |
| 2 | +description: Crucial guidelines to build a dlt rest api source |
3 | 3 | globs:
|
4 | 4 | alwaysApply: true
|
5 | 5 | ---
|
6 | 6 | ## Prerequisities to writing a source
|
7 | 7 |
|
8 |
| -1. VERY IMPORTANT. When writing new source, you should have example available in rest_api_pipeline.py file. Use github rest api source for the general structure of the code. If you do not see this file, ask user to add it |
9 |
| -2. Recall OpenAPI spec. You will need to figure out the same information that OpenAPI spec contains on each API. |
| 8 | +1. VERY IMPORTANT. When writing a new source, you should have an example available in the rest_api_pipeline.py file. |
| 9 | +Use this example or the github rest api source example from dlt's documentation on rest api for the general structure of the code. If you do not see this file rest_api_pipeline.py, ask the user to add it |
| 10 | +2. Recall OpenAPI spec. You will figure out the same information that the OpenAPI spec contains for each API. |
10 | 11 | 3. In particular:
|
11 | 12 | - API base url
|
12 | 13 | - type of authentication
|
13 | 14 | - list of endpoints with method GET (you can read data for those)
|
14 |
| -4. You will need to figure additional information that is required for successful data extraction |
| 15 | +4. You will figure out additional information that is required for successful data extraction |
15 | 16 | - type of pagination
|
16 | 17 | - if data from an endpoint can be loaded incrementally
|
17 | 18 | - unwrapping end user data from a response
|
18 | 19 | - write disposition of the endpoint: append, replace, merge
|
19 | 20 | - in case of merge, you need to find primary key that can be compound
|
20 |
| -5. Some endpoints take data from other endpoints. For example `comments` endpoint needs `post id` to get list of comments per particular post. You'll need to figure out such connections |
21 |
| -6. **ASK USER IF YOU MISS CRUCIAL INFORMATION** You should make sure user provided you with enough information to figure out the above. Below are the most common possibilities |
| 21 | +5. Some endpoints take data from other endpoints. For example, in the github rest api source example from dlt's documentation, the `comments` endpoint needs `post id` to get the list of comments per particular post. You'll need to figure out such connections |
| 22 | +6. **ASK USER IF YOU MISS CRUCIAL INFORMATION** You will make sure the user has provided you with enough information to figure out the above. Below are the most common possibilities |
22 | 23 | - open api spec (file or link)
|
23 | 24 | - any other api definition, for example Airbyte low code yaml
|
24 | 25 | - a source code in Python, java or c# of such connector or API client
|
25 | 26 | - a documentation of the api or endpoint
|
26 | 27 | 7. In case you find more than 10 endpoints and you do not get instructions which you should add to the source, ask user.
|
27 |
| -8. please make sure you use right pagination and you use exactly the arguments that are available in pagination guide. do not try to guess anything. remember that we have many paginator types that are configured differently |
28 |
| -9. When creating pipeline instance add progress="log" as parameter |
| 28 | +8. Make sure you use the right pagination and use exactly the arguments that are available in the pagination guide. do not try to guess anything. remember that we have many paginator types that are configured differently |
| 29 | +9. When creating pipeline instance add progress="log" as parameter `pipeline = dlt.pipeline(..., progress="log")` |
29 | 30 | 10. When fixing a bug report focus only on a single cause. ie. incremental, pagination or authentication or wrong dict fields
|
30 | 31 | 11. You should have references for paginator types, authenticator types and general reference for rest api in you context. **DO NOT GUESS. DO NOT INVENT CODE. YOU SHOULD HAVE DOCUMENTATION FOR EVERYTHING YOU NEED. IF NOT - ASK USER**
|
31 | 32 |
|
|
0 commit comments