You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🔨 **test(test_real_detection): Raise error for non-memory model load failures**
Update test to ensure non-memory errors raise `DetectError` instead of falling back to a smaller model.
📝 **docs(README): Clarify fallback behavior and add language code mapping guide**
Explain conditions for model fallback and provide guidance on mapping language codes to English names using `langcodes`.
Copy file name to clipboardExpand all lines: README.md
+49Lines changed: 49 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -166,6 +166,55 @@ print(detector.detect("Some very long text..."))
166
166
- When truncation happens, a WARNING is logged because it may reduce accuracy.
167
167
-`max_input_length=80` truncates overly long inputs; set `None` to disable if you prefer no truncation.
168
168
169
+
### Fallback Behavior
170
+
171
+
- As of the latest change, the library only falls back to the bundled small model when a MemoryError occurs while loading the large model.
172
+
- For other errors (e.g., I/O/permission errors, corrupted files, invalid paths), the error is raised as `DetectError` so you can diagnose the root cause quickly.
173
+
- This avoids silently masking real issues and prevents unnecessary re-downloads that can slow execution.
174
+
175
+
### Language Codes → English Names
176
+
177
+
The detector returns fastText language codes (e.g., `en`, `zh`, `ja`, `pt-br`). To present user-friendly names, you can map codes to English names using a third-party library. Example using `langcodes`:
178
+
179
+
```python
180
+
# pip install langcodes
181
+
from langcodes import Language
182
+
183
+
OVERRIDES= {
184
+
# fastText-specific or variant tags commonly used
185
+
"yue": "Cantonese",
186
+
"wuu": "Wu Chinese",
187
+
"arz": "Egyptian Arabic",
188
+
"ckb": "Central Kurdish",
189
+
"kab": "Kabyle",
190
+
"zh-cn": "Chinese (China)",
191
+
"zh-tw": "Chinese (Taiwan)",
192
+
"pt-br": "Portuguese (Brazil)",
193
+
}
194
+
195
+
defcode_to_english_name(code: str) -> str:
196
+
code = code.replace("_", "-").lower()
197
+
if code inOVERRIDES:
198
+
returnOVERRIDES[code]
199
+
try:
200
+
# Display name in English; e.g. 'Portuguese (Brazil)'
201
+
return Language.get(code).display_name("en")
202
+
exceptException:
203
+
# Try the base language (e.g., 'pt' from 'pt-br')
204
+
base = code.split("-")[0]
205
+
try:
206
+
return Language.get(base).display_name("en")
207
+
exceptException:
208
+
return code
209
+
210
+
# Usage
211
+
from fast_langdetect import detect
212
+
result = detect("Olá mundo", low_memory=False)
213
+
print(code_to_english_name(result["lang"])) # Portuguese (Brazil) or Portuguese
214
+
```
215
+
216
+
Alternatively, `pycountry` can be used for ISO 639 lookups (install with `pip install pycountry`), combined with a small override dict for non-standard tags like `pt-br`, `zh-cn`, `yue`, etc.
0 commit comments