You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🔧 refactor(infer.py): integrate max_input_length for auto-truncation and improve parameter deprecation warnings
📚 docs(README.md): update usage examples and add input handling details
🔖 chore(pyproject.toml): bump version to 0.4.0 for new features and improvements
All notable changes to this project will be documented in this file.
4
+
5
+
## [0.4.0] - 2025-09-15
6
+
7
+
- Behavior: Always replace newline characters in input to prevent FastText errors. This adjustment is logged at DEBUG level only.
8
+
- Default input truncation: Truncate inputs to 80 characters by default for stable predictions. Configurable via `LangDetectConfig(max_input_length=...)`; set `None` to disable.
9
+
- Simplified config: Removed previously proposed `verbose` and `replace_newlines` options; newline replacement is unconditional and logging of adjustments is controlled by global logger level.
10
+
- Logging: Deprecated-parameter messages lowered from WARNING to INFO to reduce noise.
11
+
- Documentation: README now includes language code → name mapping guidance and an explicit model license note (CC BY-SA 3.0) alongside MIT for code.
@@ -151,6 +147,25 @@ result = detector.detect("Hello world")
151
147
For text splitting based on language, please refer to the [split-lang](https://github.com/DoodleBears/split-lang)
152
148
repository.
153
149
150
+
151
+
### Input Handling
152
+
153
+
You can control log verbosity and input normalization via `LangDetectConfig`:
154
+
155
+
```python
156
+
from fast_langdetect import LangDetectConfig, LangDetector
157
+
158
+
config = LangDetectConfig(
159
+
max_input_length=80, # default: auto-truncate long inputs for stable results
160
+
)
161
+
detector = LangDetector(config)
162
+
print(detector.detect("Some very long text..."))
163
+
```
164
+
165
+
- Newlines are always replaced with spaces to avoid FastText errors (silent, no log).
166
+
- When truncation happens, a WARNING is logged because it may reduce accuracy.
167
+
-`max_input_length=80` truncates overly long inputs; set `None` to disable if you prefer no truncation.
168
+
154
169
## Benchmark 📊
155
170
156
171
For detailed benchmark results, refer
@@ -180,3 +195,12 @@ models
180
195
year={2016}
181
196
}
182
197
```
198
+
199
+
## License 📄
200
+
201
+
- Code: Released under the MIT License (see `LICENSE`).
202
+
- Models: This package uses the pre-trained fastText language identification models (`lid.176.ftz` bundled for offline use and `lid.176.bin` downloaded as needed). These models are licensed under the Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA 3.0) license.
203
+
- Attribution: fastText language identification models by Facebook AI Research. See the fastText docs and license for details:
- Note: If you redistribute or modify the model files, you must comply with CC BY-SA 3.0. Inference usage via this library does not change the license of the model files themselves.
0 commit comments