You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove old, obsolated, deprecated, and experimental code.
26
-
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
27
-
- ThaiNER 1.0
28
-
- Remove sentiment analysis
29
-
- Improved word_tokenize (newmm, mm) and dict_word_tokenize
30
-
- Improved POS-tagging
31
-
- More and improved examples
32
-
- see [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118)
14
+
- Improved `word_tokenize` ("newmm" and "mm" engine), a `custom_dict` dictionary can be provided
15
+
- Improved `pos_tag` Part-Of-Speech tagging
16
+
- New `NorvigSpellChecker` spell checker class, which can be initialized with custom dictionary.
17
+
- New `thai2fit` (replacing `thai2vec`, upgrade ULMFiT-related code to fastai 1.0)
18
+
- Updated ThaiNER to 1.0
19
+
- You may need to [update your existing ThaiNER models from PyThaiNLP 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
20
+
- Remove old, obsolated, deprecated, duplicated, and experimental code.
21
+
- Sentiment analysis is no longer part of the library, but rather [a text classification example](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/sentiment_analysis.ipynb).
22
+
- See more examples in [Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb)
-[Upgrading from 1.7](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
33
25
34
26
## Install
35
27
@@ -62,8 +54,8 @@ Install it with pip, for example: `pip install marisa_trie‑0.7.5‑cp36‑cp36
62
54
63
55
## Links
64
56
65
-
- User guide: [English](https://colab.research.google.com/drive/1MQ10D1mJC5r1vQAHcj4ShoRS14vz8ZF-), [ภาษาไทย](https://colab.research.google.com/drive/1rEkB2Dcr1UAKPqz4bCghZV7pXx2qxf89)
57
+
- User guide: [English](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb), [ภาษาไทย](https://colab.research.google.com/drive/1rEkB2Dcr1UAKPqz4bCghZV7pXx2qxf89)
Copy file name to clipboardExpand all lines: README.md
+52-50Lines changed: 52 additions & 50 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,25 +14,27 @@ Thai Natural Language Processing in Python.
14
14
15
15
PyThaiNLP is a Python package for text processing and linguistic analysis, similar to `nltk` but with focus on Thai language.
16
16
17
-
-[Current PyThaiNLP stable release is 2.0](https://github.com/PyThaiNLP/pythainlp/tree/master)
18
-
- PyThaiNLP 2.0 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
19
-
- Python 2.7+ users can use PyThaiNLP 1.6.
17
+
**This is a document for development branch (post 2.0). Things will break.**
20
18
21
-
**This is a document for development branch (post 2.0). Things will break. For a stable branch document, see [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**
19
+
- The latest stable release is [2.0.4](https://github.com/PyThaiNLP/pythainlp/tree/master)
20
+
- PyThaiNLP 2 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
21
+
-[Upgrading from 1.7](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
22
+
-[Upgrade ThaiNER from 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
23
+
- Python 2.7+ users can use PyThaiNLP 1.6.
22
24
23
25
📫 follow us on Facebook [PyThaiNLP](https://www.facebook.com/pythainlp/)
24
26
25
27
## Capabilities
26
28
27
-
- Convenient character and word classes, like Thai consonants (```pythainlp.thai_consonants```), vowels (```pythainlp.thai_vowels```), digits (```pythainlp.thai_digits```), and stop words (```pythainlp.corpus.thai_stopwords```) -- comparable to constants like ```string.letters```, ```string.digits```, and ```string.punctuation```
28
-
- Thai word segmentation (```word_tokenize```), including subword segmentation based on Thai Character Cluster (```tcc```) and ETCC (```etcc```)
29
-
- Thai romanization and transliteration (```romanize```, ```transliterate```)
30
-
- Thai part-of-speech taggers (```pos_tag```)
31
-
- Read out number to Thai words (```bahttext```, ```num_to_thaiword```)
32
-
- Thai collation (sort by dictionoary order) (```collate```)
- Convenient character and word classes, like Thai consonants (`pythainlp.thai_consonants`), vowels (`pythainlp.thai_vowels`), digits (`pythainlp.thai_digits`), and stop words (`pythainlp.corpus.thai_stopwords`) -- comparable to constants like `string.letters`, `string.digits`, and `string.punctuation`
30
+
- Thai word segmentation (`word_tokenize`), including subword segmentation based on Thai Character Cluster (`subword_tokenize`)
31
+
- Thai transliteration (`transliterate`)
32
+
- Thai part-of-speech taggers (`pos_tag`)
33
+
- Read out number to Thai words (`bahttext`, `num_to_thaiword`)
34
+
- Thai collation (sort by dictionoary order) (`collate`)
- Thai spelling suggestion and correction (`spell` and `correct`)
37
+
- Thai soundex (`soundex`) with three engines (`lk82`, `udom83`, `metasound`)
36
38
- Thai WordNet wrapper
37
39
- and much more - see examples in [PyThaiNLP Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb).
38
40
@@ -60,20 +62,20 @@ For some advanced functionalities, like word vector, extra packages may be neede
60
62
$ pip install pythainlp[extra1,extra2,...]
61
63
```
62
64
63
-
where ```extras``` can be
64
-
-```artagger``` (to support artagger part-of-speech tagger)*
65
-
-```deepcut``` (to support deepcut machine-learnt tokenizer)
66
-
-```icu``` (for ICU support in transliteration and tokenization)
67
-
-```ipa``` (for International Phonetic Alphabet support in transliteration)
68
-
-```ml``` (to support fastai 1.0.22 ULMFiT models)
69
-
-```ner``` (for named-entity recognizer)
70
-
-```thai2fit``` (for Thai word vector)
71
-
-```thai2rom``` (for machine-learnt romanization)
72
-
-```full``` (install everything)
65
+
where `extras` can be
66
+
-`artagger` (to support artagger part-of-speech tagger)*
67
+
-`deepcut` (to support deepcut machine-learnt tokenizer)
68
+
-`icu` (for ICU, International Components for Unicode, support in transliteration and tokenization)
69
+
-`ipa` (for IPA, International Phonetic Alphabet, support in transliteration)
70
+
-`ml` (to support fastai 1.0.22 ULMFiT models)
71
+
-`ner` (for named-entity recognizer)
72
+
-`thai2fit` (for Thai word vector)
73
+
-`thai2rom` (for machine-learnt romanization)
74
+
-`full` (install everything)
73
75
74
-
* Note: standard ```artagger``` package from PyPI will not work on Windows, please ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` instead.
76
+
* Note: standard `artagger` package from PyPI will not work on Windows, please ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` instead.
75
77
76
-
** see ```extras``` and ```extras_require``` in [```setup.py```](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) for package details.
78
+
** see `extras` and `extras_require` in [`setup.py`](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) for package details.
0 commit comments