[BUG] Sentence context greater than 512 character

I tried to correct spelling mistakes in a large text.

```py
import spacy
import contextualSpellCheck

spacy_nlp = spacy.load(
    'en_core_web_sm',
    # disable=['ner']
    disable=['parser', 'ner'] # disable extra componens for efficiency
)
contextualSpellCheck.add_to_pipe(spacy_nlp)

corpus_spacy = [spacy_nlp(doc) for doc in corpus_raw]
```

At first, I faced this error:
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: `nlp.add_pipe('sentencizer')`. Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting `doc[i].is_sent_start`.

So, I added the `sentencizer` component to the pipeline.

```py
import spacy
import contextualSpellCheck

spacy_nlp = spacy.load(
    'en_core_web_sm',
    # disable=['ner']
    disable=['parser', 'ner'] # disable extra componens for efficiency
)
spacy_nlp.add_pipe('sentencizer')
contextualSpellCheck.add_to_pipe(spacy_nlp)

corpus_spacy = [spacy_nlp(doc) for doc in corpus_raw]
```

This time I faced this error:
RuntimeError: The expanded size of the tensor (837) must match the existing size (512) at non-singleton dimension 1.  Target sizes: [1, 837].  Tensor sizes: [1, 512]

I guess this is due to the limitations of BERT. However, I believe that there should be a way to catch this error and bypass the spell check.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Sentence context greater than 512 character #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] Sentence context greater than 512 character #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions