Skip to content

Add new FAQ entry on --license-text #4476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions docs/source/cli-reference/basic-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -623,8 +623,8 @@
The option ``--license-text-diagnostics`` is a sub-option of and requires the options
``--license`` and ``--license-text``.

In the matched license text, include diagnostic highlights surrounding with square brackets []
words that are not matched.
This adds a new attribute like the matched license text, but includes diagnostic highlights
surrounding with square brackets ``[]`` for words that are not matched.

In a normal scan, whole lines of text are included in the matched license text, including parts
that are possibly unmatched.
Expand All @@ -645,9 +645,14 @@
obtaining a copy of this software and associated documentation files (the \"Software\"),
to deal in the Software without restriction

With Diagnostics on::
With Diagnostics on (new attribute with the matched text diagnostics)::

"matched_text":
"License Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle
(http://www.bouncycastle.org) Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation files (the \"Software\"),
to deal in the Software without restriction
"matched_text_diagnostics":
"License [Copyright] ([c]) [2000] - [2006] [The] [Legion] [Of] [The] [Bouncy] [Castle]
([http]://[www].[bouncycastle].[org]) Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation files (the \"Software\"),
Expand Down
50 changes: 50 additions & 0 deletions docs/source/misc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,53 @@ When scanning binaries, the line numbers are just a relative indication of where
a detection was found: there is no such thing as lines in a binary. The numbers
reported are based on the strings extracted from the binaries, typically broken
as new lines with each NULL character.


How does ``--license-text`` for ScanCode works exactly?
-------------------------------------------------------------

Is the matched text that gets included into the result exactly the lines of text
from the input file that are covered by the ``start_line`` and ``end_line``
fields of the result? I.e., if I would post-process the input file and extract
``start_line`` to ``end_line`` from it, would I get exactly the ``matched_text``
contents? Or is there some more "magic" involved when populating the
``matched_text`` field?

ScanCode is a bit smarter than just start and end line, as matching is based on
words, not lines of the actual scanned text. And a whole line may not always be matched.

For instance with this command::

$ echo "Foo is a wonder piece of code. Licensed under the GPL. " \
"For support contact foo@bar.com " > tst
$ scancode --license --license-text --license-text-diagnostics --yaml - tst
...
license_detections:
- license_expression: gpl-1.0-plus
license_expression_spdx: GPL-1.0-or-later
matches:
- license_expression: gpl-1.0-plus
license_expression_spdx: GPL-1.0-or-later
from_file: tst
start_line: 1
end_line: 1
matcher: 2-aho
score: '100.0'
matched_length: 4
match_coverage: '100.0'
rule_relevance: 100
rule_identifier: gpl_85.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_85.RULE
matched_text: Foo is a wonder piece of code. Licensed under the GPL.
For support contact foo@bar.com
matched_text_diagnostics: Licensed under the GPL.
...

then:

- ``matched_text`` is based on ``start_line`` and ``end_line``
- ``matched_text_diagnostics`` is based on the exact matched words

Note that ``matched_text_diagnostics`` also includes "tagged" gaps or extra
unmatched words highlighted between the matched words.

Loading