Skip to content

Conversation

@TG1999
Copy link
Contributor

@TG1999 TG1999 commented Oct 20, 2025

Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>
@TG1999
Copy link
Contributor Author

TG1999 commented Oct 20, 2025

root@tg1999-Precision-3561:/home/tg1999/Desktop/scancode.io# python3 etc/scripts/run_d2d_scio.py --input-file ./project_data/from-from-intbit
set.tar.gz:from --input-file ./project_data/tointbitset.whl:to --option Python --output res1.json
Files copied to: /home/tg1999/Desktop/scancode.io/d2d
Using Postgres host port: 59533
d2460ad0155ddaadb7dca767180e651f334c9bdfe9d35d37745b46ba6226cceb
Waiting for Postgres to be ready...
Postgres is ready.
Running ScanCode pipeline:
Running: /usr/bin/docker run --rm -v /home/tg1999/Desktop/scancode.io/d2d:/code -e DATABASE_URL=postgresql://scancode:scancode@host.docker.internal:59533/scancode --network host ghcr.io/aboutcode-org/scancode.io:latest sh -c scanpipe create-project scanpipe_a04dde0b --input-file /code/from-from-intbitset.tar.gz:from --input-file /code/tointbitset.whl:to --pipeline map_deploy_to_develop:Python, && scanpipe execute --project scanpipe_a04dde0b
Project scanpipe_a04dde0b created with work directory /opt/scancodeio/var/projects/scanpipe_a04dde0b-34ffa9c8
Files copied to the project inputs directory:
- from-from-intbitset.tar.gz
- tointbitset.whl
INFO Run[a75e6434-8b22-4a61-a92e-774dd7521fe1] Enter `execute_pipeline_task` Run.pk=a75e6434-8b22-4a61-a92e-774dd7521fe1
Start the map_deploy_to_develop pipeline execution...
INFO Run[a75e6434-8b22-4a61-a92e-774dd7521fe1] Run pipeline: "map_deploy_to_develop" on project: "scanpipe_a04dde0b"
INFO 
Updating directory fingerprints for 2 directories.
INFO Updating directory DB objects...
INFO Scan 0 codebase resources with scan_file
INFO Starting ProcessPoolExecutor with 15 max_workers
INFO Scan 1 codebase resources with scan_file
INFO Starting ProcessPoolExecutor with 15 max_workers
INFO Project scanpipe_a04dde0b collect_license_detections:
INFO   Processing: from/from-intbitset/intbitset.pyx for licenses
INFO Run[a75e6434-8b22-4a61-a92e-774dd7521fe1] Update Run instance with exitcode, output, and end_date
map_deploy_to_develop successfully executed on project scanpipe_a04dde0b
Running: /usr/bin/docker run --rm -v /home/tg1999/Desktop/scancode.io/d2d:/code -e DATABASE_URL=postgresql://scancode:scancode@host.docker.internal:59533/scancode --network host ghcr.io/aboutcode-org/scancode.io:latest sh -c scanpipe output --project scanpipe_a04dde0b --format json --print
scancode_db_e6d63e

Output with the following script.

@JonoYang
Copy link
Member

For simplicity and safety, I would consider using Docker compose to handle the database service. You can create a new docker-compose.yml that has scanpipe and the database, something along the lines of:

name: scancodeio-d2d
services:
  db:
    image: docker.io/library/postgres:13
    env_file:
      - docker.env
    volumes:
      - db_data:/var/lib/postgresql/data/
    shm_size: "1gb"
    restart: always
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}" ]
      interval: 10s
      timeout: 5s
      retries: 5

  worker:
    image: ghcr.io/aboutcode-org/scancode.io:latest
    env_file:
      - docker.env
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
    depends_on:
      - db


volumes:
  db_data:
  workspace:

You can run scanpipe commands by doing docker compose -f docker-compose.d2d.yml run worker scanpipe --help

@TG1999
Copy link
Contributor Author

TG1999 commented Oct 22, 2025

For simplicity and safety, I would consider using Docker compose to handle the database service. You can create a new docker-compose.yml that has scanpipe and the database, something along the lines of:

name: scancodeio-d2d
services:
  db:
    image: docker.io/library/postgres:13
    env_file:
      - docker.env
    volumes:
      - db_data:/var/lib/postgresql/data/
    shm_size: "1gb"
    restart: always
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}" ]
      interval: 10s
      timeout: 5s
      retries: 5

  worker:
    image: ghcr.io/aboutcode-org/scancode.io:latest
    env_file:
      - docker.env
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
    depends_on:
      - db


volumes:
  db_data:
  workspace:

You can run scanpipe commands by doing docker compose -f docker-compose.d2d.yml run worker scanpipe --help

@JonoYang we have a follow up issue here for same #1913

@tdruez
Copy link
Contributor

tdruez commented Oct 22, 2025

I agree with @JonoYang, we do not want to re-invent orchestration here.
Let's use what we already have and we know is working fine.
This script could simply be a wrapper around the suggested docker-compose configuration.

Also, in case of a simple one-off pipeline run, what about using the dedicated run command?
See:


The run needs a few improvements being added in #1916

Running it would looks somthing like this:

# 1. Start a postgres service
docker run -d --name scancodeio-run-db postgres:17

# 2. Run d2d pipeline
docker run --rm \
  -v "$(pwd)":/codedrop \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop:Python intbitset.tar.gz:from,intbitset.whl:to \
  > results.json

# Use download URLs
docker run --rm \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop:Python https://url/intbitset.tar.gz#from,https://url/intbitset.whl#to \
  > results.json

@TG1999
Copy link
Contributor Author

TG1999 commented Oct 22, 2025

When I use above docker compose file with run command, I get this error.

root@tg1999-Precision-3561:/home/tg1999/Desktop/scancode.io# docker compose -f docker-compose.d2d.yml run worker scanpipe create-project dd --input-file ./project_data/from-from-intbitset.tar.gz:from --input-file ./project_data/tointbitset.whl:to --pipeline map_deploy_to_develop:Python && scanpipe execute --project dd
WARN[0000] Found orphan containers ([scancodeio-d2d-worker-run-739aa81c3565]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. 
[+] Creating 1/1
 ✔ Container scancodeio-d2d-db-1  Running                                                                                               0.0s 
Traceback (most recent call last):
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 278, in ensure_connection
    self.connect()
    ~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 255, in connect
    self.connection = self.get_new_connection(conn_params)
                      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/postgresql/base.py", line 332, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/psycopg/connection.py", line 118, in connect
    raise last_ex.with_traceback(None)
psycopg.errors.ConnectionTimeout: connection timeout expired

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/scancodeio/.venv/bin/scanpipe", line 7, in <module>
    sys.exit(command_line())
             ~~~~~~~~~~~~^^
  File "/opt/scancodeio/scancodeio/__init__.py", line 98, in command_line
    execute_from_command_line(sys.argv)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
    ~~~~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
  File "/opt/scancodeio/scanpipe/management/commands/create-project.py", line 42, in handle
    self.create_project(
    ~~~~~~~~~~~~~~~~~~~^
        name=options["name"],
        ^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        create_global_webhook=not options["no_global_webhook"],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/scancodeio/scanpipe/management/commands/__init__.py", line 554, in create_project
    return create_project(
        name=name,
    ...<9 lines>...
        command=self,
    )
  File "/opt/scancodeio/scanpipe/management/commands/__init__.py", line 457, in create_project
    project.full_clean(exclude=["slug"])
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1636, in full_clean
    self.validate_unique(exclude=exclude)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1378, in validate_unique
    errors = self._perform_unique_checks(unique_checks)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1488, in _perform_unique_checks
    if qs.exists():
       ~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/query.py", line 1288, in exists
    return self.query.has_results(using=self.db)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/sql/query.py", line 660, in has_results
    return compiler.has_results()
           ~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/sql/compiler.py", line 1542, in has_results
    return bool(self.execute_sql(SINGLE))
                ~~~~~~~~~~~~~~~~^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/sql/compiler.py", line 1572, in execute_sql
    cursor = self.connection.cursor()
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 319, in cursor
    return self._cursor()
           ~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 295, in _cursor
    self.ensure_connection()
    ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 277, in ensure_connection
    with self.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 278, in ensure_connection
    self.connect()
    ~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 255, in connect
    self.connection = self.get_new_connection(conn_params)
                      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/postgresql/base.py", line 332, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/psycopg/connection.py", line 118, in connect
    raise last_ex.with_traceback(None)
django.db.utils.OperationalError: connection timeout expired

@TG1999
Copy link
Contributor Author

TG1999 commented Oct 23, 2025

@TG1999 TG1999 marked this pull request as draft October 23, 2025 10:50
@tdruez
Copy link
Contributor

tdruez commented Oct 24, 2025

@TG1999 #1916 merged and released https://github.com/aboutcode-org/scancode.io/releases/tag/v35.4.1

Documented at https://scancodeio.readthedocs.io/en/latest/quickstart.html#use-postgresql-for-better-performance

Pull the latest ScanCode.io Docker image

docker pull ghcr.io/aboutcode-org/scancode.io:latest

Start a PostgreSQL Database Service

docker run -d \
  --name scancodeio-run-db \
  -e POSTGRES_DB=scancodeio \
  -e POSTGRES_USER=scancodeio \
  -e POSTGRES_PASSWORD=scancodeio \
  -e POSTGRES_INITDB_ARGS="--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8" \
  -v scancodeio_pgdata:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:17

Stop the service with docker rm -f scancodeio-run-db once done.

Run the map_deploy_to_develop pipeline on remote inputs

FROM_URL=https://github.com/aboutcode-org/scancode.io/raw/refs/heads/main/scanpipe/tests/data/d2d-python/from-intbitset.tar.gz
TO_URL=https://github.com/aboutcode-org/scancode.io/raw/refs/heads/main/scanpipe/tests/data/d2d-python/to-intbitset.whl

docker run --rm \
  --network host \
  -e SCANCODEIO_NO_AUTO_DB=1 \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop ${FROM_URL}#from,${TO_URL}#to \
  > results.json

Run the map_deploy_to_develop pipeline on local inputs

docker run --rm \
  -v "$(pwd)":/codedrop \
  --network host \
  -e SCANCODEIO_NO_AUTO_DB=1 \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop:Python intbitset.tar.gz:from,intbitset.whl:to \
  > results.json

@TG1999 TG1999 marked this pull request as ready for review October 27, 2025 09:48
Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

.. code-block:: bash
./run_mapping.sh ./from.tar.gz ./to.whl "" results.txt false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to provide "" for the options when empty is not ideal.
Option should be passed as an an option --options OPTIONS arg.
Same for false, this should be an option --spin-db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how the options shall be passed then ?

--option Python --option Java

or

--option "Python,Java"

Copy link
Contributor Author

@TG1999 TG1999 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed with --options "Python,Java" for now. Please check

Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>
Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last minor change request, we should be ready to merged after this one.

Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>
@tdruez tdruez merged commit c5ec222 into main Oct 28, 2025
15 checks passed
@tdruez tdruez deleted the add_d2d_script branch October 28, 2025 08:26
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TG1999 sorry about the late review, here are some doc improvement suggestions for your consideration. These can be addressed separately.

+-----------------+-------------------------------------------------------------+
| Argument | Description |
+=================+=============================================================+
| ``from-path`` | Path to the base deployment/scan file |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here from and to sides should be more closely aligned with the explanation we have at https://github.com/aboutcode-org/scancode.io/blob/main/scanpipe/pipelines/deploy_to_develop.py#L42 so it is clear that one is the source side and the other is the deployed side. The base/target names used here is new and a tad bit confusing.

+-----------------+-------------------------------------------------------------+
| ``to-path`` | Path to the target deployment/scan file |
+-----------------+-------------------------------------------------------------+
| ``options`` | D2D pipeline parameters (can be empty ``""``) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options should also be a bit more descriptive, to communicate that these are ecosystem specific optional steps, and instead of can be empty we can probably mention in someway which parameters are optional and which are required.

We should probably also have a reference page on all the supported ecosystems in d2d and the capabilities supported there, and link to this page. I opened a seperate issue for this: #1922

Run ScanCode.io Mapping Script
================================

This script executes the ``map_deploy_to_develop`` mapping workflow from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert a RST link to https://github.com/aboutcode-org/scancode.io/blob/main/docs/built-in-pipelines.rst?plain=1#L188, cross links between related docummention pages are always useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a short description of d2d (similar to the pipeline docstring) would also be useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants