Skip to content

adlfs telemetric concerns #499

@MABERG13-github

Description

@MABERG13-github

Hi,

I was looking around and try to find an answer to this question on this repo and the regular dask repo so maybe someone could just answer me real quick.

Since if you want to read parquet files from gen 2 storage you use the following

import dask.dataframe as dd

storage_options={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=storage_options)
ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage_options)

To quate your README

"Operations against the Gen2 Datalake are implemented by leveraging Azure Blob Storage Python SDK."

If i go to the link and scroll down to the header Data Collection, I start to get concerned. To quote the the README:

"The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described below. You can learn more about data collection and use in the help documentation and Microsoft’s privacy statement. For more information on the data collected by the Azure SDK, please visit the Telemetry Guidelines page."

How do I turn of this Telemetry data from being collected when using dd.read_parquet? Since it require you to pass a very specific class to to the client? Or is this turned of by default?

I raise this as an issue because the company policy for the company I work with would not let me use this library and in turn the Dask library with a telemetry feature like this turned on. So hopefully this could be clarified in the documentation. Or if I'm just dumb and missed this in existing docs please point me to the right place :)

With best regards

Marcus

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions