-
Notifications
You must be signed in to change notification settings - Fork 109
Description
Hi,
I was looking around and try to find an answer to this question on this repo and the regular dask repo so maybe someone could just answer me real quick.
Since if you want to read parquet files from gen 2 storage you use the following
import dask.dataframe as dd
storage_options={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}
ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=storage_options)
ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage_options)
To quate your README
"Operations against the Gen2 Datalake are implemented by leveraging Azure Blob Storage Python SDK."
If i go to the link and scroll down to the header Data Collection, I start to get concerned. To quote the the README:
"The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described below. You can learn more about data collection and use in the help documentation and Microsoft’s privacy statement. For more information on the data collected by the Azure SDK, please visit the Telemetry Guidelines page."
How do I turn of this Telemetry data from being collected when using dd.read_parquet? Since it require you to pass a very specific class to to the client? Or is this turned of by default?
I raise this as an issue because the company policy for the company I work with would not let me use this library and in turn the Dask library with a telemetry feature like this turned on. So hopefully this could be clarified in the documentation. Or if I'm just dumb and missed this in existing docs please point me to the right place :)
With best regards
Marcus