logicalclocks
diff --git a/‎.markdownlint.yaml‎
Lines changed: 2 additions & 0 deletions b/‎.markdownlint.yaml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/concepts/dev/inside.md‎
Lines changed: 27 additions & 15 deletions b/‎docs/concepts/dev/inside.md‎
Lines changed: 27 additions & 15 deletions
diff --git a/‎docs/concepts/fs/feature_view/offline_api.md‎
Lines changed: 8 additions & 8 deletions b/‎docs/concepts/fs/feature_view/offline_api.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/concepts/fs/feature_view/online_api.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/concepts/fs/feature_view/online_api.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/concepts/mlops/prediction_services.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/concepts/mlops/prediction_services.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/concepts/projects/cicd.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/concepts/projects/cicd.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/concepts/projects/search.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/concepts/projects/search.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/concepts/projects/storage.md‎
Lines changed: 7 additions & 7 deletions b/‎docs/concepts/projects/storage.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎docs/setup_installation/admin/ha-dr/dr.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/setup_installation/admin/ha-dr/dr.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/setup_installation/admin/ha-dr/ha.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/setup_installation/admin/ha-dr/ha.md‎
Lines changed: 4 additions & 4 deletions
@@ -1,3 +1,5 @@
 MD041: false
 MD013: false
 MD033: false
+MD004:
+  style: dash
@@ -1,34 +1,46 @@
-Hopsworks provides a complete self-service development environment for feature engineering and model training. You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.
 
-<img src="../../../assets/images/concepts/dev/dev-inside.svg">
+Hopsworks provides a complete self-service development environment for feature engineering and model training.
+You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.
+
+<img src="../../../assets/images/concepts/dev/dev-inside.svg" alt="Hopsworks Development Environment" />
 
 ### Jupyter Notebooks
 
-Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL. You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks. Jupyter notebooks can also be run as Jobs.
+Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL.
+You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks.
+Jupyter notebooks can also be run as Jobs.
 
 ### Source Code Control
 
-Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket). You can securely checkout code into your project and commit and push updates to your code to your source code repository.
+Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket).
+You can securely check out code into your project and commit and push updates to your code to your source code repository.
 
 ### FTI Pipeline Environments
 
-Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice. This architecture consists of three independently developed and operated ML pipelines:
+Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice.
+This architecture consists of three independently developed and operated ML pipelines:
 
-* Feature pipeline: takes as input raw data that it transforms into features (and labels)
-* Training pipeline: takes as input features (and labels) and outputs a trained model
-* Inference pipeline: takes new feature data and a trained model and makes predictions
+- Feature pipeline: takes as input raw data that it transforms into features (and labels)
+- Training pipeline: takes as input features (and labels) and outputs a trained model
+- Inference pipeline: takes new feature data and a trained model and makes predictions
 
-In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies. Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile. Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs. That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments. You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.
+In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies.
+Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile.
+Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs.
+That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments.
+You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.
 
 ### Jobs
 
-In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources. You can run a Job in Hopsworks:
+In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources.
+You can run a Job in Hopsworks:
 
-* From the UI
-* Programmatically with the Hopsworks SDK (Python, Java) or REST API
-* From Airflow programs (either inside our outside Hopsworks)
-* From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))
+- From the UI
+- Programmatically with the Hopsworks SDK (Python, Java) or REST API
+- From Airflow programs (either inside our outside Hopsworks)
+- From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))
 
 ### Orchestration
 
-Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one. Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
+Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one.
+Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
@@ -1,19 +1,19 @@
 The feature view provides an *Offline API* for
 
-* creating training data
-* creating batch (scoring) data
+- creating training data
+- creating batch (scoring) data
 
 ## Training Data
 
 Training data is created using a feature view. You can create training data as either:
 
-* in-memory Pandas/Polars DataFrames, useful when you have a small amount of training data;
-* materialized training data in files, in a file format of your choice (such as .tfrecord, .csv, or .parquet).
+- in-memory Pandas/Polars DataFrames, useful when you have a small amount of training data;
+- materialized training data in files, in a file format of your choice (such as .tfrecord, .csv, or .parquet).
 
 You can apply filters when creating training data from a feature view:
 
-* start-time and end-time, for example, to create the train-set from an earlier time range, and the test-set from a later (unseen) time range;
-* feature value features, for example, only train a model on customers from a particular country.
+- start-time and end-time, for example, to create the train-set from an earlier time range, and the test-set from a later (unseen) time range;
+- feature value features, for example, only train a model on customers from a particular country.
 
 Note that filters are not applied when retrieving feature vectors using feature views, as we only look up features for a specific entity, like a customer. In this case, the application should know that predictions for this customer should be made on the model trained on customers in USA, for example.
 
@@ -44,8 +44,8 @@ Test data can also be split into evaluation sets to help evaluate a model for po
 
 Batch data for scoring models is created using a feature view. Similar to training data, you can create batch data as either:
 
-* in-memory Pandas/Polars DataFrames, useful when you have a small amount of data to score;
-* materialized data in files, in a file format of your choice (such as .tfrecord, .csv, or .parquet)
+- in-memory Pandas/Polars DataFrames, useful when you have a small amount of data to score;
+- materialized data in files, in a file format of your choice (such as .tfrecord, .csv, or .parquet)
 
 Batch data requires specification of a `start_time` for the start of the batch scoring data. You can also specify the `end_time` (default is the current date).
 
 
@@ -8,4 +8,4 @@ A feature vector is a row of features (without the primary key(s) and event time
 
 It may be the case that for any given feature vector, not all features will come pre-engineered from the feature store. Some features will be provided by the client (or at least the raw data to compute the feature will come from the client). We call these 'passed' features and, similar to precomputed features from the feature store, they can also be transformed by the HSFS client in the method:
 
-* feature_view.get_feature_vector(entry, passed_features={...})
+- feature_view.get_feature_vector(entry, passed_features={...})
@@ -2,10 +2,10 @@ A prediction service is an end-to-end analytical or operational machine learning
 
 A prediction service consists of the following components:
 
-* feature pipeline(s),
-* training pipeline,
-* inference pipeline (for either batch predictions or online predictions)
-* a sink for predictions - either a store or a user-interface.
+- feature pipeline(s),
+- training pipeline,
+- inference pipeline (for either batch predictions or online predictions)
+- a sink for predictions - either a store or a user-interface.
 
 ## Analytical ML
 
 
@@ -11,10 +11,10 @@ You can create dev, staging, and prod projects - either on the same cluster, but
 
 Hopsworks supports the versioning of ML assets, including:
 
-* Feature Groups: the version of its schema - breaking schema changes require a new version and backfilling the new version;
-* Feature Views:  the version of its schema, and breaking schema changes only require a new version;
-* Models: the version of a model;
-* Deployments: the version of the deployment of a model - a model with the same version can be found in >1 deployment.
+- Feature Groups: the version of its schema - breaking schema changes require a new version and backfilling the new version;
+- Feature Views:  the version of its schema, and breaking schema changes only require a new version;
+- Models: the version of a model;
+- Deployments: the version of the deployment of a model - a model with the same version can be found in >1 deployment.
 
 ## Pytest for feature logic and feature pipeline tests
 
 
@@ -6,10 +6,10 @@ description: "Documentation on the Hopsworks capabilities to discover machine-le
 
 Hopsworks supports free-text search to discover machine-learning assets:
 
-* features
-* feature groups
-* feature views
-* training data
+- features
+- feature groups
+- feature views
+- training data
 
 You can use the search bar at the top of your project to free-text search for the names or descriptions of any ML asset. You can also search using keywords or tags that are attached to an ML asset.
 
 
@@ -1,12 +1,12 @@
 Every project in Hopsworks has its own private assets:
 
-* a Feature Store (including both Online and Offline Stores)
-* a Filesystem subtree (all directory and files under /Projects/<project_name>/)
-* a Model Registry
-* Model Deployments
-* Kafka topics
-* OpenSearch indexes (including KNN indexes - the vector DB)
-* a Hive Database
+- a Feature Store (including both Online and Offline Stores)
+- a Filesystem subtree (all directory and files under /Projects/<project_name>/)
+- a Model Registry
+- Model Deployments
+- Kafka topics
+- OpenSearch indexes (including KNN indexes - the vector DB)
+- a Hive Database
 
 Access control to these assets is controlled using project membership ACLs (access-control lists). Users in a project who have a *Data Owner* role have read/write access to these assets.  Users in a project who have a *Data Scientist* role have mostly read-only access to these assets, with the exception of the ability to write to well-known directories (Resources, Jupyter, Logs).
 
 
@@ -6,8 +6,8 @@ The state of the Hopsworks cluster is divided into data and metadata and distrib
 
 The following services contain critical state that should be backed up:
 
-* **RonDB**: as mentioned above, the RonDB is used by Hopsworks to store the cluster metadata as well as the data for the online feature store.
-* **HopsFS**: HopsFS stores the data for the batch feature store as well as checkpoints and logs for feature engineering applications.
+- **RonDB**: as mentioned above, the RonDB is used by Hopsworks to store the cluster metadata as well as the data for the online feature store.
+- **HopsFS**: HopsFS stores the data for the batch feature store as well as checkpoints and logs for feature engineering applications.
 
 Backing up service/application metrics and services/applications logs are out of the scope of this guide. By default metrics and logs are rotated after 7 days. Application logs are available on HopsFS when the application has finished and, as such, are backed up with the rest of HopsFS’ data.
 
 
@@ -2,13 +2,13 @@
 
 At a high level a Hopsworks cluster can be divided into 4 groups of nodes. Each node group should be deployed according to the requirements (e.g., 3/5/7 nodes for the head node group) to guarantee the availability of the components.
 
-* **Head nodes**: The head node is responsible for running all the metadata, public API, and user interface services that are required for Hopsworks to provide its functionality. They need to be deployed in an odd number (1, 3, 5) as the head nodes run services like Zookeeper and OpenSearch which enforce consistency through quorum based protocols. The head nodes are also responsible for managing the services running on the remaining group of nodes.
-* **Worker nodes**: The worker node is responsible for executing the feature engineering pipeline code as well as storing the data for the offline feature store (HopsFS). In an on-prem deployment, the data is stored and replicated on the workers’ local hard drives. By default the data is replicated across 3 workers. In a cloud deployment, HopsFS’ data is persisted in a cloud object store (Amazon S3, Azure Blob Storage, Google Cloud Blob Storage) and the HopsFS datanodes are responsible for persisting, retrieving and caching of blocks from the object store.
-* **RonDB Data nodes**:
+- **Head nodes**: The head node is responsible for running all the metadata, public API, and user interface services that are required for Hopsworks to provide its functionality. They need to be deployed in an odd number (1, 3, 5) as the head nodes run services like Zookeeper and OpenSearch which enforce consistency through quorum based protocols. The head nodes are also responsible for managing the services running on the remaining group of nodes.
+- **Worker nodes**: The worker node is responsible for executing the feature engineering pipeline code as well as storing the data for the offline feature store (HopsFS). In an on-prem deployment, the data is stored and replicated on the workers’ local hard drives. By default the data is replicated across 3 workers. In a cloud deployment, HopsFS’ data is persisted in a cloud object store (Amazon S3, Azure Blob Storage, Google Cloud Blob Storage) and the HopsFS datanodes are responsible for persisting, retrieving and caching of blocks from the object store.
+- **RonDB Data nodes**:
   These nodes are responsible for storing the services’ metadata (Hopsworks, HopsFS, Hive Metastore, Airflow) as well as the data for the online feature store.
   For high availability, at least two data nodes should be deployed and RonDB is typically configured with a replication factor of 2, as it uses synchronous replication with 2-phase commit, not a quorum-based replication protocol.
   More advanced deployment patterns and best practices are covered in the [RonDB documentation](https://docs.rondb.com).
-* **Query brokers**: The query brokers are the entry point for querying the online feature store. They handle authentication, authorization and execution of the requests for online feature data being submitted from the feature store APIs. At least two query brokers should be deployed to achieve high availability. Query brokers are stateless. Additional query brokers should be deployed to handle additional load and clients.
+- **Query brokers**: The query brokers are the entry point for querying the online feature store. They handle authentication, authorization and execution of the requests for online feature data being submitted from the feature store APIs. At least two query brokers should be deployed to achieve high availability. Query brokers are stateless. Additional query brokers should be deployed to handle additional load and clients.
 
 Example deployment:
Original file line number	Diff line number	Diff line change
`@@ -8,4 +8,4 @@ A feature vector is a row of features (without the primary key(s) and event time`
`8`	`8`
`9`	`9`	`It may be the case that for any given feature vector, not all features will come pre-engineered from the feature store. Some features will be provided by the client (or at least the raw data to compute the feature will come from the client). We call these 'passed' features and, similar to precomputed features from the feature store, they can also be transformed by the HSFS client in the method:`
`10`	`10`
`11`		`-* feature_view.get_feature_vector(entry, passed_features={...})`
	`11`	`+- feature_view.get_feature_vector(entry, passed_features={...})`