logicalclocks
diff --git a/‎docs/concepts/dev/outside.md‎
Lines changed: 5 additions & 2 deletions b/‎docs/concepts/dev/outside.md‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎docs/concepts/fs/feature_group/external_fg.md‎
Lines changed: 5 additions & 1 deletion b/‎docs/concepts/fs/feature_group/external_fg.md‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/concepts/fs/feature_group/feature_monitoring.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/concepts/fs/feature_group/feature_monitoring.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/concepts/fs/feature_group/feature_pipelines.md‎
Lines changed: 32 additions & 10 deletions b/‎docs/concepts/fs/feature_group/feature_pipelines.md‎
Lines changed: 32 additions & 10 deletions
diff --git a/‎docs/concepts/fs/feature_group/fg_overview.md‎
Lines changed: 12 additions & 5 deletions b/‎docs/concepts/fs/feature_group/fg_overview.md‎
Lines changed: 12 additions & 5 deletions
diff --git a/‎docs/concepts/fs/feature_group/fg_statistics.md‎
Lines changed: 7 additions & 3 deletions b/‎docs/concepts/fs/feature_group/fg_statistics.md‎
Lines changed: 7 additions & 3 deletions
@@ -1,5 +1,8 @@
-You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md). Hopsworks also running SQL queries to compute features in external data warehouses. The Feature Store can also be queried with SQL.
+You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md).
+Hopsworks also running SQL queries to compute features in external data warehouses.
+The Feature Store can also be queried with SQL.
 
-There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks. However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.
+There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks.
+However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.
 
 <img src="../../../assets/images/concepts/dev/dev-outside.svg">
@@ -1,4 +1,8 @@
-External feature groups are offline feature groups where their data is stored in an external table. An external table requires a data source, defined with the Connector API (or more typically in the user interface), to enable HSFS to retrieve data from the external table. An external feature group doesn't allow for offline data ingestion or modification; instead, it includes a user-defined SQL string for retrieving data. You can also perform SQL operations, including projections, aggregations, and so on. The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.
+External feature groups are offline feature groups where their data is stored in an external table.
+An external table requires a data source, defined with the Connector API (or more typically in the user interface), to enable HSFS to retrieve data from the external table.
+An external feature group doesn't allow for offline data ingestion or modification; instead, it includes a user-defined SQL string for retrieving data.
+You can also perform SQL operations, including projections, aggregations, and so on.
+The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.
 
 In the image below, we can see that HSFS currently supports a large number of data sources, including any JDBC-enabled source, Snowflake, Data Lake, Redshift, BigQuery, S3, ADLS, GCS, RDS, and Kafka
 
 
@@ -8,7 +8,8 @@ HSFS supports monitoring features on your Feature Group by:
 
 ## Scheduled Statistics
 
-After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis. Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.
+After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis.
+Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.
 
 ## Statistics Comparison
 
 
@@ -1,46 +1,68 @@
-A feature pipeline is a program that orchestrates the execution of a dataflow graph of data validation, aggregation, dimensionality reduction, transformation, and other feature engineering steps on input data to create and/or update feature data. With HSFS, you can write feature pipelines in different languages as shown in the figure below.
+A feature pipeline is a program that orchestrates the execution of a dataflow graph of data validation, aggregation, dimensionality reduction, transformation, and other feature engineering steps on input data to create and/or update feature data.
+With HSFS, you can write feature pipelines in different languages as shown in the figure below.
 
 <img src="../../../../assets/images/concepts/fs/feature-pipelines.svg">
 
 ### Data Sources
 
-Your feature pipeline needs to connect to some (external) data source to read the data to be processed. Python, Spark, and Flink have connectors to a huge number of different data sources, while SQL feature pipelines are often restricted to a single data source (for example, your connector to SnowFlake only runs SQL on SnowFlake). SparkSQL, in contrast, can be used over tables that originate in different  data sources.
+Your feature pipeline needs to connect to some (external) data source to read the data to be processed.
+Python, Spark, and Flink have connectors to a huge number of different data sources, while SQL feature pipelines are often restricted to a single data source (for example, your connector to SnowFlake only runs SQL on SnowFlake).
+SparkSQL, in contrast, can be used over tables that originate in different  data sources.
 
 ### Data Validation
 
-In order to be able to train and serve models that you can rely on, you need clean, high quality features. Data validation operations include removing bad data, removing or imputing missing values, and identifying problems such as feature shift. HSFS supports Great Expectations to specify data validation rules that are executed in the client before features are written to the Feature Store. The validation results are collected and shown in Hopsworks.
+In order to be able to train and serve models that you can rely on, you need clean, high quality features.
+Data validation operations include removing bad data, removing or imputing missing values, and identifying problems such as feature shift.
+HSFS supports Great Expectations to specify data validation rules that are executed in the client before features are written to the Feature Store.
+The validation results are collected and shown in Hopsworks.
 
 ### Aggregations
 
-Aggregations are used to summarize large datasets into more concise, signal-rich features. Popular aggregations include count(), sum(), mean(), median(), stddev(), min(), and max(). These aggregations produce a single number (a numerical feature) that captures information about a potentially large dataset. Both numerical and categorical features are often transformed before being used to train or serve models.
+Aggregations are used to summarize large datasets into more concise, signal-rich features.
+Popular aggregations include count(), sum(), mean(), median(), stddev(), min(), and max().
+These aggregations produce a single number (a numerical feature) that captures information about a potentially large dataset.
+Both numerical and categorical features are often transformed before being used to train or serve models.
 
 ### Dimensionality Reduction
 
-If input data is impractically large or if it has a significant amount of redundancy, it can often be transformed into a reduced set of features with dimensionality reduction (often called feature extraction). Popular dimensionality algorithms include embedding algorithms, PCA, and TSNE.
+If input data is impractically large or if it has a significant amount of redundancy, it can often be transformed into a reduced set of features with dimensionality reduction (often called feature extraction).
+Popular dimensionality algorithms include embedding algorithms, PCA, and TSNE.
 
 ### Transformations
 
-Transformations are covered in more detail in [training/inference pipelines](../feature_view/training_inference_pipelines.md), as transformations typically happen after the feature store. If you store transformed features in feature groups, the feature data is no longer useful for EDA (as it near to impossible for Data Scientists to understand the transformed values). It also makes it impossible for inference pipelines to log untransformed feature values and predictions for an operational model. There is one use case for storing transformed features in feature groups - when you need to have ultra low latency when reading precomputed features (and online transformations when reading features add too much latency for your use case). The figure below shows to include transformations in your feature pipelines.
+Transformations are covered in more detail in [training/inference pipelines](../feature_view/training_inference_pipelines.md), as transformations typically happen after the feature store.
+If you store transformed features in feature groups, the feature data is no longer useful for EDA (as it near to impossible for Data Scientists to understand the transformed values).
+It also makes it impossible for inference pipelines to log untransformed feature values and predictions for an operational model.
+There is one use case for storing transformed features in feature groups - when you need to have ultra low latency when reading precomputed features (and online transformations when reading features add too much latency for your use case).
+The figure below shows to include transformations in your feature pipelines.
 
 <img src="../../../../assets/images/concepts/fs/feature-pipelines-with-transformations.svg">
 
 ### Feature Engineering in Python
 
-Python is the most widely used framework for feature engineering due to its extensive library support for aggregations (Pandas/Polars), data validation (Great Expectations), and dimensionality reduction (embeddings, PCA), and transformations (in Scikit-Learn, TensorFlow, PyTorch). Python also supports open-source feature engineering frameworks used for automated feature engineering, such as [featuretools](https://www.featuretools.com/) that supports relational and temporal sources.
+Python is the most widely used framework for feature engineering due to its extensive library support for aggregations (Pandas/Polars), data validation (Great Expectations), and dimensionality reduction (embeddings, PCA), and transformations (in Scikit-Learn, TensorFlow, PyTorch).
+Python also supports open-source feature engineering frameworks used for automated feature engineering, such as [featuretools](https://www.featuretools.com/) that supports relational and temporal sources.
 
 ### Feature Engineering in Spark/PySpark
 
-Spark is popular as a feature engineering framework as it can scale to process larger volumes of data than Python, and provides native support for aggregations, and it supports many of the same data validation (Great Expectations), and dimensionality reduction algorithms (embeddings, PCA) as Python. Spark also has native support for transformations, which are useful for analytical models (batch scoring), but less useful for operational models, where online transformations are required, and Spark environments are less common. Online model serving environments typically only support online transformations in Python.
+Spark is popular as a feature engineering framework as it can scale to process larger volumes of data than Python, and provides native support for aggregations, and it supports many of the same data validation (Great Expectations), and dimensionality reduction algorithms (embeddings, PCA) as Python.
+Spark also has native support for transformations, which are useful for analytical models (batch scoring), but less useful for operational models, where online transformations are required, and Spark environments are less common.
+Online model serving environments typically only support online transformations in Python.
 
 ### Feature Engineering in SQL
 
-SQL has grown in popularity for performing heavy lifting in feature pipelines - computing aggregates on data - when the input data already resides in a data warehouse. Data warehouses also support data validation, for example, through Great Expectations in DBT. However, SQL is not mature as a platform for transformations and dimensionality reductions, where UDFs are applied row-wise.
+SQL has grown in popularity for performing heavy lifting in feature pipelines - computing aggregates on data - when the input data already resides in a data warehouse.
+Data warehouses also support data validation, for example, through Great Expectations in DBT.
+However, SQL is not mature as a platform for transformations and dimensionality reductions, where UDFs are applied row-wise.
 
 You can do aggregation in SQL for data in your data warehouse or database.
 
 ### Feature Engineering in Flink
 
-Apache Flink is a powerful and flexible framework for stateful feature computation operations over unbounded and bounded data streams. It is used for feature engineering when you need very fresh features computed in real-time. Flink provides a rich set of operators and functions such as time windows and aggregation operations that can be applied to keyed and/or global window streams. Flink’s stateful operations allow users to maintain and update state across multiple data records or events, which is particularly useful for feature engineering tasks such as sessionization and/or maintaining rolling aggregates over a sliding window of data.
+Apache Flink is a powerful and flexible framework for stateful feature computation operations over unbounded and bounded data streams.
+It is used for feature engineering when you need very fresh features computed in real-time.
+Flink provides a rich set of operators and functions such as time windows and aggregation operations that can be applied to keyed and/or global window streams.
+Flink’s stateful operations allow users to maintain and update state across multiple data records or events, which is particularly useful for feature engineering tasks such as sessionization and/or maintaining rolling aggregates over a sliding window of data.
 
 Flink feature engineering pipelines are supported in Java/Scala only.
 
 
@@ -1,6 +1,10 @@
-As a programmer, you can consider a feature, in machine learning, to be a variable associated with some entity that contains a value that is useful for helping train a model to solve a prediction problem. That is, the feature is just a variable with predictive power for a machine learning problem, or task.
+As a programmer, you can consider a feature, in machine learning, to be a variable associated with some entity that contains a value that is useful for helping train a model to solve a prediction problem.
+That is, the feature is just a variable with predictive power for a machine learning problem, or task.
 
-A feature group is a table of features, where each feature group has a primary key, and optionally an event_time column (indicating when the features in that row were observed), and a partition key. Collectively, they are referred to as columns. The partition key determines how to layout the feature group rows on disk such that you can efficiently query the data using queries with the partition key. For example, if your partition key is the day and you have hundreds of days worth of data, with a partition key, you can query the day for only a given day or a range of days, and only the data for those days will be read from disk.
+A feature group is a table of features, where each feature group has a primary key, and optionally an event_time column (indicating when the features in that row were observed), and a partition key.
+Collectively, they are referred to as columns.
+The partition key determines how to layout the feature group rows on disk such that you can efficiently query the data using queries with the partition key.
+For example, if your partition key is the day and you have hundreds of days worth of data, with a partition key, you can query the day for only a given day or a range of days, and only the data for those days will be read from disk.
 
 <img src="../../../../assets/images/concepts/fs/feature-group-table.png">
 
@@ -12,10 +16,13 @@ Feature groups can be stored in a low-latency "online" database and/or in low co
 
 #### Online Storage
 
-The online store stores only the latest values of features for a feature group. It is used to serve pre-computed features to models at runtime.
+The online store stores only the latest values of features for a feature group.
+It is used to serve pre-computed features to models at runtime.
 
 #### Offline Storage
 
-The offline store stores the historical values of features for a feature group so that it may store much more data than the online store. Offline feature groups are used, typically, to create training data for models, but also to retrieve data for batch scoring of models.
+The offline store stores the historical values of features for a feature group so that it may store much more data than the online store.
+Offline feature groups are used, typically, to create training data for models, but also to retrieve data for batch scoring of models.
 
-In most cases, offline data is stored in Hopsworks, but through the implementation of data sources, it can reside in an external file system. The externally stored data can be managed by Hopsworks by defining ordinary feature groups or it can be used for reading only by defining [External Feature Group](external_fg.md).
+In most cases, offline data is stored in Hopsworks, but through the implementation of data sources, it can reside in an external file system.
+The externally stored data can be managed by Hopsworks by defining ordinary feature groups or it can be used for reading only by defining [External Feature Group](external_fg.md).
@@ -6,14 +6,18 @@ HSFS supports monitoring, validation, and alerting for features:
 
 ### Statistics
 
-When you create a Feature Group in HSFS, you can configure it to compute statistics over the features inserted into the Feature Group by setting the `statistics_config` dict parameter, see [Feature Group Statistics](../../../../user_guides/fs/feature_group/statistics/) for details. Every time you write to the Feature Group, new statistics will be computed over all of the data in the Feature Group.
+When you create a Feature Group in HSFS, you can configure it to compute statistics over the features inserted into the Feature Group by setting the `statistics_config` dict parameter, see [Feature Group Statistics](../../../../user_guides/fs/feature_group/statistics/) for details.
+Every time you write to the Feature Group, new statistics will be computed over all of the data in the Feature Group.
 
 ### Data Validation
 
-You can define expectation suites in Great Expectations and associate them with feature groups. When you write to a feature group, the expectations are executed, then you can define a policy on the feature group for what to do if any expectation fails.
+You can define expectation suites in Great Expectations and associate them with feature groups.
+When you write to a feature group, the expectations are executed, then you can define a policy on the feature group for what to do if any expectation fails.
 
 <img src="../../../../assets/images/concepts/fs/fg-expectations.svg">
 
 ### Alerting
 
-HSFS also supports alerts, that can be triggered when there are problems in your feature pipelines, for example, when a write fails due to an error or a failed expectation. You can send alerts to different alerting endpoints, such as email or Slack, that can be configured in the Hopsworks UI. For example, you can send a slack message if features being written to a feature group are missing some input data.
+HSFS also supports alerts, that can be triggered when there are problems in your feature pipelines, for example, when a write fails due to an error or a failed expectation.
+You can send alerts to different alerting endpoints, such as email or Slack, that can be configured in the Hopsworks UI.
+For example, you can send a slack message if features being written to a feature group are missing some input data.