From d0dd9bd3c5a7c5188d3f803a5fd3b6f051a800d3 Mon Sep 17 00:00:00 2001 From: Mamta Wardhani Date: Wed, 21 May 2025 16:08:19 +0530 Subject: [PATCH 1/9] [Edit] SQL: DATEDIFF() --- .../concepts/dates/terms/datediff/datediff.md | 191 ++++++++++++++---- 1 file changed, 154 insertions(+), 37 deletions(-) diff --git a/content/sql/concepts/dates/terms/datediff/datediff.md b/content/sql/concepts/dates/terms/datediff/datediff.md index 66f952e9b2e..9426633b080 100644 --- a/content/sql/concepts/dates/terms/datediff/datediff.md +++ b/content/sql/concepts/dates/terms/datediff/datediff.md @@ -1,72 +1,189 @@ --- Title: 'DATEDIFF()' -Description: 'Calculates and returns the difference between two date values. Available in SQL Server and MySQL.' +Description: 'Calculates the difference between two date or timestamp values and returns the result as an integer.' Subjects: - - 'Data Science' + - 'Computer Science' + - 'Web Development' Tags: - 'Database' - 'Date' - - 'Queries' - - 'MySQL' - - 'SQL Server' + - 'Functions' + - 'SQL' CatalogContent: - 'learn-sql' - 'paths/analyze-data-with-sql' - - 'paths/design-databases-with-postgresql' --- -**`DATEDIFF()`** is a function found in SQL Server and MySQL that calculates and returns the difference between two date values. +The **`DATEDIFF()`** function calculates the difference between two date or timestamp values and returns the result as an integer in a specified unit of time. This powerful function allows developers and analysts to easily measure time intervals between dates, which is essential for reporting, data analysis, and application development. -## SQL Server Syntax +`DATEDIFF()` serves as a cornerstone for date-based calculations in SQL Server, enabling users to perform operations like calculating ages, measuring durations of events, determining time elapsed between transactions, and creating date-based business metrics. Its versatility makes it invaluable for virtually any application that deals with temporal data. + +## Syntax ```pseudo -DATEDIFF(datePart, date1, date2) +DATEDIFF(interval, date1, date2) ``` -The `DATEDIFF()` function in SQL Server has three required parameters: +**Parameters:** + +- `interval`: The time unit in which the difference will be calculated. Valid values include: + - `year`, `yy`, `yyyy`: Years + - `quarter`, `qq`, `q`: Quarters + - `month`, `mm`, `m`: Months + - `dayofyear`, `dy`, `y`: Day of the year + - `day`, `dd`, `d`: Days + - `week`, `wk`, `ww`: Weeks + - `hour`, `hh`: Hours + - `minute`, `mi`, `n`: Minutes + - `second`, `ss`, `s`: Seconds + - `millisecond`, `ms`: Milliseconds + - `microsecond`, `mcs`: Microseconds + - `nanosecond`, `ns`: Nanoseconds +- `date1`: The start date for the calculation. Can be a date, datetime, datetime2, smalldatetime, or time data type, or an expression that resolves to one of these types. +- `date2`: The end date for the calculation. Can be a date, datetime, datetime2, smalldatetime, or time data type, or an expression that resolves to one of these types. + +**Return value:** -- `datePart` is the part of the date to return. It can be one of the following formats: - - Year: `year`, `yyyy`, `yy` - - Quarter: `quarter`, `qq`, `q` - - Week: `week`, `ww`, `wk` - - Weekday: `weekday`, `dw`, `w` - - Second: `second`, `ss`, `s` - - Month: `month`, `mm`, `m` - - Minute: `minute`, `mi`, `n` - - Millisecond: `millisecond`, `ms` - - Hour: `hour`, `hh` - - Day of Year: `dayofyear` - - Day: `day`, `dy`, `y` -- `date1` and `date2` are the dates to compare. It can be in several formats, one being the `yyyy/mm/dd` format. +The `DATEDIFF()` function returns an integer representing the number of time units (specified by the interval parameter) between date1 and date2. -### Example 1 +## Example 1: Basic Date Difference Calculation -The following example calculates the difference in months between `2020/05/18` and `2022/05/18`: +This example demonstrates how to calculate the difference between two dates in various time intervals: ```sql -SELECT DATEDIFF(month, '2020/05/18', '2022/05/18'); /* Output: 24 */ +-- Calculate difference between two dates in years, months, and days +SELECT + DATEDIFF(year, '2020-01-15', '2023-09-20') AS YearDiff, + DATEDIFF(month, '2020-01-15', '2023-09-20') AS MonthDiff, + DATEDIFF(day, '2020-01-15', '2023-09-20') AS DayDiff; ``` -### Example 2 +Output produced by this code will be: + +| YearDiff | MonthDiff | DayDiff | +| -------- | --------- | ------- | +| 3 | 44 | 1344 | + +This example calculates the difference between January 15, 2020, and September 20, 2023, in years, months, and days. The results show there are 3 years, 44 months, or 1344 days between these dates. + +## Example 2: Calculating Age in Years -The following example returns the difference in seconds between `2021/09/30 08:22:04` and `2021/09/30 08:25:06`: +This example demonstrates how to use `DATEDIFF()` to calculate a person's age in years from their birthdate. ```sql -SELECT DATEDIFF(second, '2021/09/30 08:22:04', '2021/09/30 08:25:06'); /* Output: 182 */ +-- Create a sample table with employee data +CREATE TABLE Employees ( + EmployeeID INT PRIMARY KEY, + FirstName VARCHAR(50), + LastName VARCHAR(50), + BirthDate DATE, + HireDate DATE +); + +-- Insert sample data +INSERT INTO Employees (EmployeeID, FirstName, LastName, BirthDate, HireDate) +VALUES + (1, 'John', 'Smith', '1985-06-15', '2010-03-20'), + (2, 'Sarah', 'Johnson', '1992-11-30', '2015-07-10'), + (3, 'Michael', 'Brown', '1978-02-23', '2005-09-15'); + +-- Calculate ages as of current date +SELECT + EmployeeID, + FirstName + ' ' + LastName AS EmployeeName, + BirthDate, + DATEDIFF(year, BirthDate, GETDATE()) AS Age +FROM + Employees +ORDER BY + Age DESC; ``` -## MySQL Syntax +The output generated by this code will be: -MySQL only requires two date parameters in the `DATEDIFF()` function and will return the number of days between `date1` and `date2`. +| EmployeeID | EmployeeName | BirthDate | Age | +| ---------- | ------------- | ---------- | --- | +| 3 | Michael Brown | 1978-02-23 | 47 | +| 1 | John Smith | 1985-06-15 | 39 | +| 2 | Sarah Johnson | 1992-11-30 | 32 | -```pseudo -DATEDIFF(date1, date2) -``` +This example shows how to calculate an employee's age by finding the difference in years between their birthdate and the current date. Note that this calculation provides the raw year difference and doesn't account for whether the birthday has occurred yet in the current year. -### Example +## Example 3: Business Metrics with `DATEDIFF()` -The following example returns the difference in days between `2019-07-05` and `2018-12-24`: +This example demonstrates how to use `DATEDIFF()` for business reporting metrics, such as calculating order processing times and identifying delayed shipments. ```sql -SELECT DATEDIFF("2019-07-05", "2018-12-24"); /* Output: 193 */ +-- Create sample orders table +CREATE TABLE Orders ( + OrderID INT PRIMARY KEY, + CustomerID INT, + OrderDate DATETIME, + ShipDate DATETIME, + DeliveryDate DATETIME +); + +-- Insert sample data +INSERT INTO Orders (OrderID, CustomerID, OrderDate, ShipDate, DeliveryDate) +VALUES + (1001, 101, '2023-01-10 09:30:00', '2023-01-11 14:15:00', '2023-01-15 11:20:00'), + (1002, 102, '2023-01-12 13:45:00', '2023-01-13 10:30:00', '2023-01-14 16:45:00'), + (1003, 103, '2023-01-15 11:20:00', '2023-01-18 09:45:00', '2023-01-22 13:10:00'), + (1004, 104, '2023-01-16 14:55:00', '2023-01-17 16:30:00', '2023-01-21 09:30:00'), + (1005, 105, '2023-01-18 10:15:00', NULL, NULL); + +-- Calculate processing, shipping, and total handling times +SELECT + OrderID, + OrderDate, + ShipDate, + DeliveryDate, + -- Processing time (from order to shipment) + DATEDIFF(hour, OrderDate, ShipDate) AS ProcessingHours, + -- Shipping time (from shipment to delivery) + DATEDIFF(day, ShipDate, DeliveryDate) AS ShippingDays, + -- Total time (from order to delivery) + DATEDIFF(day, OrderDate, DeliveryDate) AS TotalDays, + -- Identify delayed shipments (processing > 24 hours) + CASE + WHEN DATEDIFF(hour, OrderDate, ShipDate) > 24 THEN 'Delayed' + ELSE 'On Time' + END AS ShipmentStatus +FROM + Orders +WHERE + ShipDate IS NOT NULL; ``` + +The output of this code will be: + +| OrderID | OrderDate | ShipDate | DeliveryDate | ProcessingHours | ShippingDays | TotalDays | ShipmentStatus | +| ------- | ------------------- | ------------------- | ------------------- | --------------- | ------------ | --------- | -------------- | +| 1001 | 2023-01-10 09:30:00 | 2023-01-11 14:15:00 | 2023-01-15 11:20:00 | 29 | 4 | 5 | Delayed | +| 1002 | 2023-01-12 13:45:00 | 2023-01-13 10:30:00 | 2023-01-14 16:45:00 | 21 | 1 | 2 | On Time | +| 1003 | 2023-01-15 11:20:00 | 2023-01-18 09:45:00 | 2023-01-22 13:10:00 | 70 | 4 | 7 | Delayed | +| 1004 | 2023-01-16 14:55:00 | 2023-01-17 16:30:00 | 2023-01-21 09:30:00 | 26 | 4 | 5 | Delayed | + +This example demonstrates how `DATEDIFF()` can be used to calculate important business metrics for order processing. The query calculates the processing time in hours, shipping time in days, and total handling time in days. It also identifies delayed shipments based on processing times exceeding 24 hours. + +## Frequently Asked Questions + +### 1. How to calculate date difference between two dates in SQL? + +In SQL Server, use the `DATEDIFF()` function with an appropriate interval parameter like day, month, or year. For example, `DATEDIFF(day, '2023-01-01', '2023-01-15')` will return 14 days. + +### 2. Does `DATEDIFF()` include both the start and end dates in its calculation? + +`DATEDIFF()` counts the number of interval boundaries crossed between the two dates. For example, when using 'day', it counts the number of midnight boundaries crossed, not the full 24-hour periods. + +### 3. Why does `DATEDIFF(year, '2022-12-31', '2023-01-01')` return 1 even though it's just one day apart? + +Because `DATEDIFF()` counts calendar boundaries, not complete intervals. Since the dates span across a year boundary, it returns 1 year, even though the difference is only one day. + +### 4. Does `DATEDIFF()` take time zones into account? + +No, SQL Server's `DATEDIFF()` does not account for time zones or daylight saving time transitions. All calculations are done in the server's local time zone. + +### 5. Can I use `DATEDIFF()` with time-only values? + +Yes, you can use time data types with `DATEDIFF()`, but only with time-related intervals like second, minute, and hour. Using day or larger intervals with time-only values will always return 0. From 9f5c19b395cbdcf9e0abd067a634bc9778ddb33c Mon Sep 17 00:00:00 2001 From: Mamta Wardhani Date: Wed, 21 May 2025 16:09:54 +0530 Subject: [PATCH 2/9] Update datediff.md --- .../concepts/dates/terms/datediff/datediff.md | 191 ++++-------------- 1 file changed, 37 insertions(+), 154 deletions(-) diff --git a/content/sql/concepts/dates/terms/datediff/datediff.md b/content/sql/concepts/dates/terms/datediff/datediff.md index 9426633b080..66f952e9b2e 100644 --- a/content/sql/concepts/dates/terms/datediff/datediff.md +++ b/content/sql/concepts/dates/terms/datediff/datediff.md @@ -1,189 +1,72 @@ --- Title: 'DATEDIFF()' -Description: 'Calculates the difference between two date or timestamp values and returns the result as an integer.' +Description: 'Calculates and returns the difference between two date values. Available in SQL Server and MySQL.' Subjects: - - 'Computer Science' - - 'Web Development' + - 'Data Science' Tags: - 'Database' - 'Date' - - 'Functions' - - 'SQL' + - 'Queries' + - 'MySQL' + - 'SQL Server' CatalogContent: - 'learn-sql' - 'paths/analyze-data-with-sql' + - 'paths/design-databases-with-postgresql' --- -The **`DATEDIFF()`** function calculates the difference between two date or timestamp values and returns the result as an integer in a specified unit of time. This powerful function allows developers and analysts to easily measure time intervals between dates, which is essential for reporting, data analysis, and application development. +**`DATEDIFF()`** is a function found in SQL Server and MySQL that calculates and returns the difference between two date values. -`DATEDIFF()` serves as a cornerstone for date-based calculations in SQL Server, enabling users to perform operations like calculating ages, measuring durations of events, determining time elapsed between transactions, and creating date-based business metrics. Its versatility makes it invaluable for virtually any application that deals with temporal data. - -## Syntax +## SQL Server Syntax ```pseudo -DATEDIFF(interval, date1, date2) +DATEDIFF(datePart, date1, date2) ``` -**Parameters:** - -- `interval`: The time unit in which the difference will be calculated. Valid values include: - - `year`, `yy`, `yyyy`: Years - - `quarter`, `qq`, `q`: Quarters - - `month`, `mm`, `m`: Months - - `dayofyear`, `dy`, `y`: Day of the year - - `day`, `dd`, `d`: Days - - `week`, `wk`, `ww`: Weeks - - `hour`, `hh`: Hours - - `minute`, `mi`, `n`: Minutes - - `second`, `ss`, `s`: Seconds - - `millisecond`, `ms`: Milliseconds - - `microsecond`, `mcs`: Microseconds - - `nanosecond`, `ns`: Nanoseconds -- `date1`: The start date for the calculation. Can be a date, datetime, datetime2, smalldatetime, or time data type, or an expression that resolves to one of these types. -- `date2`: The end date for the calculation. Can be a date, datetime, datetime2, smalldatetime, or time data type, or an expression that resolves to one of these types. - -**Return value:** +The `DATEDIFF()` function in SQL Server has three required parameters: -The `DATEDIFF()` function returns an integer representing the number of time units (specified by the interval parameter) between date1 and date2. +- `datePart` is the part of the date to return. It can be one of the following formats: + - Year: `year`, `yyyy`, `yy` + - Quarter: `quarter`, `qq`, `q` + - Week: `week`, `ww`, `wk` + - Weekday: `weekday`, `dw`, `w` + - Second: `second`, `ss`, `s` + - Month: `month`, `mm`, `m` + - Minute: `minute`, `mi`, `n` + - Millisecond: `millisecond`, `ms` + - Hour: `hour`, `hh` + - Day of Year: `dayofyear` + - Day: `day`, `dy`, `y` +- `date1` and `date2` are the dates to compare. It can be in several formats, one being the `yyyy/mm/dd` format. -## Example 1: Basic Date Difference Calculation +### Example 1 -This example demonstrates how to calculate the difference between two dates in various time intervals: +The following example calculates the difference in months between `2020/05/18` and `2022/05/18`: ```sql --- Calculate difference between two dates in years, months, and days -SELECT - DATEDIFF(year, '2020-01-15', '2023-09-20') AS YearDiff, - DATEDIFF(month, '2020-01-15', '2023-09-20') AS MonthDiff, - DATEDIFF(day, '2020-01-15', '2023-09-20') AS DayDiff; +SELECT DATEDIFF(month, '2020/05/18', '2022/05/18'); /* Output: 24 */ ``` -Output produced by this code will be: - -| YearDiff | MonthDiff | DayDiff | -| -------- | --------- | ------- | -| 3 | 44 | 1344 | - -This example calculates the difference between January 15, 2020, and September 20, 2023, in years, months, and days. The results show there are 3 years, 44 months, or 1344 days between these dates. - -## Example 2: Calculating Age in Years +### Example 2 -This example demonstrates how to use `DATEDIFF()` to calculate a person's age in years from their birthdate. +The following example returns the difference in seconds between `2021/09/30 08:22:04` and `2021/09/30 08:25:06`: ```sql --- Create a sample table with employee data -CREATE TABLE Employees ( - EmployeeID INT PRIMARY KEY, - FirstName VARCHAR(50), - LastName VARCHAR(50), - BirthDate DATE, - HireDate DATE -); - --- Insert sample data -INSERT INTO Employees (EmployeeID, FirstName, LastName, BirthDate, HireDate) -VALUES - (1, 'John', 'Smith', '1985-06-15', '2010-03-20'), - (2, 'Sarah', 'Johnson', '1992-11-30', '2015-07-10'), - (3, 'Michael', 'Brown', '1978-02-23', '2005-09-15'); - --- Calculate ages as of current date -SELECT - EmployeeID, - FirstName + ' ' + LastName AS EmployeeName, - BirthDate, - DATEDIFF(year, BirthDate, GETDATE()) AS Age -FROM - Employees -ORDER BY - Age DESC; +SELECT DATEDIFF(second, '2021/09/30 08:22:04', '2021/09/30 08:25:06'); /* Output: 182 */ ``` -The output generated by this code will be: +## MySQL Syntax -| EmployeeID | EmployeeName | BirthDate | Age | -| ---------- | ------------- | ---------- | --- | -| 3 | Michael Brown | 1978-02-23 | 47 | -| 1 | John Smith | 1985-06-15 | 39 | -| 2 | Sarah Johnson | 1992-11-30 | 32 | +MySQL only requires two date parameters in the `DATEDIFF()` function and will return the number of days between `date1` and `date2`. -This example shows how to calculate an employee's age by finding the difference in years between their birthdate and the current date. Note that this calculation provides the raw year difference and doesn't account for whether the birthday has occurred yet in the current year. +```pseudo +DATEDIFF(date1, date2) +``` -## Example 3: Business Metrics with `DATEDIFF()` +### Example -This example demonstrates how to use `DATEDIFF()` for business reporting metrics, such as calculating order processing times and identifying delayed shipments. +The following example returns the difference in days between `2019-07-05` and `2018-12-24`: ```sql --- Create sample orders table -CREATE TABLE Orders ( - OrderID INT PRIMARY KEY, - CustomerID INT, - OrderDate DATETIME, - ShipDate DATETIME, - DeliveryDate DATETIME -); - --- Insert sample data -INSERT INTO Orders (OrderID, CustomerID, OrderDate, ShipDate, DeliveryDate) -VALUES - (1001, 101, '2023-01-10 09:30:00', '2023-01-11 14:15:00', '2023-01-15 11:20:00'), - (1002, 102, '2023-01-12 13:45:00', '2023-01-13 10:30:00', '2023-01-14 16:45:00'), - (1003, 103, '2023-01-15 11:20:00', '2023-01-18 09:45:00', '2023-01-22 13:10:00'), - (1004, 104, '2023-01-16 14:55:00', '2023-01-17 16:30:00', '2023-01-21 09:30:00'), - (1005, 105, '2023-01-18 10:15:00', NULL, NULL); - --- Calculate processing, shipping, and total handling times -SELECT - OrderID, - OrderDate, - ShipDate, - DeliveryDate, - -- Processing time (from order to shipment) - DATEDIFF(hour, OrderDate, ShipDate) AS ProcessingHours, - -- Shipping time (from shipment to delivery) - DATEDIFF(day, ShipDate, DeliveryDate) AS ShippingDays, - -- Total time (from order to delivery) - DATEDIFF(day, OrderDate, DeliveryDate) AS TotalDays, - -- Identify delayed shipments (processing > 24 hours) - CASE - WHEN DATEDIFF(hour, OrderDate, ShipDate) > 24 THEN 'Delayed' - ELSE 'On Time' - END AS ShipmentStatus -FROM - Orders -WHERE - ShipDate IS NOT NULL; +SELECT DATEDIFF("2019-07-05", "2018-12-24"); /* Output: 193 */ ``` - -The output of this code will be: - -| OrderID | OrderDate | ShipDate | DeliveryDate | ProcessingHours | ShippingDays | TotalDays | ShipmentStatus | -| ------- | ------------------- | ------------------- | ------------------- | --------------- | ------------ | --------- | -------------- | -| 1001 | 2023-01-10 09:30:00 | 2023-01-11 14:15:00 | 2023-01-15 11:20:00 | 29 | 4 | 5 | Delayed | -| 1002 | 2023-01-12 13:45:00 | 2023-01-13 10:30:00 | 2023-01-14 16:45:00 | 21 | 1 | 2 | On Time | -| 1003 | 2023-01-15 11:20:00 | 2023-01-18 09:45:00 | 2023-01-22 13:10:00 | 70 | 4 | 7 | Delayed | -| 1004 | 2023-01-16 14:55:00 | 2023-01-17 16:30:00 | 2023-01-21 09:30:00 | 26 | 4 | 5 | Delayed | - -This example demonstrates how `DATEDIFF()` can be used to calculate important business metrics for order processing. The query calculates the processing time in hours, shipping time in days, and total handling time in days. It also identifies delayed shipments based on processing times exceeding 24 hours. - -## Frequently Asked Questions - -### 1. How to calculate date difference between two dates in SQL? - -In SQL Server, use the `DATEDIFF()` function with an appropriate interval parameter like day, month, or year. For example, `DATEDIFF(day, '2023-01-01', '2023-01-15')` will return 14 days. - -### 2. Does `DATEDIFF()` include both the start and end dates in its calculation? - -`DATEDIFF()` counts the number of interval boundaries crossed between the two dates. For example, when using 'day', it counts the number of midnight boundaries crossed, not the full 24-hour periods. - -### 3. Why does `DATEDIFF(year, '2022-12-31', '2023-01-01')` return 1 even though it's just one day apart? - -Because `DATEDIFF()` counts calendar boundaries, not complete intervals. Since the dates span across a year boundary, it returns 1 year, even though the difference is only one day. - -### 4. Does `DATEDIFF()` take time zones into account? - -No, SQL Server's `DATEDIFF()` does not account for time zones or daylight saving time transitions. All calculations are done in the server's local time zone. - -### 5. Can I use `DATEDIFF()` with time-only values? - -Yes, you can use time data types with `DATEDIFF()`, but only with time-related intervals like second, minute, and hour. Using day or larger intervals with time-only values will always return 0. From 33dfcb07c461cc6ff98424efd6c55b569dd62fc5 Mon Sep 17 00:00:00 2001 From: Mamta Wardhani Date: Sun, 8 Jun 2025 00:08:41 +0530 Subject: [PATCH 3/9] [Edit] Python: Pandas: .to_datetime() --- .../terms/to-datetime/to-datetime.md | 202 +++++++++++++++--- 1 file changed, 175 insertions(+), 27 deletions(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index 2fecb8d79f1..76217812993 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -1,56 +1,204 @@ --- Title: '.to_datetime()' -Description: 'Returns a pandas datetime object for a given object, such as a Series or DataFrame' +Description: 'Converts various date and time representations into standardized pandas datetime objects for time series analysis.' Subjects: - - 'Data Science' - 'Computer Science' + - 'Data Science' Tags: - - 'Date' - - 'Display' + - 'Data Types' + - 'Functions' + - 'Time' - 'Pandas' CatalogContent: - 'learn-python-3' - 'paths/data-science' --- -The **`.to_datetime()`** function returns a pandas datetime object for a given object, often an array or dictionary-like type such as a Series or DataFrame. +The **`.to_datetime()`** function in Pandas transforms various date and time representations into standardized pandas datetime objects. It serves as the primary mechanism for converting strings, integers, lists, Series, or DataFrames containing date-like information into `datetime64` objects that can be used for time series analysis and date arithmetic operations. + +This function is essential in data preprocessing workflows where raw data contains dates in multiple formats, making it difficult to perform temporal analysis. Common use cases include converting CSV file date columns from strings to datetime objects, standardizing mixed date formats within datasets, handling Unix timestamps from APIs, parsing dates with different regional formats, and creating time series indexes for financial or scientific data analysis. The function provides robust error handling and format inference capabilities, making it indispensable for real-world data cleaning scenarios. ## Syntax -This function returns a value in datetime format. Various input arguments can be used as described below. +```pseudo +pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, + utc=False, format=None, exact=, unit=None, + infer_datetime_format=, origin='unix', cache=True) +``` + +**Parameters:** + +- `arg`: The object to convert to datetime. Can be scalar, array-like, Series, or DataFrame/dict-like +- `errors`: How to handle parsing errors - 'raise' (default), 'coerce', or 'ignore' +- `dayfirst`: Boolean, if True parses dates with day first (e.g., "31/12/2023" as Dec 31) +- `yearfirst`: Boolean, if True parses dates with year first when ambiguous +- `utc`: Boolean, if True returns UTC DatetimeIndex +- `format`: String format to parse the datetime (e.g., '%Y-%m-%d') +- `exact`: Boolean, if True requires exact format match +- `unit`: Unit for numeric timestamps ('D', 's', 'ms', 'us', 'ns') +- `infer_datetime_format`: Boolean, attempts to infer format for faster parsing (deprecated) +- `origin`: Reference date for numeric values, default 'unix' (1970-01-01) +- `cache`: Boolean, use cache for improved performance with duplicate values + +**Return value:** + +The function returns datetime-like objects depending on input type: + +- **Scalar input**: Returns pandas Timestamp +- **Array-like input**: Returns DatetimeIndex +- **Series input**: Returns Series with datetime64[ns] dtype +- **DataFrame input**: Returns Series with datetime64[ns] dtype from assembled columns + +## Example 1: Basic String Conversion Using `.to_datetime()` + +This example demonstrates the fundamental usage of `.to_datetime()` for converting date strings into pandas datetime objects: ```py -pandas.to_datetime(arg, format=None, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, infer_datetime_format=False, origin='unix', cache=True) +import pandas as pd + +# Create a Series with various date string formats +date_strings = pd.Series(['2023-01-15', '2023-02-20', '2023-03-25', '2023-04-30']) + +# Convert string dates to datetime objects +converted_dates = pd.to_datetime(date_strings) + +print("Original strings:") +print(date_strings) +print("\nConverted to datetime:") +print(converted_dates) +print(f"\nData type: {converted_dates.dtype}") +``` + +The output produced by this code will be: + +```shell +Original strings: +0 2023-01-15 +1 2023-02-20 +2 2023-03-25 +3 2023-04-30 +dtype: object + +Converted to datetime: +0 2023-01-15 +1 2023-02-20 +2 2023-03-25 +3 2023-04-30 +dtype: datetime64[ns] + +Data type: datetime64[ns] ``` -| Parameter Name | Data Type | Usage | -| ----------------------- | ------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- | -| `arg` | int, float, str, datetime, list, tuple, 1-d array, Series, DateFrame/dict-like | Converts given data into a datetime | -| `errors` | 'ignore', 'raise', 'coerce' | The given keyword determines the handling of errors | -| `dayfirst` | bool (default `False`) | Specifies that the str or list-like object begins with a day | -| `yearfirst` | bool (default `True`) | Specifies that the str or list-like object begins with a year | -| `utc` | bool (default `None`) | When `True`, output is converted to UTC time zone | -| `format` | str (default `None`) | Pass a strftime to specify the format of the datetime conversion | -| `exact` | bool (default `True`) | Determines how the format parameter is applied | -| `unit` | str (default 'ns') | Specifies the units of the passed object | -| `infer_datetime_format` | bool (default `False`) | When `True`, and no format has been passed, the datetime string will be based on the first non-`NaN` element within the object | -| `origin` | scalar (default unix) | Sets the reference date | -| `cache` | bool (default `True`) | Allows the use of a unique set of converted dates to apply the conversion (only applied when object contains at least 50 values) | +This example shows how `.to_datetime()` automatically recognizes standard date formats and converts them to pandas datetime objects. The resulting Series has `datetime64[ns]` dtype, enabling time-based operations and analysis. -## Example +## Example 2: Financial Data Processing -The code below demonstrates the conversion of a string to a datetime object with the `.to_datetime()` function. +This example shows how to process financial data with mixed date formats and handle missing values, a common scenario in real-world datasets: ```py import pandas as pd -my_list = ['11/09/30'] +import numpy as np + +# Create a DataFrame simulating financial data with mixed date formats +financial_data = pd.DataFrame({ + 'trade_date': ['2023-01-15', '15/02/2023', '03-25-2023', 'invalid_date', '2023-04-30'], + 'stock_price': [150.25, 155.80, 148.90, 152.10, 159.75], + 'volume': [1000000, 1200000, 950000, 1100000, 1300000] +}) + +# Convert dates with error handling for invalid entries +financial_data['trade_date'] = pd.to_datetime( + financial_data['trade_date'], + errors='coerce', # Convert invalid dates to NaT + dayfirst=False # Assume month comes first in ambiguous dates +) + +# Display the processed data +print("Financial data with processed dates:") +print(financial_data) + +# Check for any missing dates after conversion +missing_dates = financial_data['trade_date'].isna().sum() +print(f"\nNumber of invalid dates converted to NaT: {missing_dates}") -xyz = pd.to_datetime(my_list, dayfirst=True) -print(xyz) +# Filter out rows with invalid dates for analysis +clean_data = financial_data.dropna(subset=['trade_date']) +print(f"\nClean data shape: {clean_data.shape}") ``` -This example results in the following output:: +The output of this code is: ```shell -DatetimeIndex(['2030-11-09'], dtype='datetime64[ns]', freq=None) +Financial data with processed dates: + trade_date stock_price volume +0 2023-01-15 150.25 1000000 +1 2023-02-15 155.80 1200000 +2 2023-03-25 148.90 950000 +3 NaT 152.10 1100000 +4 2023-04-30 159.75 1300000 + +Number of invalid dates converted to NaT: 1 + +Clean data shape: (4, 3) +``` + +This example demonstrates handling real-world financial data where dates might be in different formats or contain invalid entries. Using `errors='coerce'` converts unparseable dates to NaT (Not a Time), allowing the analysis to continue with valid data. + +## Codebyte Example: Sensor Data Time Series Analysis + +This example processes sensor data with Unix timestamps and demonstrates creating a time series index for scientific data analysis: + +```codebyte/python +import pandas as pd +import numpy as np + +# Create sensor data with Unix timestamps (seconds since 1970-01-01) +sensor_timestamps = [1672531200, 1672534800, 1672538400, 1672542000, 1672545600] # Hourly readings +temperature_readings = [23.5, 24.1, 23.8, 24.3, 24.7] +humidity_readings = [45.2, 46.8, 44.9, 47.1, 48.3] + +# Create DataFrame with sensor data +sensor_data = pd.DataFrame({ + 'timestamp': sensor_timestamps, + 'temperature_c': temperature_readings, + 'humidity_percent': humidity_readings +}) + +# Convert Unix timestamps to datetime objects +sensor_data['datetime'] = pd.to_datetime( + sensor_data['timestamp'], + unit='s' # Specify that timestamps are in seconds +) + +# Set datetime as index for time series analysis +sensor_data.set_index('datetime', inplace=True) + +# Drop the original timestamp column +sensor_data.drop('timestamp', axis=1, inplace=True) + +print("Processed sensor data with datetime index:") +print(sensor_data) + +# Demonstrate time series capabilities +print(f"\nData collection period: {sensor_data.index[0]} to {sensor_data.index[-1]}") +print(f"Average temperature: {sensor_data['temperature_c'].mean():.1f}°C") + +# Resample data (example: if we had more data points) +print(f"\nTime series index frequency: {sensor_data.index.freq}") ``` + +This example shows how to process sensor data with Unix timestamps, which is common in IoT applications and scientific data collection. Converting timestamps to datetime objects and using them as an index enables powerful time series analysis capabilities in pandas. + +## Frequently Asked Questions + +### 1. Can I convert multiple date columns at once? + +Yes, you can apply `to_datetime()` to multiple columns using `apply()` or process each column individually. For DataFrames with separate year, month, day columns, pass the DataFrame directly to `to_datetime()` and it will automatically assemble the datetime from the columns. + +### 2. How do I handle dates before 1677 or after 2262? + +Pandas `datetime64[ns]` has limitations for dates outside this range. For such dates, pandas will return Python datetime objects instead of Timestamp objects, which may have reduced functionality for time series operations. + +### 3. Can I specify custom origins for Unix timestamps? + +Yes, use the `origin` parameter to set a custom reference date. For example, `origin='2000-01-01'` will interpret numeric values as time units from that date instead of the Unix epoch. From ae3c26f6b77f7fd3c55ca502537425d2882ca11c Mon Sep 17 00:00:00 2001 From: Avdhoot <50920321+avdhoottt@users.noreply.github.com> Date: Wed, 25 Jun 2025 23:35:42 +0530 Subject: [PATCH 4/9] Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md --- .../built-in-functions/terms/to-datetime/to-datetime.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index 76217812993..23a04d73d55 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -14,9 +14,9 @@ CatalogContent: - 'paths/data-science' --- -The **`.to_datetime()`** function in Pandas transforms various date and time representations into standardized pandas datetime objects. It serves as the primary mechanism for converting strings, integers, lists, Series, or DataFrames containing date-like information into `datetime64` objects that can be used for time series analysis and date arithmetic operations. +The **`.to_datetime()`** function in Pandas transforms various date and time representations into standardized pandas datetime objects. It serves as the primary mechanism for converting strings, integers, lists, Series, or [DataFrames](https://www.codecademy.com/resources/docs/pandas/dataframe) containing date-like information into `datetime64` objects that can be used for time series analysis and date arithmetic operations. -This function is essential in data preprocessing workflows where raw data contains dates in multiple formats, making it difficult to perform temporal analysis. Common use cases include converting CSV file date columns from strings to datetime objects, standardizing mixed date formats within datasets, handling Unix timestamps from APIs, parsing dates with different regional formats, and creating time series indexes for financial or scientific data analysis. The function provides robust error handling and format inference capabilities, making it indispensable for real-world data cleaning scenarios. +This function is essential in data preprocessing workflows where raw data contains dates in multiple formats, making temporal analysis difficult. Common use cases include converting CSV file date columns from strings to datetime objects, standardizing mixed date formats within datasets, handling Unix timestamps from APIs, parsing dates with different regional formats, and creating time series indexes for financial or scientific data analysis. The function provides robust error handling and format inference capabilities, making it indispensable for real-world data cleaning scenarios. ## Syntax From c8e08dc51cabb3f747f5da15029c4a42c0a11a2d Mon Sep 17 00:00:00 2001 From: Avdhoot <50920321+avdhoottt@users.noreply.github.com> Date: Wed, 25 Jun 2025 23:36:29 +0530 Subject: [PATCH 5/9] Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md --- .../built-in-functions/terms/to-datetime/to-datetime.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index 23a04d73d55..4f837a291dc 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -28,11 +28,11 @@ pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, **Parameters:** -- `arg`: The object to convert to datetime. Can be scalar, array-like, Series, or DataFrame/dict-like +- `arg`: The object to convert to `datetime`. Can be scalar, array-like, Series, or `DataFrame`/dict-like - `errors`: How to handle parsing errors - 'raise' (default), 'coerce', or 'ignore' - `dayfirst`: Boolean, if True parses dates with day first (e.g., "31/12/2023" as Dec 31) - `yearfirst`: Boolean, if True parses dates with year first when ambiguous -- `utc`: Boolean, if True returns UTC DatetimeIndex +- `utc`: Boolean, if True returns UTC `DatetimeIndex` - `format`: String format to parse the datetime (e.g., '%Y-%m-%d') - `exact`: Boolean, if True requires exact format match - `unit`: Unit for numeric timestamps ('D', 's', 'ms', 'us', 'ns') From bc703c25fedbd0a2bc3c162b12396ca13c1db860 Mon Sep 17 00:00:00 2001 From: Avdhoot <50920321+avdhoottt@users.noreply.github.com> Date: Wed, 25 Jun 2025 23:36:56 +0530 Subject: [PATCH 6/9] Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md --- .../built-in-functions/terms/to-datetime/to-datetime.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index 4f837a291dc..c124f6b2099 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -45,9 +45,9 @@ pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, The function returns datetime-like objects depending on input type: - **Scalar input**: Returns pandas Timestamp -- **Array-like input**: Returns DatetimeIndex -- **Series input**: Returns Series with datetime64[ns] dtype -- **DataFrame input**: Returns Series with datetime64[ns] dtype from assembled columns +- **Array-like input**: Returns `DatetimeIndex` +- **Series input**: Returns Series with `datetime64[ns]` dtype +- **`DataFrame` input**: Returns Series with `datetime64[ns]` dtype from assembled columns ## Example 1: Basic String Conversion Using `.to_datetime()` From 1fa8a1c7a8b99979579854aaca83f67a50288337 Mon Sep 17 00:00:00 2001 From: Avdhoot <50920321+avdhoottt@users.noreply.github.com> Date: Wed, 25 Jun 2025 23:37:21 +0530 Subject: [PATCH 7/9] Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md --- .../built-in-functions/terms/to-datetime/to-datetime.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index c124f6b2099..faadf3b0744 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -51,7 +51,7 @@ The function returns datetime-like objects depending on input type: ## Example 1: Basic String Conversion Using `.to_datetime()` -This example demonstrates the fundamental usage of `.to_datetime()` for converting date strings into pandas datetime objects: +The following example demonstrates the fundamental usage of `.to_datetime()` for converting date strings into pandas `datetime` objects: ```py import pandas as pd From 0875637723ad8911588d03dcccb283ceb3303272 Mon Sep 17 00:00:00 2001 From: Avdhoot <50920321+avdhoottt@users.noreply.github.com> Date: Wed, 25 Jun 2025 23:38:34 +0530 Subject: [PATCH 8/9] Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md --- .../built-in-functions/terms/to-datetime/to-datetime.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index faadf3b0744..c9e5d5c162e 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -93,7 +93,7 @@ This example shows how `.to_datetime()` automatically recognizes standard date f ## Example 2: Financial Data Processing -This example shows how to process financial data with mixed date formats and handle missing values, a common scenario in real-world datasets: +The following example shows how to process financial data with mixed date formats and handle missing values, a common scenario in real-world datasets: ```py import pandas as pd From ce98054fae610462c005e96a7ab72766b7399401 Mon Sep 17 00:00:00 2001 From: Avdhoot <50920321+avdhoottt@users.noreply.github.com> Date: Wed, 25 Jun 2025 23:39:04 +0530 Subject: [PATCH 9/9] Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md --- .../built-in-functions/terms/to-datetime/to-datetime.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md index c9e5d5c162e..24d5e56098f 100644 --- a/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md +++ b/content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md @@ -146,7 +146,7 @@ This example demonstrates handling real-world financial data where dates might b ## Codebyte Example: Sensor Data Time Series Analysis -This example processes sensor data with Unix timestamps and demonstrates creating a time series index for scientific data analysis: +The following example processes sensor data with Unix timestamps and demonstrates creating a time series index for scientific data analysis: ```codebyte/python import pandas as pd