Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[12.x] prefer "datetime" types over "timestamp" types #54256

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

browner12
Copy link
Contributor

@browner12 browner12 commented Jan 18, 2025

I know this has been brought up before, but I'm going to make the case again why datetimes are the superior data type compared to timestamps, and why Laravel should make these their recommended and default types for v12 and beyond.

Premises

  • All dates should be stored as UTC values (although we will not hinder people who choose to do otherwise)
  • Any specifics given in this PR are for MySQL

What this PR does NOT do

  • force Laravel to auto convert values to a given timezone. the Laravel team has made their opinions on automatic conversion known. it is on the user to submit values to the database in their desired timezone.

Parity between datetime and timestamp

Storage requirements

timestamp requires 4 bytes for storage. datetime requires 5 bytes for storage. both allow an additional 3 bytes for precision.

One of the proposals for solving the 2038 problem for timestamp is to increase it to a 64 bit integer, which would increase its storage requirements to 8 bytes.

https://dev.mysql.com/doc/refman/8.4/en/storage-requirements.html#data-types-storage-reqs-date-time

Performance

There has been some confusion in other related PRs, Issues, and Discussions about how the performance of datetime would be worse than timestamp because it stores the date as a string, and string comparison is slower than integer comparison.

datetime is actually stored internally in a fixed length binary format which allows comparisons to be just as efficient as integer comparison.

For testing, I created a table with the following migration:

Schema::create('tests', function (Blueprint $table) {
    $table->id();
    $table->timestamp('timestamp');
    $table->dateTime('datetime');
    $table->timestamps();
});

I filled the table with 100,000 rows with a random date stored in both the "timestamp" and "datetime" fields. I ran the following queries and had results consistently within 1ms of each other.

SELECT * FROM `tests` WHERE `timestamp` <  '2025-01-01';
SELECT * FROM `tests` WHERE `datetime` <  '2025-01-01';

Allow using "CURRENT_TIMESTAMP"

Both data types allow using the "CURRENT_TIMESTAMP" for both an initial value and an "on update" value.

datetime benefits

Solves the 2038 issue

timestamp fields store their value internally as a signed 32 bit integer, which means any dates after 2038/01/19 are not valid for timestamps. this is not as big of an issue right now, since most stored dates are in the past, but could potentially be a huge problem when we reach that date. it does affect current use, too, when you may be storing a future date, like an expiration.

datetime fields have a minimum value of 1000-01-01 and a maximum value of 9999-12-31, giving us a much wider valid date range, and eliminating the 2038 problem

Ignorant of Server/SQL timezone

Lastly, what may be the most important of all the benefits of datetime, it is completely ignorant of the timezone set on either the server or SQL, while timestamp is not.

When a date is entered into a timestamp it will first attempt to convert it to UTC for internal storage. This is dependent on a couple of factors. SQL could have its own explicitly set timezone. More likely, it will be set to "SYSTEM" which means it defers to the timezone set on the OS. Either way, issues arise when SQL deems its timezone to be something other than UTC. Let's say for example, SQL's timezone is set to CST(-6). When it receives a value for a timestamp field, it will interpret the value it receives as a CST value, and convert it to UTC for internal storage, and then also convert it back to CST when the value is retrieved. Now, whether you actually intended to give it a CST value is irrelevant, because all you really care about is that the value you gave it is EXACTLY what you got back.

As long as that SQL timezone value stays the same, you're actually kind of ok, even if things don't technically match up. However, things can go very poorly if the SQL timezone changes.

Imagine again we have our server with the timezone set to CST. We insert a row with a CST value, and SQL converts the timestamp field to UTC internally. Now someone comes along and sees that the server is set to CST, but should probably be UTC because that's pretty standard for servers. Unfortunately that simple change would mess up all of our data. Now when that row is retrieved from the database, SQL sees the server is in UTC, so it just gives the internal value it stored back to us, even though thats not correct and should have been converted.

This means the value we put into the database is NOT the value we got out! Some might argue that's intentional, but I would say for the large majority of people any timezone other than UTC on the server is pure happenstance or oversight, and not actually what they intended.

If we switch to datetime fields, SQL ignores any server or SQL timezone settings and simply stores the value you give it, and returns exactly the same value when you request it. By making ourselves ignorant of any server settings, we actually protect ourselves from any unintentional errors like mentioned above.

For some real numbers, assume we started with a server in CST, the table will show how timestamp and datetime differ.

Data Type Submitted Value Internal Value Returned Value with Server CST Returned Value with Server UTC
timestamp 2020-02-12 12:00:00 2022-02-12 18:00:00 2020-02-12 12:00:00 2022-02-12 18:00:00
datetime 2020-02-12 12:00:00 2020-02-12 12:00:00 2020-02-12 12:00:00 2020-02-12 12:00:00

Questionable Changes

One thing I did not change was the softDeletes() method. I think ideally it would change to using datetimes internally, and then a new softDeletesTimestamp() method would be created for that specific use. However, I'm not sure how that would affect existing usage of softDeletes() that were executed when it used timestamps.

"datetimes" are the better default choice for date related columns, and should be the recommended way from Laravel going forward

- address 2038 issue
- only 1 extra bye
- internal binary storage for equal performance
- ignorant of server/SQL timezone
@browner12 browner12 marked this pull request as draft January 18, 2025 23:45
@browner12 browner12 marked this pull request as ready for review January 19, 2025 00:35
@Rizky92
Copy link

Rizky92 commented Jan 19, 2025

Here's my two cents. While timestamp had 2038 problem, one of the alternative was to store it as UNSIGNED BIGINT. PHP itself had already support for 64 bit timestamp. Internally, Laravel had already use UNSIGNED INT for some of its tables, which can be changed to BIGINT without breaking change.

image

I'm not sure about performance indication, although I believe it should be minimal on both sides.

The only drawback was storing as UNSIGNED BIGINT may have confuse users because it has less meaning, and using the value as timestampp in other languages that may not have 64 bit support yet would make it unreadable.

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

@ziming
Copy link
Contributor

ziming commented Jan 19, 2025

I personally feel it is better to wait till closer to 2038 and see what is the consensus is for this topic for timestamps. Maybe by then there is a better solution or non issue

@browner12
Copy link
Contributor Author

@Rizky92 storing as an BIGINT is a poor solution because then we lose readability in DB guis, and it will increase storage costs to 8 bytes.

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

I don't understand this point. can you elaborate?

@ziming we have a data type literally called "datetime" that was built to handle dates and times. if we start enforcing this good standard now, the 2038 problem literally goes away. regardless of the 2038 aspect, using datetime also takes a foot gun away from people with regards to timezones. now is the time for this better solution.

@Rizky92
Copy link

Rizky92 commented Jan 19, 2025

@Rizky92 storing as an BIGINT is a poor solution because then we lose readability in DB guis, and it will increase storage costs to 8 bytes.

I understand that. That's why it's a one alternative over many that had the potential to avoid breaking change. :)

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

I don't understand this point. can you elaborate?

My bad. I shouldn't have add that. I was thinking whether it's relevant to the scope of this PR. Basically using datetime loses timezone information over timestamp have, because timestamp internally uses dataabse server timezone to offset the datetime information. Changing the columns to datetime means new apps must explicitly define which timezone it is live on.

@taylorotwell
Copy link
Member

Hey @browner12 - thanks for this PR. Are there any breaking changes for existing applications?

@antonkomarev
Copy link
Contributor

antonkomarev commented Jan 19, 2025

In our MySQL application we stopped using TIMESTAMP data type because it may differ of server settings. This may lead to big date issues.

@browner12
Copy link
Contributor Author

@taylorotwell shoot, I knew there was something in my original post I forgot that I wanted to add.

As far as I can tell, there would be no breaking changes because this PR only affects stubs, which would only affect migrations going forward.

I've also setup a small application with 2 models, with one using $table->timestamps() and one using $table->datetimes(). They seem to work just fine along side each other, as Laravel's handling of the casts does the heavy lifting.

As stated, the one questionable change we could make is to have $table->softDeletes() switch to using a datetime, and then creating a dedicated $table->softDeletesTimestamp() for users who still wanted to use the old way. My concern about not doing this is people will still just use $table->softDeletes() because they won't be any the wiser of the underlying behavior. I don't think making the change would have an affect on existing applications because the migrations wouldn't re-execute. However, you could have scenarios where someones local environment was different than production if they often run artisan migrate:fresh --seed or something similar. locally, they would have datetime deleted_at fields, and on production they would have timestamp deleted_at fields. IF the production database is only using UTC anyway, this doesn't really make a difference.

Would love some others thoughts on this.


I've also done some testing locally about updating the column definition of a timestamp field to a datetime, and it's actually pretty straightforward. For example:

ALTER TABLE `test` CHANGE `updated_at` `updated_at` datetime NULL;

Basically what seems to happen is the value you see remains unchanged. It just loses it's awareness of the server/sql timezone setting. Again, if you were doing everything in UTC anyway, there is no impact.

@browner12 browner12 changed the title prefer "datetime" types over "timestamp" types [12.x] prefer "datetime" types over "timestamp" types Jan 19, 2025
@donnysim
Copy link
Contributor

There's also a "timezone" option you can specify on the connection in config (at least for mysql, that is not present by default) to switch the timezone for how the timestamps are retrieved independent of the server. Overall I'd say this change is necessary not only because of timestamp issues, but also to make more dev aware that majority of datepickers return based on local user timezone if not set to ISO format (and even that, js ISO is not really compatible with the PHP ISO validation), not the servers or the projects and it should be handled no matter if the project is single country targeted etc. as country does not mean you have to live there to use it. There's also not a lot of content around this to make more developers aware of it and I often encounter dangerous project changes to dates without the knowledge why it's made as it is from juniors.

@kminek
Copy link
Contributor

kminek commented Jan 21, 2025

There's also a "timezone" option you can specify on the connection in config (at least for mysql, that is not present by default) to switch the timezone for how the timestamps are retrieved independent of the server. Overall I'd say this change is necessary not only because of timestamp issues, but also to make more dev aware that majority of datepickers return based on local user timezone if not set to ISO format (and even that, js ISO is not really compatible with the PHP ISO validation), not the servers or the projects and it should be handled no matter if the project is single country targeted etc. as country does not mean you have to live there to use it. There's also not a lot of content around this to make more developers aware of it and I often encounter dangerous project changes to dates without the knowledge why it's made as it is from juniors.

maybe this timezone option on mysql connection should be present by default in app skeleton and set to application timezone from config.app - just my two cents

@donnysim
Copy link
Contributor

donnysim commented Jan 21, 2025

@kminek it's not as simple as just adding it. To set a time zone, the mysql.time_zone_name table must contain it or it will throw an exception, and on windows it's empty by default so you have to go download and import it. It also adds additional sql call on init. And in all cases you must ensure that it contains your specified time zone or it will result in your site being unavailable because of that one added sql call that will fail if it's not.

@browner12
Copy link
Contributor Author

yah, while related, the database config timezone option is out of scope of this PR. and actually, switching to datetime types makes it all moot anyway.

@TheLevti
Copy link
Contributor

TheLevti commented Jan 28, 2025

Let the user/app decide/configure what he wants as default instead of forcing framework defaults. In my personal experience I discourage using datetime anywhere in MySQL databases and even in postgres (there you even have timestamp+tz support). The argument about the year 2038 is too weak as by then it most likely will have been resolved.

The tricky part lies in clients being able to connect in different timezones (see connection time_zone session variable) for different reasons. The timestamp type ensures that the data is always stored in UTC regardless of what timezone the connection is in. Should the connection timezone change, you will still get correct values back when reading them. (please don't mix it up with app and connection timezone mismatch, thats another problem)

We had and still have this legacy issue that we drag with us, because back in the days developers chose to use datetime columns for critical tables instead of timestamp. Why it becomes a problem is that once a client with a non UTC connection starts to populate those columns, you are forced to forever end ever connect with the same timezone you have stored them otherwise you will get a variable mismatch depending on the time of the year (think daylight saving timezones that switch between +1/+2). Lets say newer systems start to connect to this old database, if they can not be all aligned to connect with the same timezone as the old entries were written with, all queries and inline operations would need to adjust the values by +1 or +2 hours dynamically. Its a huge mess. When you deal with money and transactions and many different tech stacks connect to this database, this is highly critical.

As a rule of thumb, use timestamp for marking something like an event that has happened in the world or application (created_at, updated_at, disabled_at, credited_at, etc.). Use datetime when you need to specify a time and date not bound to the flow of time or timezone (e.g. a recurring payment date that has to happen e.g. on the 2nd of a month at 15:00).

@browner12
Copy link
Contributor Author

As a rule of thumb, use timestamp for marking something like an event that has happened in the world or application (created_at, updated_at, disabled_at, credited_at, etc.). Use datetime when you need to specify a time and date not bound to the flow of time or timezone (e.g. a recurring payment date that has to happen e.g. on the 2nd of a month at 15:00).

This is starting to feel more and more like the comment everyone just regurgitates about timestamp vs datetime rather than actual valid advice. At the end of the day what we want is an unambiguous point in time. Both of the types can provide that, but they both make very different assumptions.

datetime assumes everything is given to it in an explicit timezone (traditionally and very recommended UTC).

timestamp assumes that it is receiving values in the given SQL timezone, whether that's set at the server level, the GLOBAL SQL level, or the SESSION SQL level.

The difference here is in which assumption is more dangerous. IMO, for the vast majority of users, the MUCH safer option is to completely ignore any server, SQL GLOBAL, or SQL SESSION timezone settings, and to assume that everything is handled in UTC. Then it is the responsibility of the code to apply any timezone related adjustments to user facing output.


The argument about the year 2038 is too weak as by then it most likely will have been resolved.

We can solve it now, AND we can solve it with a more space efficient result (5 bytes) than what will probably end up being the solution which will store 8 bytes.


The problem lies in clients being able to connect in different timezones (see connection time_zone session variable).

I'm guessing the VAST majority of users are not connecting to their database with custom timezone variables. While Laravel supports it, most users don't even know it exists because it is not part of the default config options. You also have the downside of it requiring an extra query on EVERY connection. Most users are just relying on whatever SQL GLOBAL value they have set (which is hopefully UTC).


As I re-read your comment, what it seems to really come down to is WHERE the timezone conversion is happening. You are doing it via SQL. I'm advocating for it in the code because:

  1. More efficient, fewer queries.
  2. Less risky because server timezone variable changes are inconsequential.
  3. Where the majority of users already handle their timezone adjustments.

Let the user/app decide/configure what he wants as default instead of forcing framework defaults.

We already are enforcing a "default", currently a timestamp. This proposal is to change that default to the better datetime option. Remember at the end of the day, these are just stubs. If you truly want to stick with timestamps you can. We're just trying to set the best default for the 99%.

@TheLevti
Copy link
Contributor

TheLevti commented Jan 28, 2025

How do you come up that for 99% the default should be datetime? Majority never ever needs date ranges outside of timestamp range, its most of the time created_at and updated_at anyway, it is more efficient in storage and index size, and it offers good handling of multiple clients using the same database, especially, when it is not easy to enforce the same datetime handling on each client. Majority is not a micro service architecture where each service has its own database.

So I would keep the status quo, no need to change something for everyone for no urgent reason.

@donnysim
Copy link
Contributor

I mean, this is just a stub, any argument for or against datetime can be countered with the same switched arguments for timestamp, with the only difference being datetime can store beyond the limit, it's just depends on the use. I'm all for datetime as default when it comes to multi-timezone apps but the downside to this is that any Laravel package you use that adds dates uses timestamps, which would mean either you use timestamps to avoid footgun, or you accept your fate and monitor all those migrations. Overall neither datetime nor timestamp is a golden bullet when it comes to handling dates from various users as you still need to do the conversion to a specific timezone, just depends to which timezone, if you don't think about this, you don't have the problems you're describing.

@TheLevti
Copy link
Contributor

I know, its kind of pointless to argue about stubs, but then why do we need to change it? Did it cause trouble or issues? It feels like we change this, because of personal opinion. If the average user understands the differences or has the need to use this or that, he will change it in his project.

@donnysim
Copy link
Contributor

donnysim commented Jan 28, 2025

Thinking about this more, I'm all for datetime, but having mixed datetime and timestamps in the same codebase without any reason sounds really bad, which will end up with this change as nobody really pays attention to it and can have unexpected side effects and this not being a great idea to default unless all packages switch too.

@browner12
Copy link
Contributor Author

If the average user understands the differences or has the need to use this or that, he will change it in his project.

I would argue the average user doesn't understand, and is exactly the reason I'm trying to get rid of the footgun that is timestamp for the majority of users.

I mean, this is just a stub, any argument for or against datetime can be countered with the same switched arguments for timestamp

I disagree. I laid out some very clear points about why datetime is a better storage type for dates. If you have counterpoints to any of the points, I welcome that. If you have points about why timestamp is better, I'd love to hear those as well.


As for the point about mix-and-matching both types, whether that happens internally or from a 3rd party package, it won't matter as long as your SQL timezone variable never changes. Whether you are using datetime or timestamp, as long as the SQL timezone doesn't change, the value you put in is the value you get out.


Here is the crux of the whole argument:

  • the value you put into SQL should equal the value you get out of SQL
  • changing your SQL timezone is dangerous

this can be summarized in the following table:

Data Type Input SQL Timezone on Insert SQL Internal Storage SQL Timezone on Retrieval Retrieved Value
timestamp 1200 UTC 1200 UTC 1200
timestamp 1200 CST 1800 CST 1200
timestamp 1200 CST 1800 EST 1300
datetime 1200 UTC 1200 UTC 1200
datetime 1200 CST 1200 CST 1200
datetime 1200 CST 1200 EST 1200

We want the "Input" column to always equal the "Retrieved Value" column. As you can see in row 3, this doesn't happen when we change the SQL Timezone, and is the source of the problems.

There are some people who want the behavior in line 3, as @TheLevti might, but I would argue those are the small minority of people.

For the vast majority of people we should default to datetime type fields, store ALL values in UTC, and require the code to convert any values that need to be displayed to users in their local timezone.

@donnysim
Copy link
Contributor

donnysim commented Jan 28, 2025

I disagree. I laid out some very clear points about why datetime is a better storage type for dates. If you have counterpoints to any of the points, I welcome that. If you have points about why timestamp is better, I'd love to hear those as well.

Don't get me wrong, I'm fully on board with the datetime, UTC and such and use it for many years without any problems, accepting the fate you could say to also ensure all third party migrations use them to not mix and match. My only concern is what will the effect be overall, because all libraries will still use timestamps, but your application will use datetimes, which I really doubt many will take note.

I personally won't be effected by the stub change as I always publish the stubs, 3rd party migrations and adjust them to use datetime, and to prevent them from magically changing as it would in this case so maybe my concern is overblown.

@TheLevti
Copy link
Contributor

TheLevti commented Jan 28, 2025

@browner12 You can't do such bottomless assumptions about who uses what, which part is a majority, their knowledge and what is best for them. You provided 0 evidence or statistics about how users use this framework.

Please be aware that this is the most popular php framework, not a hobby product you are developing with a bunch of people. You can not know why or if users use different time zones in applications or for their database connection or if they are required to use something else than UTC. And then saying that they should all switch to UTC is just ridiculous. You can not decide for everyone how to do something right, just because you have a strong opinion about it, maybe in your own product, but not in a framework that millions depend on.

As such any breaking change that is not absolutely necessary (e.g. because it otherwise affects everyone), must be avoided. And yes if you are not aware this would be a breaking change. Imagine someone already using a non UTC connection and he is used to create models/classes via stubs. Suddenly new tables or columns are not stored in UTC anymore, but in his timezone. As a result reads from e.g. a BI tool that connects with UTC would then show incorrect results between those columns, requiring datetime juggling to fix that. People will suddenly have a mix of datetime/timestamp columns which differ in a potentially variable amount of hours from each other.

Also if its right for the majority, does not mean you can just dump the minority and let them suffer.

If you have a need to use datetime in your stubs, feel free to overwrite the stubs and change the default, but changing this for everyone, because of opinion is totally unnecessary.

@browner12
Copy link
Contributor Author

browner12 commented Jan 28, 2025

You provided 0 evidence or statistics about how users use this framework.

Fair point. I meant my numbers as hyperbole rather than statistic. I thought that was obvious, but I'll try to be more clear next time.

Also fair, my usage assumptions are a priori. They are based on my years as a programmer and in the Laravel ecosystem, rather than hard data. My assumptions are based on anecdotal things I've seen in many projects over the years, and on the discussions I've had over the previous couple of weeks that most programmers don't seem to fully understand the timezone implications of their data type.


And then saying that they should all switch to UTC is just ridiculous.

I am not suggesting that everyone switch to UTC, I'm stating that UTC is the best storage option generally speaking. This is not just my opinion, but the opinion of the framework as well.

https://laravel.com/docs/11.x/eloquent-mutators#date-casting-and-timezones
https://github.com/laravel/framework/blob/11.x/config/app.php#L71


As I've stated previously, having both timestamp and datetime fields in one application is completely fine, and results in no breaking changes. For most users (my opinion) who are completely unaware of all this nuance and do everything in UTC anyway, they would see no differences. The value they put into the DB is the value they get out.

For users like yourself in the minority (my opinion) who have different timezones depending on the DB connection, you would have some annoyances, but still no breaking changes. You would have to be cognizant of the field type when writing your code, to know if you needed to perform the timezone adjustment in the code, or if it was already handled for you in SQL.


Also if its right for the majority, does not mean you can just dump the minority and let them suffer.

IMO default stubs ARE for the majority. We have the opportunity here to give the majority a much better data type and I think we should take it. The intent is not to make the minority suffer, and the framework has a very simple one time fix so the minority doesn't have to suffer. Any users who wish to stick with timestamps, either because they disagree with the points laid out in this about why datetime is better, or because they wish to maintain consistency with their existing application, can publish their own stub.

@TheLevti
Copy link
Contributor

TheLevti commented Jan 29, 2025

Honestly at this point I don't mind anymore. I would anyway review each column to have the ideal type and not just blindly accept defaults.

If this gets accepted, we should for sure have it clearly highlighted in the 11 -> 12 migration guide.

@@ -16,8 +16,8 @@ return new class extends Migration
$table->string('type');
$table->morphs('notifiable');
$table->text('data');
$table->timestamp('read_at')->nullable();
$table->timestamps();
$table->dateTime('read_at')->nullable();
Copy link
Contributor

@antonkomarev antonkomarev Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the method really named dateTime and not datetime? Looks strange because we have datetimes method below and not dateTimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a casing inconsistency that I was going to address after this PR.

@taylorotwell taylorotwell marked this pull request as draft January 30, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants