I have written the below query in order to identify how many events (occur over a week) in each hour.
select Hour, count(Hour) from (
select
hour(max(events.`created_at`)) as 'Hour',
count(*) as 'Count'
from events
where created_at >= '2025-03-31' and created_at < '2025-04-06'
group by
hour(events.`created_at`), schedule_id
order by Hour
) as temp group by Hour;
The outer query is required so that it groups together all of the hours that are the same regardless of the schedule_id
. Multiple records (that all are in the same hour) may share the same schedule_id
, but I only want to know how many unique schedule_id
instances are found. i.e...
id | schedule_id | created_at |
---|---|---|
1 | 50 | 2025-04-01 09:05:05 |
2 | 50 | 2025-04-01 09:06:05 |
3 | 51 | 2025-04-01 09:07:05 |
4 | 52 | 2025-04-01 10:44:44 |
would then return
Hour | count(Hour) |
---|---|
9 | 2 |
10 | 1 |
because while there are 3 records that were created between 9am and 10am, there are only 2 unique schedules (50 and 51).
However, this query is very slow. On a table of 39 million rows, this takes 15 seconds. And the actual table that this needs to be ran on is much much much larger. Any ideas how I could improve this query?
hour
seems to be very important, perhaps an indexed field in each record would eliminate deriving critical information before sorting/selecting ops...) \$\endgroup\$schedule_id
with two different hours? How manyevents
records are expected to match any provided condition? What is the structure of the relevant indexes on theevents
table? \$\endgroup\$