0
\$\begingroup\$

I have the following 3 tables, and I don't have the option to modify them.

  • [Logs] table with [log_type], [log_time], [log_data], and other irrelevant columns
  • [Places] table with [name], [place_index], and other irrelevant columns
  • [Addresses] table with [place_index], [road_0], ...,[road_15], and other irrelevant columns

Logs of [log_type] 12 or 15 has one of the roads in the [Addresses] table in [log_data]; logs of [log_type] 1 has one of the names from [Places] table; logs of [log_type] 7 is similar to type 1 but with a space and some other info after the name.

What the query has to do is to select a list of "12" or "15" logs which

  • does not have a "1" log before it, which corresponds to a place (log_data) containing its corresponding road.

or

  • has the "1" log before it but with a "7" log in between, which also corresponds to the same place.

For example,

assuming Road_X is in PlaceI, Road_Y is in PlaceJ, and Road_Ex, Road_Z are in PlaceK, and given the following table.

--------------------------------------------
| id | log_type |  log_time  | log_data    |
--------------------------------------------
| 1  |    7     | 2018-01-30 |  PlaceA 2.3 |
--------------------------------------------
| 2  |    1     | 2018-01-30 |  PlaceB     |
--------------------------------------------
| 3  |    12    | 2018-01-31 |  Road_Ex    |
--------------------------------------------
| 4  |    7     | 2018-02-01 |  PlaceB 2.5 |
--------------------------------------------
| 5  |    7     | 2018-02-01 |  PlaceI 2.7 |
--------------------------------------------
| 6  |    15    | 2018-02-02 |  Road_X     |
--------------------------------------------
| 7  |    1     | 2018-02-04 |  PlaceI     |
--------------------------------------------
| 8  |    1     | 2018-02-05 |  PlaceJ     |
--------------------------------------------
| 9  |    12    | 2018-02-06 |  Road_X     |
--------------------------------------------
| 10 |    15    | 2018-02-06 |  Road_Y     |
--------------------------------------------
| 11 |    1     | 2018-02-10 |  PlaceK     |
--------------------------------------------
| 12 |    12    | 2018-02-11 |  Road_Z     |
--------------------------------------------
| 13 |    12    | 2018-02-11 |  Road_Ex    |
--------------------------------------------
| 14 |    7     | 2018-02-12 |  PlaceI 2.7 |
--------------------------------------------
| 15 |    7     | 2018-02-12 |  PlaceJ 2.8 |
--------------------------------------------
| 16 |    12    | 2018-02-17 |  Road_X     |
--------------------------------------------
| 17 |    15    | 2018-02-18 |  Road_Y     |
--------------------------------------------

The query should return rows 3, 6, 16, 17.

This is the best I could come up with so far:

declare @since datetimeoffset
set @since = '2018-02-10-05:00'
select a.[log_type], a.[log_time], a.[log_data]
from 
    (select [log_type], [log_data], [log_time],
        (select [name] 
         from [Places]
         where [place_index] =
            (select [place_index] 
             from [Addresses] 
             where [log_data] in 
                ([road_0], [road_1], [road_2], [road_3], [road_4], 
                    [road_5], [road_6], [road_7], [road_8], [road_9],
                    [road_10], [road_11], [road_12], [road_13], [road_14], 
                    [road_15]))) as [place_name]
     from [Logs]
     where [log_type] in (12, 15) and [log_time] >= @since) a
outer apply 
    (select TOP 1 [log_data] from 
        (select TOP 1 [log_data], [log_time]
         from [Logs]
         where [log_type] = 1 and [log_data] = [place_name] 
            and [log_time] >= @since and [log_time] < a.[log_time]
         order by [log_time] desc
         UNION ALL
         (select TOP 1 NULL, [log_time]
          from [Logs]
          where [log_type] = 7 and LEFT([log_data], CHARINDEX(' ', [log_data]) - 1) = [place_name] 
          and [log_time] >= @since and [log_time] < a.[log_time]
          order by [log_time] desc)) i
     order by [log_time] desc) b
where b.[log_data] is null
order by a.[log_time] desc

Note that because I need to reference [place_name] from table a in the subquery, I had to use outer apply instead of left join. However, it takes 5 seconds to generate less than 40 records from around 30k records in [Logs]. I've also tried replacing outer apply with where, but there's no notable difference.

Can the query be further optimized or will I need to move this business logic to the service that sends this query?


EDIT

DDL for the tables (only including relevant columns):

CREATE TABLE [Logs](
    [log_time] [datetimeoffset](7) NOT NULL, -- indexed
    [log_type] [int] NULL,
    [log_data] [varchar](300) NULL
)

CREATE TABLE [Places](
    [place_index] [smallint] NOT NULL,       -- PK, indexed
    [name] [nchar](253) NULL
)

CREATE TABLE [Addresses](
    [place_index] [smallint] NOT NULL,       -- PK, indexed
    [road_0] [char](32) NULL,
    [road_1] [char](32) NULL,
    [road_2] [char](32) NULL,
    [road_3] [char](32) NULL,
    [road_4] [char](32) NULL,
    [road_5] [char](32) NULL,
    [road_6] [char](32) NULL,
    [road_7] [char](32) NULL,
    [road_8] [char](32) NULL,
    [road_9] [char](32) NULL,
    [road_10] [char](32) NULL,
    [road_11] [char](32) NULL,
    [road_12] [char](32) NULL,
    [road_13] [char](32) NULL,
    [road_14] [char](32) NULL,
    [road_15] [char](32) NULL
) 
\$\endgroup\$
4
  • \$\begingroup\$ Can you post DDL for the tables? That would save a tremendous amount of time reviewing the query. \$\endgroup\$
    – Wes H
    Commented Feb 21, 2018 at 14:09
  • \$\begingroup\$ @WesH Thanks for the advice. I've included DDL for the 3 tables. \$\endgroup\$ Commented Feb 22, 2018 at 3:35
  • \$\begingroup\$ Can you also include some data insert scripts? That will ensure my results are correct and I also really don't want to type your data in manually. We'll just need enough to validate your question. \$\endgroup\$
    – Wes H
    Commented Feb 22, 2018 at 14:05
  • \$\begingroup\$ You didn't include ID in your DDL. What table is that part of? \$\endgroup\$
    – Wes H
    Commented Feb 22, 2018 at 14:31

1 Answer 1

1
\$\begingroup\$

Here is an attempt, though I can't verify it works because you didn't provide insert scripts and didn't include the ID column in the DDL. You also don't mention if log 1/7 is only considered for the NEXT log 12/15 for the same place or if it applies to ALL log 12/15 after the log 1/7 for that place. This script assumes all.

DECLARE @since DATETIMEOFFSET
;

SET @since = '2018-02-10-05:00'
;

WITH PrevLogs AS
  (
    SELECT      p.place_index,  /* return the place index for log one */
                One.log_time,
                Seven.log_type
      FROM      dbo.Logs AS One
      JOIN      dbo.Places AS p
        ON One.log_data = p.name
      LEFT JOIN dbo.Logs AS Seven
        ON Seven.log_data LIKE One.log_data + '%' /* Log data One matches the first portion of Log data Seven */
           AND  Seven.log_type = 7 /* Only match log Seven to the log One */
           AND  One.log_time < Seven.log_time   /* Seven log has to be after the One log */
      WHERE     One.log_type = 1    /* Only return logs of type 1 */
  ),
     LogsIWant AS
  (
    SELECT  a.place_index, /* return the place index of the raods in log 12/15 */
            l.log_time,
            l.log_type,
            l.log_data
      FROM  dbo.Logs AS l
      JOIN  dbo.Addresses AS a
        ON l.log_data IN ( a.road_0, a.road_1, a.road_2, a.road_3, a.road_4, a.road_5, a.road_6, a.road_7, a.road_8, a.road_9, a.road_10, a.road_11, a.road_12, a.road_13, a.road_14, a.road_15 )
      WHERE l.log_type IN ( 12, 15 )
            AND l.log_time >= @since
  )
  SELECT        liw.place_index,
                liw.log_time,
                liw.log_type,
                liw.log_data,
                pl.place_index,
                pl.log_type
    FROM        LogsIWant AS liw
    LEFT JOIN   PrevLogs AS pl
      ON liw.place_index = pl.place_index
      AND pl.log_time < liw.log_time /* Log 1/7 are before Log 12/15 */
    WHERE       pl.place_index IS NULL /* Log 1 does not exist */
                OR  pl.log_type IS NOT NULL /* Log 1 & Log 7 exist */
;

To your question of can this be further optimized, I would answer not really. This structure violates so many principals of data quality, it would take less time to explain what it does do right.

Twice in my IT career I have chosen to change jobs rather than put up with the mandate that "Change is not an option". If I was handed a system that was structured like this and told I could not make changes, I would do the best I could while looking for another job.

Sorry to be blunt, but the real problem here is a horrible data model.

I don't know if adding an index would count as a change, but a simple improvement you could make would be a non-clustered index on the name column of the Place table. That would at least improve the join performance from the logs table.

Good luck

\$\endgroup\$
3
  • \$\begingroup\$ Thanks for the query and your honest advice! "...a horrible data model." I can't agree more... We have bad designs all over the place with no time given to re-design. Adding an index should be possible, but I believe the bottle neck is with the outer apply (or where) subquery. The main query runs within a second for more than 100k records, which is good enough for my purpose. I will try out your query and let you know. \$\endgroup\$ Commented Feb 23, 2018 at 12:45
  • \$\begingroup\$ Forgot to add, [place_index] is the ID of [Places] and [Addresses], and [Logs] doesn't have an ID... \$\endgroup\$ Commented Feb 23, 2018 at 12:54
  • \$\begingroup\$ I've tried your solution and even added an index on log_data, but the result is the same. I actually have 6 versions, all of which perform almost, if not exactly, the same, with or without index... I think you pointed the right direction though. If I take out the requirement of checking log_type 7, then using LEFT JOIN with index on log_data out performs the other versions. Thanks for the effort regardless! \$\endgroup\$ Commented Feb 26, 2018 at 2:55

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.