Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

garrettmassey's avatar

GroupBy vs Distinct

I am working to get some specific analytic data for each fiscal quarter for our application, and I am not sure if the code is getting the right values. The data looks like this:

| ID | UUID | City | County | Zip | IP | session_id | deviceID | created_at |
| 1 | 1234 | city | county | 0 | 127.0.0.1 | abcd | string_1 | 2022-06-01 |
| 2 | 1235 | city | county | 0 | 127.0.0.1 | efgh | string_1 | 2022-06-01 |
| 3 | 4324 | city | county | 0 | 127.0.0.1 | ffsd | string_2 | 2022-06-01 |
| 4 | 2213 | city | county | 0 | 127.0.0.1 | a441 | string_2 | 2022-06-01 |

Basically, it's a log of users that access the system and some meta information. We count each unique visit with the UUID, and then we count the session IDs, and the device IDs. Device IDs are a cookie that is stored for the maximum time, so it's not perfect but it accomplishes what we want for most cases.

What I am trying to access is this:

  1. the total number of records between 2 dates. The code works for this.
  2. the number of 'unique' records, where they are identified by the device_id.
  3. the total-per-day of records between two dates, this should return an array with the date and the total for that date
  4. the unique-per-day of records between two dates, this should return an array with the date and the count of unique records for that date

So for this example, the total records are 4, and the unique records are 2. The unique per day is 2 and total per day is 1, since all records are for the same date. But if there were more dates, I would have the total and unique counts for each date in the range defined.

The code I have for this is here:

    $currentTotal = AccessRecord::whereBetween(
        'created_at',
        [$currentQuarterDates[$currentQuarter]['start'], $today]
    )->count();
    
    $currentUnique = AccessRecord::whereBetween(
        'created_at',
        [$currentQuarterDates[$currentQuarter]['start'], $today]
    )->groupBy('deviceID')->count();
    
    $dailyTotal = AccessRecord::whereBetween(
        'created_at',
        [$currentQuarterDates[$currentQuarter]['start'], $today]
    )->selectRaw('count(*) as count, DATE(created_at) as date')
     ->groupBy('date')
     ->get()
     ->toArray();
    
    $dailyUnique = AccessRecord::whereBetween(
        'created_at',
        [$currentQuarterDates[$currentQuarter]['start'], $today]
    )->selectRaw('count(distinct deviceID) as count, DATE(created_at) as date')
     ->groupBy('date')
     ->get()
     ->toArray();

I am testing this with some old data, and it's hard to tell if this is behaving the way we want it to.

0 likes
12 replies
sr57's avatar

@garrettmassey

it's hard to tell if this is behaving the way we want it to

How will you validate the answer(s) you'll get on this forum?

garrettmassey's avatar

@sr57 Good question! I'm not sure. I have been working on writing my own test data to run the code against, so that I have a clearer picture of the answer, but it's taking me a hot minute due to a hand injury that slows down my typing.

sr57's avatar

@garrettmassey

Ok, when your validation will work you'll see if your queries are Ok, and if not, your question will be clear

  • input data
  • query
  • result
  • expected result

That said, GROUP BY without aggregate function works as DISTINNCT

3 likes
garrettmassey's avatar

@sr57 I just finished testing with some new data, and I changed the groupBy to Distinct to clear things up, and I am getting the results I expected.

Thank you!

sr57's avatar

@garrettmassey

You are welcome, but why do you award the best answer? What interest for a future reader?

garrettmassey's avatar

@sr57 Just in case someone ever runs into a similar situation, they can see what I did to solve my issue / question, or what someone else did if they had the best answer. there have been times when I have encountered similar issues to someone else and their solution wasn't exactly what I needed but it was close enough to help.

vincent15000's avatar

First of all you have to be sure of the behavior you're waiting for.

Then you will be able to validate the code.

garrettmassey's avatar
garrettmassey
OP
Best Answer
Level 6

After running the code against some custom test data, I realized that I have to use Distinct instead of GroupBy but otherwise it works.

Here is the final code:

$currentTotal = AccessRecord::whereBetween(
    'created_at',
    [$currentQuarterDates[$currentQuarter]['start'], $today]
)->count();

$currentUnique = AccessRecord::whereBetween(
    'created_at',
    [$currentQuarterDates[$currentQuarter]['start'], $today]
)->distinct('deviceID')->count();

$dailyTotal = AccessRecord::whereBetween(
    'created_at',
    [$currentQuarterDates[$currentQuarter]['start'], $today]
)->selectRaw('count(*) as count, DATE(created_at) as date')
 ->groupBy('date')
 ->get()
 ->toArray();

$dailyUnique = AccessRecord::whereBetween(
    'created_at',
    [$currentQuarterDates[$currentQuarter]['start'], $today]
)->selectRaw('count(distinct deviceID) as count, DATE(created_at) as date')
 ->groupBy('date')
 ->get()
 ->toArray();
vincent15000's avatar

@garrettmassey Distinct is to retrieve distinct results from a query which can return several times the same line. Where as group by is to retrieve some agregate information from several lines, for example to sum some field among several lines.

Tray2's avatar

@garrettmassey I suggest that you try to avoid using distinct, it slows your query down a bit when using it.

Please or to participate in this conversation.