Excel tip: calculating DISTINCT COUNT of items, similar to what you’d do in SQL

Posted: (EET/GMT+2)

 

Sometimes, you need to manipulate lots of Excel data, and if you are a developer, you sometimes start thinking why doesn't Excel directly support SQL like features, like DISTINCT COUNT from a set of values. Turns out that you can indeed calculate a distinct count ("number of unique values") from a set of data, but there's no single function for it.

Note: In SQL, the construct "DISTINCT COUNT" wouldn't return the correct results, but instead the count of all items. Instead, you must use a construct like "COUNT(DISTINCT n)" where "n" is a column name of a table containing the data. However, in this post, I'm referring to this construct with a more English-friendly name "DISTINCT COUNT". See below for a concrete SQL example.

Now back to Excel, which doesn't have a function like SQL does built-in. Even so, you can simulate this function, but it requires a couple of steps. Here are brief instructions; this works best with data that has a clean format, but should work with any text data.

  1. Start by putting the data into a single column, for example A.
  2. Create an empty column on the right (B), and use Excel’s FREQUENCY function to get the frequency (number of occurrences) of items in the list. FREQUENCY is an array function in Excel, which means it returns as many results as there are input values. Thus, you need as many empty cells in your sheet as you do have input values. Example. If you have input values in A1:A10, you’d add the FREQUENCY function to cells B1:B10. Do this: select the cells B1:B10, and then press F2 to start editing. Type in "=FREQUENCY(A1:A10;A1:A10)" without the quotes, and then press Ctrl+Shift+Enter. This will add the array function to all the selected cells, and put curly brackes { and } around the function name. Note: the two FREQUENCE function parameters, data_array and bins_array must be the same areas.
  3. Now in B1:B10, you have numeric values to correspond to the number of how many times a given item appears in the source data A1:A10. You will see values from zero upwards.
  4. Next, make sure you have enough empty cells on the right of the FREQUENCY data (add a new column if need be). This would be column C. Add an IF function to C1, saying "=IF(B1>=1;1;0)". This IF function returns one if the value in B1 is one or more, or zero if the value is zero. Copy this function down to cells C2:C10, so that you have as many numbers in C as you do have in both A and B. All values in C column should be either zero or one.
  5. Finally, calculate a sum of all the values in column C. For instance: "=SUM(C1:C10)". This is the final result.

If you want to try out your skills, enter the following sample data to A1:A10: 26, 30, 29, 29, 31, 28, 25, 33, 36 and 30. The DISTINCT COUNT for these values is eight, with numbers 29 and 30 doubling. More accurately, these two numbers have a frequency of two.

To verify, try this SQL in SQL Server:

SELECT COUNT(DISTINCT Val)
FROM (VALUES (26), (30), (29), (29), (31),
(28), (25), (33), (36), (30)) AS MyTable(Val)

Sometimes, it might be easier to take your Excel data into an SQL table and do the count from there, but that's another story...