LINUX GAZETTE
[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]

"Linux Gazette...making Linux just a little more fun!"


Using Aggregate Functions and Operators in PostgreSQL

By Branden R Williams


Preface:
This article assumes the reader can do basic SELECT, INSERT, UPDATE, and DELETE queries to and from a SQL database. If you are not sure on how these functions work, please read a tutorial on how these types of queries work. Specifically if you can use a SELECT query, then you are armed with enough information to read through this document with a high level of understanding. That said, lets get on to aggregate functions!

Summary:
In the beginning of this rather extensive article, I will cover how to use the five most common and basic aggregate functions on PostgreSQL. Those functions are count(), min(), max(), avg(), and sum(). Then I will cover how to use several common operators that exist for your use in PostgreSQL. Depending on your development environment, a good philosophy to practice is letting your DataBase Management System (DBMS) craft your results so that they are immediately usable in your code with little or no processing. Good examples for the reasoning behind this philosophy are exhibited when using aggregates. Finally, I will cover how to use several common operators with our aggregate function counterparts that exist for your use in PostgreSQL. Depending on your development environment, a good philosophy to practice is letting your DataBase Management System (DBMS) craft your results so that they are immediately usable in your code with little or no processing. In this article, I will demonstrate how to use some simple operators in your queries to craft data exactly as you need it.

What is an aggregate function?
An aggregate function is a function such as count() or sum() that you can use to calculate totals. In writing expressions and in programming, you can use SQL aggregate functions to determine various statistics and values. Aggregate functions can greatly reduce the amount of coding that you need to do in order to get information from your database.

(Excerpt from the PostgreSQL 7.1 manual)
aggregate_name (expression)
aggregate_name (ALL expression)
aggregate_name (DISTINCT expression)
aggregate_name ( * )

where aggregate_name is a previously defined aggregate, and expression is any expression that does not itself contain an aggregate expression.

The first form of aggregate expression invokes the aggregate across all input rows for which the given expression yields a non-NULL value. (Actually, it is up to the aggregate function whether to ignore NULLs or not --- but all the standard ones do.) The second form is the same as the first, since ALL is the default. The third form invokes the aggregate for all distinct non-NULL values of the expression found in the input rows. The last form invokes the aggregate once for each input row regardless of NULL or non-NULL values; since no particular input value is specified, it is generally only useful for the count() aggregate function.

Consider this example. You are writing a program which tracks sales of books. You have a table called the "sale" table that contains the book title, book price, and date of purchase. You want to know what the total amount of money that you made by selling books for the month of March 2001. Without aggregate functions, you would have to select all the rows with a date of purchase in March 2001, iterate through them one by one to calculate the total. Now if you only have 10 rows, this does not make a big difference (and if you only sell 10 books a month you should hope those are pretty high dollar!). But consider a bookstore that sells on average 2000 books a month. Now iterating through each row one by one does not sound so efficient does it?

With aggregate functions you can simply select the sum() of the book price column for the month of March 2001. Your query will return one value and you will not have to iterate through them in your code!

The SUM() function.
The sum() function is very useful as described in the above example. Based on our fictitious table, consider the following.

table sale (
	book_title varchar(200),
	book_price real,
	date_of_purchase datetime
)

Without aggregate functions:
SELECT * FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';

This returns all rows which correspond to a sale in the month of March 2001.

With aggregate functions:
SELECT SUM(book_price) AS total FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';

This returns a single row with a single column called total containing the total books sold in the month of March 2001.

You can also use mathematical operators within the context of the sum() function to add additional functionality. Say for instance, you wanted to get the value of 20% of your sum of book_price as all of your books have a 20% markup built in to the price. Your aggregate would look like:

SELECT SUM(book_price) AS total, SUM(book_price * .2) AS profit FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';

If you look on a grander scale, you will see even more uses for the sum() function. For example calculating commissions, generating detailed reports, and generating running statistical totals. When writing a report, it is much easier to have SQL do the math for you and simply display the results than attempting to iterate through thousands or millions of records.

The count() function.
Yet another useful aggregate function is count(). This function allows you to return the number of rows that match a given criteria. Say for example you have a database table that contains news items and you want to display your current total of news items in the database without selecting them all and iterating through them one by one. Simply do the following:

SELECT COUNT(*) AS myCount FROM news;

This will return the total number of news articles in your database.

The MAX() and MIN() functions.
These two functions will simply return the maximum or minimum value in a given column. This may be useful if you want to very quickly know the highest priced book you sold and the lowest price book you sold (back to the bookstore scenario). That query would look like this.

SELECT MAX(book_price) AS highestPrice, MIN(book_price) AS lowestPrice FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';

Again, this simply prevents you from having to select EVERYTHING from the database, iterate through each row one by one, and calculate the result by hand.

The AVG() function.
This particular aggregate is definitely very useful. Any time you would like to generate an average value for any number of fields, you can use the avg() aggregate. Without aggregates, you would once again have to iterate through all rows returned, sum up your column and take a count of the number of rows, then do your math. In our bookstore example, say you would like to calculate the average book price that was sold during March 2001. Your query would look like this.

SELECT AVG(book_price) AS avg_price FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';

What is an operator?
An operator is something that performs on operation or function on the values that are around it. For an example of this, let's look at Mathematical Operators. If you wanted to subtract the values from two fields in a select statement, you would use the subtraction (-) operator.

SELECT salesperson_name, revenue - cost AS commission FROM sales;

What will be returned is the results of the revenue each sales person brought in minus the cost of the products that they sold which will yield their commission amount.

salesperson_namecommission
Branden Williams234.43
Matt Springfield87.74

Operators can be VERY useful when you have complex calculations or a need to produce the exact results you need without having your script do any text or math based processing.

Let's refer to our bookstore example. You are writing a program which will show you the highest margin books (largest amount of profit per book) so that your marketing monkey can place them closer to the door of the store. Instead of doing your math on the fly while iterating through your result set, you can have the result set display the correct information for you.

table inventory (
	book_title varchar(200),
	book_cost real,
	selling_price real
)

SELECT book_title, selling_price - book_cost AS profit ORDER BY profit DESC;

Which will produce results similar to the following.

book_titleprofit
How To Scam Customers Into Buying Your Books15.01
How To Crash Windows 200013.84

Now your marketing guy can very quickly see which books are the highest margin books.

Another good use for operators is when you are selecting information from one table to another. For example, you may have a temporary table that you select product data into so that it can be proofread before it is sent into some master data table. Shopping Carts make great examples of this. You can take the pertinent information from your production tables and place it in a temporary table to be then removed, quantity increased, or discounts added before it is placed into your master order table.

In an example like this, you would not want to select out your various kinds of information, perform some functions to get them just right, and then insert them back into your temporary table. You can simply do it all in one query by using operators. It also creates less of a headache when you are dealing with very dynamic data. Let the database handle as much of your dynamic data as it can.

Now I would like to go into some specific operators and their functions. To see a complete list of operators, in your pgsql interface window type '\do'.

The +, -, *, and / operators.
These are the basic math operators that you can use in PostgreSQL. See above for good examples on how to use them. A few additional examples are here.

Many more uses for math operators will be revealed in the next article in this series which combines operators with aggregate functions.

Inequality (<, >, <=, >=) operators.
You most likely have used these in the WHERE clause of a specific SQL query. For instance.

SELECT book_title FROM inventory WHERE selling_price >= '30.00';

This query will select all books that have a selling price of $30.00 or more. You could even extend that to our profit example earlier and do the following.

SELECT book_title, selling_price - book_cost AS profit WHERE selling_price - book_cost >= '14.00' ORDER BY profit DESC;

Which will only produce the following results.

book_titleprofit
How To Scam Customers Into Buying Your Books15.01

This can allow you to set thresholds for various kinds of queries which is very useful in reporting.

The || (concatenate) operator.
When doing any sort of text concatenation, this operator comes in handy. Say for instance, you have a product category which has many different products within it. You might want to print out the product category name as well as the product item on the invoice.

SELECT category || CAST(': ' AS VARCHAR) || productname AS title FROM products;

Notice the use of the CAST() function. Concatenate will require knowledge about the elements it is operating on. You must tell PostgreSQL that the string ': ' is of type VARCHAR in order for your operator to function.

Your results may look like:

title
Music CDs: Dave Matthews, Listener Supported
DVDs: Airplane

In the previous articles, I showed you some simple ways to use operators and aggregate functions to help speed up your applications. The true power of operators and aggregate functions come when you combine their respective powers together. You can cut down on the lines of code your application will need by simply letting your database handle that for you. This article will arm you with a plethora of information on this subject.

Our Scenario:
You are hired to create a web-based shopping application. Here is your database layout for your order table.

create table orders (
	orderid integer (autoincrement),
	customerid integer,
	subtotal real,
	tax real,
	shipping real
)

create table orderdetail (
	orderid integer,
	productid integer,
	price real,
	qty integer
)

create table taxtable (
	state varchar(2),
	rate real
)

create table products (
	productid integer,
	description varchar(100),
	price real
)

create table cart (
	sessionid varchar(30),
	productid integer,
	price real,
	qty integer
)

In this example, I will use database driven shopping carts instead of storing the cart information in a session. However, I will need a sessionID to keep up with the changes in the database. Our cart table contains the current pre-checkout shopping cart. Orders and Orderdetail contain the completed order with items. We can calculate each order's Grand Total by adding up the sub parts when needed for tracking or billing. Finally, products is our product table which contains a price and description.

The point of this exercise is to pass as much of the computation back to the database so that your application layer does not have to perform many trips to and from the database, as well as to reduce the lines of code required to complete your task. In this example, several of your items are stored in a database table so they may be dynamic. Those items are the basis of your subtotal, tax and shipping calculations. If you do not use operators and aggregates (and potentially subqueries), you will run the risk of making many trips around the database and putting added overhead into your application layer. I will break down the calculation of each of those items for you, as well as an example of how to put it all together in the end.

The subtotal calculation.
This is a rather simple calculation, and will only use an aggregate function and simple operator to extract. In our case.

SELECT SUM(price*qty) AS subtotal FROM cart WHERE sessionid = '9j23iundo239new';

All we need is the sum of the results from every price * qty calculation. This shows how you can combine the power of operators and aggregates very nicely. Remember that the SUM aggregate will return the total sum from every calculation that is performed on a PER ROW basis. Don't forget your order of operations!

The tax calculation.
This one can be kind of tricky without some fancy SQL. I will be using COALESCE to determine the actual tax rate. COALESCE takes two arguments. If the results of the first argument are null, it will return the second. It is very handy in situations like this. Below is the query. Note: _subtotal_ is simply a placeholder.

SELECT _subtotal_ * COALESCE(tax, 0) AS tax FROM tax WHERE state = 'TX';

In the final query, I will show you how all these will add up so try not to get confused by my nifty placeholders.

The shipping calculation.
For simplicity, we will just assume that you charge shipping based on a $3 fee per item. You could easily expand that to add some fancy calculations in as well. By adding a weight field to your products table, you could easily calculate shipping based on an algorithm. In our instance, we will just count the number of items in our cart and multiply that by 3.

SELECT COUNT(*) * 3 FROM cart AS shipping WHERE sessionid = '9j23iundo239new';

Tying it all together.
Now that I have shown you how to get the results for those calculations separately, lets tie them all together into one big SQL query. This query will handle all of those calculations, and then place them into the orders table for you.

INSERT INTO orders (customerid, subtotal, tax, shipping) VALUES (customerid, (SELECT SUM(price*qty) FROM cart WHERE sessionid = '9j23iundo239new'), (SELECT SUM(price*qty) FROM cart WHERE sessionid = '9j23iundo239new') * (SELECT COALESCE(tax, 0) FROM tax WHERE state = 'TX'), (SELECT COUNT(*) * 3 FROM cart WHERE sessionid = '9j23iundo239new'));

Additionally, if you had a Grand Total field in your orders table, you could complete this by adding up the sub items in either a separate query, or inside your INSERT query. The first of those two examples might look like this.

UPDATE orders SET grandtotal = subtotal+tax+shipping WHERE orderid = 29898;

To move the rest of the items from the cart table to the orderdetail table the following two queries can be issued in sequence.

INSERT INTO orderdetail (orderid, productid, price, qty) values SELECT _yourorderid_, productid, price, qty FROM cart WHERE sessionid = '9j23iundo239new';
DELETE FROM cart WHERE sessionid = '9j23iundo239new';

Conclusion:
Aggregate functions can greatly simplify and speed up your applications by allowing the SQL server to handle these kinds of calculations. In more complex applications they can be used to return customized results from multiple tables for reporting and other functions. Operators can greatly enhance the quality of the results that you return from your database. The correct use of operators and aggregate functions can not only increase the speed and accuracy of your application, but it also can greatly reduce your code base by removing unneeded lines of code for looping through result sets, simple calculations, and other line hogs.

I hope that you enjoy reading and learning from this article as much as I enjoyed writing it!

Branden R Williams

Branden is currently a consultant for Elliptix, an e-business and security consulting firm he co-founded this year. He has over 10 years of experience in various Internet-related technology disciplines including Unix administration, network infrastructure design and deployment, and many scripting and programming languages. For the last six years, Branden has been designing, building and deploying enterprise-scale e-commerce applications. His real-world experience is backed up by a Bachelors of Business Administration in Marketing from the University of Texas, Arlington. Branden can also be reached at brw@brw.net.


Copyright © 2001, Branden R Williams.
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 70 of Linux Gazette, September 2001

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]