How is this logically possible?

StevenLeonCooper · Feb 15, 2014

GlennB said:
I don't really understand that part. Multiple customers on a given invoice?

Not sorry Many-to-one. I wrote it wrong and corrected it like 5 times on my phone so obviously I confused myself there. One customer, many invoices/orders.

GlennB said:
But, generally speaking, disallowing nulls when a value is not yet known suggests to me a large number of tables to cater for every possible event where there might still be an unknown value, leading to some heroic table joins when all is finally done. Either that, or fill unknown values with defaults - e.g. 0 for the invoice id of an uninvoiced transaction. That would be horrible.

Yes, it requires a fair amount of joining but the benefit is that the database only contains extant data, no conceptual placeholders. In the long run I think it comes down to manageability and performance. Completely avoiding NULL values at all costs would probably light up the eyes of anyone passionate about relational theory but when you have to do 6 different joins just to find out what item a customer ordered on a certain date, it suddenly seems like it would be easier to sift through paper records than conceptualize the next Query you're going to need.

I would consider NULL values as more of a yellow flag to reconsider your schema. There may be a good reason to allow NULL values but generally I would assume there's equally-good or better reason not to.

ETA: Regarding that first part, it could go either way. Depending on how you run your business you might consider a customer a physical person and therefore you may have invoices that you do want to be linked to multiple customers.

John and Bob run a business together. John and Bob both buy supplies on a company card from you and likewise they buy supplies individually. It may be pertinent to maintain their individual identities for their individual invoices and also their combined invoice. Of course you could also distinguish between customers and businesses or by payment method ... there are many ways to skin that cat but I at least see a logical reason to have a many-to-many relationship between invoices and customers.

GlennB · Feb 16, 2014

StevenLeonCooper said:
ETA: Regarding that first part, it could go either way. Depending on how you run your business you might consider a customer a physical person and therefore you may have invoices that you do want to be linked to multiple customers.

Yeah, we bumped into a parallel situation in a certain system under development. Branches of a company phoned through orders (for delivery to that branch, making it appear that they were "the customer") but invoices were lumped together and went to head office. It got interesting when we decided also to cater for regional head offices that might represent groups of branches but not all of them, just to be safe

But recalling all this stuff lead me to a funny article, The Mother of all database Normalization Debates where "all hell lets loose"

paulhutch · Feb 18, 2014

More often than not when our DB Admin/It guy comes to me with a problem it is due to some vendor's software relying on NULL values to indicate something positive about the data e.g. not yet invoiced. Since there are so many ambiguities in the way NULL values are handled I avoid them for any data field that needs to be searched in my designs.

When I'm designing a data set I avoid using NULL by assigning a default value that indicates empty/not yet assigned. For a calculable numeric value I tend to use either the minimum or maximum allowed value. With a numeric ID, e.g. check number, I usually use zero. For other types it usually only takes a few minutes to come up with a suitable value.

Of course since I'm an embedded systems design guy it 's very possible that my techniques have some draw back for large scale systems.

GlennB · Feb 18, 2014

paulhutch said:
When I'm designing a data set I avoid using NULL by assigning a default value that indicates empty/not yet assigned. For a calculable numeric value I tend to use either the minimum or maximum allowed value.

Good luck with that, as you might end up with 'division by zero' when zero is the assigned default. Assign a huge default value for the divisor? You get a tiny result where there shouldn't even be a result. NULL avoids this, as the calculation doesn't even take place.

eta: there are many other reasons to avoid this horror, and one springs to mind... You want to query those rows where the value of a column is not yet known (which happens quite a lot). Under your system, any passing programmer needs to know - on paper or in their head -your chosen artificial default value for 'not yet known'.

Too horrible to contemplate. Think of the children

StevenLeonCooper · Feb 19, 2014

GlennB said:
Good luck with that, as you might end up with 'division by zero' when zero is the assigned default. Assign a huge default value for the divisor? You get a tiny result where there shouldn't even be a result. NULL avoids this, as the calculation doesn't even take place.

eta: there are many other reasons to avoid this horror, and one springs to mind... You want to query those rows where the value of a column is not yet known (which happens quite a lot). Under your system, any passing programmer needs to know - on paper or in their head -your chosen artificial default value for 'not yet known'.

Too horrible to contemplate. Think of the children

Loll which is why I prefer rows to only contain data that is known. If anything you almost never run into a scenario where you're adding NULL/DefaultValue checking to your queries whenever you're trying to use aggregate functions. It feels silly adding "WHERE RowValue <> NULL" at the end of everything.

SUSpilot · Feb 19, 2014

StevenLeonCooper said:
Loll which is why I prefer rows to only contain data that is known. If anything you almost never run into a scenario where you're adding NULL/DefaultValue checking to your queries whenever you're trying to use aggregate functions. It feels silly adding "WHERE RowValue NULL" at the end of everything.

Unfortunately, this happens when very large enterprises with lots (thousands) of automated and manual data entry points feed historical data into back-end systems like Teradata on a batch basis. I'm so used to typing "and <fieldname> is not null" into where, qualify, and join clauses, I don't even realize I'm doing it.

A great example would be the case of a crew inadvertently pulling a car out of an industry before we get the bill of lading - the waybill date will be null for the shipment until we get the paperwork (in fact, I flip that the other way and do a test for null to see where this is a problem - fortunately, in the EDI universe, this is a relatively rare occurrence).

GlennB · Feb 19, 2014

StevenLeonCooper said:
Loll which is why I prefer rows to only contain data that is known. If anything you almost never run into a scenario where you're adding NULL/DefaultValue checking to your queries whenever you're trying to use aggregate functions. It feels silly adding "WHERE RowValue <> NULL" at the end of everything.

There is another reason I can think for allowing NULLs, even in a heavily normalised system, and that's index size and efficiency.

Thinking about a specific system I worked on, there was a column on a transaction table called tl_to_be_invoiced which was populated with "Y" when pricing routines were complete, and the column was indexed to make the invoicing routine efficient. In fact only a tiny % of rows would be set to "Y" on a given day, as most of the stuff was historic and awaiting archiving.

Given that the invoicable rows are now permanent residents of the table then to avoid NULL in that column you'd have to set it to "N" or somesuch once it had been invoiced. Every transaction would then have a non-null entry in the index.

The alternative seems to be to create a new row in an invoiced_transactions table while deleting the original from invoicable_transactions. Just to avoid an insignificant null value that's really not a problem, afaics.

But I'm going back years here, and might be mis-recalling the technicalities of the matter.

StevenLeonCooper · Feb 19, 2014

GlennB said:
There is another reason I can think for allowing NULLs, even in a heavily normalised system, and that's index size and efficiency.

Thinking about a specific system I worked on, there was a column on a transaction table called tl_to_be_invoiced which was populated with "Y" when pricing routines were complete, and the column was indexed to make the invoicing routine efficient. In fact only a tiny % of rows would be set to "Y" on a given day, as most of the stuff was historic and awaiting archiving.

Given that the invoicable rows are now permanent residents of the table then to avoid NULL in that column you'd have to set it to "N" or somesuch once it had been invoiced. Every transaction would then have a non-null entry in the index.

The alternative seems to be to create a new row in an invoiced_transactions table while deleting the original from invoicable_transactions. Just to avoid an insignificant null value that's really not a problem, afaics.

But I'm going back years here, and might be mis-recalling the technicalities of the matter.

That kind of seems like a scenario where you've specifically built out the table structure for the express purpose of leveraging NULL values and your table is not specifically used for permanently storing data.

I've never really dealt with something like that before. To me it seems like your entire table could just have easily been a column in another table.

I think that's a scenario where you have to figure out how much normalization is right for the situation. I don't exactly share the previously-mentioned author's view that NULL = EVIL; I just appreciate the concern for the existence of NULL values. I like when a set of tables is designed to represent real data and only represents real data. I have seen first-hand, however, the pitfalls of normalization taken to the nth degree for the sake of "scalability".

Here's an interesting article on it from Coding Horror: http://www.codinghorror.com/blog/2008/07/maybe-normalizing-isnt-normal.html

aggle-rithm · Feb 21, 2014

StevenLeonCooper said:
I was reading a book on relational theory and the author suggested that the mere existence of null values in a DB was a bad sign. It does beg the question: If it's null then why does it exist? Any column worth having should probably have a default value and if you find yourself having hundreds of rows with NULL in a particular column, there's probably a different way to structure the table.

If you follow all the normalization rules, there would never be null values. There would be a lot of tables, though, and there is a performance penalty for having all those joins whenever you want to query the data.

CORed · Feb 24, 2014

aggle-rithm said:
Ah-HA! Mystery solved.

There were records in there with NULL values for invoice number. It just so happens that those records were entered, by accident, just a few minutes before I stumbled across this mystery.

Thanks for the link.

Sounds like your application has inadequate input validation (and also maybe your table definition needs to be set not to allow null values for the Invoice number field as well). I would assume that the Invoice number is a primary key for at least some of the tables involved. It should not be possible (whether through application design, or, better, in the database) to put NULL values in a primary key.

aggle-rithm · Feb 24, 2014

CORed said:
Sounds like your application has inadequate input validation (and also maybe your table definition needs to be set not to allow null values for the Invoice number field as well). I would assume that the Invoice number is a primary key for at least some of the tables involved. It should not be possible (whether through application design, or, better, in the database) to put NULL values in a primary key.

The NULL values were accidentally entered through SQL Mgt Studio.

aggle-rithm · Feb 24, 2014

GlennB said:
Additional - why might an invoice number be recorded as VARCHAR ?

If the invoice is a credit/debit memo, the number is preceded by a "C" or a "D".

Also, since invoice numbers aren't used to calculate numbers, it is improper semantics to store it as an integer. Sure, it would take up less space, but we're not in the 1970's anymore. Storage is cheap.

GlennB · Feb 24, 2014

aggle-rithm said:
If the invoice is a credit/debit memo, the number is preceded by a "C" or a "D".

Please don't take this personally, but that strikes me as horrific design.

Any report on 'Credits by credit note number' (say) then requires a SUBSTR to pick the 'C' then an ORDER BY of the whole column to list them numerically as long as the numeric part is correctly packed with leading 000s.

11 is less than 100. The way you're doing it you'd have to store them as "C000011" and "C000100" or somesuch. Then you're in deep poo if the numeric value exceeds what those 000's can handle, as you've had to pre-define their maximum length.

Dragonrock · Feb 24, 2014

Many years ago I worked on a software help desk and I accidentally discovered how to fix a bug that had been a thorn in the side of the help desk and product development since the inception of the software.

It was a travel agency accounting software package and the way travel agencies purchased airline tickets was through a company called ARC (Airline Reporting Corporation) which was owned equally by all member airlines. No tickets were purchased by from the airline itself except for Southwest Airlines tickets. Each week the travel agency would run the ARC report which was a list of all the ARC tickets sold that week and submit this report to the ARC along with the purchase price minus the agency's commission. Southwest airlines tickets were purchased directly from the airline and as such should not appear on the ARC report.

The bug was that Southwest tickets would occasionally appear on the ARC report and no one could figure out why. When viewed through the applications all the tickets appeared correct. The error was only on the ARC report. The only fix was to print out the report, cross out the Southwest tickets, and update the totals by hand. The primary column used to determine if a ticket should be on the ARC report was called "ARCNum". All tickets had an ARC Number which determined the ARC report it should be included with. Southwest tickets had a null in ARCNum.

I was doing some troubleshooting unrelated to this issue and what I need was a list of the distinct values and a count of how many rows each value had. I ran a simple query that loaded the distinct values into an temp table and then ran a row count for each value in the temp table. I knew this query would take a while so I ran it overnight.

The results were something like this:
ARCNum Count
ABC123 15,000
RCIP45 365,365
XYZPDQ 19,101
NULL 3,167
NULL 19

I wondered why there were 2 rows for NULL and when I asked the guys in product development I was told that it was one of those things that occassionally happened. I didn't like this answer so I kept looking. If I did a rowcount where ARCNum was null I still got the 3,167 but I couldn't find the missing 19 rows. I don't remember what made me think about it, but I changed my query from "where ARCNum is NULL" to "where ARCNum = "NULL". That gave me my missing 19 rows. The column was alphanumeric so it took "NULL" as a text string which meant it had an ACRNum of NULL rather than a null value in the column. Everything looked fine in the applications as null values were transferred as the text string "NULL" so it looked the same on that side.

I took my evidence to my contact in product development and a few hours latter got called into a meeting with my supervisor. The "meeting" consisted of me getting yelled at by the manager of product development for "reading SQL" which I hadn't been doing. I was told that if I were caught reading proprietary SQL again I could be terminated with cause.

I proceeded to mind my own business after that and ran a secret underground bug fix campaign where I fixed the Southwest-tickets-on-the-ARC-report issue only for the customers who I knew would keep quiet about it. My fix was simply changing the "NULL" values to a null value.

The best part was about a year after this occurred product development made a big deal out of a fix that solved the issue. Their fix was to add 600 lines to the ARC report SQL that reviewed each ticket and removed from the results any offending Southwest tickets.

How is this logically possible?

StevenLeonCooper

Thinker

GlennB

Loggerheaded, earth-vexing fustilarian

paulhutch

Master Poster

GlennB

Loggerheaded, earth-vexing fustilarian

StevenLeonCooper

Thinker

SUSpilot

Graduate Poster

GlennB

Loggerheaded, earth-vexing fustilarian

StevenLeonCooper

Thinker

aggle-rithm

Ardent Formulist

CORed

Penultimate Amazing

aggle-rithm

Ardent Formulist

aggle-rithm

Ardent Formulist

GlennB

Loggerheaded, earth-vexing fustilarian

Dragonrock

Militant Elvisian Tacoist