Thursday, March 8, 2012

Audit Table - Design Question

I am looking for feedback on how to best design an auditing database for
tracking the progress of a file through various applications and vendors.
We need to be able to track delays in processing, as well as lost or missing
files.
We have a process where we send voice files out to a third party vendor to
have them transcribed, then returned to us as text files. I need to audit
the following events:
1. When was the voice file sent to the vendor
2. When was the voice file received by the vendor
3. When did the vendor return the transcribed text file to us
Each of these events could be tracked in a separate table, and if we apply
strict rules of normalization they probably should. For auditing reports I
would probably create a view with full outer joins on all three tables, or
many views with various joins for various purposes. It is expected that
these tables would contain up to one million rows each, with one and only
one row in each table corresponding to a row in each other table. In
theory, there could be errors at any stage of the process that results in
one event never occurring, effectively creating orphan records.
However, since this is strictly for auditing the progress of a particular
file through the systems, it seems to me that I could use a single table,
and update an existing row for each event, or create a new row if one does
not exist. This leaves me with questions regarding how to apply a primary
key and unique constraints. It also, however, would greatly speed up our
ability to run reports.
Here is a description of the events and data we will be tracking:
EVENT 1
On our end the voice data file names are unique, and we have the following
information available:
1.1. file name (strSentFileName)
1.2. IP address of the PC that submitted the file (strPC_ID)
1.3. ID of the person who dictated the voice file (strProviderID)
1.4. Date and time the file was submitted to the vendor (datSent)
EVENT 2
When the vendor receives the file, they send back to us a confirmation and w
e enter the following data:
2.1. Original File Name (strSentFileName)
2.2. New file name if applicable (strReturnFileName)
2.3. Unique identifier from their system (strJobID)
2.4. Date and time the confirmation was received (datConfirmed)
EVENT 3
When we receive the transcribed text file back from the vendor, we enter the
following data:
3.1. Unique identifier from their system (strJobID)
3.2. Date and time the final text file was received (datReturned)
Below is the DDL that I have thus far, but I am still mulling over how best
to design this. I originally thought of a rather denormalized approach to
this design, then rethought it and came up with a more normalized design.
Because we are auditing a process and could have errors that need to be
tracked, we expect orphans (or nulls) that complicate each approach. Any
advice as to how best design this would be much appreciated, as would
criticism of each of the approaches below.
Please excuse the camel case naming convention. I don't care for it myself,
but it is a standard within our environment.
--********************************
--Script to create a single, denormalized table for auditing
--********************************
-- No keys are defined, since any of the 3 events described above could fail
to occur
-- strSentFileName and strJobID should be either unique, or NULL
CREATE TABLE [dbo].[tblTranscriptionAudit] (
[strSentFileName] [varchar] (50) NULL ,
[strJobID] [varchar] (10) NULL ,
[strReturnFileName] [varchar] (50) NULL ,
[datSent] [smalldatetime] NULL ,
[datConfirmed] [smalldatetime] NULL ,
[datReturned] [smalldatetime] NULL ,
[strPC_IP] [varchar] (15) NULL ,
[strProviderID] [varchar] (10) NULL
) ON [PRIMARY]
GO
--********************************
--Script to create 3 normalized tables for auditing
--********************************
CREATE TABLE [dbo].[tblVoiceFilesSent] (
[strSentFileName] [varchar] (50) NOT NULL ,
[datSent] [smalldatetime] NOT NULL ,
[strPC_IP] [varchar] (15) NOT NULL ,
[strProviderID] [varchar] (50) NOT NULL
CONSTRAINT [PK_tblVoiceFilesSent] PRIMARY KEY CLUSTERED
([strSentFileName])
) ON [PRIMARY]
GO
-- note that the FK constraint here may have to be removed.
-- We could (due to errors) get a confirmation on a file that was never
submitted
CREATE TABLE [dbo].[tblReceiptConfirmed] (
[strSentFileName] [varchar] (50) NOT NULL ,
[strJobID] [varchar] (10) NOT NULL ,
[strReturnFileName] [varchar] (50) NOT NULL ,
[datConfirmed] [smalldatetime] NOT NULL
CONSTRAINT [PK_tblReceiptConfirmed] PRIMARY KEY CLUSTERED
([strJobID]),
CONSTRAINT [FK_tblReceiptConfirmed_tblVoiceFilesSen
t] FOREIGN KEY
([strSentFileName])
REFERENCES [tblVoiceFilesSent] ([strSentFileName])
ON DELETE CASCADE ON UPDATE CASCADE
) ON [PRIMARY]
GO
-- note that the FK constraint here may have to be removed.
-- We could (due to errors) get a returned text file that was never
submitted, or never confirmed
CREATE TABLE [dbo].[tblTextFileReturned] (
[strJobID] [varchar] (10) NOT NULL ,
[datReturned] [smalldatetime] NOT NULL
CONSTRAINT [PK_tblTextFileReturned] PRIMARY KEY CLUSTERED
([strJobID]),
CONSTRAINT [FK_tblTextFileReturned_tblReceiptConfir
med] FOREIGN KEY
([strJobID])
REFERENCES [tblReceiptConfirmed] ([strJobID])
ON DELETE CASCADE ON UPDATE CASCADE
) ON [PRIMARY]
GOOn Thu, 16 Mar 2006 14:36:28 -0500, Jim Underwood wrote:

>I am looking for feedback on how to best design an auditing database for
>tracking the progress of a file through various applications and vendors.
>We need to be able to track delays in processing, as well as lost or missin
g
>files.
(snip description)
>Here is a description of the events and data we will be tracking:
>EVENT 1
>On our end the voice data file names are unique, and we have the following
>information available:
>1.1. file name (strSentFileName)
>1.2. IP address of the PC that submitted the file (strPC_ID)
>1.3. ID of the person who dictated the voice file (strProviderID)
>1.4. Date and time the file was submitted to the vendor (datSent)
>EVENT 2
>When the vendor receives the file, they send back to us a confirmation and
w
>e enter the following data:
>2.1. Original File Name (strSentFileName)
>2.2. New file name if applicable (strReturnFileName)
>2.3. Unique identifier from their system (strJobID)
>2.4. Date and time the confirmation was received (datConfirmed)
>EVENT 3
>When we receive the transcribed text file back from the vendor, we enter th
e
>following data:
>3.1. Unique identifier from their system (strJobID)
>3.2. Date and time the final text file was received (datReturned)
Hi Jim,
Is it also possible that any of the above attributes are changed later
(other than for the obvious reason of correcting an input error)? And if
they are ever changed, do you need to keep a history of the previously
entered data, and when and by who the data was changed?
If the answer is no, then I don't think you need a seperate audit table.
In fact, I wouldn't use the term auditing at all, since people associate
that term with a full logging of who changed what data when.
All the above are, in my eyes, attributes of your voice files. Some of
those attributes are related to events (e.g. DateSent, DateConfirmed,
DateReturned), some are not directly related to events (though you
happen to receive that bit of information exactly when an event
happens).
I think you can add all these attrbitues to your existing VoiceFiles
table (assuming that you have any).
Here's how I would design the table (I took the liberty of losing the
silly str-, dat-, and tbl-prefixes, adding primary key constraints and
NOT NULL constraints and improving some column names. I also added some
check constraints to ensure the sanity of the column contents).
Note: If the ID of the person who dictated the file is already known
when the VoiceFiule is entered in the DB (which I can imagine), then
change the DEFAULT NULL to NOT NULL for the relevant column.
CREATE TABLE VoiceFiles
(VoiceFileName varchar(50) NOT NULL,
-- NOTE: Replaces "strSentFileName" column
-- Columns not mentioned in this message but already present
-- in your current VoiceFiles table.
Sender_PC_IPAddr varchar(15) DEFAULT NULL,
Provider_ID varchar(10) DEFAULT NULL,
-- NOTE: I dislike the name, but the obvious
-- alternative ("dictator") is no good either
DateSentToVendor smalldatetime DEFAULT NULL,
FileNameFromVendor varchar(50) DEFAULT NULL,
VendorJobID varchar(10) DEFAULT NULL,
DateConfirmedByVendor smalldatetime DEFAULT NULL,
DateReturnedFromVendor smalldatetime DEFAULT NULL,
PRIMARY KEY (VoiceFileName),
FOREIGN KEY (Sender_PC_IPAddr) REFERENCES PCs,
FOREIGN KEY (Provider_D) REFERENCES Persons,
CHECK (CASE WHEN DateSentToVendor IS NULL
THEN CASE WHEN Sender_PC_IPAddr IS NULL
AND ProviderID IS NULL
AND DateConfirmedByVendor IS NOT NULL
THEN 'Good'
ELSE 'Bad'
END
ELSE CASE WHEN Sender_PC_IPAddr IS NOT NULL
AND ProviderID IS NOT NULL
THEN 'Good'
ELSE 'Bad'
END
END = 'Good')
CHECK (CASE WHEN DateConfirmedByVendor IS NULL
THEN CASE WHEN FileNameFromVendor IS NULL
AND VendorJobID IS NULL
AND DateReturnedFromVendor IS NOT NULL
THEN 'Good'
ELSE 'Bad'
END
ELSE CASE WHEN FileNameFromVendor IS NOT NULL
AND DateReturnedFromVendor IS NOT NULL
THEN 'Good'
ELSE 'Bad'
END
END = 'Good')
CHECK (DateConfirmedByVendor <= DateSentToVendor)
CHECK (DateReturnedFromVendor <= DateSentToVendor)
CHECK (DateReturnedFromVendor <= DateConfirmedByVendor)
)
I didn't add a UNIQUE constraint for VendorJobID, since SQL Server
insists on ignoring the standard and rejecting a second NULL entry in a
nullable column with UNIQUE constraint. There are workaround taht allow
you to create a constraint to mimic the way UNIQUE should behave; google
for them or ask in a follow-up message.
(snip)
>-- note that the FK constraint here may have to be removed.
>-- We could (due to errors) get a confirmation on a file that was never
>submitted
(snip)
>-- note that the FK constraint here may have to be removed.
>-- We could (due to errors) get a returned text file that was never
>submitted, or never confirmed
I recommend putting those confirmations and those returned text files in
a seperate table. After all, I expect (and hope!) that your company
won't handle these confirmations and returned text files the same way
that other confirmations and returned text files are handled.
Hugo Kornelis, SQL Server MVP|||Hugo,
Thanks for all your feedback. I think I will go with a separate table
(probably two) for any later transactions that occur without the presence of
an originating transaction. You are absolutely correct that these will not
be handled in the same way as the other (non-orphaned) transactions. Doing
this also alleviates my concerns about defining a primary key on
VoiceFileName, which was my main stumbling block with this setup.
I know what you are saying about the unique constraint with nulls. I
bookmarked a page or two regarding "nullbusters" when I first saw it, and I
think I may use a similar approach here.
The silly prefixes I am stuck with, unfortunately (internal development
standards) but I do find your column names to be more intuitive than what I
had scribbled down.
All in all your feedback has been a big help.
Thank you!
"Hugo Kornelis" <hugo@.perFact.REMOVETHIS.info.INVALID> wrote in message
news:i3nr12hbps3nj374sof83ab758q0pc07be@.
4ax.com...
> On Thu, 16 Mar 2006 14:36:28 -0500, Jim Underwood wrote:
> Hi Jim,
> Is it also possible that any of the above attributes are changed later
> (other than for the obvious reason of correcting an input error)? And if
> they are ever changed, do you need to keep a history of the previously
> entered data, and when and by who the data was changed?
> If the answer is no, then I don't think you need a seperate audit table.
> In fact, I wouldn't use the term auditing at all, since people associate
> that term with a full logging of who changed what data when.
> All the above are, in my eyes, attributes of your voice files. Some of
> those attributes are related to events (e.g. DateSent, DateConfirmed,
> DateReturned), some are not directly related to events (though you
> happen to receive that bit of information exactly when an event
> happens).
> I think you can add all these attrbitues to your existing VoiceFiles
> table (assuming that you have any).
> Here's how I would design the table (I took the liberty of losing the
> silly str-, dat-, and tbl-prefixes, adding primary key constraints and
> NOT NULL constraints and improving some column names. I also added some
> check constraints to ensure the sanity of the column contents).
> Note: If the ID of the person who dictated the file is already known
> when the VoiceFiule is entered in the DB (which I can imagine), then
> change the DEFAULT NULL to NOT NULL for the relevant column.
> CREATE TABLE VoiceFiles
> (VoiceFileName varchar(50) NOT NULL,
> -- NOTE: Replaces "strSentFileName" column
> -- Columns not mentioned in this message but already present
> -- in your current VoiceFiles table.
> Sender_PC_IPAddr varchar(15) DEFAULT NULL,
> Provider_ID varchar(10) DEFAULT NULL,
> -- NOTE: I dislike the name, but the obvious
> -- alternative ("dictator") is no good either
> DateSentToVendor smalldatetime DEFAULT NULL,
> FileNameFromVendor varchar(50) DEFAULT NULL,
> VendorJobID varchar(10) DEFAULT NULL,
> DateConfirmedByVendor smalldatetime DEFAULT NULL,
> DateReturnedFromVendor smalldatetime DEFAULT NULL,
> PRIMARY KEY (VoiceFileName),
> FOREIGN KEY (Sender_PC_IPAddr) REFERENCES PCs,
> FOREIGN KEY (Provider_D) REFERENCES Persons,
> CHECK (CASE WHEN DateSentToVendor IS NULL
> THEN CASE WHEN Sender_PC_IPAddr IS NULL
> AND ProviderID IS NULL
> AND DateConfirmedByVendor IS NOT NULL
> THEN 'Good'
> ELSE 'Bad'
> END
> ELSE CASE WHEN Sender_PC_IPAddr IS NOT NULL
> AND ProviderID IS NOT NULL
> THEN 'Good'
> ELSE 'Bad'
> END
> END = 'Good')
> CHECK (CASE WHEN DateConfirmedByVendor IS NULL
> THEN CASE WHEN FileNameFromVendor IS NULL
> AND VendorJobID IS NULL
> AND DateReturnedFromVendor IS NOT NULL
> THEN 'Good'
> ELSE 'Bad'
> END
> ELSE CASE WHEN FileNameFromVendor IS NOT NULL
> AND DateReturnedFromVendor IS NOT NULL
> THEN 'Good'
> ELSE 'Bad'
> END
> END = 'Good')
> CHECK (DateConfirmedByVendor <= DateSentToVendor)
> CHECK (DateReturnedFromVendor <= DateSentToVendor)
> CHECK (DateReturnedFromVendor <= DateConfirmedByVendor)
> )
> I didn't add a UNIQUE constraint for VendorJobID, since SQL Server
> insists on ignoring the standard and rejecting a second NULL entry in a
> nullable column with UNIQUE constraint. There are workaround taht allow
> you to create a constraint to mimic the way UNIQUE should behave; google
> for them or ask in a follow-up message.
> (snip)
> (snip)
> I recommend putting those confirmations and those returned text files in
> a seperate table. After all, I expect (and hope!) that your company
> won't handle these confirmations and returned text files the same way
> that other confirmations and returned text files are handled.
> --
> Hugo Kornelis, SQL Server MVP

No comments:

Post a Comment