Monday, October 22, 2012

Goldmine - Exporting History and Email Bodies in One Step

STEP 1
First create a view in SQL that will do a left join of the ContHist and Mailbox tables. This means that it will pull in every single ContHist record and only pull in information from the Mailbox table where there is a match on the field they are joined on.

NOTE: I am only including the columns that I typically use in a migration, there are more ContHist and Mailbox columns to be aware of. Also I am only pulling the first 4,000 characters of both the notes and mail messages because of the limitations of Excel. As part of this view we are converting both Notes (on ContHist) and RFC822 (on Mailbox) from blob to varchar(4000).

SQL QUERY 1


GO
CREATE VIEW [dbo].[HistoryANDMail]
AS

Select

c.USERID, c.ACCOUNTNO, c.RECTYPE, c.ONDATE, c.REF as [Subject History - REF],
CAST(CAST(c.NOTES AS varbinary(4000)) AS varchar(4000)) AS NOTES,
c.LINKRECID, c.recid,

CAST(CAST(m.RFC822 AS varbinary(4000)) AS varchar(4000)) AS RFC822

From Conthist as c
Left Join MAILBOX as m
ON c.LINKRECID = m.recid
WHERE c.ACCOUNTNO is not null and datalength(c.ACCOUNTNO)>0

END SQL QUERY 1


STEP 2
Now you have a view that contains all ContHist data converted and ready to be exported to excel. I like to try to strip the HTML before going into excel because they excel files can be very large and hard to deal with. Here is the process for this. Following a great blog post you can create a function that removes HTML from specific columns.

Essentially copy and paste this into a query and run it, this will create a user defined function we will use later.

SQL QUERY 2


CREATE FUNCTION [dbo].[udf_StripHTML]
(@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0
AND @End > 0
AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

END SQL QUERY 2

STEP 3
Now we need to run an export task sending a query we will write into an excel file. Right click on the database > tasks > export data... When it comes time for the query enter the following.

NOTE: this has to be done in sets of 50,000 due to the size limitation of a spreadsheet in excel 2007 and I was unable to run this on a spreadsheet using 2010.



SQL QUERY 3


SELECT

USERID, ACCOUNTNO, RECTYPE, ONDATE, [Subject History - REF], 
dbo.udf_stripHTML(NOTES) as NOTES, LINKRECID, recid, dbo.udf_stripHTML(RFC822) as RFC822

from 

( SELECT
    ROW_NUMBER() OVER (ORDER BY recid ASC) AS ROW_NUMBER,
    *
  FROM dbo.FINALHistoryANDMail
) foo
WHERE ROW_NUMBER <= 50000


END SQL QUERY 3


Hope this helps! 

Strip HTML from SQL Columns

Just came across this, great info!

http://blog.sqlauthority.com/2007/06/16/sql-server-udf-user-defined-function-to-strip-html-parse-html-no-regular-expression/


Goldmin Use:
What I did was create a view converting the image fields in SQL to varchar, then when exporting the query I ran the udf_stripHTML(column) function so when the data is being exported to excel the HTML is coming off then.

SQL Row Count for All Tables

There have been times I want to see how many rows are in each table of a database without going into each table and running a query to see the row count. This returns a list of all tables with more than 2 rows of data and the row count for each table.

SELECT OBJECT_NAME(OBJECT_IDTableNamest.row_count
FROM sys.dm_db_partition_stats st
WHERE index_id 2
ORDER BY st.row_count DESC