Karuta’s ASP & M$ SQLserver

Dicas, códigos e outras coisinhas de meus estudos para não esquecer…

Archive for agosto \31\UTC 2011

Gerar JSON em qualquer cláusula SQL – Consuming JSON Strings in SQL Server

Posted by karuta em agosto 31, 2011

CREATE PROCEDURE [dbo].[GetJSON] (
@ParameterSQL AS VARCHAR(MAX)
)
AS
BEGIN

DECLARE @SQL NVARCHAR(MAX)
DECLARE @XMLString VARCHAR(MAX)
DECLARE @XML XML
DECLARE @Paramlist NVARCHAR(1000)
SET @Paramlist = N'@XML XML OUTPUT'
SET @SQL = 'WITH PrepareTable (XMLString) '
SET @SQL = @SQL + 'AS ( '
SET @SQL = @SQL + @ParameterSQL+ ' FOR XML RAW, TYPE, ELEMENTS '
SET @SQL = @SQL + ') '
SET @SQL = @SQL + 'SELECT @XML = XMLString FROM PrepareTable '
EXEC sp_executesql @SQL, @Paramlist, @XML=@XML OUTPUT
SET @XMLString = CAST(@XML AS VARCHAR(MAX))

DECLARE @JSON VARCHAR(MAX)
DECLARE @Row VARCHAR(MAX)
DECLARE @RowStart INT
DECLARE @RowEnd INT
DECLARE @FieldStart INT
DECLARE @FieldEnd INT
DECLARE @KEY VARCHAR(MAX)
DECLARE @Value VARCHAR(MAX)

DECLARE @StartRoot VARCHAR(100); SET @StartRoot = '<row>'
DECLARE @EndRoot VARCHAR(100); SET @EndRoot = '</row>'
DECLARE @StartField VARCHAR(100); SET @StartField = '<'
DECLARE @EndField VARCHAR(100); SET @EndField = '>'

SET @RowStart = CharIndex(@StartRoot, @XMLString, 0)
SET @JSON = ''
WHILE @RowStart > 0
BEGIN
	SET @RowStart = @RowStart+Len(@StartRoot)
	SET @RowEnd = CharIndex(@EndRoot, @XMLString, @RowStart)
	SET @Row = SubString(@XMLString, @RowStart, @RowEnd-@RowStart)
	SET @JSON = @JSON+'{'

	-- for each row
	SET @FieldStart = CharIndex(@StartField, @Row, 0)
	WHILE @FieldStart > 0
	BEGIN
		-- parse node key
		SET @FieldStart = @FieldStart+Len(@StartField)
		SET @FieldEnd = CharIndex(@EndField, @Row, @FieldStart)
		SET @KEY = SubString(@Row, @FieldStart, @FieldEnd-@FieldStart)
		SET @JSON = @JSON+'"'+@KEY+'":'

		-- parse node value
		SET @FieldStart = @FieldEnd+1
		SET @FieldEnd = CharIndex('</', @Row, @FieldStart)
		SET @Value = SubString(@Row, @FieldStart, @FieldEnd-@FieldStart)
		SET @JSON = @JSON+'"'+@Value+'",'

		SET @FieldStart = @FieldStart+Len(@StartField)
		SET @FieldEnd = CharIndex(@EndField, @Row, @FieldStart)
		SET @FieldStart = CharIndex(@StartField, @Row, @FieldEnd)
	END	
	IF LEN(@JSON)>0 SET @JSON = SubString(@JSON, 0, LEN(@JSON))
	SET @JSON = @JSON+'},'
	--/ for each row

	SET @RowStart = CharIndex(@StartRoot, @XMLString, @RowEnd)
END
IF LEN(@JSON)>0 SET @JSON = SubString(@JSON, 0, LEN(@JSON))
SET @JSON = '[' + @JSON + ']'
SELECT @JSON

END

exemplo de uso:
EXEC GetJSON 'SELECT * FROM dbo.Employee_TBL'
retorno do select:
[{"UserId":"7C92EB27-DD81-498E-82CE-18192C940328","FirstName":"Bill","LastName":"Gates","Age":"45","LastLogin":"2008-11-02T00:00:00"},{"UserId":"58E75687-8D0F-423A-8A76-4B1D750F62FD","FirstName":"Steve","LastName":"Job","Age":"52","LastLogin":"2008-10-17T00:00:00"}]
Anúncios

Posted in sql server, store procedure | Leave a Comment »

SQL Server 2005: Using PARTITION and RANK in your criteria

Posted by karuta em agosto 10, 2011

The RANK and PARTITION features in 2005 are simply amazing.  They make so many “classic” SQL problems very easy to solve.  For example, consider the following table:

create table Batting
(Player varchar(10), Year int, Team varchar(10), HomeRuns int, primary key(Player,Year))insert into Batting
select ‘A’,2001,’Red Sox’,13 union all
select ‘A’,2002,’Red Sox’,23 union all
select ‘A’,2003,’Red Sox’,19 union all
select ‘A’,2004,’Red Sox’,14 union all
select ‘A’,2005,’Red Sox’,11 union all
select ‘B’,2001,’Yankees’,42 union all
select ‘B’,2002,’Yankees’,39 union all
select ‘B’,2003,’Yankees’,42 union all
select ‘B’,2004,’Yankees’,29 union all
select ‘C’,2002,’Yankees’,2 union all
select ‘C’,2003,’Yankees’,3 union all
select ‘C’,2004,’Red Sox’,6 union all
select ‘C’,2005,’Red Sox’,9

Suppose we would like to find out which year each player hit their most home runs, and which team they played for.  As a tie-breaker, return the latest year.

In SQL 2000, we had to do it in two steps:  First, get the MAX(HomeRuns) per player, and then join back to the Batting table to return the rest of the data:

select b.*
from
batting b
inner join
(    select player, Max(HomeRuns) as MaxHR
from Batting
group by player
) m
on b.Player = m.player and b.HomeRuns = m.MaxHRPlayer     Year        Team       HomeRuns
———- ———– ———- ———–
A          2002        Red Sox    23
B          2001        Yankees    42
B          2003        Yankees    42
C          2005        Red Sox    9

(4 row(s) affected)

Note that for player ‘B’, we get two rows back since he has two years that tie for the most home runs (2001 and 2003).  How do we return just the latest year?  Guess what — it requires *another* pass through the table:

select b.*
from
batting b
inner join
(    select player, Max(HomeRuns) as MaxHR
from Batting
group by player
) m
on b.Player = m.player and b.HomeRuns = m.MaxHR
inner join
(  select player, homeRuns, Max(Year) as MaxYear
from Batting
group by Player, HomeRuns
) y
on m.player= y.player and m.maxHR = y.HomeRuns and b.Year = y.MaxYearPlayer     Year        Team       HomeRuns
———- ———– ———- ———–
C          2005        Red Sox    9
A          2002        Red Sox    23
B          2003        Yankees    42

(3 row(s) affected)

An alternate way to do this is to calculate the “ranking” of each home run for each player, using a correlated subquery:

select b.*,
(select count(*) from batting b2 where b.player = b2.player and b2.HomeRuns >= b.HomeRuns) as Rank
from batting bPlayer     Year        Team       HomeRuns    Rank
———- ———– ———- ———– ———–
A          2001        Red Sox    13          4
A          2002        Red Sox    23          1
A          2003        Red Sox    19          2
A          2004        Red Sox    14          3
A          2005        Red Sox    11          5
B          2001        Yankees    42          2
B          2002        Yankees    39          3
B          2003        Yankees    42          2
B          2004        Yankees    29          4
C          2002        Yankees    2           4
C          2003        Yankees    3           3
C          2004        Red Sox    6           2
C          2005        Red Sox    9           1

However, notice that we still have not handled ties! (notice that Player “B” has no #1 ranking, just two #2 rankings!) To do that, we must make things a little more complicated:

select b.*,
(select count(*) from batting b2 where b.player = b2.player and (b2.HomeRuns > b.HomeRuns or (b2.HomeRuns = b.HomeRuns and b2.Year >= b.Year))) as Rank
from batting bPlayer     Year        Team       HomeRuns    Rank
———- ———– ———- ———– ———–
A          2001        Red Sox    13          4
A          2002        Red Sox    23          1
A          2003        Red Sox    19          2
A          2004        Red Sox    14          3
A          2005        Red Sox    11          5
B          2001        Yankees    42          2
B          2002        Yankees    39          3
B          2003        Yankees    42          1
B          2004        Yankees    29          4
C          2002        Yankees    2           4
C          2003        Yankees    3           3
C          2004        Red Sox    6           2
C          2005        Red Sox    9           1

(13 row(s) affected)

And, with that, we can use our “ranking” formula to return only the #1 rankings to get our results by moving the subquery to the WHERE clause:

select b.*
from batting b
where (select count(*) from batting b2 where b.player = b2.player and (b2.HomeRuns > b.HomeRuns or (b2.HomeRuns = b.HomeRuns and b2.Year >= b.Year))) =1Player     Year        Team       HomeRuns
———- ———– ———- ———–
C          2005        Red Sox    9
A          2002        Red Sox    23
B          2003        Yankees    42

(3 row(s) affected)

And that’s all you need to do this in SQL 2000 ! Easy, right?   Hmmm … well, maybe not!

The new ranking functions in SQL Server 2005 let us quickly calculate each row’s ranking within a set based on a partition and an ordering. Think of the partition as almost like a GROUP BY, where you the use of the word “per” in your specifications often indicate which columns you are grouping or partitioning on.  Since we want to return the top Home Run year per Player, we partition on Player.  This means that the rows for each Player will get a value of 1-x for their ranking.  The ranking itself is based on HomeRuns, so we want to return the rank for each partition ORDER’ed BY HomeRuns from high-to-low (DESC).

Using the new features, in SQL 2005 we can use the following SELECT to return each year’s Home Run ranking for each Player:

select Player, Year, HomeRuns, Rank() over (Partition BY Player order by HomeRuns DESC) as Rank
from
Batting

Now, like before, we have to deal with ties.  But now, it is much easier — we just add a secondary sort.  Since we want to the latest year to rank higher, we just add “Year DESC” to our ORDER BY:

select Player, Year, HomeRuns,Rank() over (Partition BY Player order by HomeRuns DESC, Year DESC) as Rank
from
Batting

Much easier than the old way!  However, we are still not done.  The above SQL returns all years with the Ranking beside it.  We are only interested in returning all rows with a Rank of 1.  However, if we try:

select Player, Year, HomeRuns
from
Batting
where
Rank() over (Partition BY Player order by HomeRuns DESC, Year DESC) = 1

We get an error:

Windowed functions can only appear in the SELECT or ORDER BY clauses.

We can easily fix this by wrapping the SELECT in a derived table:

select * from
(
select Player, Year, HomeRuns,  Rank() over (Partition BY Player order by HomeRuns DESC, Year DESC) as Rank
from
Batting
) tmp
where Rank = 1

And there you go … It’s easy to read and edit, and only 1 reference to the table is needed!  Preliminary testing also indicates that this is much more efficient than doing things “SQL 2000-style”.

Other challenging SQL statements to write in the pre-2005 world are also easily solved in SQL 2005.  For example, the classic SQL problem ofreturning the “top x per group” based on a sort, or returning a “row number” or sequence for an entire result set or groups within that set, are now very, very easy using the RANK() feature.  Try it out!

Fonte:  Jeff’s SQL Server Blog

Posted in store procedure | Leave a Comment »

SQL Server 2005: Using OVER() with Aggregate Functions

Posted by karuta em agosto 10, 2011

One of new features in SQL 2005 that I haven’t seen much talk about is that you can now add aggregate functions to any SELECT (even without a GROUP BY clause) by specifying an OVER() partition for each function. Unfortunately, it isn’t especially powerful, and you can’t do running totals with it, but it does help you make your code a little shorter and in many cases it might be just what you need.

The way it works is similar to joining an aggregated copy of a SELECT to itself. For example, consider the following:

select customerID, productID, orderDate, orderAmount
from Orders

customerID  productID   orderDate               orderAmount
----------- ----------- ----------------------- ---------------------
1           1           2007-01-01 00:00:00.000 20.00
1           2           2007-01-02 00:00:00.000 30.00
1           2           2007-01-05 00:00:00.000 23.00
1           3           2007-01-04 00:00:00.000 18.00
2           1           2007-01-03 00:00:00.000 74.00
2           1           2007-01-06 00:00:00.000 34.00
2           2           2007-01-08 00:00:00.000 10.00

(7 row(s) affected)

You can now easily return the total orderAmount per customer as an additional column in this SELECT, simply by adding an aggregate SUM() function with an OVER() clause:

select customerID,  productID, orderDate, orderAmount,
      sum(orderAmount) OVER (Partition by CustomerID) as Total
from Orders

customerID  productID   orderDate               orderAmount   Total
----------- ----------- ----------------------- ------------- ---------
1           1           2007-01-01 00:00:00.000 20.00         91.00
1           2           2007-01-02 00:00:00.000 30.00         91.00
1           2           2007-01-05 00:00:00.000 23.00         91.00
1           3           2007-01-04 00:00:00.000 18.00         91.00
2           1           2007-01-03 00:00:00.000 74.00         118.00
2           1           2007-01-06 00:00:00.000 34.00         118.00
2           2           2007-01-08 00:00:00.000 10.00         118.00

(7 row(s) affected)

The previous SQL is essentially shorthand for:

select
    o.customerID, o.productID, o.orderDate, o.orderAmount, t.Total
from
    Orders o
inner join
   (
    select customerID, sum(orderAmount) as Total from Orders group by customerID
   )
  t on t.customerID = o.customerID

since the two return the same results.

Note that the total returned using SUM(..) OVER (..) is not the total for the entire table, just for the scope of the SELECT where it is used. For example, if you add a filter to the SELECT to return only rows for ProductID 2, the totals will reflect that criteria as well:

select customerID,  productID, orderDate, orderAmount,
      sum(orderAmount) OVER (Partition by CustomerID) as Total
from Orders
where productID = 2

customerID  productID   orderDate               orderAmount   Total
----------- ----------- ----------------------- ------------  ------------
1           2           2007-01-02 00:00:00.000 30.00         53.00
1           2           2007-01-05 00:00:00.000 23.00         53.00
2           2           2007-01-08 00:00:00.000 10.00         10.00

(3 row(s) affected)

That is a nice advantage over the old way of linking to a derived table, since in that case you’d need to repeat the criteria for both the primary (outer) SELECT and also the derived table.

Typically, SUM(..) OVER(..) is most useful for calculating a percentage of a total for each row. For example, for each Order we can calculate the percentage of that order’s orderAmount compared to the customer’s total orderAmount:

select customerID,  productID, orderDate, orderAmount,
       orderAmount / sum(orderAmount) OVER (Partition by CustomerID) as Pct
from Orders

customerID  productID   orderDate               orderAmount   Pct
----------- ----------- ----------------------- ------------  -------
1           1           2007-01-01 00:00:00.000 20.00         0.2197
1           2           2007-01-02 00:00:00.000 30.00         0.3296
1           2           2007-01-05 00:00:00.000 23.00         0.2527
1           3           2007-01-04 00:00:00.000 18.00         0.1978
2           1           2007-01-03 00:00:00.000 74.00         0.6271
2           1           2007-01-06 00:00:00.000 34.00         0.2881
2           2           2007-01-08 00:00:00.000 10.00         0.0847

(7 row(s) affected)

Of course, be sure that you don’t encounter any divide by zero errors by using a CASE if necessary.

While I’ve made many references to using the SUM() function, of course this technique works with any of the other aggregate functions as well, such as MIN() or AVG(). For example, you could return only Orders where the orderAmount is below the average for the product that was ordered by writing:

select x.*
from
(
	select customerId, productID, orderDate, orderAmount,
		avg(orderAmount) over (partition by productID) as ProductAvg
	from orders
) x
where x.orderAmount < x.productAvg

customerId  productID   orderDate               orderAmount   ProductAvg
----------- ----------- ----------------------- ------------- -----------
1           1           2007-01-01 00:00:00.000 20.00         42.6666
2           1           2007-01-06 00:00:00.000 34.00         42.6666
2           2           2007-01-08 00:00:00.000 10.00         21.00

(3 row(s) affected)

It is my understanding that some SQL implementations allow you to use SUM(..) OVER (..) to calculate running totals for a SELECT, but unfortunately that does not appear to be possible using SQL Server 2005. However, there are other ways to accomplish this in T-SQL if you really need to; my general recommendation is to do this at your presentation layer if those totals are not needed for further processing at the database.

By Jeff Smith on 21 May 2007 – http://www.sqlteam.com/article/sql-sever-2005-using-over-with-aggregate-functions

Posted in store procedure | Leave a Comment »

SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Aggregates with the Over Clause

Posted by karuta em agosto 10, 2011

Aggregates with the Over Clause

You have likely heard the business term “Market Share”. If your company is the biggest and has sold 15 million units in an industry that has sold a total of 50 million units then your company’s market share is 30% (15/50 = .30). Market share represents your number divide by the sum of all other numbers. In JProCo the biggest grant (Ben@Moretechnology.com) is $41,000 and the total of all grants is $193,700. Therefore the Ben grant is 21.6% of the whole set of grants for the company.

The two simple queries in the figure below show all the Grant table records and the sum of the grant amounts.

If we want to show the total amount next to every record of the table – or just one record of the table – SQL Server gives us the same error. It does not find the supporting aggregated language needed to support the SUM( ) aggregate function.

Adding the OVER( ) clause allows us to see the total amount next to each grant. We see 193,700 next to each record in the result set.

The sum of all 10 grants is $193,700. Recall the largest single grant (007) is $41,000. Doing the quick math in our head, we recognize $41,000 is around 1/5of ~$200,000 and guesstimate that Grant 007 is just over 20% of the total.

Thanks to the OVER clause, there’s no need to guess. We can get the precise percentage. To accomplish this, we will add an expression that does the same math we did in our head. We want the new column to divide each grant amount by $193,700 (the total of all the grants).

By listing the total amount of all grants next to each individual grant, we automatically get a nice reference for how each individual grant compares to the total of all JProCo grants. The new column is added and confirms our prediction that Grant 007 represents just over 21% of all grants.

Notice that the figures in our new column appear as ratios. Percentages are 100 times the size of a ratio. Example:  the ratio 0.2116 represents a percentage of 21.16%. Multiplying a ratio by 100 will show the percentage. To finish, give the column a descriptive title, PercentOfTotal.

In today post we examined the basic over clause with an empty set of Parenthesis. The over clause actually have many variations which we will see in tomorrow’s post.

Note: If you want to setup the sample JProCo database on your system you can watch this video. For this post you will want to run the SQLQueriesChapter5.0Setup.sql script from Volume 2.

Reference:  Pinal Dave (http://blog.SQLAuthority.com)

 

Posted in store procedure | Leave a Comment »