Tuesday, March 27, 2012

Auto created statistics and missing statistics

Hello group.

I have an issue, which has bothered me for a while now:

I'm wondering why the column statistics, which SQL Server wants me to
create, if I turn off auto-created statistics, are so important to the
optimizer?

Example: from Northwind (with auto create stats off), I do the following:

SELECT * FROM Customers WHERE Country = 'Sweden'

My query plan show a clustered index scan, which is expected - no index
exists for Country. BUT, the query plan also shows, that the optimizer is
missing a statistic on Country, which tells me, that the optimizer would
benefit from knowing this.

I cannot see why? (and I've been trying for a while now).

If I create the missing statistics, nothing happens in the query plan (and
why should it?). I could understand it, if the optimizer suggested an index
on Country - this would make sense, but if creating the missing index, query
analyzer creates the statistics with an empty index, which seems to me to be
less than usable.

I've been thinking long and hard about this, but haven't been able to reach
a conclusion :) It has some relevance to my work, because allowing the
optimizer to create missing statistics limits my options for designing
indexes (e.g. covering) for some rather wide tables, so I'm thinking why not
turn it off altogether. But I would like to know the consequences - hope
somebody has already delved into this, and knows a good explanation.

Rgds
Jesper"Jesper Jensen" <moellemand@.post.tdcadsl.dk> wrote in message
news:40727f2e$0$237$edfadb0f@.dread12.news.tele.dk. ..
> Hello group.
> I have an issue, which has bothered me for a while now:
> I'm wondering why the column statistics, which SQL Server wants me to
> create, if I turn off auto-created statistics, are so important to the
> optimizer?
> Example: from Northwind (with auto create stats off), I do the following:
> SELECT * FROM Customers WHERE Country = 'Sweden'
> My query plan show a clustered index scan, which is expected - no index
> exists for Country. BUT, the query plan also shows, that the optimizer is
> missing a statistic on Country, which tells me, that the optimizer would
> benefit from knowing this.
> I cannot see why? (and I've been trying for a while now).
> If I create the missing statistics, nothing happens in the query plan (and
> why should it?). I could understand it, if the optimizer suggested an
index
> on Country - this would make sense, but if creating the missing index,
query
> analyzer creates the statistics with an empty index, which seems to me to
be
> less than usable.
> I've been thinking long and hard about this, but haven't been able to
reach
> a conclusion :) It has some relevance to my work, because allowing the
> optimizer to create missing statistics limits my options for designing
> indexes (e.g. covering) for some rather wide tables, so I'm thinking why
not
> turn it off altogether. But I would like to know the consequences - hope
> somebody has already delved into this, and knows a good explanation.
> Rgds
> Jesper

http://msdn.microsoft.com/library/d...l/statquery.asp

Simon|||Thanks, Simon, informative article, but ...

... it doesn't really explain the stuff, that I wrote. The closest I get to
an explanation, when reading this is 'These statistics are created for
columns where the optimizer would have to estimate the approximate density
or distribution otherwise'.

I knew this, but I still do not know, why the optimizer needs to know the
density and/or distribution?? I can see no valid reason, and therefore I can
see no good reason for enabling auto-creation of stats.

What I probably looking for is a good example, where the use of an
automatically created stat saves time, cycles and IOs :)

Best Rgds - Jesper

"Simon Hayes" <sql@.hayes.ch> skrev i en meddelelse
news:4072f05a$1_2@.news.bluewin.ch...
> "Jesper Jensen" <moellemand@.post.tdcadsl.dk> wrote in message
> news:40727f2e$0$237$edfadb0f@.dread12.news.tele.dk. ..
> > Hello group.
> > I have an issue, which has bothered me for a while now:
> > I'm wondering why the column statistics, which SQL Server wants me to
> > create, if I turn off auto-created statistics, are so important to the
> > optimizer?
> > Example: from Northwind (with auto create stats off), I do the
following:
> > SELECT * FROM Customers WHERE Country = 'Sweden'
> > My query plan show a clustered index scan, which is expected - no index
> > exists for Country. BUT, the query plan also shows, that the optimizer
is
> > missing a statistic on Country, which tells me, that the optimizer would
> > benefit from knowing this.
> > I cannot see why? (and I've been trying for a while now).
> > If I create the missing statistics, nothing happens in the query plan
(and
> > why should it?). I could understand it, if the optimizer suggested an
> index
> > on Country - this would make sense, but if creating the missing index,
> query
> > analyzer creates the statistics with an empty index, which seems to me
to
> be
> > less than usable.
> > I've been thinking long and hard about this, but haven't been able to
> reach
> > a conclusion :) It has some relevance to my work, because allowing the
> > optimizer to create missing statistics limits my options for designing
> > indexes (e.g. covering) for some rather wide tables, so I'm thinking why
> not
> > turn it off altogether. But I would like to know the consequences - hope
> > somebody has already delved into this, and knows a good explanation.
> > Rgds
> > Jesper
>
http://msdn.microsoft.com/library/d...l/statquery.asp
> Simon|||"Jesper Jensen" <moellemand@.post.tdcadsl.dk> wrote in message
news:4072f91b$0$300$edfadb0f@.dread12.news.tele.dk. ..
> Thanks, Simon, informative article, but ...
> ... it doesn't really explain the stuff, that I wrote. The closest I get
to
> an explanation, when reading this is 'These statistics are created for
> columns where the optimizer would have to estimate the approximate density
> or distribution otherwise'.
> I knew this, but I still do not know, why the optimizer needs to know the
> density and/or distribution?? I can see no valid reason, and therefore I
can
> see no good reason for enabling auto-creation of stats.
> What I probably looking for is a good example, where the use of an
> automatically created stat saves time, cycles and IOs :)
> Best Rgds - Jesper

OK, here's another informative article :-)

http://www.winnetmag.com/SQLServer/...2075/22075.html

In summary, index statistics exist only for the first column in an index,
but auto-created (or manually created) statistics can exist for any column.
This gives the optimizer extra information, which might mean it chooses a
different, more efficient index for a query.

Check out the example on the second page of the article - on my system, this
reduced the logical reads required for the query from 104 to 43.

But you're correct to consider that there can be an impact on performance in
some situations:

http://support.microsoft.com/defaul...kb;en-us;195565

Simon|||Thanks, Simon, that one did the trick.

One less mystery.

On my machine, QA tells me that the two queries (the index scan on
ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost
respectively. I would argue, that this saving is not worth the 'used up
index space'. In my professional life, I've seen tables, which are wide
enough (200+ columns) to demand, that precious index space is saved.

Basically, I think there are too many 'ifs' before an auto-created index
saves performance, but I appreciate the optimization idea behind it.

Thanks - Jesper

"Simon Hayes" <sql@.hayes.ch> skrev i en meddelelse
news:4073121e_1@.news.bluewin.ch...
> "Jesper Jensen" <moellemand@.post.tdcadsl.dk> wrote in message
> news:4072f91b$0$300$edfadb0f@.dread12.news.tele.dk. ..
> > Thanks, Simon, informative article, but ...
> > ... it doesn't really explain the stuff, that I wrote. The closest I get
> to
> > an explanation, when reading this is 'These statistics are created for
> > columns where the optimizer would have to estimate the approximate
density
> > or distribution otherwise'.
> > I knew this, but I still do not know, why the optimizer needs to know
the
> > density and/or distribution?? I can see no valid reason, and therefore I
> can
> > see no good reason for enabling auto-creation of stats.
> > What I probably looking for is a good example, where the use of an
> > automatically created stat saves time, cycles and IOs :)
> > Best Rgds - Jesper
> OK, here's another informative article :-)
> http://www.winnetmag.com/SQLServer/...2075/22075.html
> In summary, index statistics exist only for the first column in an index,
> but auto-created (or manually created) statistics can exist for any
column.
> This gives the optimizer extra information, which might mean it chooses a
> different, more efficient index for a query.
> Check out the example on the second page of the article - on my system,
this
> reduced the logical reads required for the query from 104 to 43.
> But you're correct to consider that there can be an impact on performance
in
> some situations:
> http://support.microsoft.com/defaul...kb;en-us;195565
> Simon|||"Jesper Jensen" <moellemand@.post.tdcadsl.dk> wrote in message news:<40732877$0$274$edfadb0f@.dread12.news.tele.dk>...
> Thanks, Simon, that one did the trick.
> One less mystery.
> On my machine, QA tells me that the two queries (the index scan on
> ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost
> respectively. I would argue, that this saving is not worth the 'used up
> index space'. In my professional life, I've seen tables, which are wide
> enough (200+ columns) to demand, that precious index space is saved.
> Basically, I think there are too many 'ifs' before an auto-created index
> saves performance, but I appreciate the optimization idea behind it.
> Thanks - Jesper

<snip
Well, you have to be careful about reaching conclusions based on
simple queries using small data sets. It's possible that a complex
join involving millions of rows would give a more significant
difference. To get a definite answer for your environment, you would
have to do some benchmarking, with and without statistics.

Simon

No comments:

Post a Comment