Loading...

h2-database@googlegroups.com

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: Compact databases without having to close the database (vacuum) sheng Fri Oct 29 16:00:11 2010

On Sep 24, 9:14 am, Thomas Mueller <[EMAIL PROTECTED]>
wrote:
Hi Thomas,

Thanks for the response and sorry for the late reply. Here are some
investigations I've done so far.

> Do you use some kind of clustering? If not, how do you ensure the
> servers (hardware) doesn't fail?

We don't have any kind of clustering, but we do have a redundancy
mechanism for the h2 database.

>
> Did you check the heap memory? A slowdown can be caused by a memory
> leak. To verify, run jps -l (to get the process id) and then "jmap
> -histo <pid>". Please send me or post the result of a slow process.

I have checked the history trend of the heap memory usage. I don't
think there is a memory leak.

>
> If this is not the problem, could you send me a "slow" and a "fast"
> database? Maybe the index pages get "almost empty" somehow.
>

Sorry. I couldn't send you the dump, but I am working on a SQL script
that simulates the operation in production to see if I can reproduce
the problem. I hope I can get back with the SQL script soon. But the
following is what I am going to put in the SQL script. The following
operations was done on one of the huge table on my slow prod dump.

1. Assume the table in the initial database is empty.
2. Insert the hour X data for 8000 distinct ID. (8000 rows)
3. Assume each ID has 1000 updates within the hour X. Update 8000 rows
for (8000
* 1000) times in random fashion.
4. Repeat 2. and 3. for 24 times to simulate the operations executed
within a
day. Notice that the primary key is the (ID, start_time).
5. Repeat 4. for 6 times to simulate the operations executed within 6
days.

The purpose of this test is to see if the operations will cause
fragments in the database file over a long time, which might be the
results of slow query. I am curious how will the records being stored
in the database file after the operations. I am wondering what kind of
file organization technique is used in H2?

> If you can't send the database, could you run the Recover tool and
> send me or post the "Statistics" section of a slow and a fast
> database? Please not the databases should contain a similar amount of
> data, otherwise the comparison is meaningless.

Here are the statistics of three dump from my prod environment.
1. Fast database with only around one day of data.
---- Statistics ----------
-- page count: 127878 free: 2250
-- page data head: 4911650 empty: 166158508 rows: 52092210
-- page count type: 0 1% count: 1652
-- page count type: 1 85% count: 108966
-- page count type: 2 3% count: 4417
-- page count type: 4 8% count: 10523
-- page count type: 5 0% count: 197
-- page count type: 6 0% count: 8
-- page count type: 7 0% count: 5
-- page count type: 8 1% count: 2107


2. Slow database with around 6 days of data.
---- Statistics ----------
-- page count: 1141210 free: 0
-- page data head: 40972096 empty: 1627068176 rows: 404881840
-- page count type: 1 88% count: 1012169
-- page count type: 2 3% count: 37966
-- page count type: 4 7% count: 89348
-- page count type: 5 0% count: 1654
-- page count type: 6 0% count: 70


3. Same slow database as 2., but ran SHUTDOWN DEFRAG before Recover
tool.
---- Statistics ----------
-- page count: 1141210 free: 0
-- page data head: 40972096 empty: 1627068161 rows: 404881855
-- page count type: 1 88% count: 1012169
-- page count type: 2 3% count: 37966
-- page count type: 4 7% count: 89348
-- page count type: 5 0% count: 1654
-- page count type: 6 0% count: 70

> Another possible reason could be the order of pages stored on disk.
> For a large database it can be important, specially when using media
> with very long seek times (such as CD or DVD). It can also be a
> problem for hard disks, but usually not. There is a new experimental
> feature to re-organize the pages: SHUTDOWN DEFRAG.
>

The following is a continuous experiment on the slow database. The
test was running on H2 version 1.2.144 using the slow database dump
from H2 version 1.2.135.

The size of the slow H2 database dump is 2.6GB. One of the huge table
has 990401 rows. I tried to run the following query that needs table
scanning against that huge table. The query took 30 minutes to finish.
I thought the table scanning shouldn't take so long.

--The slowest query in this analysis------------
SELECT
  *
FROM DATA_TABLE
ORDER BY ID, START_TIME DESC
LIMIT 1;
...
(1 row, 1733484 ms)
-----------------------------------------------


I used the SHUTDOWN DEFRAG to re-order the pages in the dump. The same
query took only
30 seconds to finish, which is 60 times faster than before SHUTDOWN
DEFRAG.

--Re-run the slowest query in the analysis-------
SELECT
  *
FROM DATA_TABLE
ORDER BY ID, START_TIME DESC
LIMIT 1;
...
(1 row, 27808 ms)
-----------------------------------------------

Regards,
Sheng

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [EMAIL PROTECTED]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.