闲时会写写程序的非技术人员。
分类: 数据库开发技术
2010-08-04 16:39:21
原文出处不详,大致意思,作者比较数据分别储存于文件和DB中的操作速度
文件在新建、更新、备份、可视化等方面优于DB,而且不用配DB服务器;DB在数据操作方面好于文件(sql语句),而且对于大数据量有优势。
试验结果,查找单个文件时,文件效率好于数据库,而数据库查找多个数据时效率要好于文件,且随着数据量的上升优势愈来愈明显。因此对于大量存储,最好还是使用数据库。
While writing , I’ve been debating with myself regarding my use of the filesystem as a datastore. While the filesystem certainly makes creating, updating, visualizing, backing up, and restoring data much easier than it would be in a database, it adds many hardships. First of all, the convenience of SQL is thrown out the window. While it is nice that using the filesystem doesn’t require a database server, not being able to use a database server means that more programming is involved. Additionally, things like searching through all the entries in the system become difficult, not to mention slow. Another downfall is that the filesystem limits the amount of metadata for each entry that can be kept in a simple fashion.
However, the biggest question on my mind was whether or not using a database server would be faster or slower when performing the most commonly requested actions: getting a list of recent items from the entire system, getting a list of items from a category, and getting one item. I decided to write a test case.
I created 6000 empty files in 20 directories. I also created a table in a mysql database that simulated the filesystem: name, mtime, dir, data. I added indexes on dir, name, and mtime. Then I started testing. In each case the test is run 10 times. Then the average is displayed. For the database tests, mysql_connect is called each time.
Getting the filenames of the 10 most recent entries from the entire system.
FILESYSTEM
TIME: 1.7814919948578 TIME: 1.7425200939178 TIME: 1.8071219921112 TIME: 1.6778069734573 TIME: 1.6711789369583 TIME: 1.7414019107819 TIME: 1.6959699392319 TIME: 1.6531630754471 TIME: 1.7546479701996 TIME: 1.6758890151978 TOT TIME: 17.201191902161 AVG TIME: 1.5637447183782DATABASE
TIME: 0.0039100646972656 TIME: 0.001039981842041 TIME: 0.00095093250274658 TIME: 0.00096702575683594 TIME: 0.00095295906066895 TIME: 0.00098395347595215 TIME: 0.0009620189666748 TIME: 0.0009760856628418 TIME: 0.00094294548034668 TIME: 0.00095808506011963 TOT TIME: 0.012644052505493 AVG TIME: 0.0011494593186812
Getting the filenames of the 10 most recent files in a single directory.
FILESYSTEM
TIME: 0.055459976196289 TIME: 0.053847074508667 TIME: 0.044721961021423 TIME: 0.043873071670532 TIME: 0.043742060661316 TIME: 0.043787956237793 TIME: 0.043717980384827 TIME: 0.04374098777771 TIME: 0.043833017349243 TIME: 0.04370105266571 TOT TIME: 0.46042513847351 AVG TIME: 0.041856830770319DATABASE
TIME: 0.0095839500427246 TIME: 0.0055500268936157 TIME: 0.005547046661377 TIME: 0.0055389404296875 TIME: 0.0056079626083374 TIME: 0.00553297996521 TIME: 0.005499005317688 TIME: 0.0055099725723267 TIME: 0.0053470134735107 TIME: 0.0053049325942993 TOT TIME: 0.059021830558777 AVG TIME: 0.0053656209598888
Getting one item.
FILESYSTEM
TIME: 0.00032293796539307 TIME: 0.00021898746490479 TIME: 0.00017297267913818 TIME: 0.00016999244689941 TIME: 0.00027298927307129 TIME: 0.00017201900482178 TIME: 0.00016689300537109 TIME: 0.00016403198242188 TIME: 0.0001760721206665 TIME: 0.00017201900482178 TOT TIME: 0.0020089149475098 AVG TIME: 0.0001826286315918DATABASE
TIME: 0.0042519569396973 TIME: 0.0011199712753296 TIME: 0.0010420083999634 TIME: 0.0010360479354858 TIME: 0.0010439157485962 TIME: 0.0010349750518799 TIME: 0.001041054725647 TIME: 0.0010310411453247 TIME: 0.0010330677032471 TIME: 0.0064520835876465 TOT TIME: 0.019086122512817 AVG TIME: 0.0017351020466198
The database was 1360 times faster than the filesystem when looking for the 10 most recent items in the entire system. The database was 7.8 times faster when looking for the 10 most recent items in a single directory. However, the filesystem was 9.5 times faster at getting a single file.
These numbers skew greater and greater towards the database as the number of items increases. And, in the one place that the filesystem wins, the operation being performed is so un-time-consuming in general, that the increase in the speed of the filesystem doesn’t amount to much.
These tests were performed with the database being on the same server as the running script. Additionally, the server performing these actions was, basically, not performing anything else at the time. If your database server is only accessible over a 2400bps modem link, your results will differ greatly. Additionally, if your database server is heavily loaded, while your web server isn’t, you may also see very different results.
Benchmarks are crap, for the most part. They don’t really mean a whole lot, unless they represent the exact cases in which you will be using the functions being tested. However, in this case, they DO represent exactly what I will be doing.
What does this mean? Inklog will no longer use the file system as its main method of data storage.