Okay, here is the deal, I am working on a project which requires me to store files in my database. But I have recently just switched to postgresql from mysql, so I had to do some research on what methods postgresql offers for file storage. In mysql GLOB would have been my choice, but postgresql offers two methods, BYTEA which is postgresql’s answer to GLOB, or store the files in Large Objects (oid).
I asked around to find out what method would be the fastes, but I could not find the answer I was looking for, so I ended up spending the night testing it for myself, this blog will show the results.
The scripts
byteaTest_save.php
ini_set('max_execution_time', '0'); ini_set('memory_limit', '128M'); $time = microtime(); $time = explode(' ', $time); $time = $time[1] + $time[0]; $begintime = $time; /* bytea test DB: CREATE TABLE images ( id serial NOT NULL, "name" text, image bytea, CONSTRAINT images_pkey PRIMARY KEY (id) ) WITH (OIDS=FALSE); ALTER TABLE images OWNER TO "storeTest"; */ function saveBytea($Conn, $Filename) { global $Dir; $FileContent = pg_escape_bytea(file_get_contents($Dir.'/'.$Filename)); $Query = "INSERT INTO images (name, image) VALUES ('".$Filename."', '".$FileContent."');"; pg_query($Conn, $Query); unset($FileContent); } $strConn = "host=localhost dbname=byteaTest user=storeTest password=test sslmode=require"; $Conn = pg_connect($strConn); global $Dir; $Dir = "/mnt/hdb/Pictures/BU billeder"; $Handle = opendir($Dir); while(false !== ($File = readdir($Handle))) { if($File !== "." && $File !== "..") { saveBytea($Conn, $File); } } closedir($Handle); pg_close($Conn); $time = microtime(); $time = explode(" ", $time); $time = $time[1] + $time[0]; $endtime = $time; $totaltime = ($endtime - $begintime); $totaltime = sprintf("%.4f", $totaltime); $Memory = memory_get_peak_usage(); echo "Execution time: " .$totaltime. " seconds. "; echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";
byteaTest_load.php
ini_set('max_execution_time', '0'); ini_set('memory_limit', '128M'); $time = microtime(); $time = explode(' ', $time); $time = $time[1] + $time[0]; $begintime = $time; /* bytea test DB: CREATE TABLE images ( id serial NOT NULL, "name" text, image bytea, CONSTRAINT images_pkey PRIMARY KEY (id) ) WITH (OIDS=FALSE); ALTER TABLE images OWNER TO "storeTest"; */ function loadBytea($Conn, $ID) { global $Dir; $Query = pg_query($Conn, "SELECT name, image FROM images WHERE id=".$ID.";"); $Row = pg_fetch_row($Query); $FileContent = pg_unescape_bytea($Row[1]); file_put_contents($Dir.'/'.$Row[0], $FileContent); unset($FileContent); } $strConn = "host=localhost dbname=byteaTest user=storeTest password=test sslmode=require"; $Conn = pg_connect($strConn); global $Dir; $Dir = "/mnt/hdb/oidVsbytea"; $Query = pg_query("SELECT id FROM images;") or die(pg_last_error($Conn)); while($Row = pg_fetch_assoc($Query)) { loadBytea($Conn, $Row['id']); } pg_close($Conn); $time = microtime(); $time = explode(" ", $time); $time = $time[1] + $time[0]; $endtime = $time; $totaltime = ($endtime - $begintime); $totaltime = sprintf("%.4f", $totaltime); $Memory = memory_get_peak_usage(); echo "Execution time: " .$totaltime. " seconds.\n"; echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";
oidTest_save.php
ini_set('max_execution_time', '0'); ini_set('memory_limit', '128M'); $time = microtime(); $time = explode(' ', $time); $time = $time[1] + $time[0]; $begintime = $time; /* oid test DB: CREATE TABLE images ( id serial NOT NULL, "name" text, image oid, CONSTRAINT images_pkey PRIMARY KEY (id) ) WITH (OIDS=FALSE); ALTER TABLE images OWNER TO "storeTest"; */ function saveOid($Conn, $Filename) { global $Dir; $FileContent = file_get_contents($Dir.'/'.$Filename); pg_query($Conn, "begin;"); $Oid = pg_lo_create($Conn); $Query = "INSERT INTO images (name, image) VALUES ('".$Filename."', '".$Oid."');"; pg_query($Conn, $Query); $OidHandle = pg_lo_open($Conn, $Oid, "w"); pg_lo_write($OidHandle, $FileContent); pg_lo_close($OidHandle); pg_query($Conn, "commit;"); unset($FileContent); } $strConn = "host=localhost dbname=oidTest user=storeTest password=test sslmode=require"; $Conn = pg_connect($strConn); global $Dir; $Dir = "/mnt/hdb/Pictures/BU billeder"; $Handle = opendir($Dir); while(false !== ($File = readdir($Handle))) { if($File !== "." && $File !== "..") { saveOid($Conn, $File); } } closedir($Handle); pg_close($Conn); $time = microtime(); $time = explode(" ", $time); $time = $time[1] + $time[0]; $endtime = $time; $totaltime = ($endtime - $begintime); $totaltime = sprintf("%.4f", $totaltime); $Memory = memory_get_peak_usage(); echo "Execution time: " .$totaltime. " seconds.\n"; echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";
oidTest_load.php
ini_set('max_execution_time', '0'); ini_set('memory_limit', '128M'); $time = microtime(); $time = explode(' ', $time); $time = $time[1] + $time[0]; $begintime = $time; /* bytea test DB: CREATE TABLE images ( id serial NOT NULL, "name" text, image bytea, CONSTRAINT images_pkey PRIMARY KEY (id) ) WITH (OIDS=FALSE); ALTER TABLE images OWNER TO "storeTest"; */ function loadOid($Conn, $ID) { global $Dir; $Query = pg_query($Conn, "SELECT name, image FROM images WHERE id=".$ID.";"); $Row = pg_fetch_row($Query); pg_query($Conn, "begin;"); $Handle = pg_lo_open($Conn, $Row[1], "r"); $FileContent = pg_lo_read($Handle, 4194304); pg_query($Conn, "commit;"); file_put_contents($Dir.'/'.$Row[0], $FileContent); unset($FileContent); } $strConn = "host=localhost dbname=oidTest user=storeTest password=test sslmode=require"; $Conn = pg_connect($strConn); global $Dir; $Dir = "/mnt/hdb/oidVsbytea"; $Query = pg_query("SELECT id FROM images;") or die(pg_last_error($Conn)); while($Row = pg_fetch_assoc($Query)) { loadOid($Conn, $Row['id']); } pg_close($Conn); $time = microtime(); $time = explode(" ", $time); $time = $time[1] + $time[0]; $endtime = $time; $totaltime = ($endtime - $begintime); $totaltime = sprintf("%.4f", $totaltime); $Memory = memory_get_peak_usage(); echo "Execution time: " .$totaltime. " seconds.\n"; echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";
oidTestClear.php
$strConn = "host=localhost dbname=oidTest user=storeTest password=test sslmode=require"; $Conn = pg_connect($strConn); $Query = pg_query($Conn, "SELECT image FROM images"); while($R = pg_fetch_assoc($Query)) { pg_lo_unlink($Conn, $R['image']); }
NB: These scripts might / or might not be optimized, if you got any changes that might give a better result, please leave a comment.
The test
Every script was run five times, everytime with a clean database and a database restart. The oidTest took abit longer, since I had to unlink every oid that was created and then clear the database (oidTestClear.php).
The scripts was using a dir I have with 1075 pictures, filesizes is from 6kb to 3724kb.
Server hardware:
AMD Athlon XP 2200+ (Clocked to 1.62Ghz)
256Mb Ram
The results
1 | 2 | 3 | 4 | 5 | Average | Memory | |
---|---|---|---|---|---|---|---|
bytea – save | 402,79s | 379,63s | 380,25s | 378,39s | 377,82s | 383,78s | 41,21MB |
bytea – load | 228,42s | 228,09s | 227,85s | 228,35s | 228,48s | 228,24s | 18,51MB |
oid – save | 114,17s | 133,75s | 121,21s | 138,75s | 127,59s | 127,09s | 3,7MB |
oid – load | 45,53s | 47,45s | 45,58s | 46,97s | 46,6s | 46,43s | 7,7MB |
Conclusion
From my tests, I most say I am suprised, I was expecting that bytea would be a little slower then oid, since it’s data had to be escaped before inserting it to the database, but I was not expecting it to be 3-4 times slower then oid.
Another thing I realized was that bytea creates a huge overhead while processing the data, 41,21MB vs 3,7MB.
From my point of view, oid is at all time the best choice if data storage in a postgresql database is a requirement, bytea is just too slow and memory hungry.
Notes
Bytea might have an advantage over oid, bytea is, from what I know, easier to maintaine and handle, since it’s data is stored inside of the table, instead of in separate place in the database. But then again, if you lose your table with oid’s, your data would still be present, this is not the case with bytea.
very helpful post, I’m facing the same problem , Thanks!
Does delete from a row from your table images generate a delete cascade to the pg_largeobject table associated to the deleted oid?
Thanks
This is most likely one #of the# best article that ever cross my reference. I do not see why anybody should disagree. It may be too easy #for them# to comprehend…anyway nice work i’m coming again right here for More Great Stuff!!
Interesting, really useful post for me