In an earlier debate here, I had talked about the fact the although there is a very strong case in favor of externalizing the BLOBs out of SQL Server to inexpensive tiers of storage for most of the SharePoint installations, there are certain BLOBs and architectural situations still which would require you to act otherwise. “Which BLOBs to externalize?” generally gets a traditional response, that prevails among the practitioners. That is, the databases are better suited for small objects while filesystems are suitable for large objects. SQL Server is particularly optimized for 8KB or less structured data, but SharePoint usually carries documents larger in size. For small, structured data, database queries are faster than opening files and opening large files on filesystem is faster than accessing large BLOBs in database.
This approach may work for many typical SharePoint installations, but not many quantitative benchmarks are available to establish it across all types of SharePoint architecture, SharePoint storage and SharePoint content. Moreover, boundary between “small” or “large” is pretty fuzzy as well. Not all SharePoint content being a good candidate for externalization, you should consider a number of parameters, before deciding “which” of your BLOBs may be externalized. So, naturally a very important parameter is the BLOB size.
A research report, a little old but still relevant, from Microsoft Research available here summarizes the correlation between BLOB size and its appropriate storage location in the form of following benchmarks:
• BLOBs less than 256KB are more efficiently stored in the database
• Files larger than 1MB are more efficiently stored in the file system
• Content between 256KB and 1MB could be stored in either, but it is better to leave it on the filesystem.
So, generalizing, firstly if you have very few large (> 256KB) documents, you might want to retain them on SQL Server provided that the total size doesn’t exceed 200GB. However, in a typical SharePoint environment, this is hardly the case. Secondly, If more than half of your BLOBs are large (>256KB), then externalize all of them. The third situation is more likely to occur where you have a variety of sizes in SharePoint repositories. In an ideal case, you must have a way to filer content based on both individual file sizes and the need to keep the overall data file size below 200GB.
In a medium to large scale SharePoint environment, there can be millions of documents with varying sizes and accumulating in TBs. In such a case, you should consistently and reliably be able to separate the documents which you want to externalize from the ones which you want to retain inside the SQL Server in an configure-once, use-repeatedly setting. In fact, this is only possible if you use a solution tightly integrated with SharePoint and is more than a standalone EBS or RBS provider such as StorageEdge. StorageEdge allows you to filter content based on a number of criteria when you want to externalize it. It has size-based, filename-based and author-based filtering options to externalize the contents based on the benchmarks provided earlier in this discussion.
Figure 1: Filters in StorageEdge for Content Externalization
In addition to the above-mentioned size based benchmarks, which should work well for most of the SharePoint systems, you must also consider that within these ranges, how much of the BLOB payload is write-intensive, and the BLOB retention age.