Solution works perfect in most cases but recently I've found it too slow. Directory can be stored in file system or in memory. While search indexes are cached in memory document contents are loaded each time from directory. I haven't checked Java implementation but .NET seems to parse file very slow. File system version requires additionally lots of disc access, which is probably well cached by operating system but not efficent for large indexes (in my case file size is over 500MB). Memory directory implementation tends to leak which is unacceptable.
Solution below demonstrates how to enable document caching in Lucene.
Cache needs to be added to FieldsReader class. Code below shows only changes that needs to be made:
- Cache directory needs to be added.
- Cache needs to be clear at the end of Close() method.
- Document is returned from cache if already loaded at the beginning of Doc() method.
- Document is added to cache after read in Doc() method.
Further optimization can be found here.
public sealed class FieldsReader { private SortedList<int,Document> cache=new SortedList<int, Document>(); //--------- public void Close() { //--------- cache.Clear(); } //--------- public Document Doc(int n) { if (cache.ContainsKey(n)) { return cache[n]; } //--------- if (!cache.ContainsKey(n)) { cache.Add(n, doc); } return doc; } //---------
No comments:
Post a Comment