There are three factors to take into account when determining system recommendations:
The system requirements for HQ are directly dependent on usage and configuration. An HQ instance contains a local Indexing Agent that can, on its own, perform indexing. If only the local Agent is used, requirements are the same as that of an Agent. If an HQ instance is not being used for indexing, system requirements can be relaxed.
Same as an Agent, but a minimum of 2 CPUs when using only remote Agents.
Same as Agent but 2-4 GB of RAM when using only remote Agents
Storage requirements for HQ are modest. Most of the data created by HQ is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.
In Vose, the Indexing Agents perform file indexing (although HQ also has the capacity to index data separately from Agents). The indexing process is highly CPU-dependent, so on machines running the Agent(s) the emphasis is primarily on CPU.
The Indexing Agent is designed to utilize all of the CPU capacity on a machine to maximize indexing throughput and performance, so in general, more CPUs are always better.
On a machine running an Indexing Agent, indexing data such as Microsoft Office files requires it to process large amounts of text and other data in memory.
Storage requirements for an Indexing Agent are modest. Most of the data created by the Agent is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.
For a Flex Index node, the emphasis is on memory and storage. In a flex deployment, the index is divided into subsets called Shards, which are distributed between multiple servers. While Flex Index storage requirements can be significant, they are not as extreme as for Voyager Server when it is running a local index.
CPU requirements are roughly the same as that of a Voyager Server running a local index.
Memory requirements are roughly the same as that of a Voyager Server running a local index.
Running out of disk space is a catastrophic event for a search index and can result in data loss. It’s important to estimate up front properly and routinely monitor disk usage.
One factor that drastically affects disk space requirements when the index contains a mix of data formats (Office, GIS, Imagery, etc) is whether or not the text field is stored by default. The important thing to note is that storing text can increase disk usage by orders of magnitude.
In a Vose installation, the meta folder (thumbnails, meta data) must also be considered when determining disk usage. While it’s recommended to estimate the size of the meta folder using the same methodology as the search index, in general the size of the meta folder is approximately around 16 MB per 1000 documents.
CPU requirements are similar to those of an Agent
Solr indexing benefits from access to more memory, so RAM requirements can be high. A typical configuration is to have 16 GB of total RAM available on the system with 6 GB assigned to the Voyager Server jvm process. Voyager Server uses the remaining memory to store the index files.
Disk requirements for a Voyager Server are the same of that as a Flex Index Node. See the previous section for details.
The approximate size of the meta folder (thumbnails, meta data) is same for both settings: 16 MB per 1000 documents
See Vose Software Requirements for a list of software Vose requires for some repository connectors, extractors, pipeline steps and processing tasks.