Solr Collection vs Cores

SolrLucene

Solr Problem Overview


I struggle with understanding the difference between collections and cores. If I understand it correctly, cores are multiple indexes. Collection consists of cores, so essentially they share the same logic in separation, i.e. separate cores and collections have separate end-points.

I have the following scenario. I create a backend for cloud service for several online shops. Each shop has a set of products, to which customers can add reviews. I want to index static data (product information) separately from dynamic information(reviews) so I can improve performance.

How can I best separate in Solr???

Solr Solutions


Solution 1 - Solr

From the SolrCloud Documentation

> Collection: A single search index. > > Shard: A logical section of a single collection (also called > Slice). Sometimes people will talk about "Shard" in a physical sense > (a manifestation of a logical shard) > > Replica: A physical manifestation of a logical Shard, implemented > as a single Lucene index on a SolrCore > > Leader: One Replica of every Shard will be designated as a Leader to > coordinate indexing for that Shard > > SolrCore: Encapsulates a single physical index. One or more make up > logical shards (or slices) which make up a collection. > > Node: A single instance of Solr. A single Solr instance can have > multiple SolrCores that can be part of any number of collections. > > Cluster: All of the nodes you are using to host SolrCores.

So basically a Collection (Logical group) has multiple cores (physical indexes).

Also, check the discussion

Solution 2 - Solr

Core

In Solr, a core is composed of a set of configuration files, Lucene index files, and Solr’s transaction log.

a Solr core is a uniquely named, managed, and configured index running in a Solr server; a Solr server can host one or more cores. A core is typically used to separate documents that have different schemas

collection

Solr also uses the term collection, which only has meaning in the context of a Solr cluster in which a single index is distributed across multiple servers.

SolrCloud introduces the concept of a collection, which extends the concept of a uniquely named, managed, and configured index to one that is split into shards and distributed across multiple servers.

Solution 3 - Solr

As per my understanding:

In distributed search,

Collection is a logical index spread across multiple servers. Core is that part of server which runs one collection.

In non-distributed search,

Single server running the Solr can have multiple collections and each of those collection is also a core. So collection and core are same if search is not distributed.

Summary

  1. Collection per server is called a core.
  2. Collection is same as an index.
  3. One Solr server can have many cores.
  4. Collection is a logical index (Example usage for multiple collections: Say two teams in same group are not big enough to justify a full Solr server of their own. But they also do not want to mix their data in a single index. They can then create separate collections/indexes which will keep their data separate).
  5. Its better to use a separate Solr Cloud rather than create collections if the data for a collection is big enough (not sure, comments please?)

Solution 4 - Solr

Single instance

On a single instance, Solr has something called a SolrCore that is essentially a single index. If you want multiple indexes, you create multiple SolrCores.

Solr Cloud

With SolrCloud, a single index can span multiple Solr instances. This means that a single index can be made up of multiple SolrCore's on different machines. We call all of these SolrCores that make up one logical index a collection.

A collection is a essentially a single index that spans many SolrCore's, both for index scaling as well as redundancy. If you wanted to move your 2 SolrCore Solr setup to SolrCloud, you would have 2 collections, each made up of multiple individual SolrCores.

Solution 5 - Solr

From Solr Wiki:

> Collections are made up of one or more shards. Shards have one or > more replicas. Each replica is a core. A single collection represents > a single logical index.

Solution 6 - Solr

This explains the use of cores and collections.

Single instance

When dealing with a single solr instance you query to cores.

The admin UI of a single Solr instance has no collection selector:

Single Solr Instance

Solr Cloud

When dealing with Solr Cloud you query to collections. The collections are organized in different cores (replicas, shards) on different solr instances.

The admin UI of a Solr Cloud instance has a collection and core selector. But cores are technically instances, here:

Solr Cloud instance

Solution 7 - Solr

From the Solr docs:

> Usage: solr create [-c name] [-d confdir] [-n configName] [-shards #] > [-replicationFactor #] [-p port] [-V] > > Create a core or collection depending on whether Solr is running in > standalone (core) or SolrCloud mode (collection). In other words, > this action detects which mode Solr is running in, and then takes
> the appropriate action (either create_core or create_collection).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNeatNerdView Question on Stackoverflow
Solution 1 - SolrJayendraView Answer on Stackoverflow
Solution 2 - SolrNanhe KumarView Answer on Stackoverflow
Solution 3 - Solruser2250246View Answer on Stackoverflow
Solution 4 - SolrKaidulView Answer on Stackoverflow
Solution 5 - SolrhappsView Answer on Stackoverflow
Solution 6 - SolrMatthias MView Answer on Stackoverflow
Solution 7 - SolrSambit TripathyView Answer on Stackoverflow