Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Using randomly generated indexes and simple queries, you can randomly select
documents from a collection or collection group in Cloud Firestore.
This answer is broken into 4 sections with different options in each section:
Auto-Id version
If you are using the randomly generated automatic ids provided in our client
libraries, you can use this same system to randomly select a document. In this
case, the randomly ordered index is the document id.
Later in our query section, the random value you generate is a new auto-id (iOS,
Android, Web) and the field you query is the __name__ field, and the 'low value'
mentioned later is an empty string. This is by far the easiest method to generate
the random index and works regardless of the language and platform.
By default, the document name (__name__) is only indexed ascending, and you also
cannot rename an existing document short of deleting and recreating. If you need
either of these, you can still use this method and just store an auto-id as an
actual field called random rather than overloading the document name for this
purpose.
You should consider which languages you need as there will be different
considerations. While Swift is easy, JavaScript notably can have a gotcha:
32-bit integer: Great for small (~10K unlikely to have a collision) datasets
64-bit integer: Large datasets (note: JavaScript doesn't natively support, yet)
This will create an index with your documents randomly sorted. Later in our query
section, the random value you generate will be another one of these values, and the
'low value' mentioned later will be -1.
For all these options, you'll want to generate a new random value in the same form
as the indexed values you created when writing the document, denoted by the
variable random below. We'll use this value to find a random spot on the index.
Wrap-around
Now that you have a random value, you can query for a single document:
Bi-directional
The wrap-around method is simple to implement and allows you to optimize storage
with only an ascending index enabled. One downside is the possibility of values
being unfairly shielded. E.g if the first 3 documents (A,B,C) out of 10K have
random index values of A:409496, B:436496, C:818992, then A and C have just less
than 1/10K chance of being selected, whereas B is effectively shielded by the
proximity of A and only roughly a 1/160K chance.
Rather than querying in a single direction and wrapping around if a value is not
found, you can instead randomly select between >= and <=, which reduces the
probability of unfairly shielded values by half, at the cost of double the index
storage.
This method will give you random sequences of documents without worrying about
seeing the same patterns repeatedly.
The trade-off is it will be slower than the next method since it requires a
separate round trip to the service for each document.
Keep it coming
In this approach, simply increase the number in the limit to the desired documents.
It's a little more complex as you might return 0..limit documents in the call.
You'll then need to get the missing documents in the same manner, but with the
limit reduced to only the difference. If you know there are more documents in total
than the number you are asking for, you can optimize by ignoring the edge case of
never getting back enough documents on the second call (but not the first).
The trade-off with this solution is in repeated sequences. While the documents are
randomly ordered, if you ever end up overlapping ranges you'll see the same pattern
you saw before. There are ways to mitigate this concern discussed in the next
section on reseeding.
This approach is faster than 'Rinse & Repeat' as you'll be requesting all the
documents in the best case a single call or worst case 2 calls.
Note that inserted documents will end up weaved in-between, gradually changing the
probabilities, as will deleting documents. If the insert/delete rate is too small
given the number of documents, there are a few strategies addressing this.
Multi-Random
Rather than worrying out reseeding, you can always create multiple random indexes
per document, then randomly select one of those indexes each time. For example,
have the field random be a map with subfields 1 to 3:
Reseed on writes
Any time you update a document, re-generate the random value(s) of the random
field. This will move the document around in the random index.
Reseed on reads
If the random values generated are not uniformly distributed (they're random, so
this is expected), then the same document might be picked a dispropriate amount of
the time. This is easily counteracted by updating the randomly selected document
with new random values after it is read.
Since writes are more expensive and can hotspot, you can elect to only update on
read a subset of the time (e.g, if random(0,100) === 0) update;).