Building a Scalable Follower Feed with Firestore

Building a Scalable Follower Feed with Firestore

Autopublished from RSS Original article
15 min read

I have written several articles over the years about this subject, changed my thought process, saw other people's ideas, and changed my ideas again. Here I will cover everything you need to know in one article.

What is a Follower Feed?

A follower feed is something trivial in noSQL. Show a list of the latest posts by User's you follow. While a regular feed can show the latest posts by all users, a follower feed will show you only the posts by the users you follow.

Schema / Data Model

Whether you're using a GraphDB, noSQL, or SQL, the data model is still generally the same. You're going to have posts, users, and follows.

Posts id title authorId ... Users id username ... Followers follower_id following_id

This translates to noSQL like so:

posts/{postId} id title authorId createdAt ... users/{userId} username ... followers/{followerId} following_id ...

Although, as you will see, this will not work from a querying perspective. You could also easily use subcollections instead of root collections for any of the collections, but the result is the same.

Queries

Here are the queries that show what we're tying to achieve:

SQL 1

SELECT * FROM Posts p WHERE authorId IN (SELECT following FROM Followers WHERE follower = $UID) ORDER BY createdAt DESC

SQL 2

SELECT * FROM Posts p JOIN Followers f ON f.following = p.authorId WHERE f.follower = $UID ORDER BY createdAt DESC

GraphQL

query { queryPost( where: { authorId: { follower: { id: $UID } } }, order: { desc: createdAt } ) { id title createdAt ... } }

So we really just end up with a many to many like so:

Posts <- Followers -> Users

If we want to translate that to Firestore noSQL, we get something like this:

const followersRef = db.collection('followers') .where('follower', '==', $UID); const following = (await followersRef.data()).following. db.collection('posts').where('userId', 'IN', following);

But we are limited to following 10 people, and we are really doing two queries on the frontend, instead of a backend join.

Tag Feed

We also run into the problem of other desired feeds, like following tags. We may want one feed with posts about the latest tags we follow, or a feed with posts from the latest users we follow, or BOTH. Then you get into weighted queries. If a post has a tag and a user we follow, should it be more important? As we have seen with other social media, we may want to artificially promote, or demote certain types of posts, or we may want to use AI to create more addictive feeds. These advanced types are beyond the scope of this post, and not really fit for aggregations etc.

Imperfect Versions

So, let's see what we can do if we are not worried about scaling to millions of users.

Version 1 - Frontend Nightmare

Do all query combining, indexing, etc on the frontend. This will cost you a lot and be slow. No thanks.

A sister version is to have a following array in users/{userID}. You can then grab just one document with all the users you have to follow, then grab their posts on the frontend. Better, but still over-reading.

Version 2 - Build Your Own Feed

This is one of my ideas. Basically, when each user logs in, they update their feed on the spot. They will save the last updated date, and populate their feed in the background. This makes sense to me in certain circumstances, but is still not the best.

Version 3 - Fireship's Method

This method was quite complex, but is definitely worth understanding.

Basically he has this data model:

followers/{followerID} recentPosts: [ ...5 recent posts here ], users: [ user_following_ids ] posts/{postId} ...post content here users/{userId} ...user content here

With this query:

const followedUsers = await db.collection('followers') .where('users', 'array-contains', followerID) .orderBy('lastPost', 'desc') .limit(10) .get();

You create a posts aggregation Firestore Trigger Function, to aggregate the latest 5 posts in recentPosts.

This works great, but then you have a limit on the possible followers you can have, due to using an array, and a limit on the number of posts you have on the frontend. You still need to sort all the latest posts. This is a great idea, but a hack non-the-less.

It is interesting to note here that he believes mass-duplication is unscalable if you do the math, due to the cost of mass duplication for someone with millions of users. He is right and wrong here.

Version 4 - Albert's Version

This is the best hacked version I found from stackoverflow. It basically says store the posts like so:

users/{userId} recentPosts: [ ...1000 recent posts ], recentPostsLastUpdatedAt: Date posts/{postId} ...post content here following/{followerId} following: [ ...users following ]

You aggregate up to 1000 documents on the user document in this version. Then it tells you to get all follower IDs in batches of 10:

query(usersRef, where(‘userID’, 'in', [FOLLOWEE_ID_1, FOLLOWEE_ID_2, …]), where("recentPostsLastUpdatedAt", ">", LAST_QUERIED_AT) )

Once you get all users, you have all user posts, which is potentially 1 million posts for the price of 1000 reads. I like the thinking here, but still not for me. Again, too much frontend sorting, and over-complicated when you're just starting out.

Version 5 - My Crazy Aggregation Version

So, I came up with a theoretical idea for a scalable version. It uses arrays to save money, but ultimately made no sense. Imagine using a 3 step aggregation to ultimately get a feed collection like this:

feed/{postID} createdAt followers: [ ....first 1000 followers ]

This gave you a neat query like so:

db.collection('feed') .where('followers', 'array-contains', userId) .orderBy('createdAt', 'desc');

But ultimately, it was too unrealistic and unreliable. While arrays save money, they are limited and require more splitting.

Version 6 - Mass Aggregation

The biggest problem with mass aggregation is the limits of Firebase Functions; it could time out. However, it can be solved.

Imagine creating an onWrite function for the posts collection. This could trigger a callable function, say populateFollowerFeed(). This function could look like this:

populateFollowerFeed({ data: change.after.data(), startId: '0x12slsl2sls`, num: 20 });

Yes, you can call a function inside a function. This function would go through a follower collection (either subcollection, or a query within a root collection) to get all the user ids from followed users. It could add the created / updated posts to each user's feed collection.

This function would call itself again with the next startID, until there are no more follower ids. This prevents function timeouts. You should probably have it delete aggregated posts as well.

The beauty of this, is that you could have another callable function for populateTagsFeed(). This could be important if you want to mix and match your posts by followed tags as well.

Yes, this gets expensive for writes. However, it is simple, idempotent, and cheap for smaller to middle size databases. I disagree that this is unfeasible, as Firestore is specifically built for reads, not writes. All noSQLs are made to think this way. If you have 1,000,000 users, the costs should be minor compared to your real needs.

The Firebase Way

Mass Aggregation may be the Firebase way. However, I suspect, considering they recommend Algolia for searching, that the Firebase team would recommend using an external database for your feeds.

However, keep in mind Algolia and other noSQL databases made for searching, cannot do the joins required for a simple follower feed.

My recommendation would be to use RedisGraph. I am a huge fan of it. You would still need a posts Trigger to keep it up to date. You can find several cloud hosted versions. It is also scalable, although potentially expensive. However, the speed is probably worth it for you. Another option may be to use Big Query with a Firebase Trigger.

Outside of these options, you may want to think about another database. However, unless you have millions of users, Firestore should work just fine for your use case with basic aggregation techniques.

J