What's the best manner of implementing a social activity stream?

Ruby on-RailsAndroid ActivitySocial Networking

Ruby on-Rails Problem Overview


I'm interested in hearing your opinions in which is the best way of implementing a social activity stream (Facebook is the most famous example). Problems/challenges involved are:

  • Different types of activities (posting, commenting ..)
  • Different types of objects (post, comment, photo ..)
  • 1-n users involved in different roles ("User x replied to User y's comment on User's Z post")
  • Different views of the same activity item ("you commented .." vs. "your friend x commented" vs. "user x commented .." => 3 representations of a "comment" activity)

.. and some more, especially if you take it to a high level of sophistication, as Facebook does, for example, combining several activity items into one ("users x, y and z commented on that photo"

Any thoughts or pointers on patterns, papers, etc on the most flexible, efficient and powerful approaches to implementing such a system, data model, etc. would be appreciated.

Although most of the issues are platform-agnostic, chances are I end up implementing such a system on Ruby on Rails

Ruby on-Rails Solutions


Solution 1 - Ruby on-Rails

I have created such system and I took this approach:

Database table with the following columns: id, userId, type, data, time.

  • userId is the user who generated the activity
  • type is the type of the activity (i.e. Wrote blog post, added photo, commented on user's photo)
  • data is a serialized object with meta-data for the activity where you can put in whatever you want

This limits the searches/lookups, you can do in the feeds, to users, time and activity types, but in a facebook-type activity feed, this isn't really limiting. And with correct indices on the table the lookups are fast.

With this design you would have to decide what metadata each type of event should require. For example a feed activity for a new photo could look something like this:

{id:1, userId:1, type:PHOTO, time:2008-10-15 12:00:00, data:{photoId:2089, photoName:A trip to the beach}}

You can see that, although the name of the photo most certainly is stored in some other table containing the photos, and I could retrieve the name from there, I will duplicate the name in the metadata field, because you don't want to do any joins on other database tables if you want speed. And in order to display, say 200, different events from 50 different users, you need speed.

Then I have classes that extends a basic FeedActivity class for rendering the different types of activity entries. Grouping of events would be built in the rendering code as well, to keep away complexity from the database.

Solution 2 - Ruby on-Rails

This is a very good presentation outlining how Etsy.com architected their activity streams. It's the best example I've found on the topic, though it's not rails specific.

http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture

Solution 3 - Ruby on-Rails

We've open sourced our approach: https://github.com/tschellenbach/Stream-Framework It's currently the largest open source library aimed at solving this problem.

The same team which built Stream Framework also offers a hosted API, which handles the complexity for you. Have a look at getstream.io There are clients available for Node, Python, Rails and PHP.

In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html

This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.

To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:

Though Stream Framework is Python based it wouldn't be too hard to use from a Ruby app. You could simply run it as a service and stick a small http API in front of it. We are considering adding an API to access Feedly from other languages. At the moment you'll have to role your own though.

Solution 4 - Ruby on-Rails

The biggest issues with event streams are visibility and performance; you need to restrict the events displayed to be only the interesting ones for that particular user, and you need to keep the amount of time it takes to sort through and identify those events manageable. I've built a smallish social network; I found that at small scales, keeping an "events" table in a database works, but that it gets to be a performance problem under moderate load.

With a larger stream of messages and users, it's probably best to go with a messaging system, where events are sent as messages to individual profiles. This means that you can't easily subscribe to people's event streams and see previous events very easily, but you are simply rendering a small group of messages when you need to render the stream for a particular user.

I believe this was Twitter's original design flaw- I remember reading that they were hitting the database to pull in and filter their events. This had everything to do with architecture and nothing to do with Rails, which (unfortunately) gave birth to the "ruby doesn't scale" meme. I recently saw a presentation where the developer used Amazon's http://aws.amazon.com/sqs/">Simple Queue Service as their messaging backend for a twitter-like application that would have far higher scaling capabilities- it may be worth looking into SQS as part of your system, if your loads are high enough.

Solution 5 - Ruby on-Rails

If you are willing to use a separate software I suggest the Graphity server which exactly solves the problem for activity streams (building on top of neo4j graph data base).

The algorithms have been implemented as a standalone REST server so that you can host your own server to deliver activity streams: http://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/

In the paper and benchmark I showed that retrieving news streams depends only linear on the amount of items you want to retrieve without any redundancy you would get from denormalizing the data:

http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/

On the above link you find screencasts and a benchmark of this approach (showing that graphity is able to retrieve more than 10k streams per second).

Solution 6 - Ruby on-Rails

// one entry per actual event
events {
id, timestamp, type, data
}

// one entry per event, per feed containing that event events_feeds { event_id, feed_id }

When the event is created, decide which feeds it appears in and add those to events_feeds. To get a feed, select from events_feeds, join in events, order by timestamp. Filtering and aggregation can then be done on the results of that query. With this model, you can change the event properties after creation with no extra work.

Solution 7 - Ruby on-Rails

I started to implement a system like this yesterday, here's where I've got to...

I created a StreamEvent class with the properties Id, ActorId, TypeId, Date, ObjectId and a hashtable of additional Details key/value pairs. This is represented in the database by a StreamEvent table (Id, ActorId, TypeId, Date, ObjectId) and a StreamEventDetails table (StreamEventId, DetailKey, DetailValue).

The ActorId, TypeId and ObjectId allow for a Subject-Verb-Object event to be captured (and later queried). Each action may result in several StreamEvent instances being created.

I've then created a sub-class for of StreamEvent each type of event, e.g. LoginEvent, PictureCommentEvent. Each of these subclasses has more context specific properties such as PictureId, ThumbNail, CommenText, etc (whatever is required for the event) which are actually stored as key/value pairs in the hashtable/StreamEventDetail table.

When pulling these events back from the database I use a factory method (based on the TypeId) to create the correct StreamEvent class.

Each subclass of StreamEvent has a Render(context As StreamContext) method which outputs the event to screen based on the passed StreamContext class. The StreamContext class allows options to be set based on the context of the view. If you look at Facebook for example your news feed on the homepage lists the fullnames (and links to their profile) of everyone involved in each action, whereas looking a friend's feed you only see their first name (but the full names of other actors).

I haven't implemented a aggregate feed (Facebook home) yet but I imagine I'll create a AggregateFeed table which has the fields UserId, StreamEventId which is populated based on some kind of 'Hmmm, you might find this interesting' algorithm.

Any comments would be massively appreciated.

Solution 8 - Ruby on-Rails

If you do decide that you're going to implement in Rails, perhaps you will find the following plugin useful:

ActivityStreams: http://github.com/face/activity_streams/tree/master

If nothing else, you'll get to look at an implementation, both in terms of the data model, as well as the API provided for pushing and pulling activities.

Solution 9 - Ruby on-Rails

I had a similar approach to that of heyman - a denormalized table containing all of the data that would be displayed in a given activity stream. It works fine for a small site with limited activity.

As mentioned above, it is likely to face scalability issues as the site grows. Personally, I am not worried about the scaling issues right now. I'll worry about that at a later time.

Facebook has obviously done a great job of scaling so I would recommend that you read their engineering blog, as it has a ton of great content -> http://www.facebook.com/notes.php?id=9445547199

I have been looking into better solutions than the denormalized table I mentioned above. Another way I have found of accomplishing this is to condense all the content that would be in a given activity stream into a single row. It could be stored in XML, JSON, or some serialized format that could be read by your application. The update process would be simple too. Upon activity, place the new activity into a queue (perhaps using Amazon SQS or something else) and then continually poll the queue for the next item. Grab that item, parse it, and place its contents in the appropriate feed object stored in the database.

The good thing about this method is that you only need to read a single database table whenever that particular feed is requested, rather than grabbing a series of tables. Also, it allows you to maintain a finite list of activities as you may pop off the oldest activity item whenever you update the list.

Hope this helps! :)

Solution 10 - Ruby on-Rails

There are two railscasts about such an activity stream:

Those solutions dont include all your requirements, but it should give you some ideas.

Solution 11 - Ruby on-Rails

I think Plurk's approach is interesting: they supply your entire timeline in a format that looks a lot like Google Finance's stock charts.

It may be worth looking at Ning to see how a social networking network works. The developer pages look especially helpful.

Solution 12 - Ruby on-Rails

I solved this a few months ago, but I think my implementation is too basic.
I created the following models:

HISTORY_TYPE

ID           - The id of the history type
NAME         - The name (type of the history)
DESCRIPTION  - A description

HISTORY_MESSAGES

ID
HISTORY_TYPE - A message of history belongs to a history type
MESSAGE      - The message to print, I put variables to be replaced by the actual values

HISTORY_ACTIVITY

ID
MESSAGE_ID    - The message ID to use
VALUES        - The data to use

Example

MESSAGE_ID_1 => "User %{user} created a new entry"
ACTIVITY_ID_1 => MESSAGE_ID = 1, VALUES = {user: "Rodrigo"}

Solution 13 - Ruby on-Rails

After implementing activity streams to enable social feeds, microblogging, and collaboration features in several applications, I realized that the base functionality is quite common and could be turned into an external service that you utilize via an API. If you are building the stream into a production application and do not have unique or deeply complex needs, utilizing a proven service may be the best way to go. I would definitely recommend this for production applications over rolling your own simple solution on top of a relational database.

My company Collabinate (http://www.collabinate.com) grew out of this realization, and we have implemented a scalable, high performance activity stream engine on top of a graph database to achieve it. We actually utilized a variant of the Graphity algorithm (adapted from the early work of @RenePickhardt who also provided an answer here) to build the engine.

If you want to host the engine yourself or require specialized functionality, the core code is actually open source for non-commercial purposes, so you're welcome to take a look.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmortView Question on Stackoverflow
Solution 1 - Ruby on-RailsheymanView Answer on Stackoverflow
Solution 2 - Ruby on-RailsMark KennedyView Answer on Stackoverflow
Solution 3 - Ruby on-RailsThierryView Answer on Stackoverflow
Solution 4 - Ruby on-RailsTim HowlandView Answer on Stackoverflow
Solution 5 - Ruby on-RailsRene PickhardtView Answer on Stackoverflow
Solution 6 - Ruby on-RailsjedediahView Answer on Stackoverflow
Solution 7 - Ruby on-RailsjammusView Answer on Stackoverflow
Solution 8 - Ruby on-RailsAldereteView Answer on Stackoverflow
Solution 9 - Ruby on-RailsView Answer on Stackoverflow
Solution 10 - Ruby on-RailsBenjamin CrouzierView Answer on Stackoverflow
Solution 11 - Ruby on-RailswarrenView Answer on Stackoverflow
Solution 12 - Ruby on-RailsRodrigoView Answer on Stackoverflow
Solution 13 - Ruby on-RailsMafubaView Answer on Stackoverflow