When analyzing sets of data, getting “instant” insight will often be critical for business decision. Which is why one may want to allocate all his/her resources on a set of tasks delivering analysis for stakeholders. Here is an example of a Twitter feed analysis using R (statistical oriented language) and ProActive. It gathers messages, then performs distributed analysis on distinct computing nodes.
How it works
In selected cities, this workflow will search Twitter feeds containing a chosen set of keywords and return a chosen number of tweets. (Due to restrictions from Twitter, the number of towns cannot be greater than 15: “Rate limits are divided into 15 minute intervals [...] 15 calls every 15 minutes.”, https://dev.twitter.com/rest/public/rate-limiting)
ProActive first creates as many tasks as the number of cities then run them in parallel using ProActive nodes. In those R-based tasks, each node will:
- connect to Twitter,
- perform the desired search,
- analyze the information,
- store the result in the ProActive dataspace (a shared storage environment adapted to distributed systems).