DataArt is a large company with a lot of highly involved people, and apart from their work within projects, there are plenty of things they enthusiastically do in their free time. One of our colleagues from Voronezh office conducted a research with the purpose to identify the most influential members of a social network. Knowing the influencers is essential for executing marketing campaigns as this helps to identify channels to distribute information. He developed an application which analyzes social graph and calculates the influence associated with each profile. This research could also serve some data validation purposes as influential objects’ information should be verified on the first place.
The search for the most influential objects in a social web
In today’s world, communications between people has shifted from personal to social media. With the increasing popularity of social networks it has become a trend to have your own page with personal data, to search for friends by interest, create groups, etc. The amount of information stored by social networks is constantly growing, and most of this information is presented in raw format and therefore it is difficult to use it in marketing purposes until you process it and obtain meaningful results that can be put to good use.
This article describes in details how to find most influential profiles, how to use this information for digital marketing and how to identify users with a suspiciously high level of activity.
Let’s denote the social network as a graph with people indicated as junctions. If the objects are somehow related – they are friends or just communicating – there is a line between them.
So, let’s make an intuitive notion of influence:
The object in the middle has six connections, but the influence is not only about the number of connections, but about the influence of those connections as well.
The image above shows that the object has three connections, and those connections also have other connections, and altogether it gives the initial object some influence in the web. It is necessary to formalize the notion of influence to take into account the number of connections our target has as well as influence of those connections.
Information was gathered from a Russian social network (a Facebook analogue) Vk.com, the number of users is in excess of 220 million, and 47 million use the site every day.
To analyze the data, an application was created which gathered information about the target’s connections and their connections (that means, second and third order of friends).
The following picture represents the whole system: the target is black, first order of friends is red, second is green and third is yellow:
The API of the used social network does not allow sending more than three requests per second, so we had to leave the machine working overnight to gather the third order of connections.
The fact is that if we had download fourth order connections, then most likely we’d include 99% of all VK users. There is a wide-spread six degrees of separation theory, which states that everyone and everything is six or fewer steps away, by way of introduction, from any other person in the world. VK is used mainly in the CIS states, so this figure should be smaller. Besides, on social networks people frequently connect with people they don’t even know in real life, and that will decrease the figure as well.
So, we have two sets of data:
- About 40 thousand users for two orders of friends.
- 4.2 million users for three orders of friends.
Here, our target is ranked first in influence. This can be explained by how the data was loaded. Everyone from the last order of friends has only one connection. For these objects, the friend list is empty. So the influence of the first order friends (which is largely based on second order friends) has a low rating. Because of this, the results are not entirely objective. However, we can observe some interesting data: the first 30 rows contain half of local State University students involved in social and cultural activities. Practically, this method helps to obtain a list of the most active users, which may be very useful.
So let’s turn to the second set of data. Here, the target is ranked 3539.
The reason is clear: here, we are talking about influence within the target’s city.
This list contains a bunch of accounts dedicated to different cultural and social activities within the city, such as excursions, music events and so on.
As for the real people accounts, we can say that they belong mostly to well-known photographers and musicians of our city. However, the activities of some of the pages look suspicious: the influence too high considering small amount of information about the person so we don’t really know what purposes the person behind the page uses it – probably, this can be spam or aggressive SMM accounts.