Social Media Network Harvesting: Some considerations

ByJane Ginn

May 28, 2014 ,

By Martin D. Zimmermann & R. Jane Ginn

It will come as no surprise to power users that are reading this post that almost all Internet-connected people freely (and unwisely) share their personal and/or business information on one or more social media platforms. Some of the social media companies openly use business models that harvest user credentials for advertising purposes.  Though privacy policies are generally customizable, users often fail to read the policy before signing up for a site’s products or services.   Therefore, users are often subjected to unwanted third-party advertising and other uses of their personally identifiable information (PII).


Furthermore, many of these social media companies will change their privacy policies, and cause unwanted data leakage of customer PII. The use of photographs is particularly disturbing for some users, especially with newly emerging ‘face recognition’ technology currently being used by Google with their Picasa and Glass products, as noted by CBC News (2011).  There are cases where photos of individuals are scrapped from these sites for use on other sites without the users’ permission or without compensation. One can speculate that the mechanism is random and mechanically set. Users that care about the privacy of their PII must take it upon themselves to run periodic scans using multiple search engines on their own names to check for privacy violations.

For example: An unscientific test to verify this thesis was performed by using a series of regular controlled checks of Facebook.  The objective of the test was to find if a sample of the profile pictures searched were found to be publicly available on other sites.  When the owners of the monitored pictures were contacted, each one claimed that the last picture they uploaded was marked ‘private’ according to the then-applicable Facebook privacy settings.  None of those contacted were aware that Facebook’s privacy policies had changed to facilitate public sharing of their profile pictures.

The main point of this discussion is not to find flaws in social networking websites but rather it is to illustrate the disconnect between the actions of corporations and the expectations of their customers surrounding the use of PII and confidential data.  And, even if information sharing or data leakage is unintentional on the part of major corporations, security gaps are inevitable.

As demonstrated by the constant flow of press accounts, the information on these giant networks is often leaked through malicious threat actors, hacking in from the outside, or by unintentional disclosures by negligent employees, or fraud by malicious insiders.  As a result of these wide open data leakage scenarios, just within the open source intelligence (OSINT) arena, there is an opportunity to collect large quantities of PII data simply by knowing where to look and what tools to use.

This is clearly a positive development within the Internet ecosystem for spammers, cyber-criminals and espionage agents.  These threat actors harvest email addresses, identification numbers, user name/password combinations, credit card information, health information and other PII for nefarious purposes. This practice is what we are calling Social Network Information Harvesting (SNIH).  And, for the privacy reasons outlined above, it is becoming more widespread and troublesome as computing speed increases, storage capacity grows and automated harvesting tools become more widely available at lower costs to the threat actors.

However, SNIH is also a valuable asset for different types of actors as well: law enforcement officials tracking down criminals, along with government and military intelligence officers seeking to obviate terrorist activity. These actors also can make use of harvesting techniques.

SNIH is thus a double-edged sword when defined as purely gathering information about people which is freely or easily available on these social networks. SNIH can be applied in an operational form in a far more systematic and advanced manner, and in different types of scenarios such as for protection of critical infrastructure and government facilities. Nonetheless, under some conditions the repercussions of SNIH can be quite serious from a human rights perspective. This raises broader ethical issues on privacy and Internet governance as well. However, it is beyond the scope of this paper to speculate on these issue.  Rather, here, we would like to outline some of the technologies currently in use for SNIH.  What follows is a description of a SNIH Operations Use Case for law enforcement (LE) officials that would employ some of the state-of-the-art technologies and harvesting techniques currently in use by criminal threat actors.

SNIH Operations Use Case

Usually what OSINT SNIH implementers do is that they create a small game, poll or application for the users to engage and ‘play with’ or access on social media sites. Most of the applications ask for permissions like email address, location, marital status, friend-list contacts and/or other sensitive PII. It should be noted that in many cases, this is, in legal terms, a fully legitimate approach to collecting information that is voluntarily shared with the applications’ developer. This type of software has come to be known as “Spyware.”  It is the business model of participatory surveillance we have built the entire Internet upon; however, it is unfortunate for targeted legitimate users who do not voluntarily share their information.  Those who have not chosen to share PII, but whose private data are leaked due to security flaws in websites or applications, can be seen as collateral damage in this privacy infringing ecosystem.

But when OSINT SNIH is used by threat actors by implanting a malicious Trojan within a Spyware application, the outcome can be quite troublesome for the victim.  An infected victim can then become an unwitting accomplice in criminal activity through the use of his/her computer processing power by a ‘bot’ that is controlled remotely by a bot-master.  Under this scenario, the victim then is transformed into a perpetrator of criminal action while at the same time experiencing a significant loss of privacy.  He/she then becomes part of collective risk element that all other legitimate users have to defend against, simply by falling prey to the actions of malicious actors.

But to return to OSINT SNIH Use Case, information harvested specifically by LE can be utilized to find and take-down botnets that are further targeting innocent victims.  Anonymous OSINT SNIH mining by LE and network traffic analysis can deliver highly valuable threat actor data and lead to further clues for attribution, an important step in the take-down strategy. If properly executed, this can be used to reveal and predict attack trends and guide the design of other offensive countermeasures to be taken against threat actors. It can also help to reveal patterns that may, ultimately, lead LE to further clues about organizational actors that occupy different niches in the criminal ecosystem.

This type of analysis is still in its  its infancy apart from expensive commercial vendor solutions such as: Recorded Future, Palentir, Maltego, and Casefile. Fortunately for LE, there are also free open source alternatives such as Creepy, theharvester, Jigsaw, NetworkX Python.  Plus, some agencies with dedicated cybercrime units can use their own coded crawlers, throw away scrapers and miners, keyword trigger bots and IRC-bots.  Knowledge of how to use collector feeds for analyzing data generated from intrusion detection system/intrusion prevention system (IDS/IPS) signatures such as Yara, Snort, and Bro can enhance the native primary research the LE unit engages in.

Similarly, the use of honeypots, ADHD and Security Onion (including traditional and specific tools such as SANCP, NTP-NG, Netsniff-ng, TCPDump) can also be used to further substantiate the attribution objective.  Also, webpage/site update modification plug-ins, Downthemall type scrapers, RSS feed tools (Akregator, Feedly), IFTTT trigger tool site, can be used.  Some LE units can establish their own monitoring sensors and run VLAN sniffers run against various databases of malware or anomaly detection systems. This provides a small sample of some of the tools and strategies that LE can use to implement an OSINT SNIH system.

In closing, we must emphasize the best-in-class “Sec-Ops” nature of this Use Case. From an implementation point of view, these types of operations must be subject to strict confidentiality to avoid compromise by people with malicious intent.  Furthermore, confidentiality must be maintained to ensure LE officers are not targeted by threat actors implementing botnet and data harvesting activities.


CBC. (2011). Google’s foray into face recognition raises privacy concerns. CBC News: Science & Technology.

Herold, R. (2006). The definitive guide to: Security inside the perimeter. San Francisco, CA:

Translate »