A wide variety of approaches or ways of seeing social media data have been undertaken, the question is whether they are technically adequate or a simple depiction of what you get.
The practice of calling APIs is becoming a routine for researchers; it is not rocket science. For instance, in the past few years the list of open source research tools (e.g. software extraction and visualization) has increased, and become simpler. It has never been so easy to collect, visualise and think data. Here it goes a list of Digital Research Tools (for Data-Driven, Social Media, and Internet-related Research); social media data collection tools; data extraction for Facebook and Twitter; DMI Tools; Médialab Tools. Nevertheless, operational simplicity should not upstage a technical appreciation. This is to say that both social media APIs and digital methods have particular functionalities, which may facilitate but also compromise research. To illustrate this, I am going to present key points that caught my attention in the process of data collection, visualization and analysis of Brazil’s protests in March 2017. Instagram was the chosen platform with data extraction based on hashtags.
I am not interested in going through the basic questions one should make before the data collection process . I will therefore focus on some technical concerns enabled by close observing the data collection processes, and complemented with a preliminary visual analysis (co-tag network and most active users). Generally, Instagram API provide researchers with access to datasets that go back years in time, and this is something very important to keep in mind. Before we get onto that, just a quick note about protests in Brazil.
Brazilians’ dissatisfaction has been reflected on massive protests since the June Journeys in 2013. However, the scenario changes according to how these demonstrations came into existence, who was ruling them, and whether they were spontaneous movements or fabricated. The protests along March made history, for instance, in 2015 the anti-Dilma Rousseff and the anti-government demonstrations , in 2016 the rise of political polarization, and in 2017 protests anti-Michel Temer and against the reform of pension system and corruption. All of them had nominated leaders and sponsors.
Data Collection Process: March 2017 in Brazil
March 2017 had two different moments: one led by Labour Party and left wing supporters on March 15, and the other on March 26 – led by the same groups who were ahead of 2015 protests; Movimento Brasil Livre (MBL or Free Brazil Movement) and Vem Pra Rua (Come to the street). In both events, different hashtags were adopted before, during and after demonstrations. Below you can see the specific hashtags used to collect public content from Instagram API, e.g. media items info, most active users and co-tag network .
Considering the protests in Brazil took place at different times and locations, the strategy for March 15 was to call Instagram API in four moments: in the late morning, afternoon, and evening of March 15, and the morning of March 16. As can you can see below, more media items were collected in the morning after the protests, with the exception of media items related to #foratemer. As a result of that, for March 26 protests, data extraction process was conducted in the following day.
At first sight, the number of the most significant hashtags of March 26, namely #vemprarua (come to the streets) or #lulanacadeia (Lula in jail), overcomes the tags adopted for March 15, e.g. #foratemer (get out Temer), #diretasjá (direct elections right now) or #grevegeral (general strike). However, since the data samples provided by Instagram API go back years (or months) in time, I checked the total number of media items collected exclusively along the day of the protests, and the relative amount from March 1st until the following day of each protest (see below). The results are quite interesting; the representative hashtags of March 26 do not provide a significant media item list in the day of the event, but along the month. Meanwhile, #foratemer and #grevegeral have a substantial number of media items in the day of the protest.
What I mean to say here is that in the process of data collection one should not solely conduct the keywords extraction according to the software’s extraction first given summary (see first tables), but it is necessary to check out the files (see second table). A second point is about the co-tag network (see first tables). For instance, those tags that enabled the collection of more than 18000 media items (in orange) do not have approximate numbers of nodes and edges (see the tags: #foratemer, #vemprarua, #MBL, #bolsonaro2018, #lulanacadeia and #lavajato). The quantity of media items does not reflect the quantity of tags used in these same media items. A third point that should be considered is iterations; for instance, one iteration gets 20 items, so 10 iterations may get approximately 200 items. The software extraction adopted here permits a maximum of 1000 iterations. So, opting for less (or more) iterations, you get less (or more) media items. However, the number of iterations may dictate the quantity of media items you can get, but it does not reflect a key technical feature of Instagram API: the final sample always go back months or years in time.
In short, some variables to account in the hashtag-based data collection process on Instagram: the choice of words matter (good hashtags! check output files!), timing (be aware when to make API calls), iterations (no matter the chosen number of iterations, the final sample always go back years in time) and opening up the files not only to verify whether you get what you want (or need), but also to uncover new insights and analytical opportunities.
Data Visualization and Technical Principles
What follows is a presentation of Instagram Co-Tag Network: #foratemer #grevegeral #diretasjá; the main differences between the subsequent graphs are the period of data extraction (late in the MORNING, AFTERNOON and EVENING of March 15), the chosen attributes and metrics and, consequently, the results obtained with preliminary visual content analysis. The initial idea of collecting data in different moments attempted to visualize the evolution of co-mentioned tags along the day. However, at the end of this experiment, I understood basic technical principles for co-tag network analysis.
The first principle concerns the choice of iterations in data collection process – see graph 1. Iterations Matter , which is the result of 100 iterations. In this undirected network node size means “count”, node colours ranking count (red shows most mentioned tag), and the thickness of edges point to correlational tags. At first sight, a great number of people who use #diretasjá (direct elections right now) also mentioned #foratemer (get out Temer), then, at the top, we see a dense group of tags that represents the Labour Party, left wing militants or supporters, and hashtags adopted by social movements. For instance, #vemprademocracia (come to democracy), #eusouaresistência (i am resistance), #frentebrasilpopular (Brazil Popular Front), and #respeitemeuvoto (respect my vote). The tags at the bottom are composed by capital cities, Lula’s supporters, anti-Temer group and a critique of mainstream media, e.g. #vejagolpista (Veja scammer) and #globogolpista (Globo scammer). The group of tags on the left (not very dense) mainly represents coup (#golpe), scammer (#golpistas) and get out Temer, Renan (Calheiros) and Aécio (Neves).
For those who close follow-up Brazilian´s demonstrations, the correlations and connections in graph 1 do not bring new insights – that´s more of the same. Thus, data collection process was repeated in late afternoon and again late in the evening, but this time, querying 1000 iterations for both cases. The diagrams 2 and 3 (below) report technical principles related with the analytical process: Instagram API provides chronological networks; and, event-based hashtag calls for ego-network analysis, e.g. #grevegeral (general strike) or #queromeaposentar (i want to retire).
As previously mentioned, Instagram API returns chronological samples; media items are ordered according to when they were tagged. As a result the retrieved GDF file gathers years (or months) of a particular tag correlational network, limiting the visual analysis of particular events, for example in scrutinising March 15 through #generalstrike but, at the same time, broadening the network analysis through a historical perspective. To grasp a particular event, it is then advisable to opt for ego-network analysis (see graph 3 above) , and in doing so, filter recent (or new) correlational tags. Regarding the historical perspective (see graph 2 above), a rich analysis can be delivered, but it demands from researchers to follow up the matter.
Let me briefly present the chronological network of #foratemer, #diretasjá, and #grevegeral. First, here it goes how far back in time each hashtag returned:
|#foratemer (get out Temer): from 10 October 2015 to 15 March 2017|
|#diretasjá (direct elections right now): from 10 August 2012 to 15 March 2017|
|#grevegeral (general strike): from 24 November 2011 to 15 March 2017|
Second, in an overview of the hashtags that compose this chronological network; by October 2015, Brazilians have had already seen three massive protests (March 15, April 12, August 16) organized by Movimento Brasil Livre (MBL) and Vem Pra Rua together with supporters of the impeachment (or the coup) of Dilma Rousseff. According to the dataset, #foratemer started being used in October 2015; at that time Michel Temer was vice-president of Brazil. #grevegeral can be certainly linked to different strikes that have occurred in Brazil since 2011, including the paralyzation of March 15, 2017. #diretasjá is generally related with presidential elections or critical moments likewise the alarming political crisis and corruption scandals in Brazil.
Finally, a brief interpretation of the Chronological Network of #foratemer, #grevegeral and #diretasjá (see graph above). What first catches my attention is how right wing and left wing engage with #foratemer (see the two circles with dotted lines separated by #foratemer). At the top, three sub-clusters: the yellow one displays an opposition to Dilma Rousseff’s administration and the Labour Party (such complaints might have taken place between 2013 and 2015); the blue cluster brings claims of corruption, e.g. involving The Brazilian Football Confederation, e.g. #corrupcaocbf (corruption in CBF), and government measures concerning CPMF (financial transaction tax), and FGST (the Length-of-Service Guarantee Fund), e.g. #naoroubemeufgts (do not steal my FGTS). #panelaço (a form of protest orchestrated through social media in which people bang their pots at the same time, making a great deal of noise for a certain period) and #pixuleco also indicate that the blue cluster emerged in 2015 and it still remains active. The gray cluster depicts a strong opposition to the former president Lula and it unveils the supporters of Jair Bolsonaro; a Brazilian congressman (and ex-army captain) who intends to run for presidential elections in 2018. He is known for advocating in favor of far-right political views, Bolsonaro has also been compared with Donald Trump.
At the bottom, the big circle with dotted lines shows the left wing, especially Labour Party. In the orange cluster we see a strong opposition to ‘Temer scammer’ (#temergolpista), Jair Bolsonaro, Sérgio Moro who is a Federal Judge, and TV Globo Brazil´s largest broadcast network. In gray the Feminism movements and tags related to demonstrations against Temer during Carnival in Belo Horizonte . On the left, capitals and cities in which protests took place. Now, if you look at the orange component as a whole, which has #diretasjá as principal node, you can see Labour Party raises the same issues, being more consistent than the divided right wing.
It seems like “everyone hates Temer”; from the leftist to rightist party, from Feminist movement to carnival players, not mentioning the main capitals and cities in Brazil. In this first stage of analysis, it is possible to anticipate that in the race for being the next president of Brazil; on one side a cohesive left wing strongly supportive of the former president Lula for 2018 presidential elections; on the other side, a not cohesive right wing with the emergence of alt-right politics under the name of Jair Bolsonaro.
Further work is required here. In-depth analysis and qualitative interpretation of chronological networks must be aligned with the media items dataset, because they indicate the period of time that hashtags were mentioned. In so doing, one can better read the historical context and interpret the chronological network.
˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚ Instagram’s most active users by hashtag mention˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚˚
Moving forward, let me now draw your attention to key actors or detecting Instagram most active users by hashtag mention. The next visualisation allows researchers to evaluate changes in ranked or valued list over time (see the tool: RankFlow). In the present case, you can visualize the users who most engage with particular hashtags on Instagram under a chronological perspective (see chart below), and according to representative tags  adopted throughout protests in March 26, 2017.
Who were the most active actors? Right wing or alt-right agents or supporters, partisan organizations and movements disguised under non-official, comical or fictional character profiles – these profiles often embrace memes as a form of political expression/opinion and alt-right’s visual language. The highlight here is the significative number of #lulanacadeia mentions, particularly direitadaopressão (right of oppression) and chegadecorruptos (no more corrupts). According to the database, the tag #lulanacadeia is only activated in February 2016 (quite recent activity), whereas #vemprarua (come to the street) begins in December 2012; this latter is an emblematic hashtag for Brazilian demonstrations since the June Journeys in 2013.
Another aspect to consider: a strong connection between #lulanacadeia and #vemprarua, just as #lulanacadeia and #lavajato (Operation Car Wash). These links just validate the political profile and polarization of protests in Brazil; it seems that what counts is to take Labour Party power or to remove their potential candidates from the presidential race (in this case, Lula). It is also evident, on the other hand, the appearance of devotees of Jair Bolsonaro and fans of Sergio Moro (not only through the Instagram accounts’ name but also through the content displayed in the most active users timeline).
No doubt this visualization calls for in-depth content and visual analyses (further work to be done!). For now, I leave you with some questions to consider: for key actors analysis, would make any difference to undertake the time period of protests? Would the most active actors hold the most engaged posts on Instagram? What could be detected in crossing information between mentioned tags by specific actors and co-tag network analysis? Would key actors activity impact on co-tag network analysis?
The mosaic below brings together some of the most liked Instagram posts throughout March 26 (have fun!).
 That would be: 1. What digital objects are available? 2. What media content can be part of my analysis? 3. How far back in time data can be retrieved? 4. What are the standard output files? 5. What are the possible approaches for visualising and analysing data?
 In 2015, four big protests took place in Brazil: March 15, April 12, August 16, and December 13.
 The software extraction that I used (and will not name here because it is still in sandbox mode) creates co-tag networks around keywords or places. When calling Instagram API with tags as endpoints, the latter tool generates two tabular files (one containing a list of media with meta-information, and the other with information on the users related to those media), and a GDF format to analyse in Gephi (see more about tags as endpoints on Instagram: https://www.instagram.com/developer/endpoints/tags/). Apps in Sandbox Mode has lower limits than Live apps (check here Instagram API Rate Limits). Sandbox Apps rate limit: 500/hour.
 Undirected Graph. Visualization Algorithm: Force Atlas 2. Filter: count 60 – 1891. Nodes: 66 (1.14% visible), total: 5796. Edges: 1142 (1.47% visible), total: 77570. Nodes size: count. Colours: ranking “count” (blue » red) red shows most mentioned tags.
 Undirected Graph. Visualization Algorithm: Force Atlas 2. Ego network: #grevegeral. Filter: degree range 500 – 24355. Nodes: 135 (0.44% visible), total: 30580. Edges: 6201 (1.4% visible), total: 443712. Node size: degree. Colours: ranking.
 In 2017, the street carnival of Brazil was marked by carnival players and artists all over the country protesting against the president. here it goes how far back in time each hashtag returned:
 Here it goes how far back in time each hashtag returned:
|#VemPraRua (come to the street): from December 2012 to March 2017|
|#MBL (Free Brazil Movement): from August 2012 to March 2017|
|#LavaJato (Operation Car Wash): from October 2014 to March 2017|
|#FimdoForoPrivilegiado (no more privileged forum): from June 2016 to March 2017|
|#LulanaCadeia (Lula in jail): from February 2016 to March 2017|