VCX

The DeepTech 50 Twitter Network

by Han de Jong

Introduction

The Deeptech_50 is a Crunchbase overview of 50 Venture Capital (VC) firms operating in the 'deep tech' space. It is essential for these firms to be visible to both potential investors and promising startups.

Goal

To identify and study the Twitter sub-network in which the VC firms from the Deeptech 50 list are embedded.

Source Material

The Deeptech 50 overview was pulled from Crunchbase. Out of the 50 VC firms, 45 provide Twitter handles on their Crunchbase profiles. These firms, referred to as the Deeptech 45, were included in the analysis. Twitter data was collected using the Tweepy API from December 20 to 28, 2020.

Methods

To identify the Twitter network in which the Deeptech 45 are embedded, we will first 'branch in' by identifying shared connections among the Deeptech 45 and then 'branch out' by mapping the network in which those shared connections are embedded. This method should delineate any Twitter community if the original seed population represents a distinct network.

The rationale is as follows: We are interested in a population 'A' (e.g., people interested in 'cats'), of which A1, A2, and A3 are members (e.g., three users with a cat in their profile picture). Note that in terms of connectivity, A1, A2, and A3 are not necessarily at the center of A. However, by mapping all Twitter accounts connected to A1, A2, and A3, we should be able to delineate the core of A (e.g., user profiles associated with famous cats or cat owners). After identifying the core of A, we then 'branch out' and map users connected to this core. This should provide a representative sample of the center of the network in which A1, A2, and A3 are embedded.

Results

To define the network in which the Deeptech 50 are embedded, we first mapped the connections between the Twitter accounts of the Deeptech 45. We observed that almost no bidirectional connections exist among the firms; most connections are unilateral, with slightly less than half of the VC firms following First Round and/or Kleiner Perkins (Figures 1 and 2).

Figure 1: Follows (source of the arrow is the follower) within the Deeptech 45.

Figure 2: The number of degrees for each node in the graph in Figure 1, where degrees represent Twitter connections.

A similar pattern emerges when examining Twitter interactions. Figure 3 shows information flow within the Deeptech 45, analyzing retweets, quote tweets, responses, and '@' mentions from the last 1,000 tweets on each user's profile. A16z and First Round are major sources of information, but Kleiner Perkins, despite its many followers, is not a major source of interactions.

Figure 3: Twitter interactions within Deeptech 45. The target of the arrow is the user being mentioned, and the arrow shade refers to the number of interactions. The size of the node refers to the number of times this user was mentioned.

Primary Network

Next we looked at the network in which the Deeptech 45 is embedded. In order to do this, we assumed that the Deeptech 45 represent a distinct subnetwork on Twitter and that this network can be delineated by tracking Twitter follows. Since the total number of follows and followers from the Deeptech 45 is approx. 675k, we only looked at bi-directional follows. Figure 4 shows connections from the top 100 most central Twitter accounts and the Deeptech 45. Most Deeptech 45 accounts are not central in this network. The five most central accounts are highlighted in Figures 5-9, and they all appear to operate in the tech VC sector.

Figure 4: Bi-directional connections of the most central 145 Twitter users. Node size refers to the number of degrees. The original Deeptech 45 are in red.

Figure 5: Aileen Lee

Figure 6: Danielle Morrill

Figure 7: Katie Roof

Figure 8: Dave McClure

Figure 9: Charles Hudson

Based on profile descriptions and the high number of bidirectional follows, it seems likely that the mapped network represents a distinct subgroup on Twitter. However, since the Deeptech 45 are somewhat peripheral, the network may not be exclusively defined by deep technology and venture capital. Random sampling showed many accounts belong to female tech influencers and leaders, suggesting other defining characteristics.

Secondary Network

Having identified the core of the Twitter network in which the Deeptech 45 are embedded we now want to study a more complete version of that network. In order to do that we applied the same method as before to the members of the primary network (i.e., map bilateral connections) and collected the 500 most central accounts. The final network shows a dense cluster with the Deeptech 45 still on the periphery (Figure 10 and 11).

Figure 10: The secondary network as in figure 4.

Figure 11: The 25 most central accounts (based on bi-directional follows) of the Twitter sub-network in which the Deeptech 45 are embedded on the periphery. Degrees on the y-axis represent bi-directional follows.

Mapping information flow in the secondary network

We mapped information flow using the last 1,000 tweets on profiles within the secondary network (Figure 12). Despite their peripheral position, some Deeptech 45 members are central in terms of interactions (Figures 13-16).

Figure 12: The flow of information within the secondary network. The target of the arrows is the source of the information, the arrow shade reflects the number of interactions. The node size refers to the number of times this user was mentioned. The Deeptech 45 are represented by red nodes, while the rest of the network is depicted by yellow nodes. Several main influences from outside of the network are highlighted in green.

Figure 13: The Deeptech 45 sorted by information centrality, a measure of how well their tweets propagated throughout the secondary network.

Other heavily interacted accounts are shown in Figure 14, using the same measure (information centrality).

Figure 14: The 50 (top 10%) most prominent sources of information in the secondary network. Notably, from the Deeptech 45, Lightspeed, NFX, and A16Z are among the top 10% most mentioned accounts.

The account with the highest information centrality is @hunterwalk (Figure 15). This likely reflects both the quality as well as the frequency of their tweets.

Figure 15: Hunter Walk, whose tweets are most prominently represented throughout the secondary network.

Furthermore we identified 75 accounts that were heavily interacted with from within this network, while not being part of the network based on connectivity. These 'outside influences' are depicted in figure 16. They include major new sources (TechCrunch, NYT, Forbes and the Wall Street Journal) but also an account by a London-based individual named 'Harry Stebbings'.

Figure 16: The 50 most influential accounts from outside the secondary network.

Figure 17: Harry Stebbings, who is one of the most influential non-newspaper-associated accounts from outside the secondary network.

The secondary network is mostly based in the San Francisco Bay Area, with some East Coast presence, suggesting it may be best defined as the "Silicon Valley VC Network" (Figure 18).

Figure 18: Same as Fig. 12, but plotted on top of a world map.

As a final analysis we looked at the Deeptech 45 and compared the number of in-group followers with information centrality within the secondary network. We assumed that there would be a linear relationship between these two measures, since VC firms with many followers are more likely to have their Tweets seen, re-tweeted and mentioned. In a way it could be said that firms that outperform (i.e. have a high information centrality despite having a low number of in-group followers) reach a higher 'rate of return' on their Twitter activity (e.g. NFX, Lightspeed). Presumably because their Tweets are of higher quality. Note however that this analysis is limited to the network defined above, it is possible that firms that 'under-perform' are in fact targeting a different Twitter community. As an example, Y Combinator appears to under-perform in this network, but it is in fact a major presence in the VC world and the network we mapped here might not be it's specific target audience.

Figure 19: Information centrality vs in-group connections (degrees). The black line represent the linear best fit of the data and to what extend information centrality (mentions) can statistically be explained by in-group followers within the secondary network. Node size reflects total (i.e. in-group and out-group) number of followers.

Figure 20: Same as in fig. 19, but for the entire secondary network. Note how several accounts outperform in terms of Twitter interactions relative to the number of within-network followers that they have. Examples include @hunterwalk, @semil and @samirkaji.

Conclusion and Discussion

This study shows that it is possible to identify and delineate a distinct Twitter subgroup and identify major sources of content within that network. Specifically, we mapped the Twitter network in which the Deeptech 45 is embedded. Future research should investigate why certain accounts achieve higher interaction rates per follower. Additionally, we aim to use topic analysis to map the flow of information through the secondary network in more detail.