Skip to main content

Analytics Tools Landscape Summary

This document briefly describes the most common forms of analytics tools in use in both traditional digital publishing and academic publishing, with an emphasis on what they are designed to measure and what goals they are designed to serve.

Published onJun 30, 2020
Analytics Tools Landscape Summary
·

This document briefly describes the most common forms of analytics tools in use in both traditional digital publishing and academic publishing, with an emphasis on what they are designed to measure and what goals they are designed to serve. Additionally, it includes a feature comparison of 3 different analytics products evaluated by the PubPub team in early 2020 ahead of a transition from using Keen to Heap for both product and web analytics.

Digital Publishing Analytics Landscape Overview

Web Analytics

Most standard web analytics tools provide a relatively consistent set of metrics that track user behavior on a website. These tools tend to be geared for media and e-commerce and used by marketers. So, they place an emphasis on understanding how users come to a site and what pathways they took along the way to converting on a defined “goal,” often purchasing a product. These tools also tend to integrate with ad buying platforms, to link ad campaigns to on-site goal conversion.

Examples: Google Analytics, Adobe Omniture, Chartbeat

Product Analytics

Product analytics tools collect similar data to web analytics tools, but are geared for people building software that is itself a product (i.e., PubPub), and thus focus more on allowing product managers to probe how users engage with specific features of a web product to determine what features are most valuable to users. These tools tend measure more fine-grained (but privacy invasive) interactions than web analytics tools like complex interactions with specific site elements, user behavior patterns over time, etc.

Examples: Heap, Mixpanel, Keen

Social Analytics

Because web analytics are collected from on-site activity only, but much traffic today comes from conversation happening on social platforms, a number of companies offer platforms to track activity on social media. Social media companies make money by selling advertisements against the data they collect. So, they tend to limit access to data about content posted on their platforms. Social analytics tools fill in those gaps.

These fall into two main categories.

Campaign Analytics

Most social media sites will give you fairly extensive data on posts you make to your own accounts on social sites. Campaign analytics tools often include schedulers, and will automatically collect and organize metrics across your accounts. Depending on the network and what analytics they provide, these metrics typically include:

  • Post reach (paid and unpaid)

  • Post views

  • Locations the post was seen (newsfeed vs. sidebar vs. recommendation)

  • Engagements with the post (likes, comments, shares, favorites, etc.)

  • Clicks on the post, if it’s a link

  • Video views, if it’s a video

  • Video view duration, if it’s a video

  • Frequency

Examples: Hootsuite, Sprout Social

Social Monitoring

These tools specialize in monitoring all social media posts and giving clients the ability to “listen” for posts that contain URLs or phrases related to the client’s content. This is the only way to know, for example, if someone posted about your article on Twitter without tagging you. The metrics these tools can provide is severely limited by most social media networks, but they can typically tell you:

  • Number of times a link was shared

  • Number of engagements (likes, shares, comments, etc.)

  • Sentiment of posts, decided by an NLP algorithm

These tools often provide news monitoring as well by tracking Google News, LexisNexis, etc.

Examples: Brandwatch, Crowdtangle, Awario

Alternative Scholarly Metrics Landscape Overview

Altmetrics platforms attempt to devise normalized metrics for scholarly work based on a combination of social, news, and citation monitoring.

Dimensions

Dimensions queries its database for a given DOI and provides the following metrics:

  • Total citations

  • Recent citations (citations in the last two years)

  • Field Citation Ratio (for articles over two years old, the relative citation performance of an article, when compared to similarly-aged articles in its subject area, where 1.0 is average)

  • Relative Citation Ratio (for articles over two years old, the relative citation performance of an article, when compared to other articles in its area of research, normalized to 1.0 against all NIH-funded articles in Dimensions)

  • Citing research categories (which categories most frequently cite the article)

Altmetric

Altmetric monitors a number of sources for mentions of a given article (by DOI, URL, and Title) and attempts to calculate a weighted “attention” score that shows how much interest the article has gotten that takes into account volume, types of sources, and quality of authors of mentions1. They use the following sources:2

  • Public policy documents

  • Mainstream media via a manually curated list of RSS feeds3

  • Blogs via a manually curated list of RSS feeds

  • Citations, via Dimensions

  • Online reference managers, via Mendeley, including sharing demographic data of people who have cited your work from Mendeley

  • Post-publication peer review from Pubpeer and Publons

  • English-language Wikipedia citations

  • Open Syllabus Project data

  • Patents, via IIFI Claims

  • Research Highlights via F1000Prime

  • Social media

    • Facebook (mentions on manually curated list of public pages)

    • Twitter

    • LinkedIn, Google+, Sin Weibo, Pinterest History

  • Other platform monitoring

    • YouTube

    • Reddit

    • Stack Overflow

Library Analytics

There’s a final class of analytics that libraries employ to understand the usage of digital collections and make purchasing decisions about different collections based on that usage. This mostly falls outside our remit, as it involves reporting requests for content organized by the access policy for the content, but we may want to implement counter feeds if we build deeper integrations into library systems.

The standard for this type of data is the COUNTER system, defined here.

KFG’s Initial Rough Ideas

  • Research to find out what audience metrics actually predict certain forms of impact for the scholarly community, and which ones we can safely not collect.

  • Combine ethically collected on-site data about user behavior (time on page, scroll depth, etc.) with monitoring from Crossref, social media, news sites, etc.

  • Survey a sample of users using interactive widgets and compare with behavioral metrics to see if certain behavioral metrics predict key impacts like understanding, mind change, etc.

  • Allow authors/admins to manually add reports of qualitative impact (i.e., a classroom invited me to speak) to their articles’ impact sections.

  • Use PubPub’s article history system to more quantitatively track comments and reviews, and display metrics like “review coverage” to readers, or benchmark article impacts/retractions/corrections against the type and quality of reviews they received.

  • O/S analytics tools

PubPub Vendor Feature Comparison

The following is a feature comparison of Google Analytics (a commonly requested vendor), Keen (what PubPub currently uses) and Heap (the vendor PubPub just switched to). Note that because we have access to underlying data for Keen and Heap, there are a number of metrics labeled “to define” that we have the data to display, but have not defined yet.

Google Analytics

Keen

Heap

Type

Audience/Ecommerce

Audience/Product

Product

Customizable Dashboards

Yes

Yes, with PubPub eng

Yes, with PubPub eng

Raw data access

No

Yes

Yes

User-definable metrics

No

Yes

Yes

Users

Unique users who have initiated a session within selected time; identifier can be set by Google or by admin4

Unique users who have initiated a session within selected time; identifier set by Keen and PubPub user ID when logged in

Unique users who have initiated a session within selected time; identifier set by Heap and PubPub user ID when logged in5

Sessions

Period of engagement by user until 30 minutes of inactivity

To define

A session is a period of activity from a single user in your app or website. It can include many pageviews or events. On web, a session ends after 30 minutes of pageview inactivity from the user. On mobile, a session ends after 5 minutes of inactivity, regardless of whether the app's background or foreground state.

Bounce Rate

% of single-page sessions where no page interaction occured

To define

To define

Session Duration

Period of time from first event recorded to last event recorded within a session

To define

No

Time on Page

No

Average length of time spent on pages during the selected time

No

Pageviews

Total number of views of pages during period, including repeated views6

Total number of views of pages during period, including repeated views

Total number of views of pages during period, including repeated views

New Users

Users visiting in this time period who have not been seen before

To define

To define

Language

Set from browser setting

Set from browser setting

Set from browser setting

Location

Set from IP address

Set from IP address

Set from IP address

System

Set from browser agent string

Set from browser agent string

Set from browser agent string

Device

Set from browser agent string

Set from browser agent string

Set from browser agent string

Interests

Google Ads network

No

No

Frequency

# of sessions per user after first session

To define

To define

Recency

# of days since last session during the time frame

To define

To define

Page Depth

Number of sessions for which a user visited at least X pages

To define

To define

Referrer

Set from HTTP request header

Set from HTTP request header

Set from HTTP request header

Search Query

Set from HTTP request header

Set from HTTP request header

Set from HTTP request header

Campaign Tags

Set from URL segments

Set from URL segments

Set from URL segments

Custom Events

Defined by admin; ex ante

Defined by admin; ex ante

Defined by admin; post hoc

Custom Variables

Yes

Not really

Yes

Conversion Pathing

Defined by admin; ex ante

Defined by admin; ex ante

Defined by admin; post hoc

Site Search

No

Yes

Realtime

Yes

No

Not really

Demographics

From Google Ad tracking

No

No

Interests

From Google Ad tracking

No

No

Benchmarks

From Google Ad tracking

To define

To define

Google Search Console

Links Google Search Console to analytics. Must opt in to ad-driven features.

No

No

E-Commerce Funnels

Defined by admin; ex ante

Defined by admin; ex ante

Defined by admin; post hoc + integrations

Scroll Height

Collecting; to define

Can be collected

Includes PubPub Data

No

Yes

Yes

Audience Segmentation

Defined by admin; post hoc

Not really

Defined by admin; post hoc

Period over Period

Yes

Not really

Yes

Email Reports

Must login

No

Yes

Report sharing

Limited

No

Yes

3rd-Party Integrations

No

No

Yes

Footnotes
6
Comments
0
comment

No comments here