A scientific website hides paid content in its source code to get a better Google ranking

27 June 2014 by Paige Brown Jarreau, posted in Uncategorized

This is a guest blog post by Samuel Péan (@SamuelPEAN on Twitter). Samuel has a PhD in Behavioural Sciences / Ecology. He is currently a freelance data scientist and web analyst. Visit his website here.

Scientific journals don’t usually use the full possibilities of the Internet… Their websites are quite static and old fashioned. There often isn’t any good social network integration, no dynamic content, etc. So, when I discovered JoVE for the first time, I was quite excited! “JoVE” stands for the “Journal of Visualized Experiments” and the principle is quite simple and efficient: providing a video with any article to visualize both the aim of the study, protocol, results, etc. Brilliant! This sounds like the perfect mix between traditional publishing and the Web 2.0. But this website has some strange practices...

Google is your friend!

If you’re not familiar with website management, you have to know there are a bunch of standards to respect. Respecting these standards leads to two important things:

  1. Your website will be displayed the same way on all computers / browsers
  2. You will have a better rank in Google ranking

This second point is really important: optimizing your website content to fit to Google requirements is a real challenge called “Search Engine Optimization” (SEO). If you’re doing it well, you can be really well placed in Google search results. If you’re failing it hard, your website will never appear. But if you try to cheat, be ready to face Google’s wrath (your website is generally blacklisted).

Why is SEO a serious thing? Because Google is a HUGE source of traffic, and traffic is money. So, basically, you need a good SEO strategy to generate traffic and also a good analytics strategy to quantify it (best landing pages, most search keywords). The better you generate traffic and know your visitors, the better you’ll be able to monetize your content and publish stuff that fits to your audience.

Source: http://www.shutterstock.com/pic.mhtml?id=54889285

Source: http://www.shutterstock.com/pic.mhtml?id=54889285

Because JoVE stakes everything on digital content (videos and digitally published articles), you might assume they are experts on SEO practices, secured content, neat analytics strategy, etc. But they appear to be making some big mistakes.

Only subscribers can access full-text… Really?

JoVE’s economic model is quite simple. There are no ads on the website, so the only way they generate money is by making institutions pay for subscription to access all articles. They also have open access articles: in this case, like most other open access journals, it’s the authors who pay the peer-review and publication fees. So, the more traffic JoVE generates, the more chances the journal has to get subscribers. If you are not a JoVE subscriber, you can only access the first 21 seconds of each video, the article abstract and the first few sentences of each article section. So, the most important thing for the JoVE team should be to protect the bulk of each article from the non-subscriber.

But the JoVE team decided to use an audacious technique to do so: they actually load the whole article content in the source code, and just add a CSS mask (a kind of filter) to hide the bulk of the text from non-subscribers. That means that even if you are not a JoVE subscriber, you can read the full article just by displaying the page source code.

Guestblog2

I discovered this just few days ago, and decided to make a little Chrome extension that breaks this filter to display the full content in JoVE’s website. I even found a surprisingly easy way to access the full video associated with any article (by comparing the open access article scheme with subscriber-only articles). I made a quick-and-dirty video to show the “exploit” and sent it to the JoVE team. (Editor Note: Samuel did not make this video public, but instead chose ethically to keep it private and bring it immediately to the attention of JoVE.)

Let’s be clear, I’m not a hacker and I’m not talking about intrusion on their servers or stuff like that… I just disabled the CSS mask they display for non-subscribers through the source code accessed via my browser, and I also guessed where there videos were stored.

I’ve since been in contact by a JoVE team member via Skype, and I was quite surprised by his answer: it seems JoVE puts the full-text in the source code even for non-subscribers on purpose, because “completely leaving out text will reduce our SEO scores”...

Interesting... Because an important thing you have to consider: when Google robots scan JoVE website content, they are considered as “non-subscribers” and so can only scan what non-subscribers can access. So JoVE SEO experts hope displaying the full-text in the source code will permit to Google robots to scan it, so they will have a better Google rank, more traffic and potentially more subscribers.

Why hiding full-text in source code is the worst idea ever...

Obviously, the first reason to not do that is that anyone can read it from the source. But this strategy is also not ethical or effective in term of both SEO and server management.

Concerning the SEO part, if you think putting the whole article in the source code will give you a better ranking, you probably haven’t understood anything of how search engines work… If you have a sexy title and a good summary, with neat keywords that appears in both of them (but not too much), that’s enough. I think what JoVE SEO experts hope is, if someone types a word that only appears at the end of the article but not in the title / summary / keywords, Google will display our link! Hmm… OK… but if that word does not appear in your title / summary / keywords, Google will not consider your website as a relevant result. So, this strategy is pointless in term of SEO. Because Google considers A LOT of other parameters than word occurrence to sort sort websites in a result page. Your position depends also on how frequently you update your website, if your page was the first one to be published on that topic, if your website generates a lot of traffic and if people who already searched these keywords clicked on your website as a valid result.

Source http://www.shutterstock.com/pic.mhtml?id=96961685

Source http://www.shutterstock.com/pic.mhtml?id=96961685

In term of server management, sending a lot of hidden information leads to huge over-costs. When you have a website, you need a server to host it, and your web hosting provider will make you pay depending on both the space you need and the traffic you generate. The more space you need, the more you pay. The more information you need to send to your visitors, the more you pay… easy. Although this may seem anecdotal, these hidden articles that are sent to EVERY user are clearly a waste in term of bandwidth management because you transfer information that is useless for the visitor, but you actually pay for this transfer. Another funny thing: The video that JoVE shows to the non-subscriber for just 21 seconds… but this video is not 21 seconds long. It’s a “preview” of a video that has the same total duration and takes up the same storage space as the “full” video available to subscribers. After 21 seconds the video is supposed to disappear, but actually after 23 seconds the video displays a huge JoVE logo (just in case). So basically they host the full video twice: one time with a logo after 23 seconds for non-subscribers and the normal one. This also makes no sense in terms of server space management. And if you pause the video, you will be able to buffer content after 21 seconds; you will never see this content but your browser will have downloaded it. So JoVE servers sent it to you for nothing. More bandwidth additional costs for nothing… I don’t know for you guys, but if I need a video preview, I’ll just cut the 21 first seconds of the video to have a mini preview file, and that’s it.

...because people deserve to know!

I used to think JoVE was a brilliant concept, but now that I’ve scratched the surface I’m convinced this site is managed by people who don’t have a good comprehension of how a website should work. When I discovered the breach, I was wondering if they were just lazy or maybe ignorant… but when they answered me that changing this will reduce their SEO scores I was quite shocked… Do they actually know how the internet works? I hope so but I’m afraid they actually don’t… Two things seem really clear to me:

  1. They do not respect their subscribers, because they make them pay just to toggle a CSS filter and to display a full video which is not really protected and is accessible in 5min for any person who knows how to read HTML / XML
  2. Sending a full-text article and a full-length video to non-subscriber is a huge mismanagement of servers’ bandwidth, and who do you think will pay for the additional servers costs due to this bad strategy? Subscribers and the ones who pay to publish their articles open access.

I think their practices reveal some ignorance and arrogance. The real question was: should I publish this? Because revealing this could lead to a bad buzz and a bad buzz could lead to a loss of money for the company and a loss of money could lead to layoffs… But is this company safe? Do they deserve to exist with these practices? JoVE doesn’t have an official impact factor. They have calculated their own unofficial impact factor, which is 1.19.

So, I weighed the pros and  cons and I chose to publish this because people deserve to know! If this has consequences for JoVE, that would be the price for their bad policy.

To conclude I would say that as a young 2.0 scientist, I sometimes don’t really understand scientific journals… I think some of their practices come from the past. They decide to publish or not your article, your article can be rejected by an anonymous reviewer without any valuable reason, they make money with your work without any retribution for authors, they don’t pay reviewers and they traditionally require copyright transfer from authors.  here are articles from 1902 still stuck behind paywalls and you’re sometimes not allowed to archive the “editor version” of your own article on your personal website. Scientists often accept this only because it’s a kind of heritage from the past, where journals were the only way to disseminate your research worldwide. But when you try to explain these rules to people from “the real world” (I mean those outside of academic research), they don’t easily understand. Institutes still pay expensive subscriptions for journals even if now everything is digital and hosting PDFs files costs less than printing and sending tons of paper… So I don’t really know where the subscription money goes. I think JoVE is just an example and I also think journals with a full Open Access policy are more worthy (because they disseminate research for real and don’t stifle it behind paywalls). But in general journals have to be careful because I wouldn’t be surprised if in a few years new ways to disseminate peer-reviewed research without the need of an editor will be the new standard.

We always had to pay to access knowledge: for decades we bought encyclopedias in paper format, then CD-ROM / DVD. But if you told someone 20 years ago that in the future we will have Wikipedia, an online, collaborative and free encyclopedia, no one would have believed you. In “publish or perish” times, and because academic research  time to integrate new technologies, journals still think researchers need them to publish. But in fact, journals need researchers to exist… So, I think they need to at least respect the research community and act with more transparency.

photoAuthor bio:

I am a Biologist turned web analyst. I have a MSc in Marine Ecology and a PhD in Behavioral Sciences. From video analysis of swimming fish to web data analysis, I specialized in massive data sets and statistics. I am a Science addict, Nature lover and technophile. I am also interested in Science communication, Open Access and Net neutrality. You can follow me on Twitter @SamuelPEAN. 


8 Responses to “A scientific website hides paid content in its source code to get a better Google ranking”

  1. Joseph G Wallis Reply | Permalink

    That type of practice is known as cloaking, when you give the search robots a different version of your page than the user, and can (and probably will) get you a major penalty from Google. It's a short term gain/long term loss situation.

    http://en.wikipedia.org/wiki/Cloaking

  2. dev Reply | Permalink

    Very nicely done. Great read & as much as I'm a fan of JOVE, I'm surprised at their willingness to use such a flimsy paywall. Although American scientists might not be that web savvy, it seems our foreign counterparts are a lot more ambitious when it comes to paywall hacking.

  3. Graham Steel (@McDawg) Reply | Permalink

    Fascinating post. I stumbled across JoVE in December 2007 and spoke with them in January 2008. I really liked the sound of what they were doing and at that time, they were fully open access, so that was an even bigger draw for me. I was asked if I would like to blog for them which I did for ~16 months.

    All was going great for everyone until 1st April 2009 when this happened. http://friendfeed.com/brembs/327c9872/can-someone-confirm-that-jove-has-gone-closed

    • Samuel Péan Reply | Permalink

      Wow... I didn't know about that story! They could have changed their policy / pricing only for the new articles but closing suddenly the access to the whole journal is quite rude...
      Another really weird thing I discovered about their economic model after writing that blog post (because I didn't know that was possible for a "scientific" journal to do that) is that there are SPONSORED articles on JoVE O_O (like this one > http://www.jove.com/video/50948/v3-stain-free-workflow-for-practical-convenient-reliable-total). This is REALLY pernicious because it's structured like a real article, you can cite it like a real article, but it's not! It's obviously quite clear when you are on JoVE website that the article is sponsored, but if you see the sponsored article cited as a reference in another publication, you wouldn't be able to notice this is actually a kind of ads. Nothing in the citation can give you a clue about whether or not the article is sponsored (they could add "[SPONSORED]" before the title, like some bloggers do on their sponsored posts). What they do is actually just giving a DOI number to commercials and the ability to researchers to cite them.
      I'm not completely closed to monetizing a journal website... They have an audience mainly composed by researchers, some companies want to reach that audience, that's how business works... But they shouldn't do it this way.
      It's pretty sad because I think there initial idea of doing short films to illustrate scientific articles was really cool, but they're actually doing everything to kill that brilliant idea...

  4. Jenny R Reply | Permalink

    The other baffling move on their part is that they *don't* have a normal style paywall. Access to the closed material is available only via institutional subscription, and it's not cheap. An individual researcher cannot purchase a subscription or one-time access to an individual article. So they want broad SEO findability, but only sell access to people at rich institutions. Whut?

Leave a Reply


five + 3 =