skip to Main Content

How to make an offline copy of a website

To create an offline browseable copy of a website you can use the tool wget.
I will guide you through the steps on how to create a offline copy based on your needs.

Download the website

wget --mirror --no-check-certificate -e robots=off --timestamping --recursive --level=inf \
--no-parent --page-requisites --convert-links --adjust-extension --backup-converted -U Mozilla \
--reject-regex feed -R '*.gz, *.tar, *.mp3, *.zip, *.flv, *.mpg, *.pdf' http://test.com

Change into the directory of the offline copy:

cd www.test.domain

Clean up temp files

wget craetes some temp files, remove them:

find . -type f -name '*.orig' | xargs -n1 rm -f

Optimize images

If you want, you can convert JPEG images to a lower quality to save disk space:

find . -iname '*.jpg' | xargs -n1 mogrify -strip -quality 20

PNG files can be converted to JPEG files, but you have to keep the same filename with png ending to not break the offline website.

find . -name '*.png' | xargs -n1 mogrify -strip -quality 20 -format jpg
find . -name '*.PNG' | xargs -n1 mogrify -strip -quality 20 -format jpg
# rename the png files that got converted to jpg back to png
find . -name '*.png' -exec sh -c 'mv `dirname "$0"`/`basename "$0" .png`.jpg $0' '{}' \;
find . -name '*.PNG' -exec sh -c 'mv `dirname "$0"`/`basename "$0" .PNG`.jpg $0' '{}' \;

Anti-Adblock-Killer Guide to fight Anti-Adblock

Anti-Adblock Introdution

If you use an Adblock technology like uBlock, you sometimes see an Anti-Adblock technology on a website that blocks you from accessing the website content. A famous example recently is forbes.com. I will use a website as an example to show how you can defeat that by creating a custom whitelist rule that will fight the Anti-Adblock but still disable the Advertisements and Tracking. You can call it an Anti-Anti-Adblock or Anti-Adblock-Killer.

Create an Anti-Adblock-Killer for an example website

The example website for this tutorial is sc2casts.com that shows a nag screen. When you open the website with Adblock enabled, you receive a message like this:

Anti-Adblock Nag Screen

Open the developer tools (by pressing F12 on Windows for Chrome), click the magnifier to inspect the element and click on the dialog that shows the Anti-Adblock message.

Anti-Adblock Nag Screen Div

The sourcecode shows the div that displays this message. The name indicates that the div id is random, refreshing the page and inspecting the div again proofs that. Therefore a black or white-listing based on the div id would not work. Now search for the id of the div in the code by pressing Ctrl+F. One of the matches is:

Anti-Adblock Nag Screen JavaScript

As you can see by the code and name of the canRunAds variable, it is used to determine if the user can show Ads or uses an Adblocker. Searching for canRunAds reveals no code location that sets it, therefore it must be set by an external script (If the code to set it is not obfuscated of course). Open the Network tab of the developer tools and refresh the page. You will now see all network requests that got blocked by the Adblocker marked red.

Anti-Adblock Network View Developer Tools

Right click on each of the blocked scripts and select Open in new tab. The url http://sc2casts.com/tt/banner_ads.js reveals an interesting script:

var canRunAds = true;

Now all we need to do is whitelist this specific url in uBlock. Open the uBlock settings, select My rules, click Edit below Temporary rules and insert a new line:

sc2casts.com http://sc2casts.com/tt/banner_ads.js script allow

Save the whitelist rules, refresh the website and voila the nag screen is gone and shows no Ads.

Update the Anti-Adblock-Killer

Update 2015-08-2: The website Anti-Adblock code changed and the solution above does not work anymore. A look into the code of the website reveals the new added JavaScript code:

var script = document.createElement('script');
    console.log("1");
    script.onerror = function(){
        script.onerror = null;
        document.body.removeChild(script);
        document.getElementById("OOXGW").style.display="";
        _gaq.push(['_trackEvent', 'adblock popup', "show popup", document.location+""]);
    }
    script.src = "http://pagead2.googlesyndication.com/pagead/js/r20150723/r20150728/show_ads_impl.js";
    document.body.appendChild(script);
</script>

This code uses the window.onerror function to catch failed http request which in our case is the blocked the Ad script. In the source code of the nag screen you can see that is uses the css class headline. If you search for it in the html you will see that it is only used in the nag screen. That allows us to use that class to find the nag screen with a jquery selector and from there to traverse up the DOM tree to find the main div of the nag screen that has a random id. Install Tampermonkey for Chrome or Greasemonkey for Firefox and add a new user script with the following content:

// ==UserScript==
// @name         sc2casts.com Anti-Adblock Killer
// @require      http://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js
// @namespace    http://www.codejuggle.dj/
// @version      1.0
// @description  Removes the nag screen on http://sc2casts.com/ when using an ad-blocker
// @author       CornelK
// @match        http://sc2casts.com/*
// @grant        none
// ==/UserScript==

var script = document.createElement('script');
script.src = "http://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(script);

$('.headline').parent().parent().parent().remove();

Refresh the site and voila, the nag screen is gone again (for now).

How to filter Facebook ads and annoyance using CSS

To apply a user stylesheet on a website, you will need a browser plugin like Stylish, which exists for Chrome and Firefox. The website for this plugin is userstyles.org and contains tons of user made stylesheets for websites.

  • Install the plugin
  • Open Stylish extension options
  • Click ‘Add New Style’
  • Enter ‘Facebook’ in the name field and check ‘Enabled’
  • Copy & paste the following stylesheet into ‘Code’
  • Click the ‘Specify’ button and switch the drop down to ‘URLs on the domain’ and enter ‘facebook.com’
  • Click ‘Save’ and open Facebook to see the difference
/* Custom user styles for facebook.com by www.codejuggle.dj */

/* Apps */
#appsNav,

/* Suggested Page */
.uiStreamStoryAttachmentOnly,

/* Sponsored */
#pagelet_ego_pane_w, .ego_section, .ego_unit_container,

/* List Suggestions */
#pagelet_friend_list_suggestions,

/* Like pages */
#pagesNav > ul > li:nth-child(3),

/* Like your favourite Pages in feeds */
.megaphone_story_wrapper,

/* Like Similar Stories */
._5j5v,

/* Suggested Post - profile picture can't be filtered yet */
.uiStreamHeadlineWithLikeButton, .uiStreamHeadlineWithLikeButton~h5, .uiStreamHeadlineWithLikeButton~div, .uiStreamHeadlineWithLikeButton~form {
  display: none !important;
}

.fbx #globalContainer {
  width:1200px
}

.hasLeftCol .homeWiderContent div#contentArea {
  padding-left:18px;
  padding-right:25px;
  width:725px
}

The complete filtering of suggested posts is not possible with current CSS standard due to a missing parent selector. The W3C Working Draft for Selectors Level 4 provide a syntax to define a subject of a selector, which would help filtering this. Some other styles that I am using:

Back To Top