The Fresh Loaf

News & Information for Amateur Bakers and Artisan Bread Enthusiasts

Data being taken from your computer during TFL visits

ElPanadero's picture
ElPanadero

Data being taken from your computer during TFL visits

Nothing really new here, just hopefully some useful info.  Everytime you visit a website you probably know that  programs, scripts and processes run without you generally knowing and data is taken from your computer and sent to companies all over the globe.  Often these are marketing companies who may then spam you or telephone you but either way your details are likely going into massive data warehouses.  Apart from collecting data, these processes open up many "outbound connections" on your PC which can affect performance and of course the data they take contributes to any monthly data allowance you have with your broadband provider.

If you want to stop this from happening then you have to identify the IP addresses where all this data is going to, work out who they are and then instruct your PC firewall to block those IP addresses.  I have done this for all of the main websites that I use on a daily basis.  I did it because I was appalled at just how many "outbound connections" were opening up everytime I visited my favourite websites, sometimes it was over 100 at a time.  It is time consuming identifying the IP Addresses involved and searching the Net to find out which companies own them.  However I have done this for The Fresh Loaf website.  Surprisingly I found that every time I  visited this great site, data from my computer was being sent to over 30 companies !

Now it is one thing for us to see and accept the presence of advertisements on websites (which are often needed to fund the site) but it is another thing entirely to allow a site to take your data and send it to so many different companies. Especially when you can't be sure just what that data is or how it will be used.

Since I have done the groundwork to identify most of the IP addresses that TFL seems to be linked to, I figured it would be useful to some of you to post up a list of them.  If you know how to use your firewall, you can simply enter each of these IPs into the "Blocked Zone" area.  Once you do this, your data won't be taken again when you visit this website.   Obviously this only prevents data scraping when you visit TFL.  Ideally you also need to do the same for all your favourite sites.  I can't do that for you, but I can give you the list for TFL.  So here it is:

IP Addresses collecting data on TFL

(note that Google and Amazon IPs are often involved in data collection activities even if you haven't visited their sites.  It is big business ! Blocking these IPs will not affect your use of the actual Google or Amazon websites - but if it did you could easily unblock the IP anyway).

Block the following IPs

199.96.57.6 -  Twitter
174.129.250.140 - Amazon
77.242.204.10  - InterNAP Network Services U.K. Limited
208.78.169.230 - Federated Media Publishing
74.117.199.102 - ADIFY CORPORATION
198.47.127.15  - Pubmatic Amsterdam, NL
66.117.28.68  - WebSideStory Search and Content Solutions
185.29.134.233 - RIPE Network Coordination Center
66.12.68.41 - Genuity DSL
54.75.234.41  - Merck and Co.
31.13.90.2  - Facebook Ireland Ltd
64.233.166.84 -  Google
68.232.35.139  - EdgeCast Networks
172.228.91.168 - America Online
87.248.212.154 - LIMELIGHT NETWORKS
104.66.78.88  - Unknown
151.249.94.55 - CDNetworks Inc.
204.236.238.224 - Amazon.com
54.194.194.206 - Merck and Co.
23.251.142.20  - Google Inc user content
64.34.226.93  - IP Tones (Peer 1 Network)
204.2.197.201 - Media6Degrees Inc
64.12.68.41  - America Online
185.29.133.223  - RIPE Network Coordination Center
205.234.175.175  - CacheNetworks
95.140.226.151  - LIMELIGHT NETWORKS
151.249.94.37  - CDNetworks Inc
74.125.133.95  - Google
173.194.126.184 - Google
31.13.90.17  - Facebook Ireland Ltd

104.28.8.97 Unknown
174.35.0.0 - 174.35.0.255  - CDNETWORKS (ALLTEL Corporation)

The IP for the TFL site itself appears to be 198.199.69.68.  Obviously DO NOT block this IP.

Cheers

EP

Floydm's picture
Floydm

I disagree with your characterization on a number of levels, El Panadero, most notable in that I see no evidence that any of these sites are keeping connections open or taking random data.  As far as I can tell, most (all?) serve up an ad or register a page visit, some for analytics and, yes, others for marketing purposes. I have disclosed what I know about this on the about page.  

If you are using Chrome, you can use the inspector to watch all the http requests coming and going (right click, Inspect Element, click Network, refresh the page).  You can also see the content of the javascript code that executes on your machine, some of which is human-readable, some minified for performance reasons but which also makes it difficult to follow. Firefox and Safari have similar tools available too. I don't know about IE because I don't use it. If connections were being kept open or reestablished, you would see evidence of that in the inspector. I know this because building web applications that do do that is what I do, professionally.

Try the inspector on TFL, but try it on other sites too. Any site that has a Twitter button or Facebook like widget with a counter on it is making similar connections.

I've make considerable effort to prevent cross site-scripting attacks on the site, which is the primary way that sites with community contributed content end up leaking private personal information. It isn't the vendors doing it; it is malicious users.

ElPanadero's picture
ElPanadero

Hi there

First off let me say my post is absolutely no criticism of the site which is excellent and if you have taken it as such then I apologise. Nevertheless, I'm personally sick of my personal data being "scraped", "tracked" or otherwise taken by the plethora of spying tactics out there so I have simply begun tackling that problem. You say:

"I see no evidence that any of these sites are keeping connections open"

Being kept open isn't really my issue. It's the fact that they open at all, then move some data to some foreign company and then close, that I don't like or want. These short outbound connections are being established every single time I visit the site.

"I see no evidence that any of these sites are . . . taking random data"

Call me simplistic but these outbound connections to these company servers is being done for a reason and data IS involved because my firewall shows the number of Bytes/KB/MB involved. Now I could spend hours analysing that data as you suggest but frankly, as my visits to TFL are for the purposes of discussing bread matters, I see no reason to analyse why data, any data, of any type or nature, should be going from my computer to Merck & Co, a US Pharmacuetical company !! Or for that matter to Federated Media Publishing or to Media6Degrees Inc or to America Online or Limelight Networks. There are various reports on the web about Limelight Networks scouring websites and in one case a web administrator emailed them to complain and was told that the issue was probably a "spider" associated with one of their advertising partners. Hmmmm.

So in the end I conclude that I don't need to analyse what data these companies are collecting when I visit the TFL site, I simply don't want them doing it because none of it has anything to do with bread discussion. Therefore I'm blocking them all. Even widgets like Twitter and Facebook buttons should not impact ME the end user. They are scripted buttons on the TFL site itself for use by end users who might choose to use those buttons. It would make sense that IF and WHEN those buttons are used, that a connection to the Twitter/Facebook sites might then be established but there is asolutely no reason for such connections to be made if I don't use those buttons at all. . . . and yet they are!

The same goes for connections to Amazon. I do use Amazon a lot personally, but what does that have to do with my viewing of The Fresh Loaf website? Why should an outbound connection open to Amazon whilst I am viewing TFL? It's a nonsense !

My view is that most (if not all) this "stuff" is associated with ads and scripts and as such you don't generally have control of what advertisers put or embed into their adverts. I'm not blaming anyone here nor am I complaining. I'm simply passing on the groundwork I have done so others don't have to do the hours of work that I did to determine all of the rogue IP addresses involved here.

Bottom line is, none of the IP connections I listed need to be there for our viewing pleasure and usage of TFL. So I have personally blocked them all in my firewall. The number of outbound connections that now appear when I visit have now dropped from 20-30 to just 1-2. Job done.

I'm very happy in my paranoid world ! :-)

EP

cranbo's picture
cranbo

My view is that most (if not all) this "stuff" is associated with ads and scripts

You're absolutely right, and these ads provide revenue to Floyd to help keep this site going. 

Yes I also ads annoying in general (and the tech behind targeted ads somewhat intrusive). But it's a price I'm willing to pay to participate in this community. Having worked in tech for many years I understand the amount of effort it takes to keep a site like this running smoothly, so if the ads help Floyd, so be it. 

In any case thanks for your insight & data. 

ElPanadero's picture
ElPanadero

I agree that ads and whatever come with them are required to fund sites like this.  That is something that Floyd manages with his sponsors.   The presence of the ads however IN NO WAY means that you have to watch them, or allow them to appear and it most certainly doesn't mean you need to allow those adverts to invasively remove marketing data from your computer.

You can prevent the Ads appearing in the first instance by simply disabling Scripting :

(Tools-InternetOptions-Security-CustomLevel-Scripting-ActiveScripting-Disable)

The site runs a hell of a lot faster when you're not having to wait for all the Ad stuff to load in.

Then as I have outlined in the OP you can prevent these sites from data scraping your computer by simply blocking the IP addresses in your firewall.   None of this affects Floyd's advert sponsors.  Websites get paid just for renting the page space for the ads in the first place.

ATB

Floydm's picture
Floydm

there is asolutely no reason for such connections to be made if I don't use those [Twitter|Facebook] buttons at all.

The buttons, images, and counters are delivered by javascript from those providers. That is how they know whether you've already liked something or how many likes/tweets the pages have. That also means that connections are made to those service providers and that, yes, they know that you've viewed a given page.  As I said, that is true on TFL or any other website you visit that has embedded social media widgets.  Having these widgets is a convenience for site visitors and administrators but, yes, it also means those vendors know more about your internet behaviour than you may be comfortable with. 

There is a new do not track header that is being experimented with but it is not widely observed yet. We'll see if it gets any traction.

Websites get paid just for renting the page space for the ads in the first place.

That is incorrect. Many advertisers pay a flat rate per thousand ad views, some pay per banner click or per conversion (sale or sign up). No one pays me a flat fee just to rent space here, so if you block all the advertising IPs, I don't get paid. I'm not telling you not to do it, just letting you know what the ramifications are.

As far as I recall the only potential advertisers who've offered me flat fees have been "native advertising" sponsors who have offered to pay me a flat amount per post I make about their products. I have not accepted those offers because my preference is to keep the line between content and advertising clearly delineated. 

Personally, if being tracked and seeing ads bothers you, my advice would be to use the "Privacy Mode" (AKA "Incognito Mode") that most modern browsers ship with. That will flush all the cookies and cached data given to you at the end of your browser session, thus making it very difficult for any advertiser to track your behaviour over more than one browsing session.  That plus any number of ad blocking tools available will get you a long way toward what you seek. 

ElPanadero's picture
ElPanadero

". Many advertisers pay a flat rate per thousand ad views, some pay per banner click or per conversion (sale or sign up). No one pays me a flat fee just to rent space here, so if you block all the advertising IPs, I don't get paid. I'm not telling you not to do it, just letting you know what the ramifications are."

Hold on though. I, and likely most others, are not actually clicking on those banners or ads. So if you're being paid per click then it makes no difference whether I block them or not. Point of note here, I never, ever click on any ad in any web page (except accidentally - which is often their hope). If I see something in an ad that interests me, I would go direct to that website via URL or via Google search.

The tracking issues are far far greater than you are making out. Just Google "Facebook Is Tracking Your Every Move on the Web" to see a plethora of sites discussing this huge issue. I don't use Facebook, but they are still tracking me nevertheless. When I use TFL, my browser back button is instantly populated with a link to www.facebook.com/plugins/like.ph. So simply by using a FB widget on the site, TFL users are exposed to Facebook's invasive practices even if they never go near that widget. Facebook tracks our site vists and passes that info on. Google and Twitter are doing similar things. Their scripts/links contain web bugs that are tracking everything we do regardless of whether or not we actually visit their websites. Even if you delete cookies, their web bugs keep finding ways to track.

I am aware of the "Do Not Track" tools out there but the most reliable way to deal with much of this is to simply edit your hosts file and redirect these sites. This is how I did it:

Start-Programs-Accessories
Right click Notepad, select Run As Administrator

In Notepad I then opened C:/windows/system32/drivers/etc

Then added the following lines into it:
127.0.0.1 www.google-analytics.com # block Analytics
127.0.0.1 ssl.google-analytics.com
127.0.0.1 google-analytics.com
127.0.0.1 www.googleadservices.com # block remarketing
127.0.0.1 googleadservices.com
127.0.0.1 facebook.com
127.0.0.1 twitter.com
127.0.0.1 www.facebook.com
127.0.0.1 www.twitter.com

Now none of these particular tracking sites can get to my PC or collect anything from it. Obviously if I actually wanted to use Facebook I would comment out the facebook lines by putting a # in front of them. My hosts file is actually a hell of a lot bigger than this. I have 100s of ad and tracking sites redirected as above.

There are many people out there who believe that if you are resigned to using social network sites like FB and Twitter then you should use a different browser specifically for those sites and for nothing else. In the fullness of time the true scale and weight of all this tracking will become evident to those who pretended it wasn't happening.

I'm not trying to deprive you of income here, I'm simply taking steps to protect myself.

Floydm's picture
Floydm

Hold on though. I, and likely most others, are not actually clicking on those banners or ads. So if you're being paid per click then it makes no difference whether I block them or not.

The first bit of my comment was "Many advertisers pay a flat rate per thousand ad views".  Zeroing out the hosts means the ads are never requested, thus the ad views never happen.

My hosts file is actually a hell of a lot bigger than this. I have 100s of ad and tracking sites redirected as above.

Then you have far more time on your hands than I do. 

-Floyd

mwilson's picture
mwilson

Protect yourself from what exactly?! I think I would be more concerned with giving myself anxiety fretting over such trivial matters. Life is too short. None of the data that goes to facebook, amazon and such personally identifies you, until you give them your personal information. I was going suggest as Floyd did, just to use incognito mode, rather than going to the trouble of individually block sites. I hear you though when it comes to site loading speed.

Take a chill pill man.

ElPanadero's picture
ElPanadero

"None of the data that goes to facebook, amazon and such personally identifies you, until you give them your personal information"

This is a tad naive my friend. These sites would like you believe that it's all just a bit of harmless marketeering but I truly believe it isn't. About a month ago, I decided to take a look at solar panels for my house. I surfed 2-3 websites involved in solar schemes such as those that put them on free. About 1 hour later, out of the blue, I received a phone call from a solar panel provider. That pretty much ended all my naivety that all this data scraping is not personal. But each to their own. As you say there are more important things to worry about.

Les Nightingill's picture
Les Nightingill

It's not possible by your visiting a website that someone can get your phone number. You are speculating, after saying that you didn't take the time to analyze the contents of the data packets.

jaywillie's picture
jaywillie

Ghostery is a free browser plugin/extension that stops trackers. It works very well for me. (On this very page, eight trackers attempted to load. Ghostery blocked six of them as needless.) Along with Ad Block, another plugin, my browsing is substantially more private.

Ghostery can be configured to allow or disallow trackers. It makes a guess at which are necessary for successful viewing of the page, and blocks all others. If I find that I need some tracker to view or use a site (the ability to access comments sections often is initially blocked, for instance), I can easily activate it.

No connection to these products, just a satisfied customer. I apologize to Floyd if these cause him to lose some income, but I prefer to keep my info to myself.

jaywillie

BobBoule's picture
BobBoule

to block trackers, its so much easier to use than manually editing the Hosts file, plus its consciously updated and I can see on each site I visit if there are new beacons/trackers showing up so it makes me feel safer.

I own my own website that has ads to help me support it so I'm sensitive to both both sets of needs and yes, when I want to buy something online I momentarily turn off Ghostery (extremely simple to do), click on an ad in TFL, make my purchase then turn it back off. Thats my way of balancing the equation easily.