James Crisp

Software dev, tech, mind hacks and the occasional personal bit

Rails serving big password protected files – Capistrano Rsync & X-Sendfile

Say you’ve got a few app servers, and you want to serve up some largish files from your rails app (eg, pdfs) behind a login screen. Well, you could put them on s3 and redirect the user to s3 with expiring links. However, this would mean the eventual URL the user gets in their browser is going to be a s3 URL that expires in an ugly way with an XML error when the link expires. And if the link (copied from the address bar) is shared around, it’ll work for non-authorised people for a little while. Then when the s3 link expires, the receivers of the link will never get to see your site/product (maybe they might want to register/subscribe), instead they’ll just get a yucky s3 API looking error in XML, and nowhere to go.

Well, where could we put the files? How about using the file system on the app servers?

Two things to solve.. how to efficiently ship the files to the app servers, and how to serve them without tying up expensive Ruby processes.

1. Shipping the files to the app servers with Capistrano and Rsync

If you’re using docker or similar, you might want to bake the files into the images, if there aren’t too huge.

Assuming you’ve got a more classic style of deploy though..

Welcome old friends Capistrano and Rsync. Using Rsync we can ensure we minimise time / data sending files using binary diffs. We can also do the file transfers simultaneously to app app servers using cap. Here’s the tasks I put together. The deploy_pdfs task will even set up the shared directory for us.

We’re sticking the files into the ‘shared/pdfs’ directory created by capistrano on the app servers. Locally, we have them sitting in a ‘pdfs’ directory in the root of the rails app. This might seem inconsistent (and it is), but the reason is due to limitations/security restrictions with X-Sendfile.

2. Serving the files with X-Sendfile/Apache to let Rails get on with other things

So Rails provides a helpful send_file method. Awesome! Just password protect a controller action and then send_file away! Oh but wait, that will tie up our expensive/heavy ruby processes sending files. Fine for Dev, but not so great for production. The good news is we can hand this work off to our lightweight Apache/nginx processes. Since I use Apache/Ubuntu, that’s what I’ll cover here, but the nginx setup is similar. Using the X-Sendfile header, our rails app can tell the web server a path to a file to send to the client.

How to set up Apache & Rails for X-Sendfile

Ok let’s get Apache rolling:

You need to whitelist the path that files can be sent from, and it can’t be a soft link. It needs to be an absolute path on disk. Hence we are using the ‘shared’ directory capistrano creates, rather than a soft linked directory in ‘current’. X-Sendfile header itself lets you send files without a path (just looks for the files in the whitelisted path), but unfortunately we can’t use this as Rails send_file checks the path exists and raises if it can’t find the file.

In your rails app in production.rb add:

  # Hand off to apache to send big files
  config.action_dispatch.x_sendfile_header = 'X-Sendfile'

In development you probably don’t need this since you won’t be using a server that supports x_sendfile. Without this config, rails will just read the file on disk and send it itself.

In a controller somewhere, just use send_file to hand off to Apache. You’ll need to specify the absolute path to the file in the ‘shared’ directory. I’d suggest putting the path to the shared directory in an environment variable or config file (however you do this usually for your app per environment), and then just append the relevant filename on to it. Also, remember to validate the requested filename (I use a whitelist of filenames to be sure), to avoid the possibility of malicious requests getting sent private files they shouldn’t from elsewhere on disk.

BigDecimal fix for Rails 4 with Ruby 2.4

Rails 4.2.9 works well with Ruby 2.4.2 except for an incompatible change with invalid decimal values and String#to_d. BigDecimal was changed in 1.3.0 (which ships with Ruby 2.4) to throw an exception on invalid values passed to the constructor. This also impacted String#to_d and caused it to raise an exception when it didn’t previously.

If string_amount is an empty String, code like this:

string_amount.to_d

will throw an exception like:

ArgumentError:
  invalid value for BigDecimal(): ""

rather than returning 0 as it did on ruby 2.3 and below.

This is handled in Rails 5 but the change has not been ported back to Rails 4.

If you’re on Rails 4, you’ve got a few options. You could add a monkey patch in your application.rb or initializer:

class String
  def to_d
    begin
      BigDecimal(self)
    rescue ArgumentError
      BigDecimal(0)
    end
  end
end

Or alternatively upgrade the BigDecimal version in your bundle. Ie, add to your Gemfile:

gem "bigdecimal", ">= 1.3.2"

Bundler will ensure you get the later version of BigDecimal rather than the one that ships with Ruby 2.4, and the behaviour was later fixed in BigDecimal 1.3.2 (but Ruby does not include it yet). See the Ruby issue for more details.

WordPress.com vs Self Hosted WordPress

Previously, this blog was hosted on a Virtual Private Server (VPS) with Linode. No problem there, it worked for many years. However, the VPS has other uses, and this blog needed to move.

WordPress.com
Wordpress.com looked like the most attractive option. Automatic updates, automatic backups, automatic CDN etc. So I exported my blog and imported into WordPress.com with a $5 AUD/month personal account. This price gives you a limited setup (no custom plugins), but that was OK as their JetPack plugin provided anti-spam for comments and a form plugin to make a contact form. The price was reasonable too. Alas, the import was very slow eventually imported only 50% of my media (about 40mb of 80mb). URLs to images were also incorrect and showed up as broken even for media that was imported, as the URL structure changed on WordPress.com (links go to a CDN instead). WordPress.com support promised that the image URLs would be updated by a batch job during the night. This never happened. On the subject of the missing media, support told me I needed to upload my media in ~5mb chunks, as the import process could time out but didn’t give any output to the user or support. Ie, they recommended I do 16 manual exports and imports from old blog to new to avoid timeouts. At this point, things felt too creaky to continue, and I asked for a refund, which WordPress.com support kindly agreed to.

Hosting on WordPress.com has some other minor differences. The comment form shows up differently (maybe from jetpack), maybe a little less attractive than my theme, but not a show stopper. There are two admin consoles, the normal WordPress one, and their custom one. There is some overlap between. It is a workable setup. You can’t add Google Analytics though without upgrading your plan, and there’s no way to get everything out easily (you can get out your WordPress xml export, but not media files or settings).

In conclusion, WordPress.com was mainly unsuccessful for me as it was unable to import my old blog. With no access to files or database, it was also impossible for me to manually correct any issues. Perhaps if you were starting a new blog, it could be a good option.

Shared Account with InMotion
After some research online, I went with inmotion.com shared hosting for WordPress. They promise to update your WordPress automatically (I also installed an additional plugin to keep the other plugins up to date). They offer a full and automated WordPress install (all plugins, themes etc available), backup functionality (though you have to ask them to restore), and full ssh access. Speed is good so far, and they are very reasonably priced ($2.95 USD/month if you sign up for 3 years with this coupon). They offer a 90 day back money guarantee.

I tried the same import from my old blog. It was more successful than wordpress.com but didn’t get all media and some links were still broken. I rsync-ed the missing files across (handy to have ssh access) and ran a quick DB update to fix links. InMotion even had a support page for this. Their online support was helpful when I needed to know the IP of my shared host.

So far so good, my blog is now set up and working and I’d recommend InMotion.

Scalability / Performance Tip – Use Cloudflare
I use Cloudflare to provide a CDN, HTTPS and reverse proxy caching for my blog. Their free account is sufficient and means that if my blog ever gets super busy, the load will be handled by Cloudflare, rather than overloading my shared host. Their CDN setup also means the blog loads faster from non-US locations.

Solving Calendar Sync problems on Android 7 Nougat

Recently, the phone calendar on my Samsung Galaxy S6 stopped synchronising with Google calendar. When I went to Google Accounts Sync in Settings, Calendar had the spinner next to it, but it was didn’t spin. Meanwhile, the calendar didn’t sync, and the battery was being chewed through more quickly than usual.

How to fix? Well the first thing I tried was deleting Calendar Storage. This worked for a day or two, and then the problem reoccurred. Next, tried deleting all my Google accounts and adding them back. That worked for about a week.

Finally, by a stroke of good luck, I was looking at the sync screen when I’d just plugged in the phone to charge – it synced fine. A lead at last! Likely something related to power settings!

I’d already poked around in the usual Device Maintenance > Battery > App power monitor screen, and all Calendar related apps where in the ‘Unmonitored’ list so wouldn’t be put to sleep. This section wasn’t the cause of the problem. I finally found the solution, in an additional hidden set of power saving options.

So.. To fix, go to Settings > Device maintenance > Battery > Battery Usage button > Vertical … at top right > Optimise battery usage. Choose All Apps from the drop down. Then disable Optimise for Calendar storage and your calendar apps. Voila! Finally calendar will sync reliably again!

UPDATE – Other syncs

I also had a similar issue with Google Sheets, Google Docs and Google Drive Sync. The same change in ‘Optimise battery usage’ settings for each of these fixed their sync as well.

Talk: Winning at HTTPS

For the first time in a little while, I’ll be giving a talk at the Sydney ALT.NET user group:

HTTPS is ever more pervasive, with few sites still using plain HTTP. Want to be the guy or girl on the team who actually understands HTTPS, can set up certificates and fix issues that come up? Sometimes this is left to an ops team, but there are benefits and impacts that cannot be ignored in development.

James has migrated several sites from HTTP to HTTPS and has tips and tricks to share.

Tues 25 October from 6pm at ThoughtWorks Sydney office, Lvl8 51 Pitt St, Sydney.
RSVP on Meetup (for pizza and beer!)

You can find the slides from the night here.

Talk: Add a billion row data warehouse to your App.. with Redshift, sql and duct tape!

Come along to Sydney ALT.NET for a BIG data night.

I’ll be giving a talk on Redshift:

Started to hit the point where your transactional database is not the right place for running reporting queries and experimental data science? Keen to chuck in more data from web logs, CRMs, facebook, etc so you can start learning more about your users? Come along to Sydney ALT.NET on June 24th to see an easy way to do it with AWS Redshift, mapping SQL and some simple scripting duct tape.

We also have a co-presented talk on Azure’s Hadoop implementation, HD Insights and Power BI: The Power of the Elephant in the Microsoft Cloud given by Jibin Johnson and Simon Waight from the Azure User Group.

From 6pm at ThoughtWorks Sydney office Lvl8 51 Pitt St, 24 June 14.
Remember to get your free ticket. See you there!

Moving to HTTPS, Rails force_ssl and Rollback

Background
We recently moved the Getup site from mixed HTTP/HTTPS to completely HTTPS. The primary driver was to ensure that sessions were never sent in plain text over the wire, to avoid session hijacking. There are other benefits too, such as protecting personal details from eavesdropping over the wire, proving site authenticity and generally simplifying the code. Checking around the web, Twitter is HTTPS only, and even Google search is all HTTPS (when you are logged in).

Rails HTTPS and force_ssl
The easiest and simplest way of moving a Rails 3 app to all HTTPS is to simply set force_ssl = true in relevant environment files. This then causes Rack ssl middleware to be loaded as the first middleware. As you can see from the code, this middleware does a variety of good stuff, that really ensures once HTTPS, always HTTPS!

  • 301 permanent redirect to HTTPS (cached forever in most browsers so you never hit the http url again)
  • Secure on the cookies so they can never be sent over HTTP (important they don’t accidentally go with a redirect for example!)
  • HSTS to ensure that in supported browsers, nobody can ever go to HTTP again!

HSTS
HSTS (HTTP Strict Transport Security) is offered by Chrome and Firefox ensures that for a given time (usually a long time, eg 20 years in the case of Twitter!), it is not possible to go to the site over HTTP again. To activate this, the site only needs to send the Strict-Transport-Security header once. You can check and manage what’s in your HSTS store in Chrome with chrome://net-internals/#hsts

Rollback
When moving to HTTPS, we wanted to ensure we could rollback to a previous release in case of problems. Using force_ssl out of the box precludes this – if you roll back after a 301 redirect or HSTS loaded by a client browser, your site will no longer be accessible!

We used a small monkey patch which turns off HSTS and uses a 302 temporary redirect, rather than a 301. This means that rollback to a previous release works fine. Here’s the patch:

require 'rack/ssl'

module ::Rack
  class SSL

    def redirect_to_https(env)
      req        = Request.new(env)
      url        = URI(req.url)
      url.scheme = "https"
      url.host   = @host if @host
      status     = 302
      headers    = { 'Content-Type'  => 'text/html',
                     'Location'      => url.to_s,
                     'Cache-Control' => 'no-cache, no-store'
                   }
      [status, headers, []]
    end

    def hsts_headers
      {}
    end

  end
end

This patch is only needed temporarily, until you decide that you no longer would want to deploy a release before force_ssl.

Performance
So is HTTPS slower than HTTP? It is to start with in initiating the first request, as ssl needs to be negotiated and set up. This leads to a few more round trips. If your clients and servers are in same country, this is pretty insignificant. Round trips from Australia to USA for example are more significant but not a major stumbling block, as long as you use Keep-Alive on the connection to ensure that later requests re-use the set up from the first request.

Assets are fully cacheable over HTTPS using the usual HTTP headers, and have been even since early Internet Explorer versions. You do need to make sure that you always load HTTPS assets though, to ensure you don’t get mixed-mode warnings in the web browser. We found some useful HTTPS performance tips here.

Load on the servers was not an issue for us as we are using Amazon Elastic Load Balancers (ELB) for our HTTPS implementation. The web/app servers don’t get involved as they are just reverse proxied by the ELB, which manages the HTTPS sessions.

Redirect Gotchas!
We have a few other domains which simply redirect to the canonical www.getup.org.au domain. Out of the box, the rack ssl middleware loads first, before our redirect middleware. This meant that for these additional domains, we got a nasty certificate warning in the browser as it is sent to HTTPS first (on the wrong domain), and then gets the redirect to the canonical domain that has the valid certificate. Changing the order of middleware to do redirects first, and then HTTPS is an easy solution.

Conclusion
The move to full HTTPS has gone smoothly and we didn’t end up needing rollback. However, it was worth having the monkey patch so that rollback was possible as an insurance policy against unexpected major problems.

Talk on Tues: Moving to HTTPS

I’ll be giving a talk at Sydney ALT.NET on Tues:

After recently moving the Getup site fully to HTTPS, James will share with you security pitfalls, the justification for the move from mixed HTTP/HTTPS, lessons learnt, and performance tips. A romp through the protocols of the web with riffs on status codes, HSTS, domain verification, and interesting headers. This talk could save your bacon.

From 6pm at ThoughtWorks Sydney office on Pitt St. Remember to RSVP on the Sydney ALT.NET site to help with catering. See you there!

iAwards Win for CommunityRun / ControlShift!

Very pleased to announce that CommunityRun and the ControlShift platform have won the NSW iAwards in the Community Category.

Many thanks to everyone from GetUp, ThoughtWorks and ControlShift labs for all the hard work and perseverance. I’m proud to have been part of the team to build a tool that lets anyone start and run their own campaign to improve things in their community.

Talk tonight: Responsive Layout with HTML5

I’ll be giving a talk at Sydney ALT.NET tonight:

Want to build a web application which dynamically changes layout to best suit the client, be it mobile, tablet or desktop with the same HTML? Fun times with HTML5, Bootstrap, HAML and Sass. You’ll get to see it in action, and the code behind the magic.

From 6pm at ThoughtWorks Sydney office on Pitt St. Remember to RSVP on the Sydney ALT.NET site to help with catering. See you there!

Page 1 of 18

Powered by WordPress & Theme by Anders Norén