Category: Technical Page 2 of 15

Importing Excel 365 CSVs with Ruby on OSX

On May 5, 2020

Up until 2016, Excel for the Mac provided a handy CSV export format called “Windows CSV”, which used iso-8859-1/Windows-1252 encoding. It was reliable, handled simple extended characters like degree signs, and could be read in Ruby with code like:

CSV.foreach(file, encoding:'iso-8859-1', headers:true, skip_blanks:true) do |row|
    ....
end

Unfortunately, support for this format was dropped in Excel 365. Many RiskAssess data files were in this format, as the earlier versions of Excel did not properly export to valid UTF-8.

Excel 365 now has a Save As option of “CSV UTF-8 (Comma-delimited)”. This is UTF-8.. with a few twists! Sometimes it includes a UTF-8 format first character, and sometimes not. Sometimes it includes the UTF-8 format first character plus a BOM (byte order mark), another invisible character. According to Wikipedia “The Unicode Standard permits the BOM in UTF-8 but does not require or recommend its use”. This makes it trickier to import. Code like this:

CSV.foreach(file, encoding:'UTF-8', headers:true, skip_blanks:true) do |row|
    ....
end

will handle the first 2 possibilities, but not the BOM. The BOM ends up as an invisible character in the data, causing trouble. It is possible to import with encoding ‘bom|utf-8’ which will handle this case. Another option is to run data through the dos2unix command (easily installed with brew) which does general tidying including removing the unnecessary BOM.

Also to watch out for, “Windows CSV” format previously used “\r” to denote new lines inside of cells. The new UTF-8 export uses “\n” for new lines inside of cells.

Fixing ‘Invalid query parameters: invalid %-encoding’ in a Rails App

By James

On May 28, 2018

In Ruby / Rails, Technical

Sometimes users manually edit query strings in the address bar, and make requests that have invalid encodings. Unfortunately Rails does not handle this neatly and the exception bubbles up. Eg,

ActionController::BadRequest
ActionView::Template::Error: Invalid query parameters: invalid %-encoding (%2Fsearch%2Fall%Forder%3Ddescending%26page%3D5%26sort%3Dcreated_at)

from:
/rack/lib/rack/utils.rb:127:in `rescue in parse_nested_query'

[Note: This was with Passenger, which passed the request through to the app – your mileage may vary with other servers]

In the case of my app, these corrupted query strings are not that important, but users are receiving 500 server error pages. Sometimes they end up with a bad query string URL cached in browser history, so they keep going back to it rather than to the home page.

A simple solution, that gives a good user experience for my app, is to simply drop the query string on a request completely if it has invalid encoding. See my implementation using Rack middleware below:

Doing a Website Re-design or new look

By James

On March 29, 2018

In RiskAssess, Soft Skills and Mind Hacks, Technical

Having recently being updating the look of RiskAssess, I thought I’d share a few important things to remember.

About 4.5% of people are colour blind. This means a lot of your users! Make sure your site has sufficient contrast in the colour choices. Great tools are available online to help:

Stage the changes as much as possible. Do them incrementally rather than all at once. That way people have time to get used to them, and you have the usual incremental software development benefits like earlier releases with lower risk of bugs.

If you have an established user base, make sure the new design is recognisably connected with the old design, so people don’t feel it’s all changed.

Test on the hardware your users use, and design for it. A design with fancy fonts and subtle colours might look good on a big Retina iMac, but how will it look on old low res LCDs and small netbook/laptop screens?

What are the demographics of your users? If they skew older, then heavier fonts, more contrast etc may be vital for readability.

As always, test on target browsers, screen sizes, mobile, tablet etc to ensure all your users have a good experience.

Ask some of your users for feedback! Yes, really! If you’ve got an established site, you really need to make sure you will be delighting people, not annoying them with a new look. Even if it’s only CSS changes, there’s likely a lot you may not have thought of that your users will spot right away that you’ll want to take on board.

Warn all your users that the change will be happening. Provide screenshots, explain why the new look is better for them, and give everyone the chance to check it out and give feedback and get ready for the change.

Encourage people to give feedback once the new design goes live. Take it on board humbly (even if it hurts), and react quickly to fix any accidental losses of functionality or oversights. Much better if users tell you what they don’t like so you can fix it.

Talk: Introducing Elixir

By James

On February 20, 2018

In ALT.NET, Talks, Technical

I’ll be speaking at the Sydney ALT.NET user group next Tues (27 Feb).

Yet another language!?? But wait, Elixir is way cool! Imagine the speed and concurrency of Erlang (it’s build on the Erlang VM), the neatness of functional programming (but only when you want it) and the sexy expressive style of Ruby, all rolled into one attractive language. And, yeah, it has a good web framework too.

Tues 27 Feb from 6pm at ThoughtWorks Sydney office
NOTE NEW LOCATION: 50 Carrington Street, Level 10

RSVP on Meetup (for pizza and beer!)

Check out talk material on Github (slides in VIM 🙂 )

And a you can watch a recording of the talk.

Rails serving big password protected files – Capistrano Rsync & X-Sendfile

By James

On November 9, 2017

In Ruby / Rails, Technical

Say you’ve got a few app servers, and you want to serve up some largish files from your rails app (eg, pdfs) behind a login screen. Well, you could put them on s3 and redirect the user to s3 with expiring links. However, this would mean the eventual URL the user gets in their browser is going to be a s3 URL that expires in an ugly way with an XML error when the link expires. And if the link (copied from the address bar) is shared around, it’ll work for non-authorised people for a little while. Then when the s3 link expires, the receivers of the link will never get to see your site/product (maybe they might want to register/subscribe), instead they’ll just get a yucky s3 API looking error in XML, and nowhere to go.

Well, where could we put the files? How about using the file system on the app servers?

Two things to solve.. how to efficiently ship the files to the app servers, and how to serve them without tying up expensive Ruby processes.

1. Shipping the files to the app servers with Capistrano and Rsync

If you’re using docker or similar, you might want to bake the files into the images, if there aren’t too huge.

Assuming you’ve got a more classic style of deploy though..

Welcome old friends Capistrano and Rsync. Using Rsync we can ensure we minimise time / data sending files using binary diffs. We can also do the file transfers simultaneously to app app servers using cap. Here’s the tasks I put together. The deploy_pdfs task will even set up the shared directory for us.

We’re sticking the files into the ‘shared/pdfs’ directory created by capistrano on the app servers. Locally, we have them sitting in a ‘pdfs’ directory in the root of the rails app. This might seem inconsistent (and it is), but the reason is due to limitations/security restrictions with X-Sendfile.

2. Serving the files with X-Sendfile/Apache to let Rails get on with other things

So Rails provides a helpful send_file method. Awesome! Just password protect a controller action and then send_file away! Oh but wait, that will tie up our expensive/heavy ruby processes sending files. Fine for Dev, but not so great for production. The good news is we can hand this work off to our lightweight Apache/nginx processes. Since I use Apache/Ubuntu, that’s what I’ll cover here, but the nginx setup is similar. Using the X-Sendfile header, our rails app can tell the web server a path to a file to send to the client.

How to set up Apache & Rails for X-Sendfile

Ok let’s get Apache rolling:

You need to whitelist the path that files can be sent from, and it can’t be a soft link. It needs to be an absolute path on disk. Hence we are using the ‘shared’ directory capistrano creates, rather than a soft linked directory in ‘current’. X-Sendfile header itself lets you send files without a path (just looks for the files in the whitelisted path), but unfortunately we can’t use this as Rails send_file checks the path exists and raises if it can’t find the file.

In your rails app in production.rb add:

  # Hand off to apache to send big files
  config.action_dispatch.x_sendfile_header = 'X-Sendfile'

In development you probably don’t need this since you won’t be using a server that supports x_sendfile. Without this config, rails will just read the file on disk and send it itself.

In a controller somewhere, just use send_file to hand off to Apache. You’ll need to specify the absolute path to the file in the ‘shared’ directory. I’d suggest putting the path to the shared directory in an environment variable or config file (however you do this usually for your app per environment), and then just append the relevant filename on to it. Also, remember to validate the requested filename (I use a whitelist of filenames to be sure), to avoid the possibility of malicious requests getting sent private files they shouldn’t from elsewhere on disk.

BigDecimal fix for Rails 4 with Ruby 2.4

By James

On September 26, 2017

In Ruby / Rails, Technical

Rails 4.2.9 works well with Ruby 2.4.2 except for an incompatible change with invalid decimal values and String#to_d. BigDecimal was changed in 1.3.0 (which ships with Ruby 2.4) to throw an exception on invalid values passed to the constructor. This also impacted String#to_d and caused it to raise an exception when it didn’t previously.

If string_amount is an empty String, code like this:

string_amount.to_d

will throw an exception like:

ArgumentError:
  invalid value for BigDecimal(): ""

rather than returning 0 as it did on ruby 2.3 and below.

This is handled in Rails 5 but the change has not been ported back to Rails 4.

If you’re on Rails 4, you’ve got a few options. You could add a monkey patch in your application.rb or initializer:

class String
  def to_d
    begin
      BigDecimal(self)
    rescue ArgumentError
      BigDecimal(0)
    end
  end
end

Or alternatively upgrade the BigDecimal version in your bundle. Ie, add to your Gemfile:

gem "bigdecimal", ">= 1.3.2"

Bundler will ensure you get the later version of BigDecimal rather than the one that ships with Ruby 2.4, and the behaviour was later fixed in BigDecimal 1.3.2 (but Ruby does not include it yet). See the Ruby issue for more details.

WordPress.com vs Self Hosted WordPress

By James

On September 5, 2017

In Technical

Previously, this blog was hosted on a Virtual Private Server (VPS) with Linode. No problem there, it worked for many years. However, the VPS has other uses, and this blog needed to move.

WordPress.com
Wordpress.com looked like the most attractive option. Automatic updates, automatic backups, automatic CDN etc. So I exported my blog and imported into WordPress.com with a $5 AUD/month personal account. This price gives you a limited setup (no custom plugins), but that was OK as their JetPack plugin provided anti-spam for comments and a form plugin to make a contact form. The price was reasonable too. Alas, the import was very slow eventually imported only 50% of my media (about 40mb of 80mb). URLs to images were also incorrect and showed up as broken even for media that was imported, as the URL structure changed on WordPress.com (links go to a CDN instead). WordPress.com support promised that the image URLs would be updated by a batch job during the night. This never happened. On the subject of the missing media, support told me I needed to upload my media in ~5mb chunks, as the import process could time out but didn’t give any output to the user or support. Ie, they recommended I do 16 manual exports and imports from old blog to new to avoid timeouts. At this point, things felt too creaky to continue, and I asked for a refund, which WordPress.com support kindly agreed to.

Hosting on WordPress.com has some other minor differences. The comment form shows up differently (maybe from jetpack), maybe a little less attractive than my theme, but not a show stopper. There are two admin consoles, the normal WordPress one, and their custom one. There is some overlap between. It is a workable setup. You can’t add Google Analytics though without upgrading your plan, and there’s no way to get everything out easily (you can get out your WordPress xml export, but not media files or settings).

In conclusion, WordPress.com was mainly unsuccessful for me as it was unable to import my old blog. With no access to files or database, it was also impossible for me to manually correct any issues. Perhaps if you were starting a new blog, it could be a good option.

Shared Account with InMotion
After some research online, I went with inmotion.com shared hosting for WordPress. They promise to update your WordPress automatically (I also installed an additional plugin to keep the other plugins up to date). They offer a full and automated WordPress install (all plugins, themes etc available), backup functionality (though you have to ask them to restore), and full ssh access. Speed is good so far, and they are very reasonably priced ($2.95 USD/month if you sign up for 3 years with this coupon). They offer a 90 day back money guarantee.

I tried the same import from my old blog. It was more successful than wordpress.com but didn’t get all media and some links were still broken. I rsync-ed the missing files across (handy to have ssh access) and ran a quick DB update to fix links. InMotion even had a support page for this. Their online support was helpful when I needed to know the IP of my shared host.

So far so good, my blog is now set up and working and I’d recommend InMotion.

Scalability / Performance Tip – Use Cloudflare
I use Cloudflare to provide a CDN, HTTPS and reverse proxy caching for my blog. Their free account is sufficient and means that if my blog ever gets super busy, the load will be handled by Cloudflare, rather than overloading my shared host. Their CDN setup also means the blog loads faster from non-US locations.

Solving Calendar Sync problems on Android 7 Nougat

By James

On August 26, 2017

In Android, Personal, Technical

Recently, the phone calendar on my Samsung Galaxy S6 stopped synchronising with Google calendar. When I went to Google Accounts Sync in Settings, Calendar had the spinner next to it, but it was didn’t spin. Meanwhile, the calendar didn’t sync, and the battery was being chewed through more quickly than usual.

How to fix? Well the first thing I tried was deleting Calendar Storage. This worked for a day or two, and then the problem reoccurred. Next, tried deleting all my Google accounts and adding them back. That worked for about a week.

Finally, by a stroke of good luck, I was looking at the sync screen when I’d just plugged in the phone to charge – it synced fine. A lead at last! Likely something related to power settings!

I’d already poked around in the usual Device Maintenance > Battery > App power monitor screen, and all Calendar related apps where in the ‘Unmonitored’ list so wouldn’t be put to sleep. This section wasn’t the cause of the problem. I finally found the solution, in an additional hidden set of power saving options.

So.. To fix, go to Settings > Device maintenance > Battery > Battery Usage button > Vertical … at top right > Optimise battery usage. Choose All Apps from the drop down. Then disable Optimise for Calendar storage and your calendar apps. Voila! Finally calendar will sync reliably again!

UPDATE – Other syncs

I also had a similar issue with Google Sheets, Google Docs and Google Drive Sync. The same change in ‘Optimise battery usage’ settings for each of these fixed their sync as well.

Talk: Winning at HTTPS

By James

On October 18, 2016

In ALT.NET, Cloud, Design / Architecture, Talks, Technical

For the first time in a little while, I’ll be giving a talk at the Sydney ALT.NET user group:

HTTPS is ever more pervasive, with few sites still using plain HTTP. Want to be the guy or girl on the team who actually understands HTTPS, can set up certificates and fix issues that come up? Sometimes this is left to an ops team, but there are benefits and impacts that cannot be ignored in development.

James has migrated several sites from HTTP to HTTPS and has tips and tricks to share.

Tues 25 October from 6pm at ThoughtWorks Sydney office, Lvl8 51 Pitt St, Sydney.
RSVP on Meetup (for pizza and beer!)

You can find the slides from the night here.

Talk: Add a billion row data warehouse to your App.. with Redshift, sql and duct tape!

By James

On June 16, 2014

In ALT.NET, Cloud, Design / Architecture, Talks, Technical

Come along to Sydney ALT.NET for a BIG data night.

I’ll be giving a talk on Redshift:

Started to hit the point where your transactional database is not the right place for running reporting queries and experimental data science? Keen to chuck in more data from web logs, CRMs, facebook, etc so you can start learning more about your users? Come along to Sydney ALT.NET on June 24th to see an easy way to do it with AWS Redshift, mapping SQL and some simple scripting duct tape.

We also have a co-presented talk on Azure’s Hadoop implementation, HD Insights and Power BI: The Power of the Elephant in the Microsoft Cloud given by Jibin Johnson and Simon Waight from the Azure User Group.

From 6pm at ThoughtWorks Sydney office Lvl8 51 Pitt St, 24 June 14.
Remember to get your free ticket. See you there!