Software dev, tech, mind hacks and the occasional personal bit

Category: Technical Page 1 of 14

Solving mysterious null values in Mysql date columns, stored by a Rails app

A few months ago, when I was doing some detailed database backup and restore testing, I discovered there were, out of millions of records which had user-entered dates, a handful that had null dates in a database column. I scratched my head for a while and couldn’t work out how this had happened, as the field is validated in Rails for empty/null.

Just today, I got an exception report from a different part of the system which does a query based on the user entered date, and it revealed the source of this extremely rare problem! So.. drum roll..

Accidentally, a user had entered a date with the year 20223.

This is valid in Ruby/Rails but too big to be stored in the mysql Date column, so had ended up (silently) being stored as null!

Easily fixed by limiting the date range a bit!

Testing performance before upgrading from Mysql 5.7 to Mysql 8

Mysql 5.7 is reaching end of life in October, so it is becoming important to upgrade. I am using Percona Mysql and the upgrade process with Apt and the Percona repositories is simple. Mysql automatically upgrades files and tables to suit the new version too. There are config file changes required, and quite a lot of defaults changed in Mysql 8, but that is not the focus of this post.

Reading up on the performance differences from Mysql 5.7 to Mysql 8, the story is mixed. In my case, Mysql 8 is definitely slower out of the box for queries that are not well indexed. We are talking often about 50% slower on larger selects. With well indexed queries, the difference is negligible. I kept looking for some config setting that would bring back old Mysql 5.7 performance, but I found no way to make Mysql 8 perform as well out of the box. The solution I found was to add more indexes. These indexes had not been required in 5.7 for good performance. In Mysql 8, they proved vital.

Considering the difference in performance between versions, and additional indexes I had put in place, I wanted to test the performance before upgrading my production setup to Mysql 8. A good way to test performance would have been to mirror incoming requests between real production, and a cloned production server, and watch the comparative performance. This is arguable the best option, but requires quite a bit of infrastructure work and HTTPS offloaded from the production server to something that is mirroring the requests and ignoring responses from the clone. AWS supports traffic mirroring, but I decided it wasn’t the best option in my situation as it would have required significant infrastructure and production changes for my setup.

The alternative that worked in my case, was to record all database queries for a period of medium-high site load, and then replay these on clone servers, to test the relative performance between database server versions and index additions. Tools exist to do this, but they are a bit dated.

If you’re interested in using approach, first record all queries to file at a time of significant load, using the Slow Log on your production server. Simultaneously, take a snapshot of the server that you can use for creating clone servers at this point in time.

To test relative performance on clones, I used Percona Playback. This is a very handy tool but old and unmaintained – still, it was the best option I found. To make Playback work, you need to run it under CentOS 7, which is easily achieved using Docker. I tried updating the code to run on modern Ubuntu but it was too big a job, libraries it depended on had changed too much.

To install, set up a data directory and grab and build my Dockerfile (I updated it a little from @lichnost’s version):

mkdir -p playback-docker/data
cd playback-docker
curl https://github.com/jcrisp/percona-query-playback/blob/master/Dockerfile > Dockerfile
docker build -t percona-playback .

Transfer your query log (eg, MyServerName-slow.log) to the clone. I’d also recommend taking a snapshot at this point, and creating new clones for performance testing from this new snapshot, since it now has Percona Playback installed and the query log file available.

To replay the queries in full on a cloned database server:

docker run --mount type=bind,source=$PWD/data,target=/app/data  --mount type=bind,source=/var/run/mysqld/mysqld.sock,target=/var/lib/mysql/mysql.sock percona-playback --mysql-max-retries 1 --mysql-host localhost --mysql-port 3306 --mysql-username <USERNAME> --mysql-password <PASSWORD> --mysql-schema <DATABASE_NAME> --query-log-file /app/data/MyServerName-slow.log

Note that this is “destructive”. All updates/inserts/deletes will be applied, so the run is only fully valid once. Re-running it will lead to errors like duplicate ID inserts or deletes of records that don’t exist any more.

For a non-destructive performance test, you can filter out just the selects using another tool, pt-query-digest, which is happily still maintained and can be installed as part of the percona-toolkit package. To make a select only log file:

cat MyServerName-slow.log |  pt-query-digest --filter '$event->{arg} =~ m/^select/i' --output slowlog > selects.log

For performance comparisons, I ran the non-destructive select-only version first to warm up the database before running the full destructive version as the real test.

This approach gave me confidence that, with additional indexes, Mysql 8 would be faster than Mysql 5.7 on a real production load for my app.

Percona MySQL: Collecting All Queries with the Slow Query Log

If you want to collect all queries running on your database for a period of time, for analysis or replay, the simplest way to do this is using the Slow Query Log.

To start collecting:

SET GLOBAL slow_query_log=1;
SET GLOBAL long_query_time=0;
SET GLOBAL log_slow_admin_statements=1;
SET GLOBAL log_queries_not_using_indexes=1;
SET GLOBAL slow_query_log_always_write_time=0;
set global slow_query_log_use_global_control="all";

To stop collecting:

SET GLOBAL slow_query_log=0;
SET GLOBAL long_query_time=10;
SET GLOBAL log_slow_admin_statements=0;
SET GLOBAL log_queries_not_using_indexes=0;
SET GLOBAL slow_query_log_always_write_time=10;

The statements will be logged to your mysql data directory, into a file named after the hostname with -slow.log on the end. For example:

/var/lib/mysql/MyServerName-slow.log

The performance degradation from logging all queries was not significant in my case, with my site under medium-high load.

ChatGPT Programming Test

I was writing a little function in Ruby and thought I’d ask ChatGPT to have a go at it.

It is easy to read, explicit, and fairly idiomatic Ruby (though not concise). Not the most performant implementation but nothing terrible. It also mainly works but does have a bug in some cases.

Here is the code:

def working_days_between(start_date, end_date)
  # Make sure the start date is before the end date
  start_date, end_date = end_date, start_date if start_date > end_date
  
  # Calculate the number of days between the two dates
  days = (end_date - start_date).to_i
  
  # Calculate the number of weekends between the two dates
  weekends = ((start_date..end_date).count { |date| date.saturday? || date.sunday? })
  
  # Subtract the weekends from the total number of days to get the number of working days
  working_days = days - weekends
  
  return working_days
end

If you have the start/end date on a weekend, then you get a negative answer. Eg,

working_days_between(Date.parse("Sat, 04 Mar 2023"), Date.parse("Sun, 05 March 2023"))
 => -1

It is because the weekend number of days calculation is including both the start date and the end date. Ie, working_days = 1 – 2 = -1

A human could easily have made the same mistake, mind you.

A better / simpler implementation is:

(from_date...to_date).count { |date| date.on_weekday? }

Note the 3 dots (…) for the date range, which does not include the end date.

Later, I tried asking ChatGPT to regenerate the answer multiple times. It gave me quite a different version every time – some versions with bugs, some with no functions, some with support for public holidays, etc.

Talk: Credit cards / Gateways

Tune in to the next Sydney ALT.NET meetup on Tuesday (30 Dec)! I’ll be giving a talk from around 6pm.

Accept credit cards: Gateways, architectures, code, and… money!

Have you been thinking to accept credit card payments for your new clever MVP, startup, or maybe even in your day job? Well, you’re in luck! James will give you a primer on how to do it simply and securely, based on his recent journey to the Gateway jungle.

Please RSVP on meetup and join the Twitch stream for some fun!

Rails ActiveModel, with nested objects and validation

So maybe you have a model that is not backed by a database table? ActiveModel is meant to cover this scenario and give you the usual ActiveRecord goodness and validation.

But the story gets much harder if you want to have nested objects. In a normal ActiveRecord::Base backed model, you can simply say:

accepts_nested_attributes_for :order_lines

and Rails will manage form submitted nested parameters for you.

Life is not so simple, or documented with nested objects on ActiveModel. accepts_nested_attributes_for is not available. But some of the underpinnings are.

So enough talk, how do you make it work? I’ll show you with an Order / OrderLines example.

Note the very special name: order_lines_attributes=. This hooks into the Rails handling of nested form parameters. Also the valid? method propagates the child errors up to the parent object, so that they show up at the top of the form.

Now how do you do the nested form? It’s similar to normal database backed nested records.

Hope this helps, it is not documented anywhere I could find, and worked out mainly though reading the Rails source.

Rails: Removing error divs around labels

Rails makes it very easy to style fields with errors on your form. Unfortunately, the same error DIV with class ‘.field_with_errors’ is applied around labels, as well as inputs/checkboxes/selects. This tends to mess up the layout and double up on error display. To fix this, you can configure the field_error_proc in your application.rb to ignore labels. The code below calls the original error decoration proc for all types of tags except for labels.

Lightning Talks at Sydney ALT.NET (30 June)

At next Sydney ALT.NET meetup (30 June), I’ll be giving a couple of short talks from around 7pm:

Building a Rust microservice, dockerising and deploying on Google Kubernetes Engine
Walk through of a little side project, showing you how this tech works and fits together.

Terminal COVID-19 tracker in your postcode
Brought to you in one line, with a little investigation in the Chrome inspector, and some simple terminal commands piped together.

Feel free to tune in on the Twitch stream for some fun!


Now available to watch on YouTube.
You can check out the repo for Rust & k8 on Github.
And the one liner for COVID-19 tracking is:

curl https://nswdac-covid-19-postcode-heatmap.azurewebsites.net/datafiles/data_Cases2.json |ruby -e 'puts STDIN.gets.gsub("},", "},\n")' | grep <your postcode>

Importing Excel 365 CSVs with Ruby on OSX

Up until 2016, Excel for the Mac provided a handy CSV export format called “Windows CSV”, which used iso-8859-1/Windows-1252 encoding. It was reliable, handled simple extended characters like degree signs, and could be read in Ruby with code like:

CSV.foreach(file, encoding:'iso-8859-1', headers:true, skip_blanks:true) do |row|
    ....
end

Unfortunately, support for this format was dropped in Excel 365. Many RiskAssess data files were in this format, as the earlier versions of Excel did not properly export to valid UTF-8.

Excel 365 now has a Save As option of “CSV UTF-8 (Comma-delimited)”. This is UTF-8.. with a few twists! Sometimes it includes a UTF-8 format first character, and sometimes not. Sometimes it includes the UTF-8 format first character plus a BOM (byte order mark), another invisible character. According to Wikipedia “The Unicode Standard permits the BOM in UTF-8 but does not require or recommend its use”. This makes it trickier to import. Code like this:

CSV.foreach(file, encoding:'UTF-8', headers:true, skip_blanks:true) do |row|
    ....
end

will handle the first 2 possibilities, but not the BOM. The BOM ends up as an invisible character in the data, causing trouble. It is possible to import with encoding ‘bom|utf-8’ which will handle this case. Another option is to run data through the dos2unix command (easily installed with brew) which does general tidying including removing the unnecessary BOM.

Also to watch out for, “Windows CSV” format previously used “\r” to denote new lines inside of cells. The new UTF-8 export uses “\n” for new lines inside of cells.

Fixing ‘Invalid query parameters: invalid %-encoding’ in a Rails App

Sometimes users manually edit query strings in the address bar, and make requests that have invalid encodings. Unfortunately Rails does not handle this neatly and the exception bubbles up. Eg,

ActionController::BadRequest
ActionView::Template::Error: Invalid query parameters: invalid %-encoding (%2Fsearch%2Fall%Forder%3Ddescending%26page%3D5%26sort%3Dcreated_at)

from:
/rack/lib/rack/utils.rb:127:in `rescue in parse_nested_query'

[Note: This was with Passenger, which passed the request through to the app – your mileage may vary with other servers]

In the case of my app, these corrupted query strings are not that important, but users are receiving 500 server error pages. Sometimes they end up with a bad query string URL cached in browser history, so they keep going back to it rather than to the home page.

A simple solution, that gives a good user experience for my app, is to simply drop the query string on a request completely if it has invalid encoding. See my implementation using Rack middleware below:

Page 1 of 14

Powered by WordPress & Theme by Anders Norén