Data too long for column error after upgrading to Rails 4

After upgrading Rails 3.2 to 4.0 we had an interesting and yet unexpected new error:

ActiveRecord::StatementInvalid: Mysql2::Error: Data too long for column 'target_column'...

It turned out that MySQL strict mode was not enabled in our Rails 3 project. And as a consequence MySQL silently truncated the content of all string or text type columns in case if they didn’t fit in the size of the given data type. In Rails 4 strict mode is enabled by default, which is a very useful option and I definitely recommend to turn it on to avoid nasty bugs to hide.

So as a hotfix I decided to truncate the content to be aligned with the previous functionality.

I started with a simple test:

# spec/models/my_model_spec.rb
# target_column is a text field of the my_model model
describe '#truncate_target_column_to_fit_into_max_storage_size' do
  context 'when the target_column field is blank' do
    it 'throws no error' do
      described_class.create(target_column: nil)
    end
  end

  context 'when the target_column field is too long' do
    it 'truncates it' do
      max_length_of_target_column = 65_535
      target_column = 'a' * (max_length_of_target_column + 1)

      model_instance = described_class.create(target_column: target_column)
      expect(model_instance.target_column.length).to eq(max_length_of_target_column)
    end
  end
end

Then the implementation came along:

class MyModel < ActiveRecord::Base
  before_save :truncate_target_column_to_fit_into_max_storage_size

  private

  def truncate_target_column_to_fit_into_max_storage_size
    return if target_column.blank?

    max_char_length = 65_535

    self.target_column = target_column[0, max_char_length]
  end
end

As a result the error rate dropped dramatically, but not 100%.

I found an example from the logs, which had less than 65 000 characters. I was one step closer to start tearing my hair out. As my last desperate attempt I tried target_column.bytesize which returned way more than 65 536, which is the storage limit in bytes of the text data type in MySQL. Hmmm, sudden light flooded my brain: it is a spanish product and spanish people like to use chars like ç which is not an ASCII character and needs more space to be stored.

A quick test in IRB proved my wild hypothesis.

'c'.bytesize #=> 1
'ç'.bytesize #=> 2

Meaning truncating upon character length is not enough if the content has non-ASCII chars.

Let’s add a test for this new case:

# spec/models/my_model_spec.rb
describe '#truncate_target_column_to_fit_into_max_storage_size' do
  context 'when the target_column field is blank' do
    it 'throws no error' do
      described_class.create(target_column: nil)
    end
  end

  context 'when the target_column field is too long' do
    context 'when it has only ascii chars' do
      it 'truncates it' do
        max_length_of_target_column = 65_535
        target_column = 'a' * (max_length_of_target_column + 1)

        cre_request = described_class.create(target_column: target_column)
        expect(cre_request.target_column.length).to eq(max_length_of_target_column)
      end
    end

    context 'when it starts with two unicode chars' do
      it 'truncates it and removes two more normal characters to compensate the extra space unicode needs to store chars' do
        max_length_of_target_column = 65_535
        target_column = 'çç' + ('a' * max_length_of_target_column)

        cre_request = described_class.create(target_column: target_column)
        expect(cre_request.target_column.length).to eq(max_length_of_target_column - 2)
      end
    end
  end
end

And don’t forget to pimp up the hotfix:

class MyModel < ActiveRecord::Base
  before_save :truncate_target_column_to_fit_into_max_storage_size

  def truncate_target_column_to_fit_into_max_storage_size
    return if target_column.blank?

    max_field_size_in_bytes = 65_535

    self.target_column = target_column[0, max_field_size_in_bytes]

    while target_column.bytesize > max_field_size_in_bytes
      self.target_column = target_column[0..-2]
    end
  end
end

And Taa Daa, the errors disappeared into thin air. In the real application these fields we had problems with are not filled by the user, but our automated processes. Meaning, after a fresh Rails 4 deploy this was quicker than talking with other departments and changing the business logic.

Update:
Rails 6 added String#truncate_bytes to truncate strings to a maximum bytesize without breaking multibyte characters or grapheme clusters 🐿

Comments