A Disturbing Trend

Background

I recently attended a hands-on training class for the latest version of MATLAB/Simulink®. During the training session, somebody noticed that Simulink files now have a .slx suffix. My first thought was "How many Simulink models will inadvertently end up as corrupted Excel spreadsheets because people mistakenly put a .xls suffix on them?". But looking a little deeper, I realized that there was a fundamental change here and one I think deserves some attention since it represents, in my opinion, a disturbing trend.

The issue here is not with changing the file extension or any potential confusion with Excel. The real issue is with the format of the underlying files. In the Simulink case, this appears to be a "one step forward, two steps back kind of thing". Upon closer inspection, I realized that these .slx files are, in fact, just zip files. The good news here is that these zip files, in turn, contain XML files and that Simulink content can now more easily be probed and inspected by off the shelf tools (although they bundle some additional complexity in the form of Open Packaging Conventions). Although I'm not sure to what extent The MathWorks will support users working directly with the underlying content instead of using the APIs, it seems like a step in the right direction over the previous file format which, while suffering all the same concerns over direct access, also required a special parser.

So XML is, in my opinion, a step forward. But the "two steps back" are the fact that all of this parseable textual content is being stuffed away in an opaque, binary format. And The MathWorks aren't the only ones doing this in the modeling community.

Why do I care?

Because it makes version control a serious pain in the ass.

We are in the midst of a renaissance of high-quality and free version control tools. We have a solid incumbent in Subversion, a centralized version control system. We also have two pretty heavy-hitting newcomers from distributed version control system (DVCS) camp in Git and Mercurial. These are all great tools and they should become an integral part of any modelers approach to managing their models. Afterall, models are software (whether people think about them that way or not) and why not use the best possible tools out there (especially when they are battle proven and free!).

By implementing this "text files wrapped in a zip file" format, I imagine that tool vendors are trying to hide complexity and ease distribution. I think this is a mistake.

Don't get me wrong, I think using zip files as a format for distribution is a great idea. I've championed it myself in proposals to the Modelica community and I support this approach in the FMI standard as well. But the key difference here is that these are distribution formats not the native model representations. Giving people the option of packaging their content up for distribution in a single file (e.g. the Java jar file format) is great. But forcing them to use that as the storage format for their source code is, frankly, a terrible idea.

What to do?

Just as with Java jar files, these formats should be supported but they should be primarily for distribution and there should also be a format that elaborates the underlying content into a directory structure full of versionable (i.e. text-based) formats. This latter format should be the default for saving your work to the file system when building models and the former format should be reserved for redistribution. Supporting such an elaborated format involves nothing more than being able to work with the contents on the file system instead of inside the zip file.

And another thing...

While we are on the subject, I want to mention one other related point. Don't store version control information in source files. It drives me crazy when tools save fields like "Last Saved" as a time+date stamp inside source files.

Why?

Consider a use case where my colleague and I both "checkout" a model. We both make separate changes to the model (that impact different lines in the textual representation). We both save and then try to merge our changes. Our "real" changes don't conflict and any of the tools I mentioned before can trivially reconcile our versions except that the file also contains this "Last Saved" line and we have different (semantically meaningless) information on that line so we end up with a merge conflict.

The lesson is, don't put version control information in the files. Let the version control system and the operating system manage that data. By placing this kind of "meaningless" (at least from a modeling perspective) information in the files you are impeding the user's ability to use quality tools.

Conclusion

Model development is software development and, as a result, can benefit from many of the great tools (version control and otherwise) that help make software developers productive. But in order to fit into the eco-system effectively, we need to adopt the conventions of software which means store source code as text with the option of bundled, binary formats for distribution.

Share your thoughts

comments powered by Disqus