My approach to crafting a reference architecture

Reference architecture approach

A reference architecture is a blueprint that provides guidance on how to design, build, and deploy a product or system. Over my time as a technical writer, I have worked closely with:

software engineers,
hardware engineers,
system architects,
product managers and
sales leads

to create reference architectures that outline key product components, their interactions, and best practices, helping customers ensure consistency and scalability.

How I create a reference architecture

I typically include the following sections in any reference architecture:

Core components: High-level building blocks of the system. Ideally, at this stage each core component should be linked back to a customer goal, for example, a block storage approach that is designed specifically to work well in typical high performance computing environments.
Relationships and interactions: How components communicate and integrate. This helps customers understand how the solution may be integrated into their overall approach.
Standards and best practices: Guidelines for implementation, security, and performance.
Customisation guidance: How to adapt the architecture to specific use cases. Depending on the solution, this may also be where I provide information on compatibility with popular products from other vendors.

Depending on the exact nature of the product, I may also include a use cases section, though often this is provided to customers via solution briefs or case studies supplementing the reference architecture.

Again, depending on the exact product, I may also include a design considerations section. This explores what a customer may want to prioritise in their solution (e.g. data durability, performance) and how the product achieves these goals. It's similar to the core components section, but is written with sales leads working with key decision makers within a potential client organisation in mind.

I generally set up this structure before beginning consultation with subject matter experts, using the document structure to help focus them on filling in the gaps in my own knowledge about a product. Once the main body of the document is completed, I'll add an executive summary,

Excerpts from my reference architecture work

Below are examples of material I have written for various reference architectures. Some information has been modified to meet client privacy requirements.

Example core component

RGW (RADOS Gateway)

Compatible with the Amazon S3 and OpenStack Swift APIs, RGW (RADOS Gateway) is a HTTP server providing a RESTful gateway to the cluster. Because both APIs share a common namespace, you can write data with one and retrieve it with the other. It scales very well and is incredibly fast, particularly with large object sizes.

Applications accessing RADOS via RGW

Example best practice

Plan for power and cooling efficiency

Power consumption and cooling will be a large part of your ongoing expenses. Three tips for keeping these costs low when planning your solution:

Design your data centre layout around airflow, at both the floorplan level and the rack level.
Choose storage software that will be able to intelligently share the load across storage media, avoiding “hot spots” created in your data centre from a handful over overworked appliances.
Choose hardware that optimises for cooling efficiency, for example, ones which use integrated circuits over chipsets, which typically require less cooling for their processor.

Through our task-specific approach, [Product 1] has been finely tuned to reduce both power and cooling requirements. The consequence of which is that we can pack more petabytes of storage into every rack installed, and still be well within the power and cooling budget available.

Example design consideration

Data durability

The durability of a cluster refers to its capacity to ensure data remains consistent and intact. Highly durable clusters are less likely to lose data from bit rot, drive failures, or any other form of data corruption.

Choices that affect your cluster’s durability include:

what software-based methods of data integrity checking will meet your durability requirements,
determining how much of your storage capacity you are willing to reserve for replication (data backups),
what replication method your cluster uses (some methods are more demanding on compute capacity than others, and some offer additional space-saving algorithms), and
how you will spread data throughout your cluster and across fault domains.

The winning combination of [Product 1] and [Product 2] offers multiple, powerful methods for preserving your data. On its own, [Product 2] can ensure your replicated data is spread carefully across failure domains, but with [Product 1], [Product 2]’s ability to keep your data safe truly shines. And by using [Product feature], the complexity of managing data distribution is simplified, making it easier to ensure your data is intelligently distributed across your cluster.

“[Wendy's] ability to translate complex technical concepts into clear, concise, and accessible content for a variety of audiences has been a great asset to SoftIron.”

Kenneth Van Alstyne, Chief Technology Officer

Fish out of order