A storage product built to solve a niche problem in population-scale genomics has quietly grown into a commercially available filesystem handling petabytes of clinical research data for multiple major pharmaceutical companies — and its creator, Paradigm4, is now positioning it as infrastructure for AI and machine learning workloads, as shown on IT Press Tour.

Paradigm4, a Boston-area data infrastructure company with deep roots in bioinformatics, presented flexFS at the IT Press Tour in Boston in June 2026. The product, now in version 1.9, addresses a structural tension in modern cloud computing: the majority of applications — AI training frameworks, HPC pipelines, analytics engines, and increasingly AI agents — communicate via POSIX file interfaces, while economically rational storage at scale means object storage services such as AWS S3, Azure Blob, or Google Cloud Storage. Object storage is inexpensive and elastic, but introduces latency and exposes an API that most software tooling does not natively speak.

The company traces flexFS’s origins to work with UK Biobank licensees who needed to process hundreds of terabytes of genomic data across hundreds of parallel compute nodes simultaneously. Gary Planthaber, Paradigm4’s CTO and the inventor of flexFS, described evaluating existing options — open source solutions including JuiceFS, ObjectiveFS, and S3FS, as well as managed services such as Amazon’s FSx for Lustre and EFS — and concluding that none satisfied the combination of throughput, cost, full POSIX compliance, and suitability for regulated life-sciences data. The company built flexFS internally as a result.

The architecture separates metadata handling from file data I/O. A dedicated flexFS metadata server provides low-latency responses to filesystem operations — directory listings, permission checks, inode updates — while file data is written to and read from object storage in parallel chunks. Each file is split into blocks, each block assigned a unique object identifier; this approach allows parallel retrieval across the object store rather than sequential access through a single server. An optional proxy group — effectively a write-back cache using RAM and NVMe storage — sits between compute instances and the object backend for latency-sensitive workloads. The proxy tier can be configured at the volume level, enabling selective caching for small random I/O patterns while allowing large sequential reads to access the object store directly.

flexFS runs across five deployment configurations: single-region cloud, multi-region and multi-cloud, on-premises, hybrid, and converged, where storage services run co-resident on compute nodes. In the converged configuration, validated jointly with Oracle on OCI, the system demonstrated performance approaching local NVMe levels despite data persisting in networked object storage.

The most detailed production data presented covers a top-five global biopharmaceutical company using flexFS for a Research Data Commons — a global repository for clinical and genomics data. The deployment now holds 1.14 petabytes across more than 160 million files and folders. According to figures presented by Paradigm4, the customer saved $1.44 million in 2025 alone compared with the alternative AWS stack, which would have combined FSx for Lustre, EFS, EBS, and S3. Over the 43-month deployment from September 2022 to March 2026, cumulative savings reached $3.13 million, representing 55 percent of what the AWS-native configuration would have cost. At the current scale of 1.14 petabytes, the complete flexFS plus S3 bill runs to $110,000 per month; the equivalent EFS storage component alone would cost $141,000 monthly.

The cost advantage, according to Paradigm4, is structural rather than a pricing concession. FSx for Lustre provisions storage in 2.4 TiB increments and cannot be reduced in size without migrating data — a process the company notes caused extended researcher downtime at the customer site. FSx also links throughput capacity to deployed storage, meaning organizations must overprovision storage to obtain the throughput they need. Over 43 months, Paradigm4 calculates that overprovisioning waste amounted to $332,000 for the pharma customer. flexFS, by contrast, charges for bytes actually stored with no minimum provisioning units, and scales throughput independently of storage.

Beyond cost, the presentation highlighted several operational properties. flexFS implements point-in-time recovery at no additional charge through a redirect-on-write block allocation scheme: when a block is overwritten, the new data is written to a new object identifier while the old one is retained for the configured retention period. This allows administrators to mount a read-only snapshot of a volume at any past point in time without interrupting running workloads. Server updates pause I/O for under one second; client updates occur via FUSE session handoff with no unmount required. A Kubernetes CSI driver with Helm chart support enables direct volume mounting in pods. An optimized find utility queries the metadata server directly rather than traversing the mounted filesystem, reducing search time on volumes with hundreds of millions of files.

Paradigm4 is extending the product into four newer use cases. For data lakehouse environments, the company ran TPC-H benchmark queries at scale factor 100 against Spark, Spark with Comet, and Spark with Gluten, comparing S3 direct, flexFS without caching, and flexFS with proxy caching. The fastest configuration — Spark plus Gluten with flexFS proxied — completed all 22 queries in 176 seconds against 1,191 seconds for baseline Spark on S3. For coupled-architecture databases such as massively parallel data warehouses or graph and vector databases, flexFS can decouple compute from storage, enabling independent scaling and reducing total cost of ownership by up to 60 percent according to company figures, without requiring code changes in the database layer. For AI and ML training workloads, the system targets GPU idle time caused by S3 throughput saturation during random reads or model checkpointing, claiming a two-times speedup over S3 direct even without the proxy cache, rising further with caching enabled. For agentic AI deployments, the POSIX environment and shared namespace allow agents to exchange file paths rather than copying data payloads, with byte-range I/O limiting unnecessary data transfer on large files.

Paradigm4 is also soliciting feedback from analysts on a proposed market category it calls the “File Lakehouse,” intended to describe platforms combining object-storage economics with POSIX filesystem semantics for unstructured data workloads, AI training, and agentic computing. The concept would position the file lakehouse alongside the data lakehouse and coupled-architecture DBMS as a distinct infrastructure layer in modern AI and analytics architectures.

flexFS is ISO 27001 certified, supports end-to-end AES-256 encryption with keys held exclusively on compute nodes, and claims eleven nines of data durability on hyperscale cloud backends. A community edition — limited to five terabytes and without proxy group support — is available at no cost under a community license. Enterprise installations typically deploy in under one hour. The current release is version 1.9.

By Jakob Jung

Dr. Jakob Jung is Editor-in-Chief of Security Storage and Channel Germany. He has been working in IT journalism for more than 20 years. His career includes Computer Reseller News, Heise Resale, Informationweek, Techtarget (storage and data center) and ChannelBiz. He also freelances for numerous IT publications, including Computerwoche, Channelpartner, IT-Business, Storage-Insider and ZDnet. His main topics are channel, storage, security, data center, ERP and CRM. Contact via Mail: jakob.jung@security-storage-und-channel-germany.de

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Cookie Notice by Real Cookie Banner