Apache Iceberg is a hot topic right now, and looks to be the future standard for representing tables in object storage. Hive tables are overdue for a replacement. People talk about table format wars: Apache Iceberg vs Delta Lake vs Apache Hudi and so on, but the “war” at the forefront of my mind isn’t which table format will become dominant, but the battle between open vs closed – open table formats vs walled gardens.
The ultimate guide to table format internals - all my writing so far
The teacher's nemesis
A few months ago I wrote Learning and Reviewing System Internals - Tactics and Psychology. One thing I touched on was how it is necessary to create a mental model in order to grok a codebase, or learn how a complex system works. The mental model gets developed piece by piece, using a layer of abstractions.
Today I am also writing about mental models and abstractions, but from the perspective of team/project leaders and their role in onboarding new team/project members. In this context, the team lead and senior engineers are teachers and how effective they has a material impact on the success of the team. However, there are real challenges and leaders can fail without being aware of it, with potentially poor outcomes if left unaddressed.
The curse of Conway and the data space
Conway’s Law:
"Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure."
This is playing out worldwide across hundreds of thousands of organizations, and it is no more evident than in the split between software development and data analytics teams. These two groups usually have a different reporting structure, right up to, or immediately below, the executive team.
This is a problem now and is only growing.