How to navigate a sea of data when it comes to container freight statistics

## Does container shipping data matter? A visit in the shop will reveal that most items on sale are not produced in your country. Most often they have been produced abroad and imported in container ships. It's a consequence of globalization and specialization - each country producing an ever narrower range of goods and services, while importing the rest. In US, the value of seaborne imports stood at 1.2 trillion dollars in 2021 (6% of GDP), two thirds of which was containerized. [US goods trade by mode](https://www.bts.gov/browse-statistical-products-and-data/freight-facts-and-figures/us-international-freight-trade) During 2022, freight rates skyrocketed, the cost of transporting goods multiplied by 5, and there were delays in the arrival of vessels and in the processing of cargo. Transport infrastructure had become a bottleneck in global economic activity, at least in the short term. Transport routes, both in land, and in the sea, have an impact on consumer choice. Improvements in transportation in recent decades -the introduction of the container, or the fact that vessel capacities are increasing (the largest can carry 24 thousand containers, compared to just 10 thousand two decades ago), or the automation in terminals and warehouses- these all contribute to increase consumer choice and decrease prices. ## What are the sources of container shipping data? Container shipping data is in general chaotic, fragmented and erroneous. That's because there's no regulation around the world mandating disclosure of transactions, so self-reporting dominates. However, the main references are as follows: ### Unctad / World Bank / Lloyd's List This is perhaps the most widely quoted source. The methodology is drafted [here](https://unctadstat.unctad.org/wds/TableViewer/tableView.aspx?ReportId=13321) (click on information icon near "Container port throughput, annual"). It mixes government statistics, port authority data releases, and private sources. However, because the sources of data are heterogeneous, the resulting dataset is not in my opinion accurate. Reporting by chinese ports includes a fair amount of double counting, and hence chinese ports have a disproportionate, biased presence at the top of world rankings. This affects as well another widely cited source, the Lloyd's List. Other disadvantages of the dataset are that it's univariate (only total throughput in TEU) and it has yearly frequency. It's nevertheless a good starting point. ### Port authorities Port authorities are in general reliable, as they have prime access to the events they measure. The main drawback is that each port authority chooses how and what data to publish. Most ports don't release data. Those that do it choose idiosyncratic metrics: Some publish total trade, some split inflows and outflows. Some publish in monthly frequency, some in biannual, annual or quarterly. Some include empty containers and others don't. Some include transshipments. Some choose pdf, csv, Tableau and others Power BI. Because of the heterogeneity, absence of publication, and medium of publication, port authorities remain a limited and niche source of data. In general, United States port authorities produce better and more consistent reports than anyone else, possibly owing to demands of the Department of Transportation. ### Government data Because all entries and exits of merchandise are reported to Customs agents, the government owns data that is relevant to the shipping industry. However, little of this data is published, or often, with a long publication delay and loss of granularity. Such is the case of monthly goods exports datasets, published by almost all countries in the world, around 30 days after the end of the month, and in euro or dollar figure. The dataset mixes other cargo like commodities, and other transport modes like road, rail or pipelines. [Eurostat](https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Maritime_transport_of_goods_-_quarterly_data) or the [Department of Transport](https://www.gov.uk/government/statistics/port-freight-quarterly-statistics-april-to-june-2023) in UK, among others, do provide disaggregates by port and type of cargo, usually with several months of delay. ### Private companies Here it gets interesting. Imports and exports are in general an opaque flow. The amount of containers, the vessels calling at port, the content of containers, these are variables that often few individuals know and it's decentralized information. However, some options exist. In the US, the government allows domestic companies to request a release of trade data under the FOIA law (freedom of information act), and so much of customs data is accessed and resold by private companies. Such is the case of the [PIERS database](https://www.spglobal.com/marketintelligence/en/mi/products/piers.html) by Standard and Poor's. This is a dataset with quite a lot of information about the buyer, the seller, the cargo category, and the country of origin. However, it's published with a couple of months of delay and it's specific to the US. In addition, much of buyer information is hidden behind agents (port forwarders, brokers, transport companies) which prevents understanding the final recipient of the goods. [Container Trades Statistics](https://www.containerstatistics.com/) is perhaps the most recognized source of data. It results from a concerted effort of shipping operators to map the industry, which commit to share manifest data (the document underlying all containerized shipping operations). The raw dataset of manifests allows, after processing, to draw aggregate trade flow estimates between countries and the cost of transportation for such trades. On the downside, the data is published with some 40 days of lag and there's limitations in the granularity of the data. For instance, no data at the port level or cargo category is released. And finally, Econdb. The [maritime dataset](https://www.econdb.com/maritime/ports/) in Econdb merges a variety of data sources available across the world. The sources of data include, among others, [AIS](https://en.wikipedia.org/wiki/Automatic_identification_system), shipping line schedules, vessel fleets and port specifications. Drawing from techniques in the oil shipping industry, it infers cargo utilization based on vessel draught. Proprietary data pipelines process high-frequency AIS data to identify vessel operations and cargo exchanges, and finally build trade aggregates, which can be accessed in near real-time.