Mannheim Web Panel (MWP)

Site content by Julian Oliver Dörr and Sebastian Schmidt


This site introduces the Mannheim Web Panel (MWP) – a panel dataset of website contents extracted from corporate websites for a large sample of German companies.
The MWP was developed at the ZEW – Leibniz Centre for European Economic Research in the Project Business and Economic Research Data Center (BERD@BW).

Why corporate websites?

Company websites pose an important source of economic data used by firms to spread product and service information (related to establishing a public image), to conduct transactions (e-business processes) and to ease opinion sharing (electronic word-of-mouth) (Balzquez & Domenech, 2018). Recent economic studies have used corporate website data to:

Why panel data?

Firm characteristics, diffusion processes such as technological advances and technological adoption as well as business relations are clearly not static but evolve over time. It requires a continuous monitoring of corporate websites to capture this information. For this reason, the ZEW – Leibniz Centre for European Economic Research scrapes corporate website contents since 2018 and has established a panel format of these contents updated every three to six months.

Is the data available for researchers?

The MWP data is stored in ZEW’s cloud structure (Seafile) and can be accessed by externals for the purpose of research via ZEW’s Research Data Centre (ZEW-FDZ). Upon signing a licence agreement, interested users will have full access to the MWP in a secured environment provided by ZEW. For more details please contact:

Dr. Sandra Gottschalk
Phone: +49 (0)621 1235-267

How is the data structured?

The MWP includes a large amount of web data from corporate websites of German companies. The general scraping framework used to establish the MWP is available on Github. The scraping parameters for the MWP are standardized. For each website, the first 50 subpages are downloaded with shorter URLs in the corporate web-domain scraped more likely. Webpages which are in German language are preferred in the scraping process such that the majority of text content in the MWP is in German.

In the following, we will describe the data structure and access to the panel in more detail using Python. Note that the data can be accessed by any other programming language as well.

How can the data be accessed?

The files can be accessed for further analysis e.g. by using pandas. For this, the following modules are necessary.

How can singular files be accessed?
How can an entire wave be accessed?

Data from an entire wave can be accessed by looping over the csv-files. Keep in mind that the files are very large which might cause problems with your memory. Therefore, it is sensible to filter only the data you need in the loop.

The above output shows company IDs on whose corporate website the search term digital matched. This shall just give a very high level example on how to work with the MWP.

Can the MWP be combined with other data?

Please note that it is possible to combine the MWP with further firm-level data hosted at ZEW, such as for example the Mannheim Innovation Panel (MIP) and the IAB/ZEW Start-up Panel. This is how the MWP realizes its full potential. Find out more on the website of the ZEW-FDZ.

Interested in more information on how to extend the MWP to span a much wider time horizon? In a future post, a framework to extend the MWP and create own web panels is introduced.