I-Media-Cities: A Digital Ecosystem Enriching A Searchable Treasure Trove Of Audio Visual Assets

paper, specified "long paper"
Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Digital Cultural Ecosystems

The paradigm of ecosystems for digital cultural contents, the so called DCEs (Digital Cultural Ecosystems), can facilitate and foster the process of democratization of knowledge. This process started in the 1990’s with the first applications developed by means of ICT tools applied to Cultural Heritage (Felicori and Guidazzoli, 2003). Open digital frameworks allow the access and the enrichment of cultural digital resources enabling new researches and studies. Moreover, these systems can bring cultural content to different audiences in innovative ways and include citizens in the process of content enrichment (Apollonio et al., 2017) (Paneva-Marinova et al., 2017).
ith respect to digital Cultural Heritage, citizens should be more than consumers

ViMM Manifesto full version, ViMM project, retrieved on 3 April 2019.

. The audience participation should be improved at an active level, implementing user-oriented perspectives.

The same principle was at the basis of the I-Media-Cities project, with its aim of involving both researchers and the general public, not only as users of the platform, as mere performers of queries, but, instead, as active contributors by tagging and further enriching the digital contents delivered by the platform.

The I-Media-Cities project

Horizon 2020 project is the initiative of 9 European Film Libraries, 5 research institutions, 2 technological providers and a specialist of digital business models to share, access to and valorise audiovisual (A/V) content about history and identity of European cities. A huge quantity of fictional and non-fictional AV works (from the end of the 19th century onwards)
been made available
for research and creative purposes, describing cities in all aspects, including physical transformation and social dynamics

(Table n. 1).

The platform, with its dynamic processing pipeline and automatic video analysis tools, enables the enrichment of the moving images meta-data (Fig. 1). By allowing also manual and automatic annotations of the A/V content the platform creates a new digital ecosystem (Caraceni et al., 2017) .

Fig. 1: IMC Movie Processing Pipeline

The innovative elements of the I-Media-Cities (IMC) platform are not limited to conveying the audio/visual contents of nine archives to a single collector and access point. Once collected, the A/V material is elaborated with a series of algorithms that automatically enrich it with a good set of meta-data. Nowadays it is a common activity for people to contribute to the enrichment of web contents with various kinds of data, in particular through social media. However, this manual activity is an extremely laborious process; especially when it regards, as in the case of the IMC project, the annotation of videos at shot or single frame level. Automatic meta-data, on the other hand, while not yet achieving the accuracy of a manual enrichment, is able to add a great deal of information. To this end, IMC has integrated a set of tools provided by Fraunhofer for the automatic analysis of video and images. These algorithms provide information on the quality of video, identify camera movements, such as zoom or pan, segment videos in various scenes (shot), and identify and recognize objects and various elements (object detection) that appear in the videos of IMC platform. The object detection tool, thanks to deep learning techniques, is able to identify the presence, for example, of people or means of transport or architectural or urban elements, differentiating between man or woman, adult or child; points out trams or carriages or bicycles; a square or a fountain, and so on (Fig. 2).

Figure 2: Visualisation of the object detection activity. Green detects objects with a confidence higher than 80%, yellow between 50 and 80% and red lower than 50%.
This is a particularly interesting and complex procedure, which has to manage the analysis of sometimes scarred films digitized several years ago and dating back up to the end of the 19th century.

Figure 3: IMC platform: video content item page displaying manual and automatic meta-data. Tags are colour coded (automatic, manual from researchers, manual from general users).

Since the automatic tools still retain a certain margin of error, a sophisticated online “frame by frame” analysis tool has been set up for the manual correction of inaccuracies on the shot detection, particularly useful on older videos (Fig. 3).

Figure 4: IMC platform: geo-localised search results

The result of this process is the creation of data that are transformed into semantic data, directly understandable by people as well as by computers. The IMC meta-data model summarizes the descriptive meta-data from the archives, and then uploaded directly with the audiovisual materials, based primarily on the
CEN EN19507 standard; and the data generated once the material has been loaded onto the platform. As mentioned, these meta-data can be the result of an automatic or manual enrichment process (Baraldi et al., 2016), modeled using the W3C Web annotation Data Model (WADM).

In I-Media-Cities a semantic search engine was developed for processing the requests coming from the user interface and, by analysing all the available meta-data, it provides researchers and citizens with the search results.

The meta-data system includes different types of annotation associated with different details:

original meta-data of the video or image, providing information at the level of the whole content;
automatic annotations at the level of a single segment of the video (scene or shot);
manual annotations at the level of the single segment of the video (scene or shot), tags and geotag (geolocalised annotation) with which the content has been enriched.

Table n.1 - Contents delivered to IMC platform up to February 2019

In particular, the automatic and manual annotations, being linked to the single segment of the video (scene or shot), allow a collection of scenes belonging to different videos where the same element can be traced: for example the tramways and traffic of the early 1900s (Fig. 5).

Figure 5: IMC platform: search by term

The automatic annotations guarantee that all the audiovisual material is analysed and annotated homogeneously, at least on a common set of aspects, while the manual annotations provide an enrichment of detailed information, also in the form of textual notes, bibliographic references, links to other similar material, both internal and external to I-Media-Cities. At the moment, there are 59,457 manual annotations (with 1,708 different terms); 422,123 automatic annotations (with 78 different terms); 6,411 geotags for 1,091 different places.
The performance of the IMC platform rely upon the High Performance Computing resources of Cineca, which allow the use of the most suitable hardware architectures for the different analysis algorithms applied.

In order to adequately present the contents of the research, particular attention is paid to the development of the visual interface, which also allows to perform searches starting directly from the map on which all the AV contents are geolocated, delimiting the selection through a time bar (Fig. 4). In relation to these aspects, a user experience evaluation process is applied, in accordance with the Agile methodology, which pervades the project (Cohen et al., 2004). The Agile methodology implies an iterative approach, with several phases of development: check, correction, check and development (Tab. 2).

Table 2: Agile methodology flow chart

During the first phase
User and System requirements were gathered by breaking the project and its requirements down into little parts of user functionalities, called
user stories and
use cases, and prioritizing them. Each Film Heritage Institution (FHI) and research partner provided a list of vision items and user stories that were categorised and grouped in
use cases by the Coordinator of the project and Cineca staff. During this process, the technical partners detailed each
use case with more technical information, such as technological enablers.

A series of training sessions and a set of “living document” were curated in order to enable this Agile process among researchers and users and make them aware of what was already available and feasible. “Living documents” are continuously updated, following both the development of the project and the state of the art of the different areas on which the project insists.
The final phase of the project foresees the opening of the platform to the wider public, and not only to researchers. The enrichment of contents can therefore take advantage of crowd-sourcing, obviously subject to a check by researchers and archives on all the information added

Among the metadata sent by the archives and associated to each content there is a “right status” value, selected from the following list:
- In copyright
- EU Orphan Work
- In copyright - Educational use permitted
- In copyright - Non-commercial use permitted
- Public Domain
- No Copyright - Contractual Restrictions
- No Copyright - Non-Commercial Use Only
- No Copyright - Other Known Legal Restrictions
- No Copyright - United States
- Copyright Undetermined
All contents, whatever “right status” they have, can be visualised by everybody, even by the general users group. The archives, depending on the “right status”, can upload low resolution versions with watermarking.
. The results of the research can be presented by archives and researchers as virtual exhibitions, set up in a 3D Web environment within the platform itself.


I-Media-Cities meets two main requirements of Cultural Heritage sector: gathering information of different types and from different sources and enriching the original information with annotations, automatic or manual ones. Each multimedia content, image or video, is inserted in a process in which the asset is associated with the original
belonging to the archive and, then, it is enriched with automatic
extracted by different algorithms and further enhanced by manual annotations. In the next phase of the project a particular attention will be devoted in improving the general public user experience (Krug, 2014). The citizens will be able to visualize the public domain content of the archives, search, browse, annotate A/V content and share their discoveries.


Apollonio, F. I., Rizzo, F., Bertacchi, S., Dall’Osso, G., Corbelli, A. and Grana, C. (2017). SACHER: Smart Architecture for Cultural Heritage in Emilia Romagna. In Grana, C. and Baraldi, L. (eds), Digital Libraries and Archives, vol. 733. Cham: Springer International Publishing, pp. 142–56 doi:10.1007/978-3-319-68130-6_12. http://link.springer.com/10.1007/978-3-319-68130-6_12 (accessed 4 March 2019).

Baraldi, L., Grana, C. and Cucchiara, R. (2016). Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features. Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval - ICMR ’16. New York, New York, USA: ACM Press, pp. 23–29 doi:10.1145/2911996.2912012. http://dl.acm.org/citation.cfm?doid=2911996.2912012 (accessed 4 March 2019).

Caraceni, S., Carpene, M., D’Antonio, M., Fiameni, G., Guidazzoli, A., Imboden, S., Liguori, M. C., et al. (2017). I-media-cities, a searchable platform on moving images with automatic and manual annotations.
2017 23rd International Conference on Virtual System & Multimedia (VSMM). Dublin: IEEE, pp. 1–8 doi:10.1109/VSMM.2017.8346274. https://ieeexplore.ieee.org/document/8346274/ (accessed 4 March 2019).

Cohen, D., Lindvall, M. and Costa, P. (2004). An Introduction to Agile Methods.
Advances in Computers, vol. 62. Elsevier, pp. 1–66 doi:10.1016/S0065-2458(03)62001-2. https://linkinghub.elsevier.com/retrieve/pii/S0065245803620012 (accessed 4 March 2019).

Felicori, M. and Guidazzoli, A. (2003). Experiences in immersive graphics and creation of multimedia database for promoting and enhancing cultural heritage. Roma: IRI Management CRIC-Centro Ricerche su Innovazione e Competitività Eventi Progetti Speciali.

Krug, S. (2014).
Don’t Make Me Think, Revisited: A Common Sense Approach to Web Usability. Third edition. Berkeley, Calif.: New Riders.

Paneva-Marinova, D., Pavlov, R. and Kotuzov, N. (2017). Approach for Analysis and Improved Usage of Digital Cultural Assets for Learning Purposes.
Cybernetics and Information Technologies,
17(3): 140–51 doi:10.1515/cait-2017-0035.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.