As cities become smarter, there is a growing demand not only to understand how a city works, but also to detect trends as they unfold, to take action and adapt cities as much as possible ahead of needs and opportunities.
Cities are the centers of economic prosperity, powerhouses of creative thinking and new technologies. They also consume the majority of resources and produce most of the waste. Increasingly cities, public and private organizations make data available for independent analysis, to foster innovation and engage citizens.
New types of data are also generated outside of any coordinated effort, as the by-products of digitalization and digital consumption, and the internet of things. These include social media, billing records, credit card transactions and many more. Multiple secular trends align to create a massive availability of data on cities, their functioning and evolution on a scale and depth considered impossible or uneconomical until recently.
This is a unique opportunity for cities: by creating transparency in systems and activities we can accelerate the identification of best practices and create new opportunities to balance prosperity with people needs and planet protection. As cities become smarter, there is a growing demand not only to understand how a city works, but also to detect trends as they unfold, to take action and adapt cities as much as possible ahead of needs and opportunities.
We looked at social media to measure some large scale phenomena that are traditionally difficult to measure. In particular, we wanted to assess the use of Twitter data (one year of data from May 2012 to may 2013) to measure the social diversity of a city as well as the overall perception of people (mood). We considered five very different cities to assess the methods and compare the results:
We define diversity as the number of ethnic communities present in a certain area of a city. There are dozens of different indicators used in literature to measure community, social and ethnic diversity. We have used the number of languages spoken in Twitter as a proxy of diversity. A language detection algorithm has been applied to each tweet published within a city boundary within 12 months. The total number of languages has been measured for tourists (Twitter users that have a home base different than the city) and residents.
Mood analysis is common research areas in social media data mining, which has developed rapidly in the last few years. It offers unique insight into soft dimensions of urban dynamics, and can be treated as a broad proxy for collective satisfaction of particular city elements or services.
We have assessed the mood by adapting the method of Dodds et al. (2011), which scores each message based on the 'happiness' of words in a tweet. The method is attractive because of its simplicity and because of the extensive crowd-sourced corpus used to score individual words. However, it has also many limitation, above all it does not consider the context of a word, which can introduce certain bias given the complexity of a human language. Any language processing technique is also sensitive to the length of an analyzed text, which is particularly important in case of the limited extent of tweets.
To capture spatial variability of a mood in a city, we have aggregated the mood score of individual tweets into the index of prevailing mood, which indicates for every area if the majority of tweets are positive or negative.
For more information: firstname.lastname@example.org