Culture venues clustering in Toulouse¶


Author: Akim van Eersel
Date: 2020-12-15

Urban places with a high population density¶

In Toulouse, France, like many other cities, some neighborhoods are well-known to have many bars and very close to each others.
However the issue referred into this work concerns clustering but not with regard to bars. The theme is the one of culture and related places for the public.

Since there are clusters of bars, are there geographic groupings of cultural points?
And if so, are these cultural places more or less grouped according to their category?

This analysis seeks to present a distribution of cultural places referenced on Foursquare, with a summary analysis on their geographical distribution and with respect to their category.
But it might be useful for getting more in-depth conclusions with additional work.

Data collection stage¶

  • The main points with their location and category will be retrieved from Foursquare databases using their URL API.

    • However, Foursquare's data is relatively biased since a very small fraction of all the cultural places is identified.
  • From Data.toulouse-metropole webpage, comptages-pietons dataset counts the pedestrian flows in different streets of Toulouse, from Toulouse Métropole, with last data input on 2020-02-13, is made available under the Open Database License.

    • These points could help better to cluster the cultural venues.

  • From the Foursquare API, 29 sites were collected and grouped among 4 cultural category bins : show, exposition-formation, play, monument.
  • Pedestrain flow counts are made out of 3136 measures in 96 differents streets over 5 years.
    • After cleaning irrelevant values, and grouping by measurement addresses and years to get the median value, 79 data points remain.

Foursquare cultural venues¶

Note: Left click on point to get their name.

In [16]:
toulouse_map
Out[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Pedestrian flow counts¶

Note: Left click on point to get addresses and median value.

In [24]:
toulouse_map
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Proximity criteria¶

The chosen way to link the two databases is to select the nearest pedestrian flow metrics from each Foursquare site. Here, this criterion will be a simple disk area centered on the different cultural places. If a point is present in an area of a disk, then it will be defined as being near the site at the center of the disk.

Note: Use down arrow key to see the second part of this dual slide.

In [30]:
toulouse_map
Out[30]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Filter Foursquare sites from proximity pedestrian flow counts¶

In [32]:
toulouse_map
Out[32]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Machine learning clustering¶

Without pedestrian flow counts¶

After finding the best fitting parameters with dendrograms selection for hierarchical algorithm, the map view of each venue colored by their cluster group is giving an interesting cleavage.

Note: Use down arrow key to see the second part of this dual slide.

In [35]:
toulouse_map
Out[35]:
Make this Notebook Trusted to load map: File -> Trust Notebook

With pedestrian flow counts¶

On proximity venues categories, monument is single, it will be better to rebrand it with to a more relevant category from the two others reamining, which in this case is exposition-formation.
There remains only two categories left: show and exposition-formation with respectively 7 and 5 venues.

In [38]:
toulouse_map
Out[38]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Discussion¶

  • Without pedestrian flow counts:
    • Considering only the geographical aspect, results seems surprisingly adequate. Generally speaking each category seems distributed with a dominance in one specific cluster.
    • However, due to the representation bias of each category (14-6-6-3) and their low occurrences, clustering is necessarily less suited to certain category (such as monument).
    • Upper right cluster, n°1, includes 8 (57%) of show venues. This group is strongly linked both to the geographical layout and show sites.
    • While for the other clusters, the geographical aspect seems to have a more important place in view of the distribution of categories.

  • With pedestrian flow counts:
    • Among the 4 clusters, the two venue categories are mainly distributed in a different cluster.
    • However, with fewer data points, the usual upper right cluster mostly made of show vaguely remains but must be heavily sliced.
    • The median of the pedestrian flow counts plays an important role in the clustering effect, probably more than category feature.
      • This data points and method doesn't show relevant or different conclusions from before. For now, adding pedestrian flow counts is too much impacting the data by reducing it. Thus, unfortunately but expected, this analysis is inconclusive.

Conclusion¶

So from the original problematic,
since there are clusters of bars, are there geographic groupings of cultural points?
And if so, are these cultural places more or less grouped according to their category?

In a sense, it is possible to say that Toulouse has at least one cluster of cultural places, and that it is strongly linked to a cultural category, namely show places.
But these data points do not make it possible to judge cultural venues clusters in the same way the bars/pubs clusters which are very dense and less dispersed.

However, adding categories and/or pedestrain flows counts in clustering algorithms are not adding any value.