Cours Stats

Abstract: This funny study is an example of the use and analysis of a bipartite network. Following a long research tradition, we used data from within the University. In this graph, nodes are either professors or programs. The edges indicate how many hours a given professor is teaching within a given program.

Research leads and some hypotheses

Actors can engage in different kind of strategies. Either they concentrate their resources on one activity, or they diversify their resource allocation among numerous activities.

This strategic choice can reveal what they seek from their participation in activities:

  • By concentrating their resources on one activity, actors gain a privileged access to the activity’s public resources, and strong support from the activity.

  • By diversifying, actors build a larger network of ties and relationship, they widen their network. If on activity comes to fail, or stop, their risk of being expelled from the network is lessen.

Is there an optimal mix ?

Actor’s experience within the network can also explain the output of the allocation problem. The more knowledgeable they are of the network, the more efficient their allocation decision will be.

Along time, do we see imitation strategy of the best performing actors and activities ?

New activities tend to consume more resources from fewer actors (high entropy) But the survivals of activities need an engagement from a wider range of actors (diminishing entropy).

New actors enter the network with a given level of entropy. But their strategy is truly revealed afterwards when they take rational choice to adjust their level of entropy.

Some activities rely on actors that are central to the network, whereas others seek resources at the periphery. In a similar way, actors may choose to make their way to the network’s core or, for some reason, remain at its periphery.

The data

For the purpose of this study, we gather schedule data from a top-ranked university’s business school. Schedule data lists training programs (activities) and professors (actors). Each professor chooses to teach in programs depending on its available time, its field of competence, and its knowledge of the program’s resources (in terms of money and contacts). The schedule reports, for each day, the number of hours each professor spent teaching in each program. The data are aggregated by school year over a eight-years period.

The nature of the data is a bipartite graph. Nodes are of two types : either programs or professors. Edges are undirected but weighed by the number of hours professors choose to allocate to a program.

A dummy example to present metrics and methods

To help reader gain better insight into what the graph representation of schedule data may look like, we provide a small dummy example on Figure 1 with 9 professors and 3 programs. Professors are represented by grey bullets and numbered from one to nine. Programs are represented by red squares and labeled A to C. The thickness of edges accounts for the number of hours a professor devotes to a program.

alt text

It is straightforward to note that professor 5 is diversifying its allocation of time over programs A through C, whereas professor 4 for example is concentrating its resources in program B, and professor 8 is strongly linked to program C while keeping a small relation with program A. It should also be pointed out that programs do not necessarily “consume”" the same amount of resources. Some programs may need a lighter overall engagement from professors, like A, where others are more demanding, like C. In a way, programs can be said to also have a diversification strategy, for example, program B heavily relies on professor 4, and may be endangered if that professor retires, where program A is less dependent on any given professor.

This network can be represented in a matrix form that clarifies the computation of descriptive statistics that is used to analyze this set of relationships. This matrix is a table with programs as rows and professors as columns; each cell indicates how many hours a professor devotes to a program.

Prog 1 2 3 4 5 6 7 8 9
A 1 3 0 0 2 0 2 1 0
B 0 1 1.5 5 1 1 0 0 0
C 0 0 0 0 3.5 3 0 6 4.5

Degree, degree distribution (different distribution for prof. and programs)

In a given graph, the degree of a node refers to the number of edges that connect to it. For example, prof 7 has a degree of 1, prof 5 a degree of 3, and program C a degree of 4. The degree distribution shows how many nodes have a given degree. In bipartite graphs, those distribution are better represented by type of nodes.

Resources consumed and offered

The total resources offered or consumed are simply the row or column sums of the matrix, depending whether one’s interest lies in professor or program analysis.

Entropy (E)

\[E = - \sum \left( \frac{x}{X} \times log(\frac{x}{X}) \right)\]

Entropy is always positive, and reaches its maximum when, for a given professor (program), resources are equally distributed over the different programs (professors). Its value goes down to zero when all of the professor’s (program’s) resources are devoted to a single program (professor). Hence, a strategy of concentration (diversification) of resource allocation will decrease (increase) entropy. We should also note that entropy is not affected by the total resource a professor (program) offers (consumes).

Example of an actor’s entropy

prof2:\(- \left( \frac{3}{4} \times log(\frac{3}{4} + \frac{1}{4} \times log(\frac{1}{4}) \right) =0.56\)

prof4:\(- \left( \frac{5}{5} \times log(\frac{5}{5} + 0 \right) =0\)

prof5:\(- \left( \frac{2}{6.5} \times log(\frac{2}{6.5} + \frac{1}{6.5} \times log(\frac{1}{6.5} + \frac{3.5}{6.5} \times log(\frac{3.5}{6.5} \right) =0.98\)

Example of an activity’s entropy




Example of the total network entropy=2.44

The university network

Real data, and implications for reproductibility

Unlike email data (Enron Network) or social ties (Judo club), schedule data is widely available due to the digitalization of firm timetables. Hence, this kind of analysis is highly reproducible in any organization where schedules are managed online. Managers can perform such analysis for the sake of their own career, or to gain better insight from the organization they are managing.

A view of the network

Red dots are programs, Blue dots are professors

alt text

## [1] 4