Diving for Pearls: 2010

Thursday, 25 November 2010

Assessing reliability

ReCal (”Reliability Calculator”) is an online utility that computes intercoder/interrater reliability coefficients for nominal ordinal, interval or ratio-level data. I haven't tried it out yet, but will need to test reliability on my analysis, so I'm bookmarking it here.

http://dfreelon.org/utils/recalfront/

Wednesday, 10 November 2010

Counting words in an Excel cell

http://office.microsoft.com/en-us/excel-help/count-the-number-of-words-in-a-cell-or-range-HA001034625.aspx

=IF(LEN(TRIM(A1))=0,0,LEN(TRIM(A1))-LEN(SUBSTITUTE(A1," ",""))+1)

I knew there must be a way of doing this - but hadn't tracked down the exact formula until now. This one piece of information made the whole Excel course worthwhile. Now that I have the link it looks like such an easy thing to Google that I can't think why I didn't manage to track it down by myself. Suppose I was just convinced that there would be a WORDCOUNT function buried somewhere in Excel.

Friday, 17 September 2010

Analysis of Retweets on Twitter

Noting a useful resource.
Blog post by Brian Solis on 'The Science of Retweets on Twitter', examining a report by Dan Zarrella.
This is an overall analysis of Twitter, which provides some possible benchmarks for comparison when focusing on a subset of Twitter.
He records the most retweeted words and the least retweeted words - noting that retweets require a slightly higher reading age than Tweets.
Punctuation is more common in Tweets than in retweets - except for semi-colons, which show up as the 'only unretweetable punctuation mark'.
And retweets vary by day of the week and by time of day - with Friday evening being a top time for retweeting.

Wednesday, 15 September 2010

Analysing Elluminate dialogue

Bit of a summer break between blog posts, during which I've turned my attention from the Twitter stream around the OU online conference in June to the Elluminate chat around the presentations.
I'm still looking for indicators of exploratory talk that can be used to identify where learning is likely to be taking place.
People at the conference used chat a lot. For example, to take the afternoon session of 22 June, there were 858 separate contributions. I can divide these roughly into four groupings:

chat about content (526 contributions)
chat about tools (101 contributions about eg the conference format, Elluminate and Twitter)
social chat (215 contributions including hi, bye and thanks)
and blank contributions (16).

That means about 61% of the chat was focused on the content of the presentation. This seems pretty high - I've got a presentation from an old ALT conference running on my computer at the moment, and nobody has typed anything in the chat box during the first half hour.
I've picked out 94 words and phrases that could be indicators of exploratory dialogue. These include 'have you looked at', 'have you read', 'do you mean', 'my understanding' and 'next step'.

Here's an example of a chat contribution that my indicators flag as a possible example of exploratory dialogue.

An initial run-through suggests that this list is good for picking out the areas where learning dialogue seems to be taking place. As you'd expect, the exploratory dialogue is mainly in the sections of the chat related to content, and there are not only areas where these indicators are more common but also people who use these words and phrases more than others.

Thursday, 19 August 2010

Retweets and hashtags as indicators of learning

Following on from my last post, I'm searching for examples of exploratory dialogue in Twitter, on the assumption that the presence of this type of dialogue suggests that learning is taking place. I'm looking at the 110 Tweets that went out using the conference hashtag #ouconf10 during the afternoon conference session on 22 June 2010. Previously identified chracteristics of exploratory dialogue are: analysis, challenges, counter-challenges, explanations, explicit reasoning, justifications and reflection on the perspectives of others.

The table above codes the Tweets. By far the biggest category is the retweet. We don't retweet in F2F conversation, so this isn't an identified characteristic of exploratory dialogue.
Retweeting is akin to quotation, although perhaps quotation requires more cognitive input because it suggests that the quoter has remembered something (either the quotation or where it can be found) and has identified that it could be relevant to the conversation. Retweeting does not require the use of memory, but it does help to flag what participants in the dialogue are identifying as important elements. I therefore think it can be classified as cumulative dialogue which is another (perhaps more low level) form of learning dialogue.

In cumulative dialogue: 'Speakers build positively but uncritically on what the others have said. Partners use talk to construct ‘common knowledge’ by accumulation. Cumulative talk is characterized by repetitions, confirmations and elaborations' (Mercer & Littleton, 2007, p59)

If this is the case, a third of the conference Tweets in this sample can be characterised as cumulative dialogue. Is this an example of a learning analytic?

If a retweet contains a conference hashtag this is an indicator that learning may be taking place.

I'll take a look at this in more detail in my next post.

Wednesday, 18 August 2010

Identifying learning dialogue

I've switched to a different data source in my quest for learning analytics. I'm currently looking at interaction around the OU's online learning and technology conference this June. I have access to a lot of data around this conference, but I'm currently focusing on two elements: the Twitter stream, and the text chat that took place in Elluminate during the sessions.
The two data sources are superficially similar - short textual contributions shared online over a limited period of time and focused on similar subjects. Although they have asynchronous features, and have all been archived, they are largely synchronous communications.
I'm trying to dig into these to go beyond rich description of what happened, to find some underlying patterns that may be helpful for identifying/supporting learning in the future. I am currently focusing on / flitting between four areas: language, resources, individuals and networks.
In terms of language, I'm trying to identify patterns of speech and interaction that suggest learning may be taking place. I'm currently focusing on the patterns that Neil Mercer and his colleagues identified as characteristic of exploratory dialogue: analysis, explanations, explicit reasoning, justifications, reflection on the perspectives of others, challenges and counter-challenges. And I've narrowed the focus to learning about content, rather than to learning about the tools or learning about other people - so I'm setting aside discussions about how to make sure you can hear the speaker, or which types of biscuit are best to eat in a coffee break. I know those are all examples of learning, but I'm not looking for ways of encouraging reflection on the merits of custard creams.
This takes me on to resources - because descriptions of exploratory dialogue were developed in a face-to-face context, where the resources to hand were limited. In the context of online dialogue, maybe it is this linking out that supports exploration - or maybe it is linking out and then returning for discussion that is important - or maybe the important thing is linking out that moves the discussion to another venue (if that turns out to be the case it's going to be very difficult to research).
And maybe it's a mistake to set aside the people, because I may find that learning is associated with certain individuals. That may be because they are making interesting contributions, or because they are central nodes linking networks of people, or networks of resources, or because they are contributing think pieces elsewhere, or because they are asking interesting questions. In that case, learning about people may be key, because it's important to find and follow these people. So what marks individuals out as key? Is it because they initiated the hashtag, or is it that every conversation peters out if they are not involved, or do they mark themselves out as confident/off-the-wall by initiating conversations about virtual biscuits?
So many possibilities.

Friday, 16 July 2010

Learning/teaching analytics - digging deeper

Following on from my last post, why isn’t the news on my Research skills required by PhD students cloudscape as good as it looks at first sight?

Here are the viewing figures. 856 page views of my Cloudscape. Lovely, regular peaks in the viewing figures every Monday. What happens every Monday? I go and check out how much people are using my clouds – that’s what. So at least 400 of those page views are me collecting data. That, in turn, affects my bounce rate – which is 0% on all those data collecting days. In this case, the site’s own analytics are more helpful – the cloudscape has had 323 views (it doesn’t count the logged-in author).

Something else I get from the viewing figures is a spike on 28 June. What happened there? It seems that someone from The Open University arrived three times from Google with related research queries and spent 2 hours 50 minutes on the site. That’s great, they were obviously finding it useful – but that cuts everyone else’s average time on the site considerably.

More depressingly, here are 20 of the 32 searches that brought people to this cloud – where they then spent no time and bounced away from the site again. Some of them are probably in the wrong place and need to leave to look for IT skills or UK skills. The majority, though, should be staying – I’m losing at least 12% of visitors who should have found this a useful resource.

What do visitors see in that fraction of a second they spend on my Cloudscape before bouncing away? Possibly the first line of text ‘Research Skills required by PhD students, as defined by the UK Research Councils’. And, yes, half of those bouncing visitors are outside the UK – although only an eighth of views originate outside the UK. Time for a rewrite – these are generic research skills, relevant worldwide.

More broadly, what have I learned about implementing analytics on a learning site?

Search for a way to set aside visits by the site owner and the content author – it looks as if this needs to be done on the site itself, rather than through Google Analytics.
Set aside outliers for separate consideration.
Give content authors access to the keywords that are working for that content, and the keywords that are not.
In fact, if the aim is to improve the learning/teaching effectiveness of resources, it would be good if authors could access short analytics reports without having to filter requests through the site owner.

Google Analytics and learning/teaching potential

What else can Google Analytics tell me about the potential of my resources on Cloudworks as teaching materials? Well, I’d expect my Cloudscapes, which are essentially index pages, to lead people to the associated clouds (content pages).

I’ll focus here on Research Skills required by PhD Students. I created this back in February – and it involved a fairly straight transfer of Web materials to Web 2.0. Pretty much all of this material was already available on the Open University Intranet, I moved it over to Cloudworks, sorted out the dead links, added some new links and it’s now open for people to add to, comment on and discuss.

The original, Intranet, version of the web page didn’t work too well.The page stats over a 14-month period showed that people were following the links based on where they fell on the page. Links at the top left did very well, followed by the first link after each heading. Links on the bottom right were only followed once or twice a month. This looks like a classic browsing pattern – people arrive, click around to see what is on offer, but don’t make any serious use of the page or its linked resources.

When the material was added to Cloudworks, the pattern of usage became more even. The previously dominant A1: Recognising research problems was replaced by B2: Compliance with ethical requirements which, more recently, has been overtaken by B6: Justifying research methods. I like to think that that reflects seasonal variation in the concerns of PhD students at the OU (the target audience for the clouds and for the original site) but that’s currently just a guess.

If the Cloudscape is doing its job as a learning/teaching resource I’d expect to see a low level of bounces from that page, a high level of people moving through to the linked clouds, people moving through to the pages related to their original search terms, and those people who do move on to linked clouds spending at least a few minutes on the site.

I seem to have a fair-sized readership – 848 page views since February. Two-thirds of those arriving on the page are bouncing directly off the site but those arrive from elsewhere on the site don't tend to leave when they get to my cloudscape. I’m glad to see that they are following the links, and that those who stay on the site move to linked pages.

The unique visitors who landed directly on the page spent nearly 8 hours there in total, which averages out at around four minutes each. What’s more, people with relevant, detailed searches seem to be spending time on the site.

According to my original hypothesis, Google Analytics is showing this to be a page that looks likely to be supporting teaching and learning. People want to find out more about skills required for research, this page provides links to relevant material, they follow these links and spend some time looking at them.

All good news? Not entirely. In my next post I’ll look at some of the things that are going wrong, and the analytics that help to identify these.

Thursday, 15 July 2010

Using analytics to improve practice

I’m looking at a Cloudworks page I set up earlier this year, when I was creating resources for Open University research students in connection with a postgrad conference here at the university. Each of the presentations at the conference had its own cloud, and they have all received a fairly steady amount of visits each week since the conference. The most popular cloud, and the one I’m looking at here, is Skills Audit (currently with 199 views, according to the Cloudscape figures).

Students at the OU have to complete a Skills Audit at the end of the first year of their PhD – considering their progress in relation to a set of research skills identified and agreed by the UK research councils. Having done it myself, I’d say it was a valuable exercise, but I remember resenting having this chore imposed upon me.

Before I looked at Google Analytics, I was fairly pleased with this cloud and its performance. It’s aimed at first-year PhD students at the OU and it’s picking up around ten views a week at the moment. It links to complete lists of research skills and resources for developing and assessing them, it links to the actual skills adit that OU students have to fill in, and it links to the rest of the postgrad conference, and it contains a liveblog of a detailed conference session on the Skills Audit, as well as a biography of the speaker. Not only that, but the SocialLearn gadgets on the page point visitors to other related resources. All the links and info you need in one handy cloud.

I’m not so happy, though, when I get to the analytics. Visitors are spending an average of 43 seconds on the page, but only one or two have spent much longer there. More seriously, I have an 88.6% bounce rate, so visitors aren’t using this as a gateway to other resources on Cloudworks. This gets even worse (90.65%) when I look at the key words people used to get to the page.

Let’s take a closer look at a representative sample. That long line of zeroes represents time spent on the page, the 100%s represent the bounce rate and the exit rate. So people are arriving who should be looking for exactly this resource and, in less than a second, are consistently deciding that what I have provided is no good for them.

And, when I spend half a second on the cloud, I can see exactly why. The cloud begins with a long biography of the seminar presenter – that’s not what people are looking for. I know why it’s at the top – it was the only content I had available before the conference took place – but it shouldn’t be there now.

So the analytics have helped me to improve my own practice – now I need to think through how I can extend this finding in order to have a wider impact.

Learning analytics / teaching analytics

My current focus is on learning analytics. How can we tell from site analytics whether someone is learning, engaging in activities that have been shown to support learning, or exhibiting behaviours that are associated with learning? And, rather than develop Wheel 2.0, I'm looking at available analytics and whether they can be harnessed to do this. Hence the current focus on Google Analytics.

A problem is that identifying learning means that I need to be able to associate activity with specific individuals or groups of individuals. As I discussed below, I can do that to some limited extent with Google Analytics but it's not really set up for me to do that and, more to the point, focusing on individuals in this way feels intrusive and, I think, would need informed consent from those concerned if I pursued it to any extent.

So Google Analytics can give me some pointers as to whether learning (activities/behaviours) are taking place, but to link this to individual learners or groups of learners would involve another set of analytics, and those learners would have to be aware of what was taking place.

Coming at this from another perspective - how about teaching analytics? I'm not thinking here of time/motion studies about level of activity and output - I'm more interested in helping teachers / educators judge the value their output has for others. Google Analytics are potentially more helpful here, because the authors of online resources, and the creators of online discussions have publicly identified themselves, and so resources can be tied to individuals.

So, if I examine an online resource, I could look at how many visits it receives, how long those visits last and whether people move on to use linked resources. I can examine the effects of sharing a link to that resource on Twitter. How effective are my Twitter links compared with those of people widely known for their expertise in the field?

More broadly, if I look at an educational resource (I'm currently looking at Cloudworks) I can begin to identify the most effective resources and behaviours. In my next post, I'll describe some preliminary work I have done on this.

Wednesday, 14 July 2010

Hourly reports on Google Analytics

Google Analytics allows you to break down activity on your site by hour, but this function isn’t easy to find in the current version.

I have set up a custom report to do this (custom reporting is available on the left-hand side of your Google Analytics screen).

Custom reports are set up by dragging metrics (blue) from the left of the screen and adding dimensions (green).

I have dragged over Entrances – which counts how many people arrive on the site – and split it down by hours of the day. So, for example, I can see that between 9am and 10am there have been 1015 site entrances during the last month – and there have only been 100 between 1am and 2am.

As usual, I can filter the report by selecting a particular date, or range of dates, at the top of the screen. On 21 June most people arrived in the hour before midday, while only two people arrived 1am-2pm.

My hourly report is set to drill down to ‘Page Title’ (that’s the second green dimension that I dragged on to my custom report. This means I can click on any hour of the day and see where those entrances took place during that hour.

As my focus is on the two-day OU online conference (21-22 June 2010) I can now focus right in and see where people arrived on the site during specific sessions.
What is more, I can then subdivide that information, so I know how many people arriving on a certain page during a certain session are new visitors, or which country they come from.

By the time I have narrowed it to time, date and city where the visitor is based, though, the level of granularity is such that it has ethical implications because I now have a fairly good idea who some of those individual visitors are. I can click through and see who their service provider is (for example ‘open university’, ‘university of leeds’) and what their connection speed is (an online conference on dial-up? – ouch)

Using Google Analytics at this level of granularity seems/is intrusive. That’s a tension for my work on learning analytics – because how can learning analytics work if they don’t split down to individual level? I guess the distinction has to be that I should be in the user’s hands to turn them on and off, and to decide their own privacy levels in different situations.

Academic Analytics: A New Tool for a New Era

I've followed a link in OUseful Info to Educause, where I'm looking at
Academic Analytics: A New Tool for a New Era
John P. Campbell, Peter B. DeBlois, and Diana G. Oblinger
EDUCAUSE Review, vol. 42, no. 4 (July/August 2007): 40–57

This identifies several uses for analytics in education:

To manage enrolment, using standardised exam scores, high school coursework, and other information to determine which applicants will be admitted.
To inform fund-raising. By building a data warehouse containing information about alumni and friends, institutions can use predictive models to identify those donors who are most likely to give.
To aid retention by identifying students most at risk of dropping out
To assess which proactive interventions have the best influence on academic success and retention.
To predict student success within a course

They also highlight three characteristics of successful academic analytics-based projects (link to referenced PDF):

Leaders who are committed to evidence-based decision-making
Administrative staff who are skilled at data analysis
A flexible technology platform that is available to collect, mine, and analyze data

Within the OU, the IET Student Statistics department takes a leading role in analytics projects like these. Other departments, such as Communications, also make use of analytics data.

My focus is on learning analytics - how we can use online analytics to identify learning, conditions that support learning and behaviours that support learning.

Building on previous experience

Tony Hirst helpfully Twittered a Google search term for locating his posts on Google Analytics and his uncourse
allintitle: "course analytics" site:ouseful.open.ac.uk

He began (24 October 2007) with a focus on:
"four distribution (rather than average) measures that are useful for analysing user behaviour on non-ecommerce websites:
* Visitor loyalty - how often has each user visited the site over a given period;
* Visitor recency - of all the people who have visited the site, how many have visited in the last N days;
* Length of visit - how long do visitors stay on site;
* Depth of visit - how many pages on the site are seen on each visit"

His focus (25 Oct) was on "website analytics can be used applied to online course websites in order to gain a better understanding of online study habits and the bahaviour of students taking an online course." The length of visit figure gave an idea of how long students were long to spend online studying course material (approx 30 mins in this case).

On 26 Oct he focused on timing of visits. Students on the course appeared to be less likely to visit on a Saturday, and seemed to be online more at lunchtimes and in the early to mid-evening. Apart from these daily and weekly patterns, there were also spikes associated with deadlines.

This is a small dataset (around 100 students) in the context of the OU, and they were studying an online computer course which makes them likely to be atypical in terms of computer use. Still, it gives some broad hypotheses - students will prefer online material provided in up to 30-minute chunks and they are more likely to be available for collaborative activities in lunchtimes and evenings.

Monday, 12 July 2010

Learning analytics

In this blog I will be thinking about learning analytics. How can we identify learning when it takes place online without formal assessment?