Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

The following components are available in the 1.0 release:

  • Canvas Extractor
  • Google Classroom Extractor
  • Schoology Extractor
  • LMS Data Store Loader

Please see LMS Toolkit for more information about the purpose of these tools.

Note

The LMS Data Store Loader pushes CSV files, created by the extractors, into a SQL Server database. That database can be the same as an Ed-Fi ODS. However, all of the data are loaded into tables in the lms schema instead of the edfi schema.

Pre-Requisites

Warning

Python 3.9.5 has a bug that causes the extractors to crash, and thus should not be used. The Alliance's testing has used 3.9.4.


Note
titleNote on Python Version

In practice, these tools have only been tested on Windows 10; however, these tools should work from any operating system that supports Python 3.9.

Running the Tools

The LMS Toolkit components can be installed into other Python scripts as dependencies, or they can run as stand-alone command line scripts from the source code.

Deck
idrunning


Card
idrunning-packages
labelFrom Packages

The following commands install all fours tools into the active virtual environment; however, each tool is independent and you can install only the tools you need.

Code Block
languagebash
pip install edfi-canvas-extractor
pip install edfi-google-classroom-extractor
pip install edfi-schoology-extractor
pip install edfi-lms-ds-loader


Tip

To install the most current pre-release version, add the --pre  flag on each command.

We have developed sample Jupyter notebooks that demonstrate execution of each extractor paired with execution of the LMS Data Store Loader:


Card
idrunning-source-code
labelFrom Source Code

The source code repository has detailed information on each tool. To get started, clone or download the repository and review the main readme file for instructions on how to configure and execute the extractors from the command line.


Runtime Arguments and Options

Whether you run the extractors by incorporating into an existing Python package, or by using the stand-alone command line utility from the source repository, there are a number of required and optional arguments. When running with the command line tool, simply provide the --help  option for the full set of options for each extractor.

Applies ToArgumentRequired?
Applies To
Purpose
Purpose
AllFeatureNo
All

Define which optional features are to be retrieved from the upstream system. Default: none. Available features:

  • Activities : encompassing section activities and system activities.
  • Attendance: attendance data. Only applies to Schoology.
  • Assignments: encompassing assignments and submissions.
  • Grades: section-level grades (assignment grades are included on the submissions resource). Experimental and only implemented for Canvas at this time.

Note: Sections, Section Associations, and Users are always pulled from the Source System.

Log LevelNo
All
Valid options are: DEBUG, INFO (default), WARNING, ERROR, CRITICAL
Output DirectoryNo
All
The output directory for the generated CSV files. Defaults to: ./data.
Sync database directoryNo
All
Directory for storing a SQLite database that is used in support of synchronizing the data between successive executions of the tool. Defaults to: ./data.
Google ClassroomClassroom accountYes
, for GoogleGoogle Classroom
The email address of the Google Classroom admin account.
Usage start dateNo
Google Classroom
Start date for usage data pull in YYYY-MM-DD format.
Usage end dateNo
Google Classroom
End date for usage data pull in YYYY-MM-DD format.
SchoologyClient keyYesSchoology client key.
Client secretYesSchoology client secret.
Page sizeNoPage size for the paginated requests. Defaults to: 200. Max value: 200.
Input directoryNoInput directory for usage CSV files.
CanvasBase URLYesThe Canvas API base url.
Access tokenYesThe Canvas API access token
Start DateYesStart date for the range of classes and events to include, in YYYY-MM-DD format.
End DateYesEnd date for the range of classes and events to include, in YYYY-MM-DD format.


Tip

To retrieve multiple features with one call to the command line interface, list them out with spaces separating the values or commas. Examples:

Code Block
# Two ways to get these three optional features:
poetry run python .\edfi_google_classroom_extractor -f activities, grades, assignments
poetry run python .\edfi_google_classroom_extractor -f activities grades assignments

# Retrieve only the "activities" data (in addition to the core data set).
# Note the use of the "long flag" intead of `-f`.
poetry run python .\edfi_google_classroom_extractor --feature activities


Using Extractor Output

The LMS Data Store Loader pushes the extractor-created CSV files into a SQL Server database, where the data are available for use via standard SQL Server interfaces and tools. However, the CSV files can also be consumed directly to perform many interesting analyses. We have a developed a set of Jupyter notebooks that demonstrate analytics tasks that can be performed in Python using the Pandas framework, reading raw CSV files. Sample output from these notebooks is visible directly in GitHub, without needing to run the code locally:

Operational Concerns

Logging

Deck
historyfalse
idlogging


Card
defaulttrue
idlogging-packages
labelFrom Packages
titleLogging configuration when installing from packages

When you incorporate the LMS Toolkit components as package dependencies in other Python scripts, then you need to pass the log-level to the main facade class and you need to define the logging format. For example:

Code Block
import logging
import sys
from edfi_schoology_extractor.helpers.arg_parser import MainArguments as s_args
from edfi_schoology_extractor import extract_facade

# Setup global logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# Prepare parameters
arguments = s_args(
    client_key=KEY,
    client_secret=SECRET,
    output_directory=OUTPUT_DIRECTORY,
	# ----------- Here is the log level setting -----------
    log_level=LOG_LEVEL,
	# -----------------------------------------------------
    page_size=200,
    input_directory=None,
    sync_database_directory=SYNC_DATABASE_DIRECTORY
)

# Run the Schoology extractor
extract_facade.run(arguments)



Card
idlogging-source-code
labelFrom Source Code
titleLogging configuration when running from source code

When running from source code, each extractor logs output to the console; these log messages can be captured in a file by redirecting output to a file:

Code Block
languagebash
titleRedirect to file
poetry run python edfi-canvas-extractor > 2021-05-02-canvas.log

The above example assumes that all configuration has been placed into a .env  file or environment variables.

The log level defaults to INFO. You can lower the number of log messages by changing to WARNING, or get increased logging by changing to DEBUG. The log level can be set at the command line, in a .env file, or an environment variable (the exact environment variable name depends on the extractor; run the extractor with --help  for more information). 

Code Block
languagebash
titleSet level to DEBUG
poetry run python edfi-canvas-extractor --log-level DEBUG > 2021-05-02-canvas.log



Security

Upstream APIs

Each API has its own process for securing access. Please see the respective readme files for more information:

Data Storage

Given the LMS Toolkit deals with student data, both the filesystem and database (if uploading to SQL Server) are subject to all of the same access restrictions as the Ed-Fi ODS database.

Database Permissions

The LMS Data Store Loader tool manages its own database tables. Thus the first time you run the tool, the credentials used to  connect to SQL Server need to have the db_ddladmin  permission in order to create the necessary tables. Subsequent executions can use an account with more restrictive permissions, i.e. the db_datawriter role.

Scheduling

The API's provided by these three learning management systems are well defined at a granular level. From a performance perspective, this means that the process of getting a complete set of data is very chatty and may take a long time to process. It is difficult to predict the exact impact, although generally the time will scale proportional to the number of course sections. Some of the API's also do not have any mechanism for restricting the date range or looking for changed data, resulting in each execution of the extractor re-pulling the entire data set.

If running on a daily basis, then we recommend running after normal school hours to minimize contention with network traffic to the source system. If running weekly, then it may be best to run over the weekend. 

It should be trivial to call these programs from Windows Task Scheduler, Linux chron, or a workflow engine such as Apache Airflow.

Contents

Table of Contents