Overview of encoded Application

This document does not contain installation or operating instructures. See README.rst for that.

Encoded is a python/javascript application for storing, modifying, retrieving and displaying the metadata (as JSON objects) for the ENCODE project. The application was designed specifically to store metadata for high-throughput genomics experiments, but the overall architecture is suitable for any set of highly linked objects.

The “deep” backend is a simple Postgres object database. The relational database does not store any specific information about the objects but simply tracks transactions and keys. CRUD (Create/Read/Update/Delete) in this database is governed by a python Pyramid app. This python app can stand alone and provide JSON objects via GET directly from the database.

Elasticsearch is used to deeply and robustly index the entire object store and provide extremely fast read access and powerful search capability.

The Browser accessible frontend is written in ReactJS and uses the same Pyramid URL dispatch as the backend, but converts the GET request JSON into XHTML for viewing in a Web Browser.

SOURCE CODE ORGANIZATION

The top Level is organized into the following folders

  • .ebextensions - contains all EB environment provisioning scripts
  • bin - contains some misc scripts, such as macpoetry-build and test
  • deploy - contains remaining deployment scripts
  • docs - contain the source of this documentation
  • examples - XXX: Unused?
  • jest
  • node_modules
  • parts - contains WSGI process executables
  • scripts - XXX: Unused?
  • src - main source code

The src directory contains all the python and javascript code for front and backends

  • commands - the python source for command line scripts used for synching, indexing and other utilities independent of the main Pyramid application
  • docs - contains some miscellaneous docs
  • locust - contains locust load testing code
  • schemas - JSON schemas (JSONSchema, JSON-LD) describing allowed types and values for all metadata objects
  • static - Frontend JS (components), SCSS/CSS (HTML styling), images, fonts and frontend JS libraries
  • tests - Unit and integration tests
  • upgrade - python instructions for upgrading old objects stored to the latest schema
  • workflow_examples - XXX: document me
  • workflow_test_inserts - XXX: document me

BACKEND

  • Application (responds to web requests) - the main config files are *.ini in the root encoded directory.

Guts

views

The guts of the web application are in the views package. Views.views defines the Item and Collection classes that the web app will respond to via URLs like /{things}/ (returns a Collection of Things) and /{things}/{id} (retuns a Thing).

Other modules in the views package correspond to non-core views that the app will respond to.
user.py - special user objects are special access_key.py - generation/modification of access keys for programatic access search.py - constructs ES query and passes though to :9200

snovault.py

snovault.py defines the core Collection and Item classes which are the python representation of linked JSON objects and groups (collections) of linked JSON objects. It contains the business logic for updating JSON objects via PATCH and the recursive GETs necessary for embedded objects.

AuthZ

  • authentication.py
  • authorization.py
  • persona.py
  • JSON data schema
    definition
    Each object type has a .json schema file in /schemas. The objects are linked and embedded within each other by reference, forming a graph structure. “Mixins” are sub-schemas included in more than one object type definition. Each schema file is versioned and mapping an object from an older schema to a new one is called upgrading
    validation
    Objects are validated as they are POSTed or PATCHed to the application (via HTTP). Not sure when/how the validation is hooked in
    upgrading
    No idea
    linked and embedded objects
    Sorcery
  • Postgres Storage
    • Loading
  • Elasticsearch & Indexing

FRONTEND

The pyramid app handles all URL dispatch and fetches JSON objects from Elasticsearch (or optionally, the database directly). These can be either individual objects or Collections (arrays) of objects. The objects can either be “flat” with no linked objects embedded, or with some or all linked objects embedded in the response.

The scope of embedding is decided on an object-by-object bases, listed in the /src/encoded/types directory. Each object has an ‘embedded’ list defined, which dictates what objects will be embedded in the elasticsearch indexing process. Whole objects can be embedded or specific fields of objects. For objects (with linkTo’s in the schema) are not explicitly added to the ‘embedded’ list, three fields will automatically included, regardless of whether or not these are calculated properties. These are link_id, display_title, and uuid.

FOR MORE INFO ON EMBEDDING, reference docs/embedding-and-indexing.rst in snovault.

  • renderers.py - code that determines whether to return HTML or JSON based on request, as well as code for starting the node subprocess renderer.js which converts the ReactJS pages into XHTML.

Use of NodeJS

About ReactJS

Component Pages

HTML pages are written in Javascript using JSX and ReactJS. These files are in src/static/components. Each object type has a component which describes how both the individual item and the collection pages are rendered. Other pages include home and search. JSX allows the JS file itself to serve like an HTML template, similar to other web frameworks.

Boilerplate and Parent Classes

  • app.js
  • globals.js
  • mixins.js
  • errors.js
  • home.js
  • item.js
  • collection.js
  • fetched.js
  • edit.js
  • testing.js

User Pages (Templates)

  • index.js
  • antibody.js
  • biosample.js
  • dataset.js
  • experiment.js
  • platform.js
  • search.js
  • target.js

Views and Sections (Templates)

  • dbxref.js
  • navbar.js
  • footer.js

API

Parameters (to be supplied in POST object or via GET url parameters):

  • datastore=(database|elasticsearch) default: elasticsearch
  • format=json Return JSON objects instead of XHTML from browser.
  • limit=((int)|all) return only some or all objects in a collection
  • Searching