Overview of encoded Application¶

This document does not contain installation or operating instructures. See README.rst for that.

Encoded is a python/javascript application for storing, modifying, retrieving and displaying the metadata (as JSON objects) for the ENCODE project. The application was designed specifically to store metadata for high-throughput genomics experiments, but the overall architecture is suitable for any set of highly linked objects.

The “deep” backend is a simple Postgres object database. The relational database does not store any specific information about the objects but simply tracks transactions and keys. CRUD (Create/Read/Update/Delete) in this database is governed by a python Pyramid app. This python app can stand alone and provide JSON objects via GET directly from the database.

Elasticsearch is used to deeply and robustly index the entire object store and provide extremely fast read access and powerful search capability.

The Browser accessible frontend is written in ReactJS and uses the same Pyramid URL dispatch as the backend, but converts the GET request JSON into XHTML for viewing in a Web Browser.

SOURCE CODE ORGANIZATION¶

The top Level is organized into the following folders

.ebextensions - contains all EB environment provisioning scripts

bin - contains some misc scripts, such as macpoetry-build and test

deploy - contains remaining deployment scripts

docs - contain the source of this documentation

examples - XXX: Unused?

jest

node_modules

parts - contains WSGI process executables

scripts - XXX: Unused?

src - main source code

The src directory contains all the python and javascript code for front and backends

commands - the python source for command line scripts used for synching, indexing and other utilities independent of the main Pyramid application

docs - contains some miscellaneous docs

locust - contains locust load testing code

schemas - JSON schemas (JSONSchema, JSON-LD) describing allowed types and values for all metadata objects

static - Frontend JS (components), SCSS/CSS (HTML styling), images, fonts and frontend JS libraries

tests - Unit and integration tests

upgrade - python instructions for upgrading old objects stored to the latest schema

workflow_examples - XXX: document me

workflow_test_inserts - XXX: document me

BACKEND¶

Application (responds to web requests) - the main config files are *.ini in the root encoded directory.

Guts¶

views¶

The guts of the web application are in the views package. Views.views defines the Item and Collection classes that the web app will respond to via URLs like /{things}/ (returns a Collection of Things) and /{things}/{id} (retuns a Thing).

Other modules in the views package correspond to non-core views that the app will respond to.

user.py - special user objects are special access_key.py - generation/modification of access keys for programatic access search.py - constructs ES query and passes though to :9200

snovault.py¶

snovault.py defines the core Collection and Item classes which are the python representation of linked JSON objects and groups (collections) of linked JSON objects. It contains the business logic for updating JSON objects via PATCH and the recursive GETs necessary for embedded objects.

AuthZ¶

authentication.py

authorization.py

persona.py

JSON data schema

definition

Each object type has a .json schema file in /schemas. The objects are linked and embedded within each other by reference, forming a graph structure. “Mixins” are sub-schemas included in more than one object type definition. Each schema file is versioned and mapping an object from an older schema to a new one is called upgrading

validation

Objects are validated as they are POSTed or PATCHed to the application (via HTTP). Not sure when/how the validation is hooked in

upgrading

No idea

linked and embedded objects

Sorcery

Postgres Storage

Loading

Elasticsearch & Indexing

FRONTEND¶

The pyramid app handles all URL dispatch and fetches JSON objects from Elasticsearch (or optionally, the database directly). These can be either individual objects or Collections (arrays) of objects. The objects can either be “flat” with no linked objects embedded, or with some or all linked objects embedded in the response.

The scope of embedding is decided on an object-by-object bases, listed in the /src/encoded/types directory. Each object has an ‘embedded’ list defined, which dictates what objects will be embedded in the elasticsearch indexing process. Whole objects can be embedded or specific fields of objects. For objects (with linkTo’s in the schema) are not explicitly added to the ‘embedded’ list, three fields will automatically included, regardless of whether or not these are calculated properties. These are link_id, display_title, and uuid.

FOR MORE INFO ON EMBEDDING, reference docs/embedding-and-indexing.rst in snovault.

renderers.py - code that determines whether to return HTML or JSON based on request, as well as code for starting the node subprocess renderer.js which converts the ReactJS pages into XHTML.

Use of NodeJS¶

About ReactJS¶

Component Pages¶

HTML pages are written in Javascript using JSX and ReactJS. These files are in src/static/components. Each object type has a component which describes how both the individual item and the collection pages are rendered. Other pages include home and search. JSX allows the JS file itself to serve like an HTML template, similar to other web frameworks.

Boilerplate and Parent Classes¶

app.js

globals.js

mixins.js

errors.js

home.js

item.js

collection.js

fetched.js

edit.js

testing.js

User Pages (Templates)¶

index.js

antibody.js

biosample.js

dataset.js

experiment.js

platform.js

search.js

target.js

Views and Sections (Templates)¶

dbxref.js

navbar.js

footer.js

API

Parameters (to be supplied in POST object or via GET url parameters):¶

datastore=(database|elasticsearch) default: elasticsearch

format=json Return JSON objects instead of XHTML from browser.

limit=((int)|all) return only some or all objects in a collection

Searching