Overview

Overview

Data Collection

Architecture

SourceRank

Package Managers

Everything in Libraries.io begins with package managers. On a regular basis, background tasks find new or updated libraries from each of those packages managers. Libraries cannot be added to the system unless they exist on one of those package managers.

Projects, Versions and Dependencies

Each library is stored in the Project table. If a library’s package manager supports published version numbers, each new version is stored in the Version table; and if the package manager supports library dependencies, then they are recorded in the Dependency table.

Repositories

Libraries.io will then augment package manager data with data from a Repository if one is referenced in the package manager. GitHub repositories will be downloaded and stored in the Repository table, on creation a background Sidekiq task is kicked off to download related information from GitHub including:

Some package managers that don’t have a concept of published versions (like Go and Bower), often they will fall back to using tags from a source repository if available, Libraries.io attempts to use GitHub tags as a fallback for all package managers that don’t provide version information.

Licenses

License names are ran through SPDX. This standardises the many different ways of writing the same licenses into a single version, which is then used for filtering in search and listing on https://libraries.io/licenses. A library can have multiple licenses as it may contain other libraries with conditions that are enforced upward. If a project doesn’t have any license data from the package manager then it will fall back to using the (singular) Repository license.

If a project has a non-standard or commercial license it’s currently normalized to “Other” and is not indexed in search.

Architecture

Libraries.io is made up of a number of micro-services that work together. The following diagram provides a high-level overview:

Overview of Libraries.io architecture

Components

The main bits are:

Core Web App

Parser services

The API

Firehose Server Sent Events API for Libraries.io releases

Libraries

GitHub Firehose

  • Github Firehose Server-sent Events firehose of the GitHub public timeline
  • Dispatch Dispatch events from various sources into Libraries.io job queue

Webhooks

  • Lib2Issues Create GitHub Issues from Libraries.io webhooks
  • Travis Rebuilder Rerun travis-ci tests after any dependency is updated
  • Sentinel Automated dependency updates for Node.js projects

Bots

Firehose

Tools

Other

  • Documentation Documentation for the whole Libraries.io project
  • Support Public issue tracker for Libraries.io users
  • Assets Non-code assets for Libraries.io
  • GitHub Companion Google chrome extension that adds Libraries.io to GitHub repo pages
  • D3 Dependencies D3 dependency graph visualization from Libraries.io API

Retired repositories

  • Librarian Node.js web service for parsing dependencies from manifests
  • Librarian-parsers Node.js library for parsing dependencies from manifests
  • Librarian-cli Node.js cli for parsing dependencies from manifests
  • Gem Parser Web service for parsing Ruby and Cocoapod manifests
  • GithubUrls Parse GitHub repo details from a variety of urls
  • LibHub Minimalistic GitHub client for Node.js
  • Favicon Generates Libraries.io favicons for a given colour or language
  • First PR Bot Tweets whenever someone opens their first open source pull request on GitHub
  • Languages Just the language names and colors from github-linguist

SourceRank

SourceRank is the name for the algorithm that we use to index search results. The maximum score for SourceRank is currently around 30 points.

Our analysis is broken down into:

Code

  • Does the project have any outdated dependencies? Tag: any_outdated_dependencies, Score: -2`

Community

  • How many ‘stars’ does the project have? Tag: stars Score: +log(stars)/2
  • How many contributors does the project have? Tag: contributors Score: +log(contributors)/2
  • How many ‘subscribers’ does the project have? Tag: subscribers Score: +log(subscribers)/2
  • Has there been an update within the last six months? Tag: recently_pushed Score: +1

Distribution

  • Is there a link to the source code? Tag: repository_present Score: +1
  • Does the project use versioning? Tag: versions_present Score: +1
  • Does every version use semantic versioning? Tag: follows_semver Score: +1
  • Has the project reached version 1.0.0 yet? Tag: one_point_oh Score: +1
  • Is the project more than six months old? Tag: not_brand_new Score: +1
  • Has the project had a release within the last six months? Tag: recent_release Score: +1
  • Are all published versions marked as ‘pre-release’ by the maintainer? Tag: all_prereleases Score: -2
  • Has the project been removed from the package manager? Tag: is_removed Score: -5

Documentation

  • Does the project have a readme file? Tag: readme_present Score: +1
  • Does the project have a valid license? Tag: license_present Score: +1
  • Does the project have a description, homepage, repository link or keywords? Tag: basic_info_present Score: +1
  • Is the project marked as deprecated by the owner? Tag: is_deprecated' Score: -5`
  • Is the project marked as unmaintained by the maintainer? Tag: is_unmaintained Score: -5

Usage

  • How many Projects are dependent on this project? Tag: dependent_projects Score: +log(dependent_projects)*2
  • How many Repositories are dependent on this project? Tag: dependent_repositories Score: +log(dependent_repositories)

TODO

Expand upon:

  • GitHub Firehose
  • Repository monitoring
  • Repository Dependencies
  • Distributed package managers (Carthage)
  • Notifications
  • Webhooks
  • Deprecated and unmaintained detection
  • Removal detection
  • Recommendations
  • Firehose
  • Rest API
  • Dependency warnings
  • Project suggestions
  • Project mutes
  • Subscriptions