Overview
Overview
Package Managers
Everything in Libraries.io begins with package managers. On a regular basis, background tasks find new or updated libraries from each of those packages managers. Libraries cannot be added to the system unless they exist on one of those package managers.
Projects, Versions and Dependencies
Each library is stored in the Project table. If a library’s package manager supports published version numbers, each new version is stored in the Version table; and if the package manager supports library dependencies, then they are recorded in the Dependency table.
Repositories
Libraries.io will then augment package manager data with data from a Repository if one is referenced in the package manager. GitHub repositories will be downloaded and stored in the Repository table, on creation a background Sidekiq task is kicked off to download related information from GitHub including:
- Readme
- Tags
- Contributors
- Owner (GitHub user or GitHub Org)
- Source repository (if it’s a fork)
- Dependency Manifests
Some package managers that don’t have a concept of published versions (like Go and Bower), often they will fall back to using tags from a source repository if available, Libraries.io attempts to use GitHub tags as a fallback for all package managers that don’t provide version information.
Licenses
License names are ran through SPDX. This standardises the many different ways of writing the same licenses into a single version, which is then used for filtering in search and listing on https://libraries.io/licenses. A library can have multiple licenses as it may contain other libraries with conditions that are enforced upward. If a project doesn’t have any license data from the package manager then it will fall back to using the (singular) Repository license.
If a project has a non-standard or commercial license it’s currently normalized to “Other” and is not indexed in search.
Architecture
Libraries.io is made up of a number of micro-services that work together. The following diagram provides a high-level overview:
Components
The main bits are:
Core Web App
- Libraries.io The main website and the data store.
Parser services
- mix-deps-json Elixir parser for Hex dependency manifests
- clojars json Convert clojars.org data to JSON
- Carthage Parser Web service for parsing Carthage manifests
- Yarn Parser Web service for parsing yarn.lock manifests
- Pydeps Web service for calculating dependencies for python modules via pip
- Cocoapods API Web service for indexing cocoapods specs repo
- gradle parser Web service for parsing gradle dependencies
- npm-update-stream Web service for indexing npm update stream
The API
Firehose Server Sent Events API for Libraries.io releases
Libraries
- Bibliothecary Parses manifest files
- Gemnasium Parser An improved fork of gemnasium-parser
- Semantic Range node-semver written in Ruby for comparison and inclusion of semantic versions and ranges.
- SemanticInterval Turns Interval range syntax into Semantic Version range syntax
- License Compatibility Checks compatibility between different licenses from SPDX
- SPDX Standardises licenses
- Pictogram Logos for programming languages and package managers
- Package Managers Metadata about every package manager that Libraries.io supports
GitHub Firehose
- Github Firehose Server-sent Events firehose of the GitHub public timeline
- Dispatch Dispatch events from various sources into Libraries.io job queue
Webhooks
- Lib2Issues Create GitHub Issues from Libraries.io webhooks
- Travis Rebuilder Rerun travis-ci tests after any dependency is updated
- Sentinel Automated dependency updates for Node.js projects
Bots
- Just Open Sourced Tweeting whenever a repo is open sourced on GitHub
- Libby Libraries.io hubot
Firehose
- Firehose Stream Live streaming visualization of Libraries.io releases
Tools
- Required files (library) Ensures that certain files exist in all our repo’s
- Required files Files that should exist in every Libraries.io repository
- libsearch CLI for searching Libraries.io via the API
- Picto CLI for managing logos in Pictogram
Other
- Documentation Documentation for the whole Libraries.io project
- Support Public issue tracker for Libraries.io users
- Assets Non-code assets for Libraries.io
- GitHub Companion Google chrome extension that adds Libraries.io to GitHub repo pages
- D3 Dependencies D3 dependency graph visualization from Libraries.io API
Retired repositories
- Librarian Node.js web service for parsing dependencies from manifests
- Librarian-parsers Node.js library for parsing dependencies from manifests
- Librarian-cli Node.js cli for parsing dependencies from manifests
- Gem Parser Web service for parsing Ruby and Cocoapod manifests
- GithubUrls Parse GitHub repo details from a variety of urls
- LibHub Minimalistic GitHub client for Node.js
- Favicon Generates Libraries.io favicons for a given colour or language
- First PR Bot Tweets whenever someone opens their first open source pull request on GitHub
- Languages Just the language names and colors from github-linguist
SourceRank
SourceRank is the name for the algorithm that we use to index search results. The maximum score for SourceRank is currently around 30 points.
Our analysis is broken down into:
Code
- Does the project have any outdated dependencies? Tag:
any_outdated_dependencies, Score:
-2`
Community
- How many ‘stars’ does the project have? Tag:
stars
Score:+log(stars)/2
- How many contributors does the project have? Tag:
contributors
Score:+log(contributors)/2
- How many ‘subscribers’ does the project have? Tag:
subscribers
Score:+log(subscribers)/2
- Has there been an update within the last six months? Tag:
recently_pushed
Score:+1
Distribution
- Is there a link to the source code? Tag:
repository_present
Score:+1
- Does the project use versioning? Tag:
versions_present
Score:+1
- Does every version use semantic versioning? Tag:
follows_semver
Score:+1
- Has the project reached version 1.0.0 yet? Tag:
one_point_oh
Score:+1
- Is the project more than six months old? Tag:
not_brand_new
Score:+1
- Has the project had a release within the last six months? Tag:
recent_release
Score:+1
- Are all published versions marked as ‘pre-release’ by the maintainer? Tag:
all_prereleases
Score:-2
- Has the project been removed from the package manager? Tag:
is_removed
Score:-5
Documentation
- Does the project have a readme file? Tag:
readme_present
Score:+1
- Does the project have a valid license? Tag:
license_present
Score:+1
- Does the project have a description, homepage, repository link or keywords? Tag:
basic_info_present
Score:+1
- Is the project marked as deprecated by the owner? Tag:
is_deprecated' Score:
-5` - Is the project marked as unmaintained by the maintainer? Tag: is_unmaintained Score:
-5
Usage
- How many Projects are dependent on this project? Tag:
dependent_projects
Score:+log(dependent_projects)*2
- How many Repositories are dependent on this project? Tag:
dependent_repositories
Score:+log(dependent_repositories)
TODO
Expand upon:
- GitHub Firehose
- Repository monitoring
- Repository Dependencies
- Distributed package managers (Carthage)
- Notifications
- Webhooks
- Deprecated and unmaintained detection
- Removal detection
- Recommendations
- Firehose
- Rest API
- Dependency warnings
- Project suggestions
- Project mutes
- Subscriptions