References:

Tested on Ubuntu 20.04:

  • Install docker and docker-compose
  • Install and start Airbyte
      mkdir ~/build
      cd ~/build
      git clone https://github.com/airbytehq/airbyte.git
      cd airbyte
      # Edit BASIC_AUTH_PASSWORD in .env from the default
      ./run-ab-platform.sh 
    
  • Once you see an Airbyte banner, the UI is ready to go at http://localhost:8000. It actually runs at 0.0.0.0:8000.
    • You will be asked for a username and password. By default, that’s username airbyte and password password.
    • Once you deploy airbyte to your servers, be sure to change these in your .env file.
    • Configure Airbyte to connect Sample Data source to Local JSON destination.
    • Pick test as destination path. This will result in three output files /tmp/airbyte_local/_airbyte_raw_{stream_name}.jsonl.
  • Create the build folder:
      mkdir ~/build/airbyte-dagster-langchain
      cd ~/build/airbyte-dagster-langchain
    
  • Set up venv, and enter the virtual environment:
      python3 -m venv ~/.venv/langchain
      . ~/.venv/langchain/bin/activate
      pip install openai faiss-cpu requests beautifulsoup4 tiktoken dagster_managed_elements langchain dagster dagster-airbyte dagit
    
  • Download https://github.com/airbytehq/dagster-langchain/blob/main/ingest.py
  • Edit the airbyte_loader to point it to one of the configured /tmp/airbyte_local/_airbyte_raw_{stream_name}.jsonl files
  • Add the user and password here: python airbyte_instance = AirbyteResource( host="localhost", port="8000", username="airbyte", password="password", )
  • Set up Dagster
      if [ ! -d $DAGSTER_HOME ]; then mkdir -p $DAGSTER_HOME ]; fi
      touch ${DAGSTER_HOME}/dagster.yaml
    
  • Put your OPENAI_API_KEY and DAGSTER_HOME in your ~/.bashrc:
      export OPENAI_API_KEY=XXXg
      export DAGSTER_HOME=~/build/airbyte-dagster-langchain/dagster_home    
    
  • Re-source the ~/.bashrc if necessary. Re-enter the virtual env.
  • Start Dagster:
      dagster dev -f ingest.py -h 0.0.0.0
    

    The -h parameter controls the dagit host, and -p controls the port. Default port is 3000. Use ingest_all.py for ingesting multiple sources.

  • Start Streamlit:
      . ~/.venv/streamlit/bin/activate
      streamlit run chatbot.py --server.port 8080
    
  • Open the ui at port 8501